Meta Unveils Latest Llama Protection Tools to Secure AI Applications

Meta is committed to providing developers with the best tools to build secure AI applications. The company has introduced a range of Llama Protection tools, designed to help developers enhance security when building with Llama. These tools are available through Meta’s Llama Protections page, Hugging Face, and GitHub.
Key Llama Protection Tools
Llama Guard 4: An updated version of Llama Guard, this tool offers unified safeguards across different modalities, supporting text and image understanding. It is also available on the newly launched Llama API in a limited preview.
LlamaFirewall: A security tool designed to prevent AI system risks such as prompt injection, insecure code, and risky plugin interactions. For more details, developers can refer to the LlamaFirewall research paper.
Llama Prompt Guard 2: An updated version of the Llama Prompt Guard classifier model. Prompt Guard 2 86M improves detection of jailbreak and prompt injection attempts. A smaller version, Prompt Guard 2 22M, is also introduced, providing up to 75% reduction in latency and compute costs with minimal performance trade-offs.
Empowering the AI Defense Community
Meta also recognizes the importance of using AI in security operations to combat cyber threats. In response to demand from the security community, the company is sharing updates on how organizations can evaluate the efficacy of AI systems in defense operations.
Meta is also launching the Llama Defenders Program for select partners to support their efforts in improving software system robustness using AI.
New Security Benchmarks and Tools:
CyberSecEval 4: An updated suite of open-source cybersecurity benchmarks, including CyberSOC Eval and AutoPatchBench, to assess AI defense capabilities.
CyberSOC Eval: Developed with CrowdStrike, this framework evaluates AI’s effectiveness in security operation centers. It will be released soon.
AutoPatchBench: A benchmark to assess how well AI systems like Llama can automatically patch vulnerabilities in native code before exploitation.
Automated Sensitive Doc Classification Tool: A tool that automatically applies security classification labels to internal documents to prevent unauthorized access or distribution. It also filters sensitive documents in AI systems. Available on GitHub.
Llama Generated Audio Detector & Llama Audio Watermark Detector: These tools help detect AI-generated audio content, enabling organizations to combat scams, fraud, and phishing attempts. ZenDesk, Bell Canada, and AT&T are already integrating these tools into their systems.
Also read: Meta Launches Dedicated Meta AI App
Private Processing for Enhanced AI Privacy
Meta is also introducing Private Processing, a new technology that allows WhatsApp users to leverage AI for tasks like summarizing unread messages while maintaining privacy. The technology ensures that Meta and WhatsApp cannot access users' messages. Meta is collaborating with the security community to audit and improve this architecture, working toward a secure and private AI experience for users.
Building with Security in Mind
Meta’s approach to Private Processing includes a threat model that identifies and defends against potential attack vectors. The company plans to continue developing and strengthening the technology in collaboration with researchers, ensuring robust security before its official product launch.