DeepSeek Fails 58% of the Jailbreak Tests by Qualys TotalAI

In the evaluation, DeepSeek-R1 LLaMA 8B was subjected to 891 assessments, with a failure rate of 61%.

MITSloan ME Editorial March 12, 2025 Reading Time: 2 Min Read

Topics

[Image source: Krishna Prasad/MITSMR Middle East]

With DeepSeek’s disruptive debut in the AI landscape, questions of security and accuracy have emerged as chief concerns among retail users and curious experts. Qualys has recently conducted an in-depth security evaluation of the DeepSeek-R1 LLaMA 8B AI model using its newly launched AI security platform, Qualys TotalAI. The results revealed concerning vulnerabilities, with the model demonstrating a failure rate of 61% when subjected to Knowledge Base (KB) attacks and a failure rate of 58% when tested against Jailbreak attacks.

KB Analysis via Qualys TotalAI

The KB analysis component of Qualys TotalAI assesses a language model’s responses to a series of 16 categories, including controversial topics, factual inconsistencies, hate speech, legal issues, privacy breaches, and the disclosure of sensitive information. These responses are scrutinized for potential vulnerabilities, ethical concerns, and legal risks, with a severity rating applied based on the directness and potential impact of the vulnerabilities identified.

In the evaluation, DeepSeek-R1 LLaMA 8B was subjected to 891 assessments, with a failure rate of 61%. The most significant area of concern was the model’s alignment, where it achieved a meager pass rate of 8%. Other areas of notable failure included its handling of controversial topics (13%) and factual inconsistencies (21%). Conversely, the model excelled in filtering out explicit content, successfully passing 100% of the tests designed to detect sexual content.

Jailbreak Vulnerabilities

Further testing of DeepSeek-R1 LLaMA 8B examined its susceptibility to jailbreak techniques, which are methods used to bypass safety mechanisms, allowing the model to generate harmful or restricted outputs. The model was exposed to 18 distinct jailbreak types through 885 individual attempts, resulting in a failure rate of 58%. This demonstrated a significant vulnerability to adversarial manipulation, with the model struggling to prevent outputs related to illegal activities, misinformation, hate speech, and other harmful content.

Dilip Bachwani, Chief Technology Officer at Qualys, emphasized the importance of addressing security and compliance concerns as AI adoption continues to rise. He stated, “Ensuring the security of AI models, such as through vulnerability assessments and proactive risk management, is essential for responsible deployment.” Qualys TotalAI provides organizations with the tools to evaluate, monitor, and secure their AI models, ensuring they remain both secure and compliant as they scale.

Topics

About the Author

Tags:

Topics

Share