Using its Automated AI Red Teaming Platform, Mindgard uncovers behaviors of Pixtral Large that can lead to vulnerable systems, including susceptibility to jailbreak techniques, log injection exploits, and encoding-based attacks
Mindgard, the leader in AI security testing, has detected behaviours of Pixtral Large (Pixtral-Large-Instruct-2411), the recently released multi-modal model from Mistral, which can impact end users if used in an application without appropriate controls put in place. Mindgard discovered that Pixtral Large is susceptible to advanced jailbreak techniques, log injection exploits, and encoding-based attacks. These findings highlight areas where developers and users of Pixtral Large can strengthen guardrails and enhance input/output filtering systems to mitigate risks.
These new discoveries in Pixtral Large highlight the ongoing risks in deploying AI systems without robust input/output filtering and guardrails. Attackers exploiting these flaws could bypass content restrictions, manipulate terminal environments, or use encoding exploits to evade moderation. Such exploits compromise the integrity and security of AI-based systems and their applications in sensitive industries.
Pixtral Large, introduced by Mistral on November 18, 2024, is designed to process multi-modal inputs across a wide range of applications. While it includes safeguards to prevent malicious content generation, Mindgard’s testing revealed that these protections can be bypassed through AntiGPT and Dev Mode v2 jailbreak techniques. Additionally, its flexibility in processing and outputting a wide range of encodings, can expose and facilitate exploitation of XSS, SQL injection and log injection vulnerabilities in other parts of the application stack.
Using its proprietary Automated AI Red Teaming Platform, Mindgard identified three primary characteristics in Pixtral Large that can create vulnerabilities:
- The model was consistently bypassed by AntiGPT and Dev Mode v2 jailbreak techniques, which manipulate the model’s inputs to provoke restricted outputs or simulate alternative operational states that lift programming constraints.
- Log injection vulnerabilities can be exposed by Pixtral Large generating raw and escaped ANSI sequences. These sequences, when viewed in terminal environments, could allow malicious commands to execute, compromising developer systems.
- Furthermore, the model was found to facilitate encoding-based attacks. These exploits take advantage of how Pixtral Large processes obfuscated or encoded text, such as diacritics, hexadecimal encoding, or zero-width characters, allowing attackers to bypass content moderation systems and generate unsafe outputs.
Dr. Peter Garraghan, CEO/CTO of Mindgard and Professor at Lancaster University, said: “Our findings in Pixtral Large emphasize the importance of proactive security testing and validation in AI systems. Addressing challenges like jailbreak techniques and encoding exploits is essential to ensuring the reliability of applications using AI. Mindgard is dedicated to helping the AI community build safer and more secure systems for all users.”
Mindgard advises users of Pixtral Large to review and enhance their guardrails and content filtering systems to address these findings. Mindgard is actively collaborating with industry partners to support the safe and responsible deployment of AI technologies.
About Mindgard
Mindgard is the leader in Artificial Intelligence Security Testing. Founded at Lancaster University and backed by cutting edge research, Mindgard enables organizations to secure their AI systems from new threats that traditional application security tools cannot address. Its industry-first, award-winning, Dynamic Application Security Testing for AI (DAST-AI) solution delivers continuous security testing and automated AI red teaming across the AI lifecycle, making AI security actionable and auditable. For more information, visit mindgard.ai