- Microsoft created an AI red team back in 2018 as it foresaw the rise of AI
- A red team represents the enemy; and adopts the adversarial persona.
- Latest whitepaper from the team hopes to address common vulnerabilities in AI systems and LLMs
Over the past seven years, Microsoft has been addressing the risks in artificial intelligence systems through its dedicated AI ‘red team’.
Established to foresee and counter the growing challenges posed by advanced AI systems, this team adopts the role of threat actors, ultimately aiming to identify vulnerabilities before they can be exploited in the real world.
Now, after years of work, Microsoft has released a whitepaper from the team, showcasing some of its most important findings from its work.
Microsoft’s red team whitepaper findings
Over the years, the focus of Microsoft’s red teaming has expanded beyond traditional vulnerabilities to tackle novel risks unique to AI, working across Microsoft’s own Copilot as well as open-source AI models.
The whitepaper emphasizes the importance of combining human expertise with automation to detect and mitigate risks effectively.
One major lesson learned is the integration of generative AI into modern applications has not only expanded the cyberattack surface, but also brought unique challenges.
Techniques such as prompt injections exploit models’ inability to differentiate between system-level instructions and user inputs, enabling attackers to manipulate outcomes.
Meanwhile, traditional risks, such as outdated software dependencies or improper security engineering, remain significant, and Microsoft deem human expertise indispensable in countering them.
The team found an effective understanding of the risks surrounding automation often requires subject matter experts who can evaluate content in specialized fields such as medicine or cybersecurity.
Furthermore, it highlighted cultural competence and emotional intelligence as vital cybersecurity skills.
Microsoft also stressed the need for continuous testing, updated practices, and “break-fix” cycles, a process of identifying vulnerabilities and implementing fixes on top of additional testing.