Microsoft is launching a range of new tools for AI developers running their software on Azure. This includes a tool to promptly detect injection attacks, perform risk monitoring and recognize hallucinations in output.
The tools will be available for Azure AI Studio, Microsoft writes. Developers who run their AI software in Azure AI can use them. The company says there is increasing demand for tools that help “balance innovation and risk management.” The new tools for Azure AI Studio should help with this.
Microsoft releases five such tools. Prompt Shields is a mechanism to detect and stop jailbreaks and other forms of prompt injection attacks. Many users of AI systems try to 'hack' them by pushing the limits of what those tools allow. This can be done, for example, through jailbreaks such as DAN for ChatGPT, but also by seeing how generative AI responds to certain code or by uploading documents. Prompt Shield looks at commonly used methods and common phrases with which this is done, or at methods that are often used in jailbreaking. These methods are then easier to block.
Azure AI Studio also includes detection of hallucinations in the output of generative AI. This is especially dangerous in fields where data, including output, is best kept confidential. When models are hallucinated, such a model can leak data that it is not actually supposed to leak. 'Groundedness Detection' prevents that. There will also be a Risk & Safety Monitoring tool, a kind of platform on which administrators can see the effect of content filters at a glance.
Finally, two tools will be available with which the output of generative AI can be made safer. The Safety Messages System can automatically impose safety conditions on output. Safety Evaluations is intended to detect an application vulnerability to jailbreak attacks, but also to assess content risks.
Not all tools are immediately available. Prompt Shields, Safety Evaluations and Risk & Safety Monitoring is available in preview in Azure AI and Azure OpenAI, but Groundedness Detection and Safety Messages System are coming 'in the future'.