Citadel AI announces the release of Eval Insight, an LLM-based analysis feature that automatically explains evaluation results of AI systems in clear, straightforward language.
Eval Insight leverages Citadel AI’s proprietary LLM-powered evaluation suite to analyze the reliability and safety of customers’ AI systems automatically, providing plain-English summaries of AI performance, safety, and security.
As companies actively embrace AI across many business areas, maintaining a balance between innovation (“offense”) and safeguarding against security and reputational risks (“defense”) has become crucial. However, manually evaluating, managing, and ensuring the continuous safe operation of AI systems demands significant expertise, time, and resources.
Citadel AI addresses this challenge with Citadel Lens, a unified platform that automates evaluation and monitoring for both generative and predictive AI systems. Citadel Lens has a successful track record across multiple industries, including healthcare, automotive, finance, and insurance.
Citadel Lens automatically generates two types of reports: “Technical Reports” for engineering teams and “Governance Reports” for management and GRC teams. Technical Reports provide in-depth evaluations of AI systems to help developers drive improvements, while Governance Reports strengthen compliance with international standards (such as ISO) and promote best-practice operational workflows.

Eval Insight enhances productivity by automatically summarizing the key points from these reports for users with limited time. Citadel AI’s proprietary LLM-powered evaluation suite can interpret Lens reports from both technical and governance perspectives, highlighting essential information for stakeholders.
With Eval Insight, users get an instant snapshot of their AI system, which enables targeted and efficient quality improvements with limited time and expertise. This innovative feature benefits not only AI engineering teams, but also management and GRC teams committed to ensuring AI safety.

Upcoming features for Eval Insight include:
- Automated analysis of generative AI evaluation reports from the perspectives of safety and reliability. It identifies patterns in requests and responses linked to security risks or hallucination issues, and provides plain-English explanations.
- Automatic interpretation of Technical Reports, highlighting crucial quality and reliability issues including overall performance metrics, robustness, or performance gaps, summarized clearly for engineers.
- Automated analysis of Governance Reports, emphasizing critical points concerning AI safety, including compliance, reputational risks, and fairness. It delivers straightforward explanations and references relevant regulatory standards and guidelines.
Citadel AI welcomes user feedback to enhance Eval Insight as features continue to roll out.