Evaluating Agentic AI Systems: A Balanced Framework for Performance, Robustness, Safety and Beyond

Manish Shukla

doi:10.31224/5195

##article.authors##

Manish Shukla NA

DOI:

https://doi.org/10.31224/5195

Keywords:

AI, multi-agent systems, Evaluation, Ethical AI, AI Governance

Abstract

Agentic artificial intelligence (AI)—multi-agent systems that combine large language models with external tools and autonomous planning—are rapidly transitioning from research labs into high-stakes domains. Existing evaluations emphasise narrow technical metrics such as task success or latency, leaving important sociotechnical dimensions like human trust, ethical compliance and economic sustainability under-measured. We propose a balanced evaluation framework spanning five axes (capability&efficiency, robustness& adaptability, safetyðics, human-centred interaction and economic&sustainability) and introduce novel indicators including goal-drift scores and harm-reduction indices. Beyond synthesising prior work, we identify gaps in current benchmarks, develop a conceptual diagram to visualise interdependencies and outline experimental protocols for empirically validating the framework. Case studies from recent industry deployments illustrate that agentic AI can yield 20–60 % productivity gains yet often omit assessments of fairness, trust and long-term sustainability. We argue that multidimensional evaluation—combining automated metrics with human-in-the-loop scoring and economic analysis—is essential for responsible adoption of agentic AI.

Downloads

Download data is not yet available.

Evaluating Agentic AI Systems: A Balanced Framework for Performance, Robustness, Safety and Beyond

##article.authors##

DOI:

Keywords:

Abstract

Downloads

Additional Files

Posted

License

Latest preprints