FRAME: Real-World AI Measurement and Evaluation
Building The Next Generation of AI Evaluation
The Forum for Real-World AI Measurement and Evaluation (FRAME) is a global initiative anchored at Virginia State University's Center for Responsible AI dedicated to equipping stakeholders outside the AI development stack with evidence to make sound decisions about their deployments.
Why FRAME
Across sectors, leaders are under pressure to ensure that AI systems deliver value without creating new risks. But the current evaluation ecosystem offers little visibility into how these systems perform under real-world conditions. Evidence tends to focus on abstract model capabilities rather than operational behavior, leaving decision-makers without the evidence they need to guide deployment, oversight, or investment.
FRAME was established to produce that evidence.
What FRAME Does
FRAME formalizes real-world AI evaluation methods and translates evaluation outcomes into decision-ready evidence. To do this, FRAME combines large‑scale trials of AI systems with structured observation of how people actually use them, what outcomes they generate, and how those outcomes arise in context. By tracing the path from an AI system’s output through its practical use and downstream consequences, FRAME refines evaluation methodology and generates evidence that helps organizations compare deployments, understand higher‑order effects, and manage AI as an ongoing part of institutional life.
Mission
FRAME produces evidence about how AI is used, the outcomes it generates, and how those outcomes arise — to enable sound decision-making about real-world deployments.
Vision
A world where our choices about AI are grounded in what it actually does in practice.
Objectives
- Advance methods to evaluate AI-in-use under real-world conditions.
- Translate information about AI deployments into actionable metrics that anyone can understand.
- Foster an evaluation community beyond the AI development pipeline.
To make this work scalable and reusable, FRAME establishes centralized infrastructure that captures “user entropy” at scale and produces comparable indicators across sites:
- Testing Sandbox – A controlled but realistic environment that uses large‑scale remote participant panels to evaluate AI systems under task‑driven scenarios. Panelists act as reporters of their own experience, documenting how they leverage, repurpose, or abandon tools and where friction, workarounds, or risks appear in everyday use. The sandbox maintains strict human‑subjects protections and relies on carefully designed proxy tasks to measure high‑stakes risks without exposing participants to harm or sensitive content.
- Metrics Hub – A translation layer that converts sandbox traces into indicators of system utility, friction, resilience, access, and impact with real users in real contexts. These indicators sit alongside existing capability, safety, and compliance metrics, adding a deployment‑focused layer that helps leaders interpret what benchmark scores and safety tests mean for actual use over time.
Who is Involved
FRAME's members form a global, interdisciplinary coalition spanning measurement science, machine learning, social science, and the humanities across academia, industry, government, and civil society.
FRAME is anchored at Virginia State University's Center for Responsible AI, which serves as the institutional home for sponsored activities. Operational management and strategic coordination are carried out in collaboration with Civitaas Insights. The initiative is structured to safeguard independence while providing stable governance and conflict-of-interest protections.
Membership
FRAME membership is based on active contribution, not affiliation alone. Members are expected to participate in and help build the real-world AI evaluation ecosystem — through working groups, evaluation activities, committee work, or other Forum initiatives. Membership is granted through an open call process and renewed annually. For information about membership, contact FRAME@vsu.edu.
How Organizations Work with FRAME
Organizations and communities can engage with FRAME to access empirical evidence grounded in settings like their own. Partners can support evaluations tailored to their sector, collaborate on the design of studies that reflect the populations and contexts that matter most to them, and draw on FRAME's shared infrastructure to inform their own deployment decisions while maintaining appropriate safeguards for sensitive, proprietary, or confidential information.
FRAME's methods complement existing capability benchmarks, safety pipelines, and adversarial testing by adding deployment-focused evidence about what AI systems actually do in the contexts where they are used.
Governance and Leadership
FRAME is anchored at Virginia State University's Center for Responsible AI, which serves as the institutional home for all sponsored activities. Operational management and strategic coordination are carried out in collaboration with Civitaas Insights. Day-to-day leadership is shared across three roles, which collectively oversee the Testing Sandbox, Metrics Hub, and member activities, ensuring that all evaluations meet FRAME's scientific, ethical, and independence standards and remain aligned with its public-interest mission.
Gabriella Waters, PhD — Institutional Sponsoring Director
Reva Schwartz — FRAME Director
Maurice B. Jones — Operations Director