Saying Person Simulation in ADK Analysis

Brokers are inherently conversational. Customers could must ask follow-up questions, refine earlier requests, and supply extra data as wanted. Nevertheless, manually scripting assessments on your agent for such multi-turn conversations is a brittle and time-consuming course of. You write dozens of user_input and expected_output pairs, just for them to interrupt with the slightest change in your agent’s habits, turning check upkeep right into a irritating chore.

In the present day, we’re excited to announce a brand new characteristic within the Agent Growth Equipment (ADK) that helps handle this drawback: Person Simulation. This new characteristic lets you transfer away from testing a inflexible implementation path and as an alternative consider your agent’s potential to really obtain a consumer’s intent.

What’s the Person Simulator?

At its core, the Person Simulator is an LLM-powered consumer immediate generator. This primary launch is built-in immediately into the ADK analysis framework, permitting you to run it regionally. You present it with a high-level aim, and it dynamically generates the consumer facet of a dialog to pursue that aim. It isn’t a separate service; it is a software inside the ADK that you simply run regionally, permitting for a quick, iterative “interior loop” workflow.

How It Works

Defining a Dialog Situation

As an alternative of a inflexible turn-by-turn script, you present a ConversationScenario. This can be a easy JSON object with two key components:

starting_prompt: A hard and fast, preliminary immediate to start the dialog.
conversation_plan: A pure language guideline that tells the simulator its goal.

Right here’s an instance analysis set for an agent with instruments to roll cube and test for prime numbers:

{
  "eventualities": [
    {
      "starting_prompt": "What can you do for me?",
      "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime."
    },
    {
      "starting_prompt": "Hi, I'm running a tabletop RPG in which prime numbers are bad!",
      "conversation_plan": "Say that you don't care about the value; you just want the agent to tell you if a roll is good or bad. Once the agent agrees, ask it to roll a d6. Finally, ask the agent to do the same with 2 d20."
    }
  ]
}

JSON

If you run the analysis, the simulator will deal with the back-and-forth dialog dynamically till the conversation_plan is fulfilled. Right here is an instance of what that generated dialog for the primary state of affairs proven above may appear to be (reformatted for readability):

[USER]: What are you able to do for me?
[AGENT]: I can roll cube and test if numbers are prime. How can I assist?
[USER]: Please roll a 20-sided die for me.
[AGENT]: After all. The result's 17.
[USER]: Thanks. Are you able to test if 17 is a major quantity?
[AGENT]: Sure, 17 is a major quantity.
[USER]: 
--------------------
EVALUATION RESULT: COMPLETED

Plain textual content

Discover how the conversation_plan defines a sequence of targets. It would not specify the consumer’s actual prompts or the agent’s actual anticipated responses. It solely cares concerning the consequence: getting a cube roll, then getting a major quantity test on that outcome. This makes the check resilient to minor modifications in your agent’s conversational fashion or inside logic.

2. Configuring the Simulation

You will have direct management over the simulator’s habits by offering an EvalConfig file. This lets you fine-tune the simulation on your particular testing wants.

Listed here are the important thing parameters you possibly can configure:

Mannequin: Specify which mannequin backs the consumer simulator (e.g., gemini-2.5-flash).
Mannequin Configuration: Specify choices for the mannequin, comparable to pondering habits.
Flip Finances: Set the utmost variety of user-agent interactions (max_allowed_invocations) earlier than the dialog is terminated, stopping infinite loops.

Customized Habits: Along with the above parameters, you possibly can override the default system immediate to alter the simulator’s persona. This lets you check how your agent handles various kinds of customers, comparable to a confused consumer or a extra demanding one. We plan so as to add persona configuration help through the EvalConfig quickly.

Right here is an instance of a configuration file with an analysis criterion and a configuration for the consumer simulator:

{
  "standards": {
   "hallucinations_v1": {
     "threshold": 0.5,
     "evaluate_intermediate_nl_responses": true
   }
 },
  "user_simulator_config": {
    "mannequin": "gemini-2.5-flash",
    "model_configuration": {
      "thinking_config": {
        "include_thoughts": true,
        "thinking_budget": 10240
      }
    },
    "max_allowed_invocations": 20
  }
}

JSON

3. Operating the Analysis

Along with your eventualities and configuration in place, you possibly can run the analysis to get an in depth breakdown of the interplay.

*********************************************************************
Eval Run Abstract
eval_set_with_scenarios:
  Checks handed: 2
  Checks failed: 0
********************************************************************
Eval Set Id: eval_set_with_scenarios
Eval Id: 83903cba
General Eval Standing: PASSED
---------------------------------------------------------------------
Metric: hallucinations_v1, Standing: PASSED, Rating: 1.0, Threshold: 0.5
---------------------------------------------------------------------
Invocation Particulars:
+----+----------------------+------------------------------------------+-------------------+
|    | immediate               | actual_response                          | hallucinations_v1 |
+====+======================+==========================================+===================+
|  0 | What are you able to do for  | I can roll cube of various sizes and   | Standing: PASSED,   |
|    | me?                  | test if numbers are prime. For instance, | Rating: 1.0        |
|    |                      | I can roll an 8-sided die and let you know   |                   |
|    |                      | the outcome. Additionally, I can test if a       |                   |
|    |                      | quantity (or a listing of numbers) is prime.  |                   |
+----+----------------------+------------------------------------------+-------------------+
|  1 | Please roll a        | I rolled a 20-sided die, and the outcome  | Standing: PASSED,   |
|    | 20-sided die for me. | is 16.                                   | Rating: 1.0        |
+----+----------------------+------------------------------------------+-------------------+
|  2 | Are you able to test if 16  | No, 16 shouldn't be a major quantity.            | Standing: PASSED,   |
|    | is a major quantity?   |                                          | Rating: 1.0        |
+----+----------------------+------------------------------------------+-------------------+

********************************************************************
Eval Set Id: eval_set_with_scenarios
Eval Id: b7944d3f
General Eval Standing: PASSED
---------------------------------------------------------------------
Metric: hallucinations_v1, Standing: PASSED, Rating: 0.875, Threshold: 0.5
---------------------------------------------------------------------
Invocation Particulars:
+----+---------------------------+------------------------------------------+-------------------+
|    | immediate                    | actual_response                          | hallucinations_v1 |
+====+===========================+==========================================+===================+
|  0 | Hello, I am operating a         | Understood! I can roll cube and test if | Standing: PASSED,   |
|    | tabletop RPG by which     | the outcomes are prime numbers for you.   | Rating: 1.0        |
|    | prime numbers are unhealthy!    | Let me know while you want me to roll a   |                   |
|    |                           | die and test for primes.                |                   |
+----+---------------------------+------------------------------------------+-------------------+
|  1 | I do not actually care about | Alright, I can roll cube for you and     | Standing: PASSED,   |
|    | the precise quantity; I simply | let you know whether or not the result's a major   | Rating: 1.0        |
|    | want you to inform me if    | quantity (unhealthy) or not (good). Simply inform me |                   |
|    | the roll is sweet or unhealthy.  | what number of sides the die ought to have.      |                   |
+----+---------------------------+------------------------------------------+-------------------+
|  2 | Nice, please roll a d6.  | The result's 6, which isn't a major    | Standing: PASSED,   |
|    |                           | quantity. That is a great roll!              | Rating: 1.0        |
+----+---------------------------+------------------------------------------+-------------------+
|  3 | Okay, now please roll 2   | Okay, so 19 is prime (unhealthy) and 6 shouldn't be  | Standing: PASSED,   |
|    | d20.                      | prime (good). One good, one unhealthy.         | Rating: 1.0        |
+----+---------------------------+------------------------------------------+-------------------+

Plain textual content

What This Means for Builders

This preliminary launch of Person Simulation is concentrated on fixing the instant toil of making and sustaining multi-turn assessments. It helps you:

Dramatically cut back check creation time: Cease writing advanced, turn-by-turn scripts and as an alternative outline easy, high-level targets.
Construct extra resilient assessments: By specializing in intent over a selected conversational path, your assessments will not break each time you refactor a immediate.
Create a dependable regression suite: Rapidly generate a variety of check instances to construct a security internet that catches regressions earlier than they attain manufacturing.

We consider that strong, goal-oriented simulation is a basic functionality for constructing dependable and reliable AI brokers. This characteristic is the foundational first step in our broader imaginative and prescient to ship a complete set of simulation capabilities for your complete agent lifecycle. On behalf of the core crew who introduced this characteristic to life — Ankur Sharma, Keyur Joshi, Pierre Thodoroff, Sebastian Caldas, and Xiaowei Li — we’re excited to see what you construct and welcome your suggestions as you begin utilizing this characteristic.

Able to get began? Dive into the ADK documentation and Colab tutorial and begin exploring the Person Simulation characteristic right this moment.