Testing with AI¶

TestAgent uses AI to evaluate outputs against criteria or expected values.

graph TB
    subgraph "AI Testing Flow"
        A[Output] --> B[AI Judge]
        C[Criteria] --> B
        D[Expected] --> B
        B --> E[Score]
        E --> F{"Score >= Threshold?"}
        F -->|Yes| G[Pass]
        F -->|No| H[Fail]
    end

    classDef input fill:#6366F1,stroke:#7C90A0,color:#fff
    classDef judge fill:#F59E0B,stroke:#7C90A0,color:#fff
    classDef pass fill:#10B981,stroke:#7C90A0,color:#fff
    classDef fail fill:#8B0000,stroke:#7C90A0,color:#fff

    class A,C,D input
    class B,E,F judge
    class G pass
    class H fail

Core Functions¶

test()¶

The main testing function:

from testagent import test

result = test(
    "The capital of France is Paris",
    criteria="factually correct"
)

Parameters:

Parameter	Type	Description
`output`	`str`	The output to test
`expected`	`str`	Optional expected output
`criteria`	`str`	Optional evaluation criteria

accuracy()¶

Compare output to expected value:

from testagent import accuracy

result = accuracy("4", expected="4")

criteria()¶

Evaluate against custom criteria:

from testagent import criteria

result = criteria(
    "Hello! How can I help?",
    criteria="is a friendly greeting"
)

TestAgent Class¶

For more control, use the class directly:

from testagent import TestAgent, TestConfig

tester = TestAgent(config=TestConfig(
    model="gpt-4",
    threshold=8.0,
))

result = tester.run("output", criteria="is correct")

Async Support¶

from testagent import TestAgent

tester = TestAgent()
result = await tester.run_async("output", criteria="is correct")

How It Works¶

Input: Your output + criteria/expected
Judge: AI evaluates the output
Score: Returns 0-10 score
Result: Pass if score ≥ threshold (default: 7.0)