Skip to content

Core Concepts

Understanding these core concepts will help you get the most out of LayerCode Gym.

Architecture Overview

LayerCode Gym sits between your test code and the Layercode platform, simulating real voice clients.

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│  Your Test   │────▶│  LayerCode Gym  │────▶│ Your Backend │
│    Code      │     │     Client      │     │    Server    │
└──────────────┘     └─────────────────┘     └──────────────┘
                              │                       │
                              │                       ▼
                              │                ┌──────────────┐
                              └───────────────▶│  Layercode   │
                                               │   Platform   │
                                               └──────────────┘

Authorization Flow

  1. Test code creates a LayercodeClient with a UserSimulator
  2. Client requests authorization from YOUR backend server (SERVER_URL)
  3. Backend returns a client_session_key from Layercode
  4. Client connects to Layercode WebSocket using that key
  5. Conversation proceeds with the UserSimulator providing responses

The client never hits Layercode's API directly - it always goes through your backend first.

User Simulators

The UserSimulator is the heart of LayerCode Gym. It generates user responses during conversations.

Three Factory Methods

1. from_text() - Fixed Messages

Best for regression testing and fast iteration:

from layercode_gym import UserSimulator

simulator = UserSimulator.from_text(
    messages=[
        "Hello! I'm interested in your services.",
        "Tell me more about pricing.",
        "Thank you, goodbye."
    ],
    send_as_text=True  # Fast mode - no TTS needed
)

When to use: - Regression testing with known scenarios - Quick debugging of conversation flow - Testing specific edge cases - CI/CD pipelines (fastest execution)

2. from_files() - Pre-recorded Audio

Best for testing transcription and audio handling:

from pathlib import Path

simulator = UserSimulator.from_files(
    files=[
        Path("audio/intro.wav"),
        Path("audio/question.wav"),
        Path("audio/goodbye.wav")
    ]
)

When to use: - Testing transcription accuracy - Stress testing with various audio qualities - Testing different accents and speaking styles - Testing background noise handling

3. from_agent() - AI-Driven Personas

Best for realistic, dynamic conversations:

from layercode_gym import Persona

persona = Persona(
    background_context="You are a 35-year-old small business owner",
    intent="You want to understand pricing and features"
)

simulator = UserSimulator.from_agent(
    persona=persona,
    model="openai:gpt-4o-mini",  # or "anthropic:claude-3-5-sonnet"
    max_turns=5,
    send_as_text=False  # Auto-creates TTS engine
)

When to use: - Simulating realistic user behavior - Testing conversation flow and context handling - Exploratory testing of edge cases - Evaluating agent personality and tone

TTS Auto-Creation

When using send_as_text=False, LayerCode Gym automatically creates an OpenAI TTS engine:

# This works out of the box
simulator = UserSimulator.from_text(
    messages=["Hello!"],
    send_as_text=False  # TTS engine auto-created
)

Configure via environment variables:

export OPENAI_TTS_MODEL="gpt-4o-mini-tts"
export OPENAI_TTS_VOICE="coral"  # alloy, echo, fable, onyx, nova, shimmer, coral
export OPENAI_TTS_INSTRUCTIONS="Speak slowly and clearly"

Or pass custom settings:

from layercode_gym import Settings

settings = Settings(
    tts_model="gpt-4o-mini-tts",
    tts_voice="alloy",
    tts_instructions="Speak with a British accent"
)

simulator = UserSimulator.from_text(
    messages=["Hello!"],
    send_as_text=False,
    settings=settings
)

Custom Simulators

For full control, implement the UserSimulatorProtocol:

from layercode_gym.simulator import (
    UserSimulatorProtocol,
    UserRequest,
    UserResponse
)

class MyCustomSimulator(UserSimulatorProtocol):
    async def get_response(
        self,
        request: UserRequest
    ) -> UserResponse | None:
        # Your custom logic here
        # - request.agent_transcript: list of agent messages so far
        # - request.turn_number: current turn
        # - request.conversation_id: unique ID

        if request.turn_number > 5:
            return None  # End conversation

        return UserResponse(
            text="Custom response based on context",
            audio_path=None,  # or Path to audio file
            data=()  # optional additional data
        )

Callbacks

Callbacks allow you to hook into the conversation lifecycle for monitoring, evaluation, and custom logic.

Turn Callbacks

Called after each conversation turn:

from layercode_gym.callbacks import TurnCallback

async def my_turn_callback(
    turn_number: int,
    user_message: str,
    agent_message: str,
    conversation_id: str
) -> None:
    print(f"Turn {turn_number}:")
    print(f"  User: {user_message}")
    print(f"  Agent: {agent_message}")

client = LayercodeClient(
    simulator=simulator,
    turn_callback=my_turn_callback
)

Conversation Callbacks

Called once at the end of the conversation:

from layercode_gym.callbacks import ConversationCallback
from layercode_gym.models import ConversationLog

async def my_conversation_callback(
    conversation_log: ConversationLog
) -> None:
    stats = conversation_log.stats
    print(f"Conversation complete!")
    print(f"  Total turns: {stats['total_turns']}")
    print(f"  Duration: {stats['duration_seconds']}s")
    print(f"  Avg latency: {stats['avg_latency_ms']}ms")

client = LayercodeClient(
    simulator=simulator,
    conversation_callback=my_conversation_callback
)

LLM-as-Judge

Built-in callback for automated quality evaluation:

from layercode_gym.callbacks import create_judge_callback

judge = create_judge_callback(
    criteria=[
        "Did the agent answer all user questions?",
        "Was the agent polite and professional?",
        "Did the conversation flow naturally?"
    ],
    model="openai:gpt-4o"  # or "anthropic:claude-3-5-sonnet"
)

client = LayercodeClient(
    simulator=simulator,
    turn_callback=judge
)

The judge results are saved to conversations/<id>/judge_results.json:

{
  "overall_score": 8.5,
  "criteria_scores": {
    "Did the agent answer all user questions?": 9,
    "Was the agent polite and professional?": 10,
    "Did the conversation flow naturally?": 7
  },
  "feedback": "The agent was helpful and polite..."
}

Conversation Outputs

After each conversation, LayerCode Gym creates a structured output directory:

conversations/<conversation_id>/
├── transcript.json          # Full conversation log with stats
├── conversation_mix.wav     # Combined audio (user + assistant)
├── user_0.wav              # Individual user audio files
├── user_1.wav
├── assistant_0.wav         # Individual assistant audio files
├── assistant_1.wav
└── judge_results.json      # If using judge callback

Transcript Structure

{
  "conversation_id": "conv_abc123",
  "agent_id": "your_agent_id",
  "started_at": "2025-01-15T10:30:00Z",
  "ended_at": "2025-01-15T10:32:15Z",
  "turns": [
    {
      "turn_number": 0,
      "user_message": {
        "text": "Hello!",
        "timestamp": "2025-01-15T10:30:00Z",
        "audio_path": "user_0.wav"
      },
      "agent_message": {
        "text": "Hi there! How can I help you?",
        "timestamp": "2025-01-15T10:30:01.234Z",
        "audio_path": "assistant_0.wav",
        "ttfab_ms": 234
      }
    }
  ],
  "stats": {
    "total_turns": 3,
    "duration_seconds": 135.5,
    "avg_latency_ms": 245,
    "avg_ttfab_ms": 234,
    "total_user_words": 45,
    "total_agent_words": 123
  }
}

Key Metrics

  • TTFAB (Time To First Audio Byte): How long until the agent starts speaking
  • Latency: Total time from user message to agent response
  • Turn count: Number of back-and-forth exchanges
  • Duration: Total conversation length

Settings and Configuration

All configuration is managed through the Settings class:

from layercode_gym import Settings

settings = Settings(
    # Required
    server_url="http://localhost:8001",
    agent_id="your_agent_id",

    # TTS Configuration
    tts_model="gpt-4o-mini-tts",
    tts_voice="coral",
    tts_instructions="Speak clearly",

    # Audio Chunking
    chunk_ms=100,
    chunk_interval=0.0,

    # Storage
    output_root="./conversations"
)

# Use in client
client = LayercodeClient(
    simulator=simulator,
    settings=settings
)

Settings can also be loaded from environment variables (see Getting Started).

Next Steps