Core Concepts¶
Understanding these core concepts will help you get the most out of LayerCode Gym.
Architecture Overview¶
LayerCode Gym sits between your test code and the Layercode platform, simulating real voice clients.
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Your Test │────▶│ LayerCode Gym │────▶│ Your Backend │
│ Code │ │ Client │ │ Server │
└──────────────┘ └─────────────────┘ └──────────────┘
│ │
│ ▼
│ ┌──────────────┐
└───────────────▶│ Layercode │
│ Platform │
└──────────────┘
Authorization Flow¶
- Test code creates a
LayercodeClientwith aUserSimulator - Client requests authorization from YOUR backend server (
SERVER_URL) - Backend returns a
client_session_keyfrom Layercode - Client connects to Layercode WebSocket using that key
- Conversation proceeds with the UserSimulator providing responses
The client never hits Layercode's API directly - it always goes through your backend first.
User Simulators¶
The UserSimulator is the heart of LayerCode Gym. It generates user responses during conversations.
Three Factory Methods¶
1. from_text() - Fixed Messages¶
Best for regression testing and fast iteration:
from layercode_gym import UserSimulator
simulator = UserSimulator.from_text(
messages=[
"Hello! I'm interested in your services.",
"Tell me more about pricing.",
"Thank you, goodbye."
],
send_as_text=True # Fast mode - no TTS needed
)
When to use: - Regression testing with known scenarios - Quick debugging of conversation flow - Testing specific edge cases - CI/CD pipelines (fastest execution)
2. from_files() - Pre-recorded Audio¶
Best for testing transcription and audio handling:
from pathlib import Path
simulator = UserSimulator.from_files(
files=[
Path("audio/intro.wav"),
Path("audio/question.wav"),
Path("audio/goodbye.wav")
]
)
When to use: - Testing transcription accuracy - Stress testing with various audio qualities - Testing different accents and speaking styles - Testing background noise handling
3. from_agent() - AI-Driven Personas¶
Best for realistic, dynamic conversations:
from layercode_gym import Persona
persona = Persona(
background_context="You are a 35-year-old small business owner",
intent="You want to understand pricing and features"
)
simulator = UserSimulator.from_agent(
persona=persona,
model="openai:gpt-4o-mini", # or "anthropic:claude-3-5-sonnet"
max_turns=5,
send_as_text=False # Auto-creates TTS engine
)
When to use: - Simulating realistic user behavior - Testing conversation flow and context handling - Exploratory testing of edge cases - Evaluating agent personality and tone
TTS Auto-Creation¶
When using send_as_text=False, LayerCode Gym automatically creates an OpenAI TTS engine:
# This works out of the box
simulator = UserSimulator.from_text(
messages=["Hello!"],
send_as_text=False # TTS engine auto-created
)
Configure via environment variables:
export OPENAI_TTS_MODEL="gpt-4o-mini-tts"
export OPENAI_TTS_VOICE="coral" # alloy, echo, fable, onyx, nova, shimmer, coral
export OPENAI_TTS_INSTRUCTIONS="Speak slowly and clearly"
Or pass custom settings:
from layercode_gym import Settings
settings = Settings(
tts_model="gpt-4o-mini-tts",
tts_voice="alloy",
tts_instructions="Speak with a British accent"
)
simulator = UserSimulator.from_text(
messages=["Hello!"],
send_as_text=False,
settings=settings
)
Custom Simulators¶
For full control, implement the UserSimulatorProtocol:
from layercode_gym.simulator import (
UserSimulatorProtocol,
UserRequest,
UserResponse
)
class MyCustomSimulator(UserSimulatorProtocol):
async def get_response(
self,
request: UserRequest
) -> UserResponse | None:
# Your custom logic here
# - request.agent_transcript: list of agent messages so far
# - request.turn_number: current turn
# - request.conversation_id: unique ID
if request.turn_number > 5:
return None # End conversation
return UserResponse(
text="Custom response based on context",
audio_path=None, # or Path to audio file
data=() # optional additional data
)
Callbacks¶
Callbacks allow you to hook into the conversation lifecycle for monitoring, evaluation, and custom logic.
Turn Callbacks¶
Called after each conversation turn:
from layercode_gym.callbacks import TurnCallback
async def my_turn_callback(
turn_number: int,
user_message: str,
agent_message: str,
conversation_id: str
) -> None:
print(f"Turn {turn_number}:")
print(f" User: {user_message}")
print(f" Agent: {agent_message}")
client = LayercodeClient(
simulator=simulator,
turn_callback=my_turn_callback
)
Conversation Callbacks¶
Called once at the end of the conversation:
from layercode_gym.callbacks import ConversationCallback
from layercode_gym.models import ConversationLog
async def my_conversation_callback(
conversation_log: ConversationLog
) -> None:
stats = conversation_log.stats
print(f"Conversation complete!")
print(f" Total turns: {stats['total_turns']}")
print(f" Duration: {stats['duration_seconds']}s")
print(f" Avg latency: {stats['avg_latency_ms']}ms")
client = LayercodeClient(
simulator=simulator,
conversation_callback=my_conversation_callback
)
LLM-as-Judge¶
Built-in callback for automated quality evaluation:
from layercode_gym.callbacks import create_judge_callback
judge = create_judge_callback(
criteria=[
"Did the agent answer all user questions?",
"Was the agent polite and professional?",
"Did the conversation flow naturally?"
],
model="openai:gpt-4o" # or "anthropic:claude-3-5-sonnet"
)
client = LayercodeClient(
simulator=simulator,
turn_callback=judge
)
The judge results are saved to conversations/<id>/judge_results.json:
{
"overall_score": 8.5,
"criteria_scores": {
"Did the agent answer all user questions?": 9,
"Was the agent polite and professional?": 10,
"Did the conversation flow naturally?": 7
},
"feedback": "The agent was helpful and polite..."
}
Conversation Outputs¶
After each conversation, LayerCode Gym creates a structured output directory:
conversations/<conversation_id>/
├── transcript.json # Full conversation log with stats
├── conversation_mix.wav # Combined audio (user + assistant)
├── user_0.wav # Individual user audio files
├── user_1.wav
├── assistant_0.wav # Individual assistant audio files
├── assistant_1.wav
└── judge_results.json # If using judge callback
Transcript Structure¶
{
"conversation_id": "conv_abc123",
"agent_id": "your_agent_id",
"started_at": "2025-01-15T10:30:00Z",
"ended_at": "2025-01-15T10:32:15Z",
"turns": [
{
"turn_number": 0,
"user_message": {
"text": "Hello!",
"timestamp": "2025-01-15T10:30:00Z",
"audio_path": "user_0.wav"
},
"agent_message": {
"text": "Hi there! How can I help you?",
"timestamp": "2025-01-15T10:30:01.234Z",
"audio_path": "assistant_0.wav",
"ttfab_ms": 234
}
}
],
"stats": {
"total_turns": 3,
"duration_seconds": 135.5,
"avg_latency_ms": 245,
"avg_ttfab_ms": 234,
"total_user_words": 45,
"total_agent_words": 123
}
}
Key Metrics¶
- TTFAB (Time To First Audio Byte): How long until the agent starts speaking
- Latency: Total time from user message to agent response
- Turn count: Number of back-and-forth exchanges
- Duration: Total conversation length
Settings and Configuration¶
All configuration is managed through the Settings class:
from layercode_gym import Settings
settings = Settings(
# Required
server_url="http://localhost:8001",
agent_id="your_agent_id",
# TTS Configuration
tts_model="gpt-4o-mini-tts",
tts_voice="coral",
tts_instructions="Speak clearly",
# Audio Chunking
chunk_ms=100,
chunk_interval=0.0,
# Storage
output_root="./conversations"
)
# Use in client
client = LayercodeClient(
simulator=simulator,
settings=settings
)
Settings can also be loaded from environment variables (see Getting Started).
Next Steps¶
- Examples - See detailed code examples
- API Reference - Full API documentation
- Advanced Usage - Custom implementations and LogFire integration