Observability와 Tracing
왜 Observability가 필요한가
섹션 제목: “왜 Observability가 필요한가”에이전트는 블랙박스처럼 동작하기 쉽습니다. 작업이 실패했을 때 “왜 실패했는가” 를 알려면 실행 과정의 모든 결정을 추적해야 합니다. 로그만으로는 부족합니다. 구조화된 트레이스 가 있어야 패턴을 발견하고 개선점을 찾을 수 있습니다.
Observability는 세 가지 질문에 답합니다.
- 무엇이 일어났는가? — 툴 호출 순서, LLM 응답 내용
- 왜 그 결정을 했는가? — 컨텍스트, 시스템 프롬프트, 직전 메시지
- 얼마나 비쌌는가? — 토큰 사용량, 비용, 시간
추적해야 할 핵심 이벤트
섹션 제목: “추적해야 할 핵심 이벤트”type TraceEvent = | { type: 'agent_start'; taskId: string; task: string; timestamp: string } | { type: 'agent_end'; taskId: string; success: boolean; turns: number; totalTokens: number; costUsd: number } | { type: 'llm_request'; turn: number; messageCount: number; systemLength: number; toolCount: number } | { type: 'llm_response'; turn: number; stopReason: string; outputTokens: number; latencyMs: number } | { type: 'tool_call'; turn: number; toolName: string; input: unknown } | { type: 'tool_result'; turn: number; toolName: string; success: boolean; outputLength: number; latencyMs: number } | { type: 'context_compaction'; turn: number; beforeTokens: number; afterTokens: number } | { type: 'approval_requested'; toolName: string; approved: boolean } | { type: 'error'; turn: number; errorType: string; message: string; retried: boolean };각 이벤트 타입은 명확한 목적을 가집니다.
| 이벤트 | 분석 목적 |
|---|---|
llm_request / llm_response | 레이턴시 병목, 컨텍스트 크기 추이 |
tool_call / tool_result | 가장 자주 실패하는 툴, 실행 패턴 |
context_compaction | 압축 빈도와 효과 |
error | 오류 유형별 빈도, 재시도 성공률 |
커스텀 Tracer 구현
섹션 제목: “커스텀 Tracer 구현”interface Tracer { trace(event: TraceEvent): void; flush(): Promise<void>;}
class CompositeTracer implements Tracer { constructor(private tracers: Tracer[]) {}
trace(event: TraceEvent): void { for (const tracer of this.tracers) { tracer.trace(event); } }
async flush(): Promise<void> { await Promise.all(this.tracers.map(t => t.flush())); }}
// 콘솔 출력 (개발 환경)class ConsoleTracer implements Tracer { trace(event: TraceEvent): void { const symbols: Record<string, string> = { agent_start: '▶', agent_end: '■', llm_request: '→', llm_response: '←', tool_call: '⚙', tool_result: '✓', error: '✗', }; const sym = symbols[event.type] ?? '·'; console.debug(`${sym} [${event.type}]`, event); }
async flush(): Promise<void> {}}
// JSONL 파일 저장 (분석용)class FileTracer implements Tracer { private buffer: string[] = [];
constructor(private filePath: string) {}
trace(event: TraceEvent): void { this.buffer.push(JSON.stringify({ ...event, timestamp: new Date().toISOString() })); }
async flush(): Promise<void> { if (this.buffer.length === 0) return; await fs.appendFile(this.filePath, this.buffer.join('\n') + '\n', 'utf-8'); this.buffer = []; }}LangSmith 통합
섹션 제목: “LangSmith 통합”LangSmith는 LangChain 에코시스템의 observability 플랫폼이지만, 커스텀 에이전트와도 통합할 수 있습니다.
import { Client, RunTree } from 'langsmith';
class LangSmithTracer implements Tracer { private client: Client; private runs = new Map<string, RunTree>();
constructor(apiKey: string, private projectName: string) { this.client = new Client({ apiKey }); }
trace(event: TraceEvent): void { switch (event.type) { case 'agent_start': { const run = new RunTree({ name: 'agent_run', run_type: 'chain', inputs: { task: event.task }, project_name: this.projectName, client: this.client, }); run.postRun(); this.runs.set(event.taskId, run); break; } case 'tool_call': { // 툴 호출을 자식 run으로 기록 const parent = this.runs.get(/* taskId */ ''); if (parent) { const toolRun = parent.createChild({ name: event.toolName, run_type: 'tool', inputs: event.input as Record<string, unknown>, }); toolRun.postRun(); } break; } case 'agent_end': { const run = this.runs.get(event.taskId); if (run) { run.end({ success: event.success, turns: event.turns }); run.patchRun(); this.runs.delete(event.taskId); } break; } } }
async flush(): Promise<void> {}}Braintrust 통합
섹션 제목: “Braintrust 통합”import * as braintrust from 'braintrust';
class BraintrustTracer implements Tracer { private experiment: braintrust.Experiment | null = null; private spans = new Map<string, braintrust.Span>();
constructor(private projectName: string, private experimentName: string) {}
async init(): Promise<void> { this.experiment = await braintrust.init(this.projectName, { experiment: this.experimentName, }); }
trace(event: TraceEvent): void { if (!this.experiment) return;
switch (event.type) { case 'agent_start': { const span = this.experiment.startSpan({ name: 'agent', input: event.task, }); this.spans.set(event.taskId, span); break; } case 'agent_end': { const span = this.spans.get(event.taskId); if (span) { span.log({ output: { success: event.success, turns: event.turns }, scores: { success: event.success ? 1 : 0 }, metrics: { tokens: event.totalTokens, cost: event.costUsd }, }); span.end(); this.spans.delete(event.taskId); } break; } } }
async flush(): Promise<void> { await this.experiment?.flush(); }}MainAgent에 Tracer 통합
섹션 제목: “MainAgent에 Tracer 통합”class InstrumentedMainAgent extends MainAgent { constructor(deps: AgentDependencies, private tracer: Tracer, spec?: SubAgentSpec) { super(deps, spec); }
async run(task: string): Promise<AgentResult> { const taskId = crypto.randomUUID(); this.tracer.trace({ type: 'agent_start', taskId, task, timestamp: new Date().toISOString() });
try { const result = await super.run(task); this.tracer.trace({ type: 'agent_end', taskId, success: result.success, turns: result.turns, totalTokens: result.totalTokens ?? 0, costUsd: result.costUsd ?? 0, }); return result; } catch (error) { this.tracer.trace({ type: 'error', turn: 0, errorType: error instanceof Error ? error.constructor.name : 'Unknown', message: String(error), retried: false, }); throw error; } finally { await this.tracer.flush(); } }}