Week 1 Day 5:Structured Output - 苏格拉底教学
💡 一句话核心:让LLM输出的不是「一段文字」,而是「一份契约」——下游系统能验证、审计、消费的结构化数据。
学习目标
- 理解为什么Agent系统必须用Structured Output
- 设计AgentAnswer struct及其字段语义
- 实现validation规则(约束之间的逻辑关系)
- 掌握JSON Schema + Response Format API
- 学会Prompt engineering让LLM稳定输出合法JSON
第一部分:问题驱动
🤔 问题1:为什么不能让LLM直接返回一段文字?
引导问题:
- 如果Agent的回答是一段纯文本,下游系统(如工单系统、审计日志、监控)怎么消费?
- 如何区分「Agent很确定」和「Agent在瞎猜」?
- 上线后出现badcase,你怎么回溯:LLM用了哪些tool?引用了哪些文档?
- 如果产品要求「confidence低的回答必须走人工审核」,你的系统能实现吗?
答案揭示:
- 下游消费:工单系统需要知道"要不要创建工单"、"谁负责",而不是一段话
- 验证:系统需要判断"这个回答能不能直接发给用户"
- 审计:出问题时,需要完整的决策链路(reasoning + sources + tool_calls)
- 路由:低置信度的回答要走人工,高置信度可自动发送
你应该理解:
Structured Output不是"为了好看",而是Agent系统与现实业务系统之间的契约。
🤔 问题2:一个好的AgentAnswer应该包含什么字段?
思考: 如果你是客服Agent的使用方(工单系统),你需要Agent告诉你什么?
必需字段:
| 字段 |
类型 |
用途 |
answer |
string |
给用户看的最终答案 |
confidence |
enum (low/medium/high) |
触发下游路由(是否人工审核) |
sources |
[]Source |
引用的文档,用于溯源和展示"出处" |
tool_calls |
[]ToolCall |
审计Agent调用了哪些工具(创建工单?查权限?) |
needs_human_review |
bool |
显式信号:要不要走人工 |
reasoning |
string |
Agent的思考过程,用于debug和审计 |
反思: 为什么confidence要用enum而不是float(0-1)?
- 一致性:LLM输出0.873和0.874没意义,离散化后更稳定
- 可路由:下游规则
if confidence == "low" then review比if confidence < 0.6更清晰
- 可测试:eval时不用纠结0.6和0.61的区别
🤔 问题3:字段之间的约束关系是什么?
场景: LLM返回了
{
"answer": "根据我们的政策,你可以申请...",
"confidence": "high",
"sources": [],
"needs_human_review": false
}
问题: 这个回答合法吗?
不合法! sources为空却说confidence=high——Agent在编造答案。这就是hallucination。
规则集:
sources为空 → confidence不能为high
tool_calls中包含"create_ticket" → needs_human_review必须为true
confidence=low → needs_human_review必须为true
answer不能为空字符串
- 每个
source必须有doc_id和chunk_id
关键洞察: Structured Output的价值不仅是字段结构,更是字段之间的逻辑约束。LLM经常会违反这些约束,所以我们需要Validation层。
第二部分:动手实现
✅ 版本1:定义AgentAnswer struct
// internal/agent/answer.go
package agent
import (
"encoding/json"
"fmt"
"strings"
)
// Confidence 置信度枚举
type Confidence string
const (
ConfidenceLow Confidence = "low"
ConfidenceMedium Confidence = "medium"
ConfidenceHigh Confidence = "high"
)
// Source 引用的文档来源
type Source struct {
DocID string `json:"doc_id"`
ChunkID string `json:"chunk_id"`
Title string `json:"title,omitempty"`
URL string `json:"url,omitempty"`
Snippet string `json:"snippet,omitempty"` // 引用的原文片段
}
// ToolCall Agent调用的工具记录
type ToolCall struct {
Name string `json:"name"`
Arguments json.RawMessage `json:"arguments"`
Result json.RawMessage `json:"result,omitempty"`
Error string `json:"error,omitempty"`
}
// AgentAnswer Agent的完整结构化回答
type AgentAnswer struct {
Answer string `json:"answer"`
Confidence Confidence `json:"confidence"`
Sources []Source `json:"sources"`
ToolCalls []ToolCall `json:"tool_calls"`
NeedsHumanReview bool `json:"needs_human_review"`
Reasoning string `json:"reasoning"`
}
反思题:
- 为什么
Sources是[]Source而不是[]string?→ 需要doc_id+chunk_id才能精准溯源
- 为什么
ToolCall.Arguments是json.RawMessage?→ 不同工具参数结构不同,延迟解析
- 为什么有
Reasoning字段?→ 审计和debug,不给用户看但要记录
✅ 版本2:Validation规则
// internal/agent/validate.go
package agent
// Violation 一条违规记录
type Violation struct {
Field string `json:"field"`
Rule string `json:"rule"`
Message string `json:"message"`
}
// Validate 返回所有违规;为空表示合法
func (a *AgentAnswer) Validate() []Violation {
var violations []Violation
// 规则1:answer不能为空
if strings.TrimSpace(a.Answer) == "" {
violations = append(violations, Violation{
Field: "answer",
Rule: "required",
Message: "answer must not be empty",
})
}
// 规则2:confidence必须是枚举值
switch a.Confidence {
case ConfidenceLow, ConfidenceMedium, ConfidenceHigh:
default:
violations = append(violations, Violation{
Field: "confidence",
Rule: "enum",
Message: fmt.Sprintf("confidence must be low/medium/high, got %q", a.Confidence),
})
}
// 规则3:sources为空 → confidence不能是high
if len(a.Sources) == 0 && a.Confidence == ConfidenceHigh {
violations = append(violations, Violation{
Field: "confidence",
Rule: "consistency",
Message: "confidence cannot be high when sources is empty (no grounding)",
})
}
// 规则4:创建工单 → needs_human_review必须true
for _, tc := range a.ToolCalls {
if tc.Name == "create_ticket" && !a.NeedsHumanReview {
violations = append(violations, Violation{
Field: "needs_human_review",
Rule: "policy",
Message: "creating a ticket requires human review",
})
}
}
// 规则5:confidence=low → needs_human_review必须true
if a.Confidence == ConfidenceLow && !a.NeedsHumanReview {
violations = append(violations, Violation{
Field: "needs_human_review",
Rule: "consistency",
Message: "low confidence answers must be marked for review",
})
}
// 规则6:每个source必须有doc_id和chunk_id
for i, src := range a.Sources {
if src.DocID == "" || src.ChunkID == "" {
violations = append(violations, Violation{
Field: fmt.Sprintf("sources[%d]", i),
Rule: "required",
Message: "each source must have doc_id and chunk_id",
})
}
}
return violations
}
// IsValid 简便方法
func (a *AgentAnswer) IsValid() bool {
return len(a.Validate()) == 0
}
测试一下:
func TestValidate_SourcesEmptyHighConfidence(t *testing.T) {
a := AgentAnswer{
Answer: "SSO密码可以通过portal重置",
Confidence: ConfidenceHigh,
Sources: []Source{}, // 空!
}
vs := a.Validate()
if len(vs) == 0 {
t.Fatal("expected violations for empty sources with high confidence")
}
}
✅ 版本3:JSON Schema生成
为什么要JSON Schema?
- OpenAI的
response_format支持JSON Schema,强制LLM输出合法JSON
- 作为文档,清晰描述契约
// internal/agent/schema.go
package agent
import "encoding/json"
// AnswerSchema 返回给OpenAI response_format用的schema
func AnswerSchema() json.RawMessage {
schema := map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"answer": map[string]interface{}{
"type": "string",
"description": "Final answer shown to the user.",
},
"confidence": map[string]interface{}{
"type": "string",
"enum": []string{"low", "medium", "high"},
"description": "high=grounded in sources; medium=partial; low=guessing or no sources.",
},
"sources": map[string]interface{}{
"type": "array",
"items": map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"doc_id": map[string]interface{}{"type": "string"},
"chunk_id": map[string]interface{}{"type": "string"},
"title": map[string]interface{}{"type": "string"},
"snippet": map[string]interface{}{"type": "string"},
},
"required": []string{"doc_id", "chunk_id"},
"additionalProperties": false,
},
},
"tool_calls": map[string]interface{}{
"type": "array",
"items": map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"name": map[string]interface{}{"type": "string"},
"arguments": map[string]interface{}{"type": "object"},
},
"required": []string{"name", "arguments"},
},
},
"needs_human_review": map[string]interface{}{
"type": "boolean",
},
"reasoning": map[string]interface{}{
"type": "string",
"description": "Brief chain-of-thought, not shown to user.",
},
},
"required": []string{
"answer", "confidence", "sources",
"tool_calls", "needs_human_review", "reasoning",
},
"additionalProperties": false,
}
b, _ := json.Marshal(schema)
return b
}
✅ 版本4:调用OpenAI with Response Format
// internal/agent/generate.go
package agent
import (
"context"
"encoding/json"
"fmt"
"github.com/sashabaranov/go-openai"
)
type Generator struct {
client *openai.Client
model string
}
func NewGenerator(apiKey, model string) *Generator {
return &Generator{
client: openai.NewClient(apiKey),
model: model,
}
}
// Generate 调用LLM并解析成AgentAnswer
func (g *Generator) Generate(ctx context.Context, userQuery string, retrievedChunks []Source) (*AgentAnswer, error) {
systemPrompt := buildSystemPrompt(retrievedChunks)
resp, err := g.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: g.model,
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleSystem, Content: systemPrompt},
{Role: openai.ChatMessageRoleUser, Content: userQuery},
},
ResponseFormat: &openai.ChatCompletionResponseFormat{
Type: openai.ChatCompletionResponseFormatTypeJSONObject,
},
Temperature: 0.2, // 结构化输出建议低温度
})
if err != nil {
return nil, fmt.Errorf("openai call: %w", err)
}
if len(resp.Choices) == 0 {
return nil, fmt.Errorf("no choices returned")
}
var answer AgentAnswer
raw := resp.Choices[0].Message.Content
if err := json.Unmarshal([]byte(raw), &answer); err != nil {
return nil, fmt.Errorf("parse answer json: %w (raw=%s)", err, raw)
}
// 验证
if violations := answer.Validate(); len(violations) > 0 {
return &answer, &ValidationError{Violations: violations, Raw: raw}
}
return &answer, nil
}
type ValidationError struct {
Violations []Violation
Raw string
}
func (e *ValidationError) Error() string {
return fmt.Sprintf("validation failed: %d violations", len(e.Violations))
}
✅ 版本5:Prompt Engineering - 让LLM稳定输出合法JSON
问题: LLM有时会输出合法JSON但违反业务约束(如sources空却说high)。怎么办?
// internal/agent/prompt.go
package agent
import (
"fmt"
"strings"
)
const systemPromptTemplate = `You are a customer support agent. You MUST respond in strict JSON following this schema:
{
"answer": "<final answer to user>",
"confidence": "low" | "medium" | "high",
"sources": [{"doc_id": "...", "chunk_id": "...", "title": "...", "snippet": "..."}],
"tool_calls": [{"name": "...", "arguments": {...}}],
"needs_human_review": true | false,
"reasoning": "<brief thought>"
}
## CRITICAL RULES (violating these causes your response to be REJECTED):
1. If "sources" is empty, "confidence" MUST be "low". Never guess.
2. If you use tool "create_ticket", "needs_human_review" MUST be true.
3. If "confidence" is "low", "needs_human_review" MUST be true.
4. Every source must include both "doc_id" and "chunk_id".
5. "answer" must be non-empty.
## Grounding
Answer ONLY based on the following retrieved context. If the context does not
contain the answer, say so honestly and set confidence=low.
## Retrieved Context
%s
## Examples
Good response (grounded):
{"answer":"To reset SSO, visit the portal...","confidence":"high","sources":[{"doc_id":"faq_001","chunk_id":"faq_001_chunk_2","snippet":"Click Forgot Password"}],"tool_calls":[],"needs_human_review":false,"reasoning":"Found direct answer in FAQ."}
Good response (no grounding):
{"answer":"I don't have information about this. Let me escalate.","confidence":"low","sources":[],"tool_calls":[],"needs_human_review":true,"reasoning":"No relevant docs retrieved."}
Now respond in strict JSON.`
func buildSystemPrompt(chunks []Source) string {
var sb strings.Builder
for i, c := range chunks {
sb.WriteString(fmt.Sprintf("[%d] doc_id=%s chunk_id=%s\n%s\n\n",
i+1, c.DocID, c.ChunkID, c.Snippet))
}
if sb.Len() == 0 {
sb.WriteString("(no relevant context retrieved)")
}
return fmt.Sprintf(systemPromptTemplate, sb.String())
}
关键技巧总结:
- 显式Schema:在prompt里写完整schema,LLM遵循率更高
- CRITICAL RULES段:用大写/强调词,LLM会更重视
- Few-shot examples:给一个grounded的和一个低confidence的例子
- Temperature=0.2:低温度让输出更确定
- response_format=json_object:强制合法JSON
✅ 版本6:Repair Loop(进阶)
问题: LLM输出违反约束,能不能让它自己修复?
// internal/agent/repair.go
func (g *Generator) GenerateWithRepair(ctx context.Context, query string, chunks []Source) (*AgentAnswer, error) {
answer, err := g.Generate(ctx, query, chunks)
var vErr *ValidationError
if !errors.As(err, &vErr) {
return answer, err // 成功或非validation错误
}
// 让LLM自己修复
repairPrompt := fmt.Sprintf(
"Your previous response violated these rules:\n%s\n\nOriginal response:\n%s\n\nFix it and return strict JSON again.",
formatViolations(vErr.Violations), vErr.Raw,
)
resp, err := g.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: g.model,
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: repairPrompt},
},
ResponseFormat: &openai.ChatCompletionResponseFormat{
Type: openai.ChatCompletionResponseFormatTypeJSONObject,
},
})
if err != nil {
return nil, err
}
var repaired AgentAnswer
if err := json.Unmarshal([]byte(resp.Choices[0].Message.Content), &repaired); err != nil {
return nil, err
}
return &repaired, nil
}
注意: Repair loop最多1-2次,否则成本爆炸。如果反复失败,应该记录badcase并走fallback(返回人工兜底答案)。
第三部分:关键概念
1. 契约优先(Contract First)
Structured Output是Agent层与业务层之间的API契约。定义契约时要问:
- 谁是消费者?(工单系统、审计日志、前端UI)
- 消费者需要什么字段来做决策?
- 哪些字段缺失/错误会导致生产事故?
2. 约束分层
| 层 |
例子 |
检查方式 |
| 语法层 |
是否合法JSON |
json.Unmarshal |
| Schema层 |
字段类型、枚举 |
JSON Schema validator |
| 业务层 |
sources空不能high |
自定义Validate() |
三层都要有,不能漏。
3. Grounding与Hallucination
- Grounded answer:每句话能追溯到
sources中的原文
- Hallucinated answer:sources空却信誓旦旦
Structured Output通过sources字段把grounding可观测化。
4. Temperature vs Structured Output
- 闲聊:temperature=0.7-1.0
- 结构化输出:temperature=0.0-0.3
- 不要用temperature=0,偶尔会陷入死循环
第四部分:自测清单
运行前,问自己:
第五部分:作业
任务1:实现AgentAnswer + Validate
任务2:接入真实LLM
任务3:加入Repair Loop
任务4:思考题
- 如果新增一个字段
suggested_actions []string,该不该加validation规则?
- 下游工单系统要求
ticket_id,但Agent不一定能产出,schema该怎么设计?
第六部分:常见问题解答
Q1: 为什么不用Go struct tag直接生成JSON Schema?
A: 可以,但Go原生不支持enum、inter-field约束等。推荐的库:
github.com/invopop/jsonschema
- 手写schema(字段少时更清晰)
生产项目建议手写,因为schema是契约,应该显式控制而非自动生成。
Q2: confidence由谁判定?LLM自评靠谱吗?
A: LLM自评偏乐观,这是已知问题。改进方法:
- 在prompt里提供明确的判定标准("high=能直接引用原文;medium=需要综合多源;low=推测")
- 训练阶段用eval对比LLM自评和人工标注,校准prompt
- 对关键场景(如涉及金额、权限),强制走人工审核不依赖self-confidence
Q3: sources的snippet要多长?
A: 50-200字符的原文片段。太短不能溯源,太长污染响应。前端展示"出处"时直接用。
Q4: 如果LLM输出的JSON超长被截断怎么办?
A:
- 设置
max_tokens留够余量
- 检测
finish_reason=length,若命中则报错并返回fallback
- Reasoning字段限长(prompt里加"reasoning必须≤100字")
Q5: ToolCall为什么既要记Arguments又要记Result?
A: 审计和replay。出问题时需要能还原"Agent调用了create_ticket(title=X),返回了ticket_id=123"的完整链路。
配套算法题
今天的算法主题:图/矩阵BFS-DFS。对应Agent系统中的"检索图谱遍历"思维。
题1:Number of Islands (LeetCode 200) - Medium
// 给定'1'(陆地)'0'(水)的2D grid,返回岛屿数
func numIslands(grid [][]byte) int {
if len(grid) == 0 {
return 0
}
m, n := len(grid), len(grid[0])
count := 0
var dfs func(i, j int)
dfs = func(i, j int) {
if i < 0 || i >= m || j < 0 || j >= n || grid[i][j] != '1' {
return
}
grid[i][j] = '0' // 标记访问
dfs(i+1, j)
dfs(i-1, j)
dfs(i, j+1)
dfs(i, j-1)
}
for i := 0; i < m; i++ {
for j := 0; j < n; j++ {
if grid[i][j] == '1' {
count++
dfs(i, j)
}
}
}
return count
}
讲解: DFS模板题。grid[i][j]='0'原地标记避免重复访问。时间O(mn),空间O(mn)递归栈。
追问: 如果不能修改grid怎么办?→ 用visited [][]bool。
题2:Max Area of Island (LeetCode 695) - Medium
func maxAreaOfIsland(grid [][]int) int {
m, n := len(grid), len(grid[0])
maxArea := 0
var dfs func(i, j int) int
dfs = func(i, j int) int {
if i < 0 || i >= m || j < 0 || j >= n || grid[i][j] != 1 {
return 0
}
grid[i][j] = 0
return 1 + dfs(i+1, j) + dfs(i-1, j) + dfs(i, j+1) + dfs(i, j-1)
}
for i := 0; i < m; i++ {
for j := 0; j < n; j++ {
if grid[i][j] == 1 {
if a := dfs(i, j); a > maxArea {
maxArea = a
}
}
}
}
return maxArea
}
讲解: 相比Number of Islands,DFS返回值改为"本次连通区域大小"。
题3:Clone Graph (LeetCode 133) - Medium
type Node struct {
Val int
Neighbors []*Node
}
func cloneGraph(node *Node) *Node {
if node == nil {
return nil
}
visited := map[*Node]*Node{}
var clone func(n *Node) *Node
clone = func(n *Node) *Node {
if cp, ok := visited[n]; ok {
return cp
}
cp := &Node{Val: n.Val}
visited[n] = cp // 必须先登记再递归,防止环
for _, nb := range n.Neighbors {
cp.Neighbors = append(cp.Neighbors, clone(nb))
}
return cp
}
return clone(node)
}
讲解: 图的深拷贝。核心:用map记录"原节点→新节点",先登记再递归,否则环会无限循环。
联系Agent: 对应ToolCall链路的深拷贝——审计日志中需要snapshot整条调用链,不能被后续mutation影响。
题4:Word Ladder (LeetCode 127) - Hard(选做)
BFS求最短路径。每次变一个字母,从beginWord到endWord的最短步数。
提示: 构建wordList的邻接关系时,可以用"通配模式":hot → *ot, h*t, ho*作为桥接节点,避免O(n²)的相邻检查。
下一步:Day 6 预告
明天我们会:
- 实现Retry/Backoff机制(transient vs non-retryable错误分类)
- Exponential backoff + ctx.Done()正确交互
- Idempotency key保证工单不重复创建
- Sync.Map / Redis做去重存储
准备问题:
context.Canceled和context.DeadlineExceeded有什么区别?
- 什么是"at-least-once" vs "exactly-once"?幂等性属于哪个?
- 两个goroutine同时提交同一请求,你的系统会创建两个工单吗?