Week 1 Day 5:Structured Output - 苏格拉底教学

💡 一句话核心:让LLM输出的不是「一段文字」,而是「一份契约」——下游系统能验证、审计、消费的结构化数据。

学习目标

  • 理解为什么Agent系统必须用Structured Output
  • 设计AgentAnswer struct及其字段语义
  • 实现validation规则(约束之间的逻辑关系)
  • 掌握JSON Schema + Response Format API
  • 学会Prompt engineering让LLM稳定输出合法JSON

第一部分:问题驱动

🤔 问题1:为什么不能让LLM直接返回一段文字?

引导问题:

  1. 如果Agent的回答是一段纯文本,下游系统(如工单系统、审计日志、监控)怎么消费?
  2. 如何区分「Agent很确定」和「Agent在瞎猜」?
  3. 上线后出现badcase,你怎么回溯:LLM用了哪些tool?引用了哪些文档?
  4. 如果产品要求「confidence低的回答必须走人工审核」,你的系统能实现吗?

答案揭示:

  • 下游消费:工单系统需要知道"要不要创建工单"、"谁负责",而不是一段话
  • 验证:系统需要判断"这个回答能不能直接发给用户"
  • 审计:出问题时,需要完整的决策链路(reasoning + sources + tool_calls)
  • 路由:低置信度的回答要走人工,高置信度可自动发送

你应该理解: Structured Output不是"为了好看",而是Agent系统与现实业务系统之间的契约


🤔 问题2:一个好的AgentAnswer应该包含什么字段?

思考: 如果你是客服Agent的使用方(工单系统),你需要Agent告诉你什么?

必需字段:

字段 类型 用途
answer string 给用户看的最终答案
confidence enum (low/medium/high) 触发下游路由(是否人工审核)
sources []Source 引用的文档,用于溯源和展示"出处"
tool_calls []ToolCall 审计Agent调用了哪些工具(创建工单?查权限?)
needs_human_review bool 显式信号:要不要走人工
reasoning string Agent的思考过程,用于debug和审计

反思: 为什么confidence要用enum而不是float(0-1)?

  • 一致性:LLM输出0.873和0.874没意义,离散化后更稳定
  • 可路由:下游规则if confidence == "low" then reviewif confidence < 0.6更清晰
  • 可测试:eval时不用纠结0.6和0.61的区别

🤔 问题3:字段之间的约束关系是什么?

场景: LLM返回了

{
  "answer": "根据我们的政策,你可以申请...",
  "confidence": "high",
  "sources": [],
  "needs_human_review": false
}

问题: 这个回答合法吗?

不合法! sources为空却说confidence=high——Agent在编造答案。这就是hallucination

规则集:

  1. sources为空 → confidence不能为high
  2. tool_calls中包含"create_ticket" → needs_human_review必须为true
  3. confidence=lowneeds_human_review必须为true
  4. answer不能为空字符串
  5. 每个source必须有doc_idchunk_id

关键洞察: Structured Output的价值不仅是字段结构,更是字段之间的逻辑约束。LLM经常会违反这些约束,所以我们需要Validation层。


第二部分:动手实现

✅ 版本1:定义AgentAnswer struct

// internal/agent/answer.go
package agent

import (
	"encoding/json"
	"fmt"
	"strings"
)

// Confidence 置信度枚举
type Confidence string

const (
	ConfidenceLow    Confidence = "low"
	ConfidenceMedium Confidence = "medium"
	ConfidenceHigh   Confidence = "high"
)

// Source 引用的文档来源
type Source struct {
	DocID   string `json:"doc_id"`
	ChunkID string `json:"chunk_id"`
	Title   string `json:"title,omitempty"`
	URL     string `json:"url,omitempty"`
	Snippet string `json:"snippet,omitempty"` // 引用的原文片段
}

// ToolCall Agent调用的工具记录
type ToolCall struct {
	Name      string          `json:"name"`
	Arguments json.RawMessage `json:"arguments"`
	Result    json.RawMessage `json:"result,omitempty"`
	Error     string          `json:"error,omitempty"`
}

// AgentAnswer Agent的完整结构化回答
type AgentAnswer struct {
	Answer            string     `json:"answer"`
	Confidence        Confidence `json:"confidence"`
	Sources           []Source   `json:"sources"`
	ToolCalls         []ToolCall `json:"tool_calls"`
	NeedsHumanReview  bool       `json:"needs_human_review"`
	Reasoning         string     `json:"reasoning"`
}

反思题:

  • 为什么Sources[]Source而不是[]string?→ 需要doc_id+chunk_id才能精准溯源
  • 为什么ToolCall.Argumentsjson.RawMessage?→ 不同工具参数结构不同,延迟解析
  • 为什么有Reasoning字段?→ 审计和debug,不给用户看但要记录

✅ 版本2:Validation规则

// internal/agent/validate.go
package agent

// Violation 一条违规记录
type Violation struct {
	Field   string `json:"field"`
	Rule    string `json:"rule"`
	Message string `json:"message"`
}

// Validate 返回所有违规;为空表示合法
func (a *AgentAnswer) Validate() []Violation {
	var violations []Violation

	// 规则1:answer不能为空
	if strings.TrimSpace(a.Answer) == "" {
		violations = append(violations, Violation{
			Field:   "answer",
			Rule:    "required",
			Message: "answer must not be empty",
		})
	}

	// 规则2:confidence必须是枚举值
	switch a.Confidence {
	case ConfidenceLow, ConfidenceMedium, ConfidenceHigh:
	default:
		violations = append(violations, Violation{
			Field:   "confidence",
			Rule:    "enum",
			Message: fmt.Sprintf("confidence must be low/medium/high, got %q", a.Confidence),
		})
	}

	// 规则3:sources为空 → confidence不能是high
	if len(a.Sources) == 0 && a.Confidence == ConfidenceHigh {
		violations = append(violations, Violation{
			Field:   "confidence",
			Rule:    "consistency",
			Message: "confidence cannot be high when sources is empty (no grounding)",
		})
	}

	// 规则4:创建工单 → needs_human_review必须true
	for _, tc := range a.ToolCalls {
		if tc.Name == "create_ticket" && !a.NeedsHumanReview {
			violations = append(violations, Violation{
				Field:   "needs_human_review",
				Rule:    "policy",
				Message: "creating a ticket requires human review",
			})
		}
	}

	// 规则5:confidence=low → needs_human_review必须true
	if a.Confidence == ConfidenceLow && !a.NeedsHumanReview {
		violations = append(violations, Violation{
			Field:   "needs_human_review",
			Rule:    "consistency",
			Message: "low confidence answers must be marked for review",
		})
	}

	// 规则6:每个source必须有doc_id和chunk_id
	for i, src := range a.Sources {
		if src.DocID == "" || src.ChunkID == "" {
			violations = append(violations, Violation{
				Field:   fmt.Sprintf("sources[%d]", i),
				Rule:    "required",
				Message: "each source must have doc_id and chunk_id",
			})
		}
	}

	return violations
}

// IsValid 简便方法
func (a *AgentAnswer) IsValid() bool {
	return len(a.Validate()) == 0
}

测试一下:

func TestValidate_SourcesEmptyHighConfidence(t *testing.T) {
	a := AgentAnswer{
		Answer:     "SSO密码可以通过portal重置",
		Confidence: ConfidenceHigh,
		Sources:    []Source{}, // 空!
	}
	vs := a.Validate()
	if len(vs) == 0 {
		t.Fatal("expected violations for empty sources with high confidence")
	}
}

✅ 版本3:JSON Schema生成

为什么要JSON Schema?

  • OpenAI的response_format支持JSON Schema,强制LLM输出合法JSON
  • 作为文档,清晰描述契约
// internal/agent/schema.go
package agent

import "encoding/json"

// AnswerSchema 返回给OpenAI response_format用的schema
func AnswerSchema() json.RawMessage {
	schema := map[string]interface{}{
		"type": "object",
		"properties": map[string]interface{}{
			"answer": map[string]interface{}{
				"type":        "string",
				"description": "Final answer shown to the user.",
			},
			"confidence": map[string]interface{}{
				"type": "string",
				"enum": []string{"low", "medium", "high"},
				"description": "high=grounded in sources; medium=partial; low=guessing or no sources.",
			},
			"sources": map[string]interface{}{
				"type": "array",
				"items": map[string]interface{}{
					"type": "object",
					"properties": map[string]interface{}{
						"doc_id":   map[string]interface{}{"type": "string"},
						"chunk_id": map[string]interface{}{"type": "string"},
						"title":    map[string]interface{}{"type": "string"},
						"snippet":  map[string]interface{}{"type": "string"},
					},
					"required":             []string{"doc_id", "chunk_id"},
					"additionalProperties": false,
				},
			},
			"tool_calls": map[string]interface{}{
				"type": "array",
				"items": map[string]interface{}{
					"type": "object",
					"properties": map[string]interface{}{
						"name":      map[string]interface{}{"type": "string"},
						"arguments": map[string]interface{}{"type": "object"},
					},
					"required": []string{"name", "arguments"},
				},
			},
			"needs_human_review": map[string]interface{}{
				"type": "boolean",
			},
			"reasoning": map[string]interface{}{
				"type":        "string",
				"description": "Brief chain-of-thought, not shown to user.",
			},
		},
		"required": []string{
			"answer", "confidence", "sources",
			"tool_calls", "needs_human_review", "reasoning",
		},
		"additionalProperties": false,
	}
	b, _ := json.Marshal(schema)
	return b
}

✅ 版本4:调用OpenAI with Response Format

// internal/agent/generate.go
package agent

import (
	"context"
	"encoding/json"
	"fmt"

	"github.com/sashabaranov/go-openai"
)

type Generator struct {
	client *openai.Client
	model  string
}

func NewGenerator(apiKey, model string) *Generator {
	return &Generator{
		client: openai.NewClient(apiKey),
		model:  model,
	}
}

// Generate 调用LLM并解析成AgentAnswer
func (g *Generator) Generate(ctx context.Context, userQuery string, retrievedChunks []Source) (*AgentAnswer, error) {
	systemPrompt := buildSystemPrompt(retrievedChunks)

	resp, err := g.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
		Model: g.model,
		Messages: []openai.ChatCompletionMessage{
			{Role: openai.ChatMessageRoleSystem, Content: systemPrompt},
			{Role: openai.ChatMessageRoleUser, Content: userQuery},
		},
		ResponseFormat: &openai.ChatCompletionResponseFormat{
			Type: openai.ChatCompletionResponseFormatTypeJSONObject,
		},
		Temperature: 0.2, // 结构化输出建议低温度
	})
	if err != nil {
		return nil, fmt.Errorf("openai call: %w", err)
	}

	if len(resp.Choices) == 0 {
		return nil, fmt.Errorf("no choices returned")
	}

	var answer AgentAnswer
	raw := resp.Choices[0].Message.Content
	if err := json.Unmarshal([]byte(raw), &answer); err != nil {
		return nil, fmt.Errorf("parse answer json: %w (raw=%s)", err, raw)
	}

	// 验证
	if violations := answer.Validate(); len(violations) > 0 {
		return &answer, &ValidationError{Violations: violations, Raw: raw}
	}

	return &answer, nil
}

type ValidationError struct {
	Violations []Violation
	Raw        string
}

func (e *ValidationError) Error() string {
	return fmt.Sprintf("validation failed: %d violations", len(e.Violations))
}

✅ 版本5:Prompt Engineering - 让LLM稳定输出合法JSON

问题: LLM有时会输出合法JSON但违反业务约束(如sources空却说high)。怎么办?

// internal/agent/prompt.go
package agent

import (
	"fmt"
	"strings"
)

const systemPromptTemplate = `You are a customer support agent. You MUST respond in strict JSON following this schema:

{
  "answer": "<final answer to user>",
  "confidence": "low" | "medium" | "high",
  "sources": [{"doc_id": "...", "chunk_id": "...", "title": "...", "snippet": "..."}],
  "tool_calls": [{"name": "...", "arguments": {...}}],
  "needs_human_review": true | false,
  "reasoning": "<brief thought>"
}

## CRITICAL RULES (violating these causes your response to be REJECTED):

1. If "sources" is empty, "confidence" MUST be "low". Never guess.
2. If you use tool "create_ticket", "needs_human_review" MUST be true.
3. If "confidence" is "low", "needs_human_review" MUST be true.
4. Every source must include both "doc_id" and "chunk_id".
5. "answer" must be non-empty.

## Grounding

Answer ONLY based on the following retrieved context. If the context does not
contain the answer, say so honestly and set confidence=low.

## Retrieved Context

%s

## Examples

Good response (grounded):
{"answer":"To reset SSO, visit the portal...","confidence":"high","sources":[{"doc_id":"faq_001","chunk_id":"faq_001_chunk_2","snippet":"Click Forgot Password"}],"tool_calls":[],"needs_human_review":false,"reasoning":"Found direct answer in FAQ."}

Good response (no grounding):
{"answer":"I don't have information about this. Let me escalate.","confidence":"low","sources":[],"tool_calls":[],"needs_human_review":true,"reasoning":"No relevant docs retrieved."}

Now respond in strict JSON.`

func buildSystemPrompt(chunks []Source) string {
	var sb strings.Builder
	for i, c := range chunks {
		sb.WriteString(fmt.Sprintf("[%d] doc_id=%s chunk_id=%s\n%s\n\n",
			i+1, c.DocID, c.ChunkID, c.Snippet))
	}
	if sb.Len() == 0 {
		sb.WriteString("(no relevant context retrieved)")
	}
	return fmt.Sprintf(systemPromptTemplate, sb.String())
}

关键技巧总结:

  1. 显式Schema:在prompt里写完整schema,LLM遵循率更高
  2. CRITICAL RULES段:用大写/强调词,LLM会更重视
  3. Few-shot examples:给一个grounded的和一个低confidence的例子
  4. Temperature=0.2:低温度让输出更确定
  5. response_format=json_object:强制合法JSON

✅ 版本6:Repair Loop(进阶)

问题: LLM输出违反约束,能不能让它自己修复?

// internal/agent/repair.go
func (g *Generator) GenerateWithRepair(ctx context.Context, query string, chunks []Source) (*AgentAnswer, error) {
	answer, err := g.Generate(ctx, query, chunks)
	var vErr *ValidationError
	if !errors.As(err, &vErr) {
		return answer, err // 成功或非validation错误
	}

	// 让LLM自己修复
	repairPrompt := fmt.Sprintf(
		"Your previous response violated these rules:\n%s\n\nOriginal response:\n%s\n\nFix it and return strict JSON again.",
		formatViolations(vErr.Violations), vErr.Raw,
	)

	resp, err := g.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
		Model: g.model,
		Messages: []openai.ChatCompletionMessage{
			{Role: openai.ChatMessageRoleUser, Content: repairPrompt},
		},
		ResponseFormat: &openai.ChatCompletionResponseFormat{
			Type: openai.ChatCompletionResponseFormatTypeJSONObject,
		},
	})
	if err != nil {
		return nil, err
	}

	var repaired AgentAnswer
	if err := json.Unmarshal([]byte(resp.Choices[0].Message.Content), &repaired); err != nil {
		return nil, err
	}
	return &repaired, nil
}

注意: Repair loop最多1-2次,否则成本爆炸。如果反复失败,应该记录badcase并走fallback(返回人工兜底答案)。


第三部分:关键概念

1. 契约优先(Contract First)

Structured Output是Agent层与业务层之间的API契约。定义契约时要问:

  • 谁是消费者?(工单系统、审计日志、前端UI)
  • 消费者需要什么字段来做决策?
  • 哪些字段缺失/错误会导致生产事故?

2. 约束分层

例子 检查方式
语法层 是否合法JSON json.Unmarshal
Schema层 字段类型、枚举 JSON Schema validator
业务层 sources空不能high 自定义Validate()

三层都要有,不能漏。

3. Grounding与Hallucination

  • Grounded answer:每句话能追溯到sources中的原文
  • Hallucinated answer:sources空却信誓旦旦

Structured Output通过sources字段把grounding可观测化

4. Temperature vs Structured Output

  • 闲聊:temperature=0.7-1.0
  • 结构化输出:temperature=0.0-0.3
  • 不要用temperature=0,偶尔会陷入死循环

第四部分:自测清单

运行前,问自己:

  • 我能说出AgentAnswer每个字段给谁用
  • 为什么confidence用enum而不是float?
  • 写出3条字段间的约束规则
  • response_format和JSON Schema的区别是什么?
  • Repair loop为什么不能无限重试?
  • 如果LLM返回的JSON合法但违反业务规则,错误该返回给用户还是吞掉?

第五部分:作业

任务1:实现AgentAnswer + Validate

  • 完成internal/agent/answer.govalidate.go
  • 写至少5个table-driven test覆盖每条规则
  • go test ./internal/agent/...全绿

任务2:接入真实LLM

  • 实现Generator.Generate,用response_format=json_object
  • 用5个样例问题手动测试(其中2个应该返回low confidence)
  • 记录3个badcase:LLM违反了哪条规则?

任务3:加入Repair Loop

  • 实现GenerateWithRepair,最多repair 1次
  • 统计repair的触发率和成功率

任务4:思考题

  • 如果新增一个字段suggested_actions []string,该不该加validation规则?
  • 下游工单系统要求ticket_id,但Agent不一定能产出,schema该怎么设计?

第六部分:常见问题解答

Q1: 为什么不用Go struct tag直接生成JSON Schema?

A: 可以,但Go原生不支持enum、inter-field约束等。推荐的库:

  • github.com/invopop/jsonschema
  • 手写schema(字段少时更清晰)

生产项目建议手写,因为schema是契约,应该显式控制而非自动生成。


Q2: confidence由谁判定?LLM自评靠谱吗?

A: LLM自评偏乐观,这是已知问题。改进方法:

  1. 在prompt里提供明确的判定标准("high=能直接引用原文;medium=需要综合多源;low=推测")
  2. 训练阶段用eval对比LLM自评和人工标注,校准prompt
  3. 对关键场景(如涉及金额、权限),强制走人工审核不依赖self-confidence

Q3: sources的snippet要多长?

A: 50-200字符的原文片段。太短不能溯源,太长污染响应。前端展示"出处"时直接用。


Q4: 如果LLM输出的JSON超长被截断怎么办?

A:

  1. 设置max_tokens留够余量
  2. 检测finish_reason=length,若命中则报错并返回fallback
  3. Reasoning字段限长(prompt里加"reasoning必须≤100字")

Q5: ToolCall为什么既要记Arguments又要记Result?

A: 审计和replay。出问题时需要能还原"Agent调用了create_ticket(title=X),返回了ticket_id=123"的完整链路。


配套算法题

今天的算法主题:图/矩阵BFS-DFS。对应Agent系统中的"检索图谱遍历"思维。

题1:Number of Islands (LeetCode 200) - Medium

// 给定'1'(陆地)'0'(水)的2D grid,返回岛屿数
func numIslands(grid [][]byte) int {
	if len(grid) == 0 {
		return 0
	}
	m, n := len(grid), len(grid[0])
	count := 0

	var dfs func(i, j int)
	dfs = func(i, j int) {
		if i < 0 || i >= m || j < 0 || j >= n || grid[i][j] != '1' {
			return
		}
		grid[i][j] = '0' // 标记访问
		dfs(i+1, j)
		dfs(i-1, j)
		dfs(i, j+1)
		dfs(i, j-1)
	}

	for i := 0; i < m; i++ {
		for j := 0; j < n; j++ {
			if grid[i][j] == '1' {
				count++
				dfs(i, j)
			}
		}
	}
	return count
}

讲解: DFS模板题。grid[i][j]='0'原地标记避免重复访问。时间O(mn),空间O(mn)递归栈。

追问: 如果不能修改grid怎么办?→ 用visited [][]bool


题2:Max Area of Island (LeetCode 695) - Medium

func maxAreaOfIsland(grid [][]int) int {
	m, n := len(grid), len(grid[0])
	maxArea := 0

	var dfs func(i, j int) int
	dfs = func(i, j int) int {
		if i < 0 || i >= m || j < 0 || j >= n || grid[i][j] != 1 {
			return 0
		}
		grid[i][j] = 0
		return 1 + dfs(i+1, j) + dfs(i-1, j) + dfs(i, j+1) + dfs(i, j-1)
	}

	for i := 0; i < m; i++ {
		for j := 0; j < n; j++ {
			if grid[i][j] == 1 {
				if a := dfs(i, j); a > maxArea {
					maxArea = a
				}
			}
		}
	}
	return maxArea
}

讲解: 相比Number of Islands,DFS返回值改为"本次连通区域大小"。


题3:Clone Graph (LeetCode 133) - Medium

type Node struct {
	Val       int
	Neighbors []*Node
}

func cloneGraph(node *Node) *Node {
	if node == nil {
		return nil
	}
	visited := map[*Node]*Node{}

	var clone func(n *Node) *Node
	clone = func(n *Node) *Node {
		if cp, ok := visited[n]; ok {
			return cp
		}
		cp := &Node{Val: n.Val}
		visited[n] = cp // 必须先登记再递归,防止环
		for _, nb := range n.Neighbors {
			cp.Neighbors = append(cp.Neighbors, clone(nb))
		}
		return cp
	}
	return clone(node)
}

讲解: 图的深拷贝。核心:用map记录"原节点→新节点",先登记再递归,否则环会无限循环。

联系Agent: 对应ToolCall链路的深拷贝——审计日志中需要snapshot整条调用链,不能被后续mutation影响。


题4:Word Ladder (LeetCode 127) - Hard(选做)

BFS求最短路径。每次变一个字母,从beginWord到endWord的最短步数。

提示: 构建wordList的邻接关系时,可以用"通配模式":hot → *ot, h*t, ho*作为桥接节点,避免O(n²)的相邻检查。


下一步:Day 6 预告

明天我们会:

  1. 实现Retry/Backoff机制(transient vs non-retryable错误分类)
  2. Exponential backoff + ctx.Done()正确交互
  3. Idempotency key保证工单不重复创建
  4. Sync.Map / Redis做去重存储

准备问题:

  • context.Canceledcontext.DeadlineExceeded有什么区别?
  • 什么是"at-least-once" vs "exactly-once"?幂等性属于哪个?
  • 两个goroutine同时提交同一请求,你的系统会创建两个工单吗?