Week 1 Day 5：Structured Output - 苏格拉底教学

💡 一句话核心：让LLM输出的不是「一段文字」，而是「一份契约」——下游系统能验证、审计、消费的结构化数据。

学习目标

理解为什么Agent系统必须用Structured Output
设计AgentAnswer struct及其字段语义
实现validation规则（约束之间的逻辑关系）
掌握JSON Schema + Response Format API
学会Prompt engineering让LLM稳定输出合法JSON

第一部分：问题驱动

🤔 问题1：为什么不能让LLM直接返回一段文字？

引导问题：

如果Agent的回答是一段纯文本，下游系统（如工单系统、审计日志、监控）怎么消费？
如何区分「Agent很确定」和「Agent在瞎猜」？
上线后出现badcase，你怎么回溯：LLM用了哪些tool？引用了哪些文档？
如果产品要求「confidence低的回答必须走人工审核」，你的系统能实现吗？

答案揭示：

下游消费：工单系统需要知道"要不要创建工单"、"谁负责"，而不是一段话
验证：系统需要判断"这个回答能不能直接发给用户"
审计：出问题时，需要完整的决策链路（reasoning + sources + tool_calls）
路由：低置信度的回答要走人工，高置信度可自动发送

你应该理解： Structured Output不是"为了好看"，而是Agent系统与现实业务系统之间的契约。

🤔 问题2：一个好的AgentAnswer应该包含什么字段？

思考： 如果你是客服Agent的使用方（工单系统），你需要Agent告诉你什么？

必需字段：

字段	类型	用途
`answer`	string	给用户看的最终答案
`confidence`	enum (low/medium/high)	触发下游路由（是否人工审核）
`sources`	[]Source	引用的文档，用于溯源和展示"出处"
`tool_calls`	[]ToolCall	审计Agent调用了哪些工具（创建工单？查权限？）
`needs_human_review`	bool	显式信号：要不要走人工
`reasoning`	string	Agent的思考过程，用于debug和审计

反思： 为什么confidence要用enum而不是float（0-1）？

一致性：LLM输出0.873和0.874没意义，离散化后更稳定
可路由：下游规则if confidence == "low" then review比if confidence < 0.6更清晰
可测试：eval时不用纠结0.6和0.61的区别

🤔 问题3：字段之间的约束关系是什么？

场景： LLM返回了

{
  "answer": "根据我们的政策，你可以申请...",
  "confidence": "high",
  "sources": [],
  "needs_human_review": false
}

问题： 这个回答合法吗？

不合法！ sources为空却说confidence=high——Agent在编造答案。这就是hallucination。

规则集：

sources为空 → confidence不能为high
tool_calls中包含"create_ticket" → needs_human_review必须为true
confidence=low → needs_human_review必须为true
answer不能为空字符串
每个source必须有doc_id和chunk_id

关键洞察： Structured Output的价值不仅是字段结构，更是字段之间的逻辑约束。LLM经常会违反这些约束，所以我们需要Validation层。

第二部分：动手实现

✅ 版本1：定义AgentAnswer struct

// internal/agent/answer.go
package agent

import (
	"encoding/json"
	"fmt"
	"strings"
)

// Confidence 置信度枚举
type Confidence string

const (
	ConfidenceLow    Confidence = "low"
	ConfidenceMedium Confidence = "medium"
	ConfidenceHigh   Confidence = "high"
)

// Source 引用的文档来源
type Source struct {
	DocID   string `json:"doc_id"`
	ChunkID string `json:"chunk_id"`
	Title   string `json:"title,omitempty"`
	URL     string `json:"url,omitempty"`
	Snippet string `json:"snippet,omitempty"` // 引用的原文片段
}

// ToolCall Agent调用的工具记录
type ToolCall struct {
	Name      string          `json:"name"`
	Arguments json.RawMessage `json:"arguments"`
	Result    json.RawMessage `json:"result,omitempty"`
	Error     string          `json:"error,omitempty"`
}

// AgentAnswer Agent的完整结构化回答
type AgentAnswer struct {
	Answer            string     `json:"answer"`
	Confidence        Confidence `json:"confidence"`
	Sources           []Source   `json:"sources"`
	ToolCalls         []ToolCall `json:"tool_calls"`
	NeedsHumanReview  bool       `json:"needs_human_review"`
	Reasoning         string     `json:"reasoning"`
}

反思题：

为什么Sources是[]Source而不是[]string？→ 需要doc_id+chunk_id才能精准溯源
为什么ToolCall.Arguments是json.RawMessage？→ 不同工具参数结构不同，延迟解析
为什么有Reasoning字段？→ 审计和debug，不给用户看但要记录

✅ 版本2：Validation规则

// internal/agent/validate.go
package agent

// Violation 一条违规记录
type Violation struct {
	Field   string `json:"field"`
	Rule    string `json:"rule"`
	Message string `json:"message"`
}

// Validate 返回所有违规；为空表示合法
func (a *AgentAnswer) Validate() []Violation {
	var violations []Violation

	// 规则1：answer不能为空
	if strings.TrimSpace(a.Answer) == "" {
		violations = append(violations, Violation{
			Field:   "answer",
			Rule:    "required",
			Message: "answer must not be empty",
		})
	}

	// 规则2：confidence必须是枚举值
	switch a.Confidence {
	case ConfidenceLow, ConfidenceMedium, ConfidenceHigh:
	default:
		violations = append(violations, Violation{
			Field:   "confidence",
			Rule:    "enum",
			Message: fmt.Sprintf("confidence must be low/medium/high, got %q", a.Confidence),
		})
	}

	// 规则3：sources为空 → confidence不能是high
	if len(a.Sources) == 0 && a.Confidence == ConfidenceHigh {
		violations = append(violations, Violation{
			Field:   "confidence",
			Rule:    "consistency",
			Message: "confidence cannot be high when sources is empty (no grounding)",
		})
	}

	// 规则4：创建工单 → needs_human_review必须true
	for _, tc := range a.ToolCalls {
		if tc.Name == "create_ticket" && !a.NeedsHumanReview {
			violations = append(violations, Violation{
				Field:   "needs_human_review",
				Rule:    "policy",
				Message: "creating a ticket requires human review",
			})
		}
	}

	// 规则5：confidence=low → needs_human_review必须true
	if a.Confidence == ConfidenceLow && !a.NeedsHumanReview {
		violations = append(violations, Violation{
			Field:   "needs_human_review",
			Rule:    "consistency",
			Message: "low confidence answers must be marked for review",
		})
	}

	// 规则6：每个source必须有doc_id和chunk_id
	for i, src := range a.Sources {
		if src.DocID == "" || src.ChunkID == "" {
			violations = append(violations, Violation{
				Field:   fmt.Sprintf("sources[%d]", i),
				Rule:    "required",
				Message: "each source must have doc_id and chunk_id",
			})
		}
	}

	return violations
}

// IsValid 简便方法
func (a *AgentAnswer) IsValid() bool {
	return len(a.Validate()) == 0
}

测试一下：

func TestValidate_SourcesEmptyHighConfidence(t *testing.T) {
	a := AgentAnswer{
		Answer:     "SSO密码可以通过portal重置",
		Confidence: ConfidenceHigh,
		Sources:    []Source{}, // 空！
	}
	vs := a.Validate()
	if len(vs) == 0 {
		t.Fatal("expected violations for empty sources with high confidence")
	}
}

✅ 版本3：JSON Schema生成

为什么要JSON Schema？

OpenAI的response_format支持JSON Schema，强制LLM输出合法JSON
作为文档，清晰描述契约

// internal/agent/schema.go
package agent

import "encoding/json"

// AnswerSchema 返回给OpenAI response_format用的schema
func AnswerSchema() json.RawMessage {
	schema := map[string]interface{}{
		"type": "object",
		"properties": map[string]interface{}{
			"answer": map[string]interface{}{
				"type":        "string",
				"description": "Final answer shown to the user.",
			},
			"confidence": map[string]interface{}{
				"type": "string",
				"enum": []string{"low", "medium", "high"},
				"description": "high=grounded in sources; medium=partial; low=guessing or no sources.",
			},
			"sources": map[string]interface{}{
				"type": "array",
				"items": map[string]interface{}{
					"type": "object",
					"properties": map[string]interface{}{
						"doc_id":   map[string]interface{}{"type": "string"},
						"chunk_id": map[string]interface{}{"type": "string"},
						"title":    map[string]interface{}{"type": "string"},
						"snippet":  map[string]interface{}{"type": "string"},
					},
					"required":             []string{"doc_id", "chunk_id"},
					"additionalProperties": false,
				},
			},
			"tool_calls": map[string]interface{}{
				"type": "array",
				"items": map[string]interface{}{
					"type": "object",
					"properties": map[string]interface{}{
						"name":      map[string]interface{}{"type": "string"},
						"arguments": map[string]interface{}{"type": "object"},
					},
					"required": []string{"name", "arguments"},
				},
			},
			"needs_human_review": map[string]interface{}{
				"type": "boolean",
			},
			"reasoning": map[string]interface{}{
				"type":        "string",
				"description": "Brief chain-of-thought, not shown to user.",
			},
		},
		"required": []string{
			"answer", "confidence", "sources",
			"tool_calls", "needs_human_review", "reasoning",
		},
		"additionalProperties": false,
	}
	b, _ := json.Marshal(schema)
	return b
}

✅ 版本4：调用OpenAI with Response Format

// internal/agent/generate.go
package agent

import (
	"context"
	"encoding/json"
	"fmt"

	"github.com/sashabaranov/go-openai"
)

type Generator struct {
	client *openai.Client
	model  string
}

func NewGenerator(apiKey, model string) *Generator {
	return &Generator{
		client: openai.NewClient(apiKey),
		model:  model,
	}
}

// Generate 调用LLM并解析成AgentAnswer
func (g *Generator) Generate(ctx context.Context, userQuery string, retrievedChunks []Source) (*AgentAnswer, error) {
	systemPrompt := buildSystemPrompt(retrievedChunks)

	resp, err := g.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
		Model: g.model,
		Messages: []openai.ChatCompletionMessage{
			{Role: openai.ChatMessageRoleSystem, Content: systemPrompt},
			{Role: openai.ChatMessageRoleUser, Content: userQuery},
		},
		ResponseFormat: &openai.ChatCompletionResponseFormat{
			Type: openai.ChatCompletionResponseFormatTypeJSONObject,
		},
		Temperature: 0.2, // 结构化输出建议低温度
	})
	if err != nil {
		return nil, fmt.Errorf("openai call: %w", err)
	}

	if len(resp.Choices) == 0 {
		return nil, fmt.Errorf("no choices returned")
	}

	var answer AgentAnswer
	raw := resp.Choices[0].Message.Content
	if err := json.Unmarshal([]byte(raw), &answer); err != nil {
		return nil, fmt.Errorf("parse answer json: %w (raw=%s)", err, raw)
	}

	// 验证
	if violations := answer.Validate(); len(violations) > 0 {
		return &answer, &ValidationError{Violations: violations, Raw: raw}
	}

	return &answer, nil
}

type ValidationError struct {
	Violations []Violation
	Raw        string
}

func (e *ValidationError) Error() string {
	return fmt.Sprintf("validation failed: %d violations", len(e.Violations))
}

✅ 版本5：Prompt Engineering - 让LLM稳定输出合法JSON

问题： LLM有时会输出合法JSON但违反业务约束（如sources空却说high）。怎么办？

// internal/agent/prompt.go
package agent

import (
	"fmt"
	"strings"
)

const systemPromptTemplate = `You are a customer support agent. You MUST respond in strict JSON following this schema:

{
  "answer": "<final answer to user>",
  "confidence": "low" | "medium" | "high",
  "sources": [{"doc_id": "...", "chunk_id": "...", "title": "...", "snippet": "..."}],
  "tool_calls": [{"name": "...", "arguments": {...}}],
  "needs_human_review": true | false,
  "reasoning": "<brief thought>"
}

## CRITICAL RULES (violating these causes your response to be REJECTED):

1. If "sources" is empty, "confidence" MUST be "low". Never guess.
2. If you use tool "create_ticket", "needs_human_review" MUST be true.
3. If "confidence" is "low", "needs_human_review" MUST be true.
4. Every source must include both "doc_id" and "chunk_id".
5. "answer" must be non-empty.

## Grounding

Answer ONLY based on the following retrieved context. If the context does not
contain the answer, say so honestly and set confidence=low.

## Retrieved Context

%s

## Examples

Good response (grounded):
{"answer":"To reset SSO, visit the portal...","confidence":"high","sources":[{"doc_id":"faq_001","chunk_id":"faq_001_chunk_2","snippet":"Click Forgot Password"}],"tool_calls":[],"needs_human_review":false,"reasoning":"Found direct answer in FAQ."}

Good response (no grounding):
{"answer":"I don't have information about this. Let me escalate.","confidence":"low","sources":[],"tool_calls":[],"needs_human_review":true,"reasoning":"No relevant docs retrieved."}

Now respond in strict JSON.`

func buildSystemPrompt(chunks []Source) string {
	var sb strings.Builder
	for i, c := range chunks {
		sb.WriteString(fmt.Sprintf("[%d] doc_id=%s chunk_id=%s\n%s\n\n",
			i+1, c.DocID, c.ChunkID, c.Snippet))
	}
	if sb.Len() == 0 {
		sb.WriteString("(no relevant context retrieved)")
	}
	return fmt.Sprintf(systemPromptTemplate, sb.String())
}

关键技巧总结：

显式Schema：在prompt里写完整schema，LLM遵循率更高
CRITICAL RULES段：用大写/强调词，LLM会更重视
Few-shot examples：给一个grounded的和一个低confidence的例子
Temperature=0.2：低温度让输出更确定
response_format=json_object：强制合法JSON

✅ 版本6：Repair Loop（进阶）

问题： LLM输出违反约束，能不能让它自己修复？

// internal/agent/repair.go
func (g *Generator) GenerateWithRepair(ctx context.Context, query string, chunks []Source) (*AgentAnswer, error) {
	answer, err := g.Generate(ctx, query, chunks)
	var vErr *ValidationError
	if !errors.As(err, &vErr) {
		return answer, err // 成功或非validation错误
	}

	// 让LLM自己修复
	repairPrompt := fmt.Sprintf(
		"Your previous response violated these rules:\n%s\n\nOriginal response:\n%s\n\nFix it and return strict JSON again.",
		formatViolations(vErr.Violations), vErr.Raw,
	)

	resp, err := g.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
		Model: g.model,
		Messages: []openai.ChatCompletionMessage{
			{Role: openai.ChatMessageRoleUser, Content: repairPrompt},
		},
		ResponseFormat: &openai.ChatCompletionResponseFormat{
			Type: openai.ChatCompletionResponseFormatTypeJSONObject,
		},
	})
	if err != nil {
		return nil, err
	}

	var repaired AgentAnswer
	if err := json.Unmarshal([]byte(resp.Choices[0].Message.Content), &repaired); err != nil {
		return nil, err
	}
	return &repaired, nil
}

注意： Repair loop最多1-2次，否则成本爆炸。如果反复失败，应该记录badcase并走fallback（返回人工兜底答案）。

第三部分：关键概念

1. 契约优先（Contract First）

Structured Output是Agent层与业务层之间的API契约。定义契约时要问：

谁是消费者？（工单系统、审计日志、前端UI）
消费者需要什么字段来做决策？
哪些字段缺失/错误会导致生产事故？

2. 约束分层

层	例子	检查方式
语法层	是否合法JSON	`json.Unmarshal`
Schema层	字段类型、枚举	JSON Schema validator
业务层	sources空不能high	自定义Validate()

三层都要有，不能漏。

3. Grounding与Hallucination

Grounded answer：每句话能追溯到sources中的原文
Hallucinated answer：sources空却信誓旦旦

Structured Output通过sources字段把grounding可观测化。

4. Temperature vs Structured Output

闲聊：temperature=0.7-1.0
结构化输出：temperature=0.0-0.3
不要用temperature=0，偶尔会陷入死循环

第四部分：自测清单

运行前，问自己：

我能说出AgentAnswer每个字段给谁用？
为什么confidence用enum而不是float？
写出3条字段间的约束规则
response_format和JSON Schema的区别是什么？
Repair loop为什么不能无限重试？
如果LLM返回的JSON合法但违反业务规则，错误该返回给用户还是吞掉？

第五部分：作业

任务1：实现AgentAnswer + Validate

完成internal/agent/answer.go和validate.go
写至少5个table-driven test覆盖每条规则
go test ./internal/agent/...全绿

任务2：接入真实LLM

实现Generator.Generate，用response_format=json_object
用5个样例问题手动测试（其中2个应该返回low confidence）
记录3个badcase：LLM违反了哪条规则？

任务3：加入Repair Loop

实现GenerateWithRepair，最多repair 1次
统计repair的触发率和成功率

任务4：思考题

如果新增一个字段suggested_actions []string，该不该加validation规则？
下游工单系统要求ticket_id，但Agent不一定能产出，schema该怎么设计？

第六部分：常见问题解答

Q1: 为什么不用Go struct tag直接生成JSON Schema？

A: 可以，但Go原生不支持enum、inter-field约束等。推荐的库：

github.com/invopop/jsonschema
手写schema（字段少时更清晰）

生产项目建议手写，因为schema是契约，应该显式控制而非自动生成。

Q2: confidence由谁判定？LLM自评靠谱吗？

A: LLM自评偏乐观，这是已知问题。改进方法：

在prompt里提供明确的判定标准（"high=能直接引用原文；medium=需要综合多源；low=推测"）
训练阶段用eval对比LLM自评和人工标注，校准prompt
对关键场景（如涉及金额、权限），强制走人工审核不依赖self-confidence

Q3: sources的snippet要多长？

A: 50-200字符的原文片段。太短不能溯源，太长污染响应。前端展示"出处"时直接用。

Q4: 如果LLM输出的JSON超长被截断怎么办？

设置max_tokens留够余量
检测finish_reason=length，若命中则报错并返回fallback
Reasoning字段限长（prompt里加"reasoning必须≤100字"）

Q5: ToolCall为什么既要记Arguments又要记Result？

A: 审计和replay。出问题时需要能还原"Agent调用了create_ticket(title=X)，返回了ticket_id=123"的完整链路。

配套算法题

今天的算法主题：图/矩阵BFS-DFS。对应Agent系统中的"检索图谱遍历"思维。

题1：Number of Islands (LeetCode 200) - Medium

// 给定'1'（陆地）'0'（水）的2D grid，返回岛屿数
func numIslands(grid [][]byte) int {
	if len(grid) == 0 {
		return 0
	}
	m, n := len(grid), len(grid[0])
	count := 0

	var dfs func(i, j int)
	dfs = func(i, j int) {
		if i < 0 || i >= m || j < 0 || j >= n || grid[i][j] != '1' {
			return
		}
		grid[i][j] = '0' // 标记访问
		dfs(i+1, j)
		dfs(i-1, j)
		dfs(i, j+1)
		dfs(i, j-1)
	}

	for i := 0; i < m; i++ {
		for j := 0; j < n; j++ {
			if grid[i][j] == '1' {
				count++
				dfs(i, j)
			}
		}
	}
	return count
}

讲解： DFS模板题。grid[i][j]='0'原地标记避免重复访问。时间O(mn)，空间O(mn)递归栈。

追问： 如果不能修改grid怎么办？→ 用visited [][]bool。

题2：Max Area of Island (LeetCode 695) - Medium

func maxAreaOfIsland(grid [][]int) int {
	m, n := len(grid), len(grid[0])
	maxArea := 0

	var dfs func(i, j int) int
	dfs = func(i, j int) int {
		if i < 0 || i >= m || j < 0 || j >= n || grid[i][j] != 1 {
			return 0
		}
		grid[i][j] = 0
		return 1 + dfs(i+1, j) + dfs(i-1, j) + dfs(i, j+1) + dfs(i, j-1)
	}

	for i := 0; i < m; i++ {
		for j := 0; j < n; j++ {
			if grid[i][j] == 1 {
				if a := dfs(i, j); a > maxArea {
					maxArea = a
				}
			}
		}
	}
	return maxArea
}

讲解： 相比Number of Islands，DFS返回值改为"本次连通区域大小"。

题3：Clone Graph (LeetCode 133) - Medium

type Node struct {
	Val       int
	Neighbors []*Node
}

func cloneGraph(node *Node) *Node {
	if node == nil {
		return nil
	}
	visited := map[*Node]*Node{}

	var clone func(n *Node) *Node
	clone = func(n *Node) *Node {
		if cp, ok := visited[n]; ok {
			return cp
		}
		cp := &Node{Val: n.Val}
		visited[n] = cp // 必须先登记再递归，防止环
		for _, nb := range n.Neighbors {
			cp.Neighbors = append(cp.Neighbors, clone(nb))
		}
		return cp
	}
	return clone(node)
}

讲解： 图的深拷贝。核心：用map记录"原节点→新节点"，先登记再递归，否则环会无限循环。

联系Agent： 对应ToolCall链路的深拷贝——审计日志中需要snapshot整条调用链，不能被后续mutation影响。

题4：Word Ladder (LeetCode 127) - Hard（选做）

BFS求最短路径。每次变一个字母，从beginWord到endWord的最短步数。

提示： 构建wordList的邻接关系时，可以用"通配模式"：hot → *ot, h*t, ho*作为桥接节点，避免O(n²)的相邻检查。

下一步：Day 6 预告

明天我们会：

实现Retry/Backoff机制（transient vs non-retryable错误分类）
Exponential backoff + ctx.Done()正确交互
Idempotency key保证工单不重复创建
Sync.Map / Redis做去重存储

准备问题：

context.Canceled和context.DeadlineExceeded有什么区别？
什么是"at-least-once" vs "exactly-once"？幂等性属于哪个？
两个goroutine同时提交同一请求，你的系统会创建两个工单吗？

ON THIS PAGE

Week 1 Day 5：Structured Output - 苏格拉底教学#

学习目标#

第一部分：问题驱动#

🤔 问题1：为什么不能让LLM直接返回一段文字？#

🤔 问题2：一个好的AgentAnswer应该包含什么字段？#

🤔 问题3：字段之间的约束关系是什么？#

第二部分：动手实现#

✅ 版本1：定义AgentAnswer struct#

✅ 版本2：Validation规则#

✅ 版本3：JSON Schema生成#

✅ 版本4：调用OpenAI with Response Format#

✅ 版本5：Prompt Engineering - 让LLM稳定输出合法JSON#

✅ 版本6：Repair Loop（进阶）#

第三部分：关键概念#

1. 契约优先（Contract First）#

2. 约束分层#

3. Grounding与Hallucination#

4. Temperature vs Structured Output#

第四部分：自测清单#

第五部分：作业#

任务1：实现AgentAnswer + Validate#

任务2：接入真实LLM#

任务3：加入Repair Loop#

任务4：思考题#

第六部分：常见问题解答#

配套算法题#

题1：Number of Islands (LeetCode 200) - Medium#

题2：Max Area of Island (LeetCode 695) - Medium#

题3：Clone Graph (LeetCode 133) - Medium#

题4：Word Ladder (LeetCode 127) - Hard（选做）#

下一步：Day 6 预告#