Week 4 Day 24:Eval Runner + CI集成 - 让Agent质量不退化 ⚠️

💡 没有自动eval的Agent项目就是定时炸弹:prompt改一行、模型换一个版本都可能让准确率崩盘。今天把Day 12的eval dataset接上CI,让每个PR自动跑回归测试。

难度警告:本day涉及CI/CD、并发控制、统计分析,是Week 4最硬的一天。

第一部分:问题驱动

🤔 问题1:为什么手动测试不够?

引导问题:

  1. 你改了system prompt一行,怎么确认50个case都没有回归?
  2. GPT-4.1和GPT-4.5哪个适合你的场景?怎么量化对比?
  3. 3个月后有人改了retrieval逻辑,准确率悄悄从90%掉到70%,你怎么发现?

答案揭示:

手动测试的问题:

  • 覆盖不全:人不可能每次跑50+ cases
  • 主观:今天觉得"还行",明天觉得"不太对"
  • 不可重复:结果无法trend分析
  • 无门禁:坏代码可能混入main

自动eval的价值:

  • 回归保护:PR自动跑,降级直接挂
  • 量化对比:A/B两个版本精确打分
  • 趋势可见:历史accuracy趋势图
  • 决策支持:换模型、改prompt有数据支撑

🤔 问题2:Eval runner应该是什么形态?

引导问题:

  1. 跑一次50 cases要5分钟,CI超时怎么办?
  2. Eval要不要消耗真实OpenAI额度?
  3. Judge(LLM判分)本身不稳定,怎么让结果reproducible?

答案揭示:

设计要求:

  • CLI命令agent eval --dataset eval.jsonl --out report.json
  • 并发执行:goroutine池,50 cases并行跑
  • 阈值判断--threshold 0.85,低于就exit非零
  • 报告输出:JSON+Markdown,CI可以comment
  • 成本控制:支持--max-concurrency限速,--mock本地测试

🤔 问题3:CI里什么时候跑eval?

引导问题:

  1. 每个commit跑?太贵
  2. 只在main分支跑?发现问题晚了
  3. 只手动trigger?大家会忘

答案揭示:

分层策略:

commit push → unit tests (免费、快) PR opened/updated → eval on 10 smoke cases (~$0.5) PR labeled "full-eval" → eval on all 50 cases (~$3) merge to main → full eval + 写入history nightly → full eval + 对比baseline

第二部分:动手实现

✅ 版本1:Eval数据格式(复用Day 12)

{"id":"c001","input":"如何重置SSO密码?","expected_answer":"访问SSO门户...","expected_tools":["search"],"tags":["faq","sso"]}
{"id":"c002","input":"帮我创建一个ticket,标题是'网络慢'","expected_action":"draft_ticket","required_approval":true,"tags":["write","approval"]}

关键字段:

  • id: 稳定ID
  • input: 用户消息
  • expected_*: 期望结果(可能是answer、tool序列、action等)
  • tags: 分类,可按tag过滤和统计

✅ 版本2:Eval Runner核心逻辑

// cmd/evalrunner/main.go
package main

import (
    "context"
    "encoding/json"
    "flag"
    "fmt"
    "os"
    "sync"
    "time"

    "github.com/yourname/agent-runtime/internal/agent"
    "github.com/yourname/agent-runtime/internal/eval"
)

func main() {
    var (
        datasetPath   = flag.String("dataset", "evals/dataset.jsonl", "eval dataset path")
        outputPath    = flag.String("out", "report.json", "output report path")
        threshold     = flag.Float64("threshold", 0.85, "pass rate threshold")
        concurrency   = flag.Int("concurrency", 5, "concurrent cases")
        tagFilter     = flag.String("tags", "", "filter by tag (comma-separated)")
        mock          = flag.Bool("mock", false, "use mock LLM")
    )
    flag.Parse()

    // 1. 加载数据集
    cases, err := eval.LoadDataset(*datasetPath)
    if err != nil {
        fmt.Fprintf(os.Stderr, "load dataset: %v\n", err)
        os.Exit(1)
    }

    if *tagFilter != "" {
        cases = eval.FilterByTags(cases, *tagFilter)
    }
    fmt.Printf("Running %d cases with concurrency=%d\n", len(cases), *concurrency)

    // 2. 初始化agent
    ag, err := agent.New(agent.Config{Mock: *mock})
    if err != nil {
        fmt.Fprintf(os.Stderr, "init agent: %v\n", err)
        os.Exit(1)
    }

    // 3. 并发执行
    results := runConcurrent(ag, cases, *concurrency)

    // 4. 聚合报告
    report := eval.Aggregate(results)
    report.Threshold = *threshold
    report.StartTime = time.Now()

    // 5. 写报告
    if err := writeReport(report, *outputPath); err != nil {
        fmt.Fprintf(os.Stderr, "write report: %v\n", err)
        os.Exit(1)
    }

    // 6. 根据阈值决定退出码(给CI用)
    fmt.Printf("\n===== Summary =====\n")
    fmt.Printf("Total:       %d\n", report.Total)
    fmt.Printf("Passed:      %d (%.1f%%)\n", report.Passed, report.PassRate*100)
    fmt.Printf("Failed:      %d\n", report.Failed)
    fmt.Printf("AvgLatency:  %.0fms\n", report.AvgLatencyMs)
    fmt.Printf("TotalCost:   $%.4f\n", report.TotalCostUSD)

    if report.PassRate < *threshold {
        fmt.Printf("\n❌ FAILED: pass rate %.2f < threshold %.2f\n", report.PassRate, *threshold)
        os.Exit(1)
    }
    fmt.Printf("\n✅ PASSED\n")
}

func runConcurrent(ag *agent.Agent, cases []eval.Case, concurrency int) []eval.Result {
    sem := make(chan struct{}, concurrency)
    results := make([]eval.Result, len(cases))
    var wg sync.WaitGroup

    for i, c := range cases {
        wg.Add(1)
        sem <- struct{}{}
        go func(idx int, cs eval.Case) {
            defer wg.Done()
            defer func() { <-sem }()

            start := time.Now()
            ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
            defer cancel()

            resp, err := ag.HandleRequest(ctx, agent.Request{
                UserID:  "eval",
                Message: cs.Input,
            })

            res := eval.Result{
                CaseID:    cs.ID,
                Input:     cs.Input,
                Expected:  cs.ExpectedAnswer,
                Actual:    resp.Answer,
                LatencyMs: time.Since(start).Milliseconds(),
                Tags:      cs.Tags,
            }

            if err != nil {
                res.Passed = false
                res.Error = err.Error()
            } else {
                // 调judge打分
                score, reason := eval.Judge(ctx, cs, resp)
                res.Score = score
                res.Reason = reason
                res.Passed = score >= 0.7  // 0.7为pass阈值
                res.CostUSD = resp.CostUSD
            }

            results[idx] = res
            fmt.Printf("[%d/%d] %s: %s (score=%.2f, %dms)\n",
                idx+1, len(results), cs.ID, passStatus(res.Passed), res.Score, res.LatencyMs)
        }(i, c)
    }

    wg.Wait()
    return results
}

func passStatus(p bool) string {
    if p { return "PASS" }
    return "FAIL"
}

✅ 版本3:Judge(LLM-as-Judge)

// internal/eval/judge.go
package eval

import (
    "context"
    "encoding/json"
    "fmt"

    "github.com/yourname/agent-runtime/internal/llm"
)

const judgePrompt = `你是eval判分员。对比expected和actual answer,从以下维度打分:
- correctness(是否回答正确):0-1
- completeness(是否信息完整):0-1
- relevance(是否切题):0-1

总分 = 加权平均。严格但公正。

Input: %s
Expected: %s
Actual: %s

输出JSON:{"score": 0.x, "reason": "..."}`

type JudgeOutput struct {
    Score  float64 `json:"score"`
    Reason string  `json:"reason"`
}

func Judge(ctx context.Context, cs Case, actual agent.Response) (float64, string) {
    judge := llm.NewClient("gpt-4o")  // judge用强模型

    prompt := fmt.Sprintf(judgePrompt, cs.Input, cs.ExpectedAnswer, actual.Answer)

    resp, err := judge.Chat(ctx, []llm.Message{
        {Role: "user", Content: prompt},
    }, llm.WithResponseFormat("json_object"))
    if err != nil {
        return 0, fmt.Sprintf("judge error: %v", err)
    }

    var out JudgeOutput
    if err := json.Unmarshal([]byte(resp.Content), &out); err != nil {
        return 0, "judge parse error"
    }

    return out.Score, out.Reason
}

Judge的陷阱:

  • 波动大 → 用确定性强的prompt + temperature=0
  • 自我偏见 → Judge不要用和被测相同的model
  • 长answer偏好 → 在prompt里显式惩罚冗长

✅ 版本4:聚合报告

// internal/eval/report.go
package eval

import "time"

type Result struct {
    CaseID    string   `json:"case_id"`
    Input     string   `json:"input"`
    Expected  string   `json:"expected"`
    Actual    string   `json:"actual"`
    Passed    bool     `json:"passed"`
    Score     float64  `json:"score"`
    Reason    string   `json:"reason"`
    LatencyMs int64    `json:"latency_ms"`
    CostUSD   float64  `json:"cost_usd"`
    Error     string   `json:"error,omitempty"`
    Tags      []string `json:"tags"`
}

type Report struct {
    StartTime     time.Time         `json:"start_time"`
    Total         int               `json:"total"`
    Passed        int               `json:"passed"`
    Failed        int               `json:"failed"`
    PassRate      float64           `json:"pass_rate"`
    Threshold     float64           `json:"threshold"`
    AvgScore      float64           `json:"avg_score"`
    AvgLatencyMs  float64           `json:"avg_latency_ms"`
    P95LatencyMs  int64             `json:"p95_latency_ms"`
    TotalCostUSD  float64           `json:"total_cost_usd"`
    ByTag         map[string]TagStat `json:"by_tag"`
    Results       []Result          `json:"results"`
}

type TagStat struct {
    Total   int     `json:"total"`
    Passed  int     `json:"passed"`
    PassRate float64 `json:"pass_rate"`
}

func Aggregate(results []Result) Report {
    r := Report{
        Total:   len(results),
        Results: results,
        ByTag:   make(map[string]TagStat),
    }

    var totalScore, totalLatency, totalCost float64
    latencies := make([]int64, 0, len(results))

    for _, res := range results {
        if res.Passed {
            r.Passed++
        }
        totalScore += res.Score
        totalLatency += float64(res.LatencyMs)
        totalCost += res.CostUSD
        latencies = append(latencies, res.LatencyMs)

        for _, tag := range res.Tags {
            stat := r.ByTag[tag]
            stat.Total++
            if res.Passed {
                stat.Passed++
            }
            r.ByTag[tag] = stat
        }
    }

    r.Failed = r.Total - r.Passed
    if r.Total > 0 {
        r.PassRate = float64(r.Passed) / float64(r.Total)
        r.AvgScore = totalScore / float64(r.Total)
        r.AvgLatencyMs = totalLatency / float64(r.Total)
    }
    r.TotalCostUSD = totalCost

    for tag, stat := range r.ByTag {
        if stat.Total > 0 {
            stat.PassRate = float64(stat.Passed) / float64(stat.Total)
        }
        r.ByTag[tag] = stat
    }

    r.P95LatencyMs = percentile(latencies, 0.95)
    return r
}

func percentile(nums []int64, p float64) int64 {
    if len(nums) == 0 { return 0 }
    sorted := make([]int64, len(nums))
    copy(sorted, nums)
    sort.Slice(sorted, func(i, j int) bool { return sorted[i] < sorted[j] })
    idx := int(float64(len(sorted)) * p)
    if idx >= len(sorted) { idx = len(sorted) - 1 }
    return sorted[idx]
}

✅ 版本5:Markdown报告(给PR comment用)

// internal/eval/markdown.go
package eval

import (
    "fmt"
    "strings"
)

func (r Report) ToMarkdown() string {
    var b strings.Builder

    status := "✅ PASS"
    if r.PassRate < r.Threshold {
        status = "❌ FAIL"
    }

    b.WriteString(fmt.Sprintf("## Eval Report %s\n\n", status))
    b.WriteString(fmt.Sprintf("**Pass rate**: %.1f%% (%d/%d), threshold %.1f%%\n\n",
        r.PassRate*100, r.Passed, r.Total, r.Threshold*100))

    b.WriteString("| Metric | Value |\n|---|---|\n")
    b.WriteString(fmt.Sprintf("| Avg score | %.2f |\n", r.AvgScore))
    b.WriteString(fmt.Sprintf("| Avg latency | %.0fms |\n", r.AvgLatencyMs))
    b.WriteString(fmt.Sprintf("| p95 latency | %dms |\n", r.P95LatencyMs))
    b.WriteString(fmt.Sprintf("| Total cost | $%.4f |\n\n", r.TotalCostUSD))

    // 按tag拆分
    if len(r.ByTag) > 0 {
        b.WriteString("### Breakdown by tag\n\n")
        b.WriteString("| Tag | Pass rate |\n|---|---|\n")
        for tag, stat := range r.ByTag {
            b.WriteString(fmt.Sprintf("| %s | %.1f%% (%d/%d) |\n",
                tag, stat.PassRate*100, stat.Passed, stat.Total))
        }
        b.WriteString("\n")
    }

    // 失败cases详情
    failedCount := 0
    for _, res := range r.Results {
        if !res.Passed {
            failedCount++
        }
    }

    if failedCount > 0 {
        b.WriteString(fmt.Sprintf("### Failed cases (%d)\n\n<details>\n\n", failedCount))
        for _, res := range r.Results {
            if !res.Passed {
                b.WriteString(fmt.Sprintf("**%s** (score=%.2f)\n", res.CaseID, res.Score))
                b.WriteString(fmt.Sprintf("- Input: %s\n", res.Input))
                b.WriteString(fmt.Sprintf("- Expected: %s\n", truncate(res.Expected, 200)))
                b.WriteString(fmt.Sprintf("- Actual: %s\n", truncate(res.Actual, 200)))
                b.WriteString(fmt.Sprintf("- Reason: %s\n\n", res.Reason))
            }
        }
        b.WriteString("</details>\n")
    }

    return b.String()
}

func truncate(s string, n int) string {
    if len(s) <= n { return s }
    return s[:n] + "..."
}

✅ 版本6:GitHub Actions CI集成

# .github/workflows/eval.yml
name: Eval

on:
  pull_request:
    branches: [main]
  workflow_dispatch:

jobs:
  eval:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4

      - name: Setup Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.22'

      - name: Cache Go modules
        uses: actions/cache@v4
        with:
          path: |
            ~/.cache/go-build
            ~/go/pkg/mod
          key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}

      - name: Build eval runner
        run: go build -o evalrunner ./cmd/evalrunner

      - name: Determine dataset
        id: dataset
        run: |
          if ${{ contains(github.event.pull_request.labels.*.name, 'full-eval') }}; then
            echo "path=evals/dataset.jsonl" >> $GITHUB_OUTPUT
            echo "threshold=0.85" >> $GITHUB_OUTPUT
          else
            echo "path=evals/smoke.jsonl" >> $GITHUB_OUTPUT
            echo "threshold=0.80" >> $GITHUB_OUTPUT
          fi

      - name: Run eval
        id: eval
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          ./evalrunner \
            --dataset ${{ steps.dataset.outputs.path }} \
            --threshold ${{ steps.dataset.outputs.threshold }} \
            --out report.json \
            --concurrency 5
        continue-on-error: true

      - name: Generate markdown report
        if: always()
        run: |
          go run ./cmd/evalrunner/report --in report.json --format markdown > report.md

      - name: Comment PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const body = fs.readFileSync('report.md', 'utf8');
            
            // 查找已有的eval comment,更新或创建
            const { data: comments } = await github.rest.issues.listComments({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
            });
            const existing = comments.find(c => c.body.includes('## Eval Report'));
            
            if (existing) {
              await github.rest.issues.updateComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                comment_id: existing.id,
                body: body,
              });
            } else {
              await github.rest.issues.createComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.issue.number,
                body: body,
              });
            }

      - name: Upload report artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: eval-report
          path: |
            report.json
            report.md

      - name: Fail if eval failed
        if: steps.eval.outcome != 'success'
        run: exit 1

关键设计:

  • Label-based选择:full-eval label触发全量
  • Comment去重:查已有eval comment再更新
  • Artifact上传:事后可下载分析
  • 无论成败都评论(if: always()

第三部分:关键概念

为什么separate smoke vs full dataset?

smoke.jsonl : 10个精选最关键case,每次PR跑,$0.5/次 dataset.jsonl: 50+全量case,full-eval label或main才跑,$3/次

平衡成本和覆盖。

阈值怎么定?

初始化:

  1. 跑一次现有版本,记baseline(比如85%)
  2. 阈值设为baseline - 5% buffer = 80%
  3. 每次提升baseline,相应提升阈值

警戒线:

  • PR低于threshold → block merge
  • Main分支连续2天低于baseline → 告警

Judge稳定性技巧

// 1. temperature=0
llm.WithTemperature(0)

// 2. 确定性prompt(输出JSON固定schema)
// 3. 多次跑取中位数
// 4. Judge用强模型(gpt-4o),被测可以弱(gpt-4o-mini)
// 5. 对比模式:只判断"A优于B还是B优于A",比绝对打分稳定

对抗"eval作弊"

防止改prompt专门过eval的问题:

  • Dataset定期扩充新case
  • 分hold-out set(30%不给模型看)
  • 监控production real traffic vs eval gap

第四部分:自测清单

  • eval runner的exit code规则?
  • 并发控制用了什么模式?(信号量)
  • LLM-as-Judge的风险点?
  • smoke vs full dataset分别什么时候用?
  • GitHub Actions如何secure地用OPENAI_API_KEY?
  • 怎么让PR comment不刷屏(去重策略)?

第五部分:作业

任务1:CLI Runner

  • 实现evalrunner二进制
  • 支持--dataset --threshold --out --concurrency
  • 本地./evalrunner --dataset evals/dataset.jsonl能跑通

任务2:Report输出

  • JSON格式report
  • Markdown格式report(含failed cases详情)
  • 按tag breakdown

任务3:GitHub Actions

  • .github/workflows/eval.yml
  • Secrets配置OPENAI_API_KEY
  • PR触发smoke eval
  • Comment自动贴到PR

任务4:实验

  • 故意改坏prompt
  • 观察eval是否fail
  • 恢复,确认pass

第六部分:常见问题解答

Q: Eval太贵,怎么省钱?

A:

  1. PR只跑smoke(10 cases)
  2. gpt-4o-mini作为被测模型
  3. Judge用gpt-4o-mini也行(准确率略降)
  4. 加cache:相同input的result不重新跑
  5. Label触发full eval(开发者判断何时需要)

Q: Flaky eval怎么办?

A:

  1. 诊断:多跑几次,看是不是真随机
  2. 稳定化:temp=0, judge固定prompt
  3. 容忍:阈值留buffer
  4. 极端case:标记为@flaky,不计入pass rate

Q: GitHub Actions超时?

A:

  1. 并发提高(--concurrency 10
  2. 拆workflow:smoke必跑,full异步
  3. self-hosted runner(更多资源)

Q: 怎么追踪历史趋势?

A:

  • 每次main merge写入evals/history.jsonl
  • 用GitHub Pages做简单dashboard
  • 或接Datadog/Grafana

Q: Eval改动也要review吗?

A: 必须!dataset属于"考试题",改题要和改代码分开PR review,避免"改题来过eval"。


配套算法题

题1:Serialize and Deserialize Binary Tree (Hard)

题目: 设计一个算法,将二叉树序列化为字符串,并能从该字符串反序列化还原原始树。序列化/反序列化的格式不限,但需要自洽。

思路: 用 BFS(层序遍历)序列化:逐层入队,空节点记为 "#",非空节点记录值,各节点用逗号分隔。反序列化按同样层序顺序重建:用队列记录待分配子节点的父节点,依次从 tokens 中取值挂左/右子树。

package main

import (
    "fmt"
    "strconv"
    "strings"
)

type TreeNode struct {
    Val   int
    Left  *TreeNode
    Right *TreeNode
}

// Codec 持有序列化/反序列化方法
type Codec struct{}

// serialize 使用 BFS 层序遍历将树转为字符串
func (c *Codec) serialize(root *TreeNode) string {
    if root == nil {
        return ""
    }
    var sb strings.Builder
    queue := []*TreeNode{root}
    for len(queue) > 0 {
        node := queue[0]
        queue = queue[1:]
        if node == nil {
            sb.WriteString("#,")
        } else {
            sb.WriteString(strconv.Itoa(node.Val))
            sb.WriteByte(',')
            queue = append(queue, node.Left, node.Right)
        }
    }
    return sb.String()
}

// deserialize 从 BFS 序列化字符串还原树
func (c *Codec) deserialize(data string) *TreeNode {
    if data == "" {
        return nil
    }
    tokens := strings.Split(strings.TrimRight(data, ","), ",")
    if len(tokens) == 0 || tokens[0] == "#" {
        return nil
    }
    rootVal, _ := strconv.Atoi(tokens[0])
    root := &TreeNode{Val: rootVal}
    queue := []*TreeNode{root}
    i := 1
    for len(queue) > 0 && i < len(tokens) {
        node := queue[0]
        queue = queue[1:]
        // 处理左子节点
        if i < len(tokens) && tokens[i] != "#" {
            val, _ := strconv.Atoi(tokens[i])
            node.Left = &TreeNode{Val: val}
            queue = append(queue, node.Left)
        }
        i++
        // 处理右子节点
        if i < len(tokens) && tokens[i] != "#" {
            val, _ := strconv.Atoi(tokens[i])
            node.Right = &TreeNode{Val: val}
            queue = append(queue, node.Right)
        }
        i++
    }
    return root
}

func main() {
    c := Codec{}
    root := &TreeNode{
        Val:  1,
        Left: &TreeNode{Val: 2},
        Right: &TreeNode{
            Val:   3,
            Left:  &TreeNode{Val: 4},
            Right: &TreeNode{Val: 5},
        },
    }
    data := c.serialize(root)
    fmt.Println("Serialized:", data)
    restored := c.deserialize(data)
    fmt.Println("Root val:", restored.Val)           // 1
    fmt.Println("Right.Left val:", restored.Right.Left.Val) // 4
}

复杂度: 时间 O(n),空间 O(n)(队列最大存一层节点,最差 n/2)。

变体/面试追问:

  1. 前序(DFS)序列化和 BFS 序列化各有什么优劣?(DFS 代码更简洁;BFS 更符合直觉层级结构)
  2. 如果树的节点值可以为负数或含特殊字符,分隔符和 null 占位符该如何设计?

题2:Binary Tree Maximum Path Sum (Hard)

题目: 给定一个二叉树,找出路径节点值之和最大的路径。路径可以从任意节点出发,到达任意节点,不必经过根节点,但每个节点最多出现一次。

思路: DFS 后序遍历。对每个节点,分别计算左、右子树能贡献的最大单向路径和(负数则取 0,表示不选)。以当前节点为"弯折顶点"时,最大路径和 = node.Val + left + right,用全局变量记录最大值。函数返回值只能是单向路径(node.Val + max(left, right)),供父节点使用。

package main

import (
    "fmt"
    "math"
)

type TreeNode struct {
    Val   int
    Left  *TreeNode
    Right *TreeNode
}

func maxPathSum(root *TreeNode) int {
    maxSum := math.MinInt32

    var dfs func(node *TreeNode) int
    dfs = func(node *TreeNode) int {
        if node == nil {
            return 0
        }
        // 左右子树的最大单向贡献值,负数不选(取0)
        leftGain := dfs(node.Left)
        if leftGain < 0 {
            leftGain = 0
        }
        rightGain := dfs(node.Right)
        if rightGain < 0 {
            rightGain = 0
        }

        // 以当前节点为弯折顶点的最大路径和
        pathThrough := node.Val + leftGain + rightGain
        if pathThrough > maxSum {
            maxSum = pathThrough
        }

        // 返回单向最大值(只能选左或右一侧继续延伸)
        if leftGain > rightGain {
            return node.Val + leftGain
        }
        return node.Val + rightGain
    }

    dfs(root)
    return maxSum
}

func main() {
    // 树:[-10, 9, 20, nil, nil, 15, 7]
    root := &TreeNode{
        Val:  -10,
        Left: &TreeNode{Val: 9},
        Right: &TreeNode{
            Val:   20,
            Left:  &TreeNode{Val: 15},
            Right: &TreeNode{Val: 7},
        },
    }
    fmt.Println(maxPathSum(root)) // 42 (15->20->7)

    // 单节点负数
    fmt.Println(maxPathSum(&TreeNode{Val: -3})) // -3
}

复杂度: 时间 O(n),空间 O(h)(h 为树高,递归栈)。

变体/面试追问:

  1. 如果要同时返回最大路径的节点序列(而非仅数值),如何修改?(在 dfs 中额外维护路径记录,弯折点处合并左右路径)
  2. 如果树非常深(接近 n 层),递归栈溢出怎么解决?(迭代后序遍历 + 显式栈模拟)

题3:Construct Binary Tree from Preorder and Inorder Traversal (Medium)

题目: 给出前序遍历和中序遍历的结果,构造出原始二叉树。假设树中没有重复值。

思路: 前序遍历第一个元素是根节点。在中序遍历中找到根节点位置,其左侧为左子树的中序序列,右侧为右子树的中序序列。用哈希表缓存中序索引,避免每次线性查找,将总复杂度从 O(n²) 降至 O(n)。

package main

import "fmt"

type TreeNode struct {
    Val   int
    Left  *TreeNode
    Right *TreeNode
}

func buildTree(preorder []int, inorder []int) *TreeNode {
    // 建立中序值→下标的哈希索引
    inorderIdx := make(map[int]int, len(inorder))
    for i, v := range inorder {
        inorderIdx[v] = i
    }

    var build func(preL, preR, inL, inR int) *TreeNode
    build = func(preL, preR, inL, inR int) *TreeNode {
        if preL > preR {
            return nil
        }
        // 前序第一个元素是当前子树的根
        rootVal := preorder[preL]
        root := &TreeNode{Val: rootVal}

        // 在中序中找根的位置
        mid := inorderIdx[rootVal]
        leftSize := mid - inL // 左子树节点数

        // 递归构造左子树和右子树
        root.Left = build(preL+1, preL+leftSize, inL, mid-1)
        root.Right = build(preL+leftSize+1, preR, mid+1, inR)
        return root
    }

    n := len(preorder)
    return build(0, n-1, 0, n-1)
}

func main() {
    preorder := []int{3, 9, 20, 15, 7}
    inorder := []int{9, 3, 15, 20, 7}
    root := buildTree(preorder, inorder)
    fmt.Println(root.Val)        // 3
    fmt.Println(root.Left.Val)   // 9
    fmt.Println(root.Right.Val)  // 20
    fmt.Println(root.Right.Left.Val)  // 15
    fmt.Println(root.Right.Right.Val) // 7
}

复杂度: 时间 O(n),空间 O(n)(哈希表 + 递归栈)。

变体/面试追问:

  1. 如果给的是后序遍历和中序遍历,怎么构造?(后序最后一个元素是根,思路完全对称)
  2. 只给前序和后序遍历能唯一确定二叉树吗?(不能,存在多棵满足条件的树;只有完全二叉树可以)

下一步:Day 25 预告

明天我们会:

  1. 成本分析:tokens × 单价,月度账单预估
  2. 缓存策略:prompt级、响应级、带TTL
  3. 模型路由:简单query用小模型
  4. Fallback链:主模型挂了切备用
  5. 延迟优化:并发tool call、流式响应

准备问题:

  • 每次请求平均多少token?成本多少?
  • 哪些query不需要GPT-4o也能搞定?
  • 流式响应为什么感觉更快?