Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision. Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions. Go Benchmarking & Performance Measurement Performance improvement does not exist without measures — if you can measure it, you can improve it. This skill covers the full measurement workflo…

to skip unit tests during benchmark runs\"}\n ]\n },\n {\n \"id\": 4,\n \"name\": \"benchstat-output-interpretation\",\n \"description\": \"Tests understanding of benchstat output format including p-value, tilde symbol, and confidence intervals\",\n \"prompt\": \"I ran benchstat and got this output. What does it mean? Should I merge my optimization?\\n\\n```\\n │ old.txt │ new.txt │\\n │ sec/op │ sec/op vs base │\\nParse-16 4.592µ ± 8% 4.481µ ± 7% ~ (p=0.089 n=10)\\n```\",\n \"trap\": \"Without the skill, the model may interpret the lower number as an improvement and recommend merging\",\n \"assertions\": [\n {\"id\": \"4.1\", \"text\": \"Explains that the ~ symbol means no statistically significant difference was detected\"},\n {\"id\": \"4.2\", \"text\": \"States that p=0.089 is above the 0.05 significance threshold\"},\n {\"id\": \"4.3\", \"text\": \"Notes that the wide confidence intervals (±8%, ±7%) overlap, making the result untrustworthy\"},\n {\"id\": \"4.4\", \"text\": \"Advises NOT to claim improvement based on this result\"},\n {\"id\": \"4.5\", \"text\": \"Suggests increasing -count to 20+ or reducing noise sources as next steps\"}\n ]\n },\n {\n \"id\": 5,\n \"name\": \"p-hacking-awareness\",\n \"description\": \"Tests whether the model warns against rerunning benchmarks until significance appears\",\n \"prompt\": \"My benchstat comparison keeps showing `~` (no significant difference). Can I just rerun the benchmarks a few more times and pick the run that shows significance?\",\n \"trap\": \"Without the skill, the model may naively agree or suggest cherry-picking favorable runs\",\n \"assertions\": [\n {\"id\": \"5.1\", \"text\": \"Explicitly warns against 'retry until significant' as selection bias / p-hacking\"},\n {\"id\": \"5.2\", \"text\": \"Explains that rerunning until ~ disappears introduces bias\"},\n {\"id\": \"5.3\", \"text\": \"Recommends increasing -count ONCE and accepting the result\"},\n {\"id\": \"5.4\", \"text\": \"Mentions that at alpha=0.05, ~5% of benchmarks will randomly show significance (false positives)\"},\n {\"id\": \"5.5\", \"text\": \"Suggests the change may genuinely have no measurable effect\"}\n ]\n },\n {\n \"id\": 6,\n \"name\": \"interleaving-benchmark-runs\",\n \"description\": \"Tests knowledge of interleaving old/new benchmark runs to reduce systematic bias\",\n \"prompt\": \"I'm comparing performance before and after an optimization. I ran all 10 baseline iterations first, then all 10 optimized iterations. My colleague says this approach is flawed. Why?\",\n \"trap\": \"Without the skill, the model may not know about systematic bias from sequential runs or may not suggest pre-compilation with go test -c\",\n \"assertions\": [\n {\"id\": \"6.1\", \"text\": \"Identifies systematic bias: thermal throttling, background processes, CPU frequency scaling can differ between the first batch and second batch\"},\n {\"id\": \"6.2\", \"text\": \"Recommends interleaving runs (alternating old/new) to reduce this bias\"},\n {\"id\": \"6.3\", \"text\": \"Recommends pre-compiling both versions with `go test -c` to avoid measuring compilation time\"},\n {\"id\": \"6.4\", \"text\": \"Shows running the pre-compiled test binaries directly (e.g., ./old.test -test.bench=...)\"},\n {\"id\": \"6.5\", \"text\": \"Explains that without pre-compilation, each go test -bench invocation includes compilation overhead that varies\"}\n ]\n },\n {\n \"id\": 7,\n \"name\": \"alloc-objects-vs-inuse-space\",\n \"description\": \"Tests understanding of when to use alloc_objects vs inuse_space heap profile types\",\n \"prompt\": \"My Go service has high GC CPU overhead (30% of CPU in runtime.mallocgc according to pprof). I captured a heap profile. Should I look at alloc_objects, alloc_space, or inuse_space?\",\n \"trap\": \"Without the skill, the model may suggest inuse_space (which shows live objects, not allocation rate) or alloc_space (which shows bytes, not count)\",\n \"assertions\": [\n {\"id\": \"7.1\", \"text\": \"Recommends alloc_objects as the primary choice for GC pressure / high allocation rate\"},\n {\"id\": \"7.2\", \"text\": \"Explains that alloc_objects counts allocation events and helps find high-frequency object churn driving GC work\"},\n {\"id\": \"7.3\", \"text\": \"Explains that inuse_space shows currently live objects and is for leak detection, not GC churn\"},\n {\"id\": \"7.4\", \"text\": \"Distinguishes alloc_space (total bytes allocated) as useful for reducing peak memory, not GC frequency\"},\n {\"id\": \"7.5\", \"text\": \"Mentions that runtime.mallocgc dominating CPU profile indicates allocation rate is the bottleneck, not computation\"}\n ]\n },\n {\n \"id\": 8,\n \"name\": \"pprof-flat-vs-cum\",\n \"description\": \"Tests understanding of flat vs cumulative time in pprof and when to use top -cum\",\n \"prompt\": \"I ran `go tool pprof cpu.prof` and the top command shows runtime.mallocgc, runtime.memmove, and runtime.scanobject as the top functions. These are all runtime functions I can't modify. How do I find which of MY functions is causing this?\",\n \"trap\": \"Without the skill, the model may suggest trying to optimize runtime functions or be unsure how to trace back to application code\",\n \"assertions\": [\n {\"id\": \"8.1\", \"text\": \"Recommends using `top -cum` to find application functions with high cumulative time\"},\n {\"id\": \"8.2\", \"text\": \"Explains that runtime functions appearing in top (flat) are symptoms, not causes — they're called by application code\"},\n {\"id\": \"8.3\", \"text\": \"Explains the difference: flat = time in the function itself, cum = time in the function + everything it calls\"},\n {\"id\": \"8.4\", \"text\": \"Suggests using `list` or `peek` to drill into the application functions that delegate to these runtime calls\"},\n {\"id\": \"8.5\", \"text\": \"Explains the 'flat low + cum high' pattern: the function is a coordinator calling expensive things\"}\n ]\n },\n {\n \"id\": 9,\n \"name\": \"escape-analysis-interpretation\",\n \"description\": \"Tests ability to use and interpret escape analysis output\",\n \"prompt\": \"My benchmark shows unexpected heap allocations for a function that only uses local variables. How can I find out why Go is allocating on the heap instead of the stack?\",\n \"trap\": \"Without the skill, the model may suggest generic profiling without mentioning the specific compiler flag for escape analysis\",\n \"assertions\": [\n {\"id\": \"9.1\", \"text\": \"Recommends `go build -gcflags=\\\"-m\\\"` to show escape decisions\"},\n {\"id\": \"9.2\", \"text\": \"Mentions `-m -m` (double -m) for verbose output showing the escape chain / reason\"},\n {\"id\": \"9.3\", \"text\": \"Lists common escape causes like returning pointer to local, interface boxing, closure captures\"},\n {\"id\": \"9.4\", \"text\": \"Notes that this analysis is free (compile-time, no runtime overhead)\"},\n {\"id\": \"9.5\", \"text\": \"Advises to only investigate escapes in hot functions identified by pprof, not all functions\"}\n ]\n },\n {\n \"id\": 10,\n \"name\": \"inlining-budget-and-blockers\",\n \"description\": \"Tests knowledge of Go's inlining cost budget and common blockers\",\n \"prompt\": \"I have a small helper function in a hot path that I want the Go compiler to inline. How do I check if it's being inlined, and what might prevent it?\",\n \"trap\": \"Without the skill, the model may not know the specific budget number (80) or all the blockers like defer, recover, go statements\",\n \"assertions\": [\n {\"id\": \"10.1\", \"text\": \"Recommends `go build -gcflags=\\\"-m\\\"` and grepping for 'can inline' or 'cannot inline'\"},\n {\"id\": \"10.2\", \"text\": \"Mentions the inline cost budget of 80 (as of Go 1.22+)\"},\n {\"id\": \"10.3\", \"text\": \"Lists defer as an inlining blocker\"},\n {\"id\": \"10.4\", \"text\": \"Lists recover() as an inlining blocker\"},\n {\"id\": \"10.5\", \"text\": \"Mentions that splitting large functions into smaller ones can help the hot inner function inline\"}\n ]\n },\n {\n \"id\": 11,\n \"name\": \"trace-vs-pprof-selection\",\n \"description\": \"Tests judgment on when to use execution trace vs pprof\",\n \"prompt\": \"My Go HTTP service has high P99 latency (500ms) but pprof CPU profile shows only 15% CPU utilization. The top functions in the CPU profile are all fast. Where is the time going?\",\n \"trap\": \"Without the skill, the model may suggest more CPU profiling or generic optimization, missing that this is a scheduling/blocking problem requiring the execution tracer\",\n \"assertions\": [\n {\"id\": \"11.1\", \"text\": \"Identifies this as a case where pprof is insufficient — low CPU with high latency means goroutines are waiting, not working\"},\n {\"id\": \"11.2\", \"text\": \"Recommends `go tool trace` (execution tracer) to see scheduling delays and blocking\"},\n {\"id\": \"11.3\", \"text\": \"Explains that pprof only shows on-CPU time; trace shows off-CPU waiting states\"},\n {\"id\": \"11.4\", \"text\": \"Suggests looking at goroutine states: yellow/orange (runnable but waiting for P) or red/pink (blocked on I/O, channel, mutex)\"},\n {\"id\": \"11.5\", \"text\": \"Mentions the -pprof=sync or -pprof=net flag to extract blocking profiles from trace data\"}\n ]\n },\n {\n \"id\": 12,\n \"name\": \"benchstat-unit-normalization\",\n \"description\": \"Tests understanding of benchstat's automatic unit normalization\",\n \"prompt\": \"I'm confused by benchstat output. My benchmark reports `ns/op` but benchstat shows `sec/op` with a µ prefix. Is this an error?\",\n \"trap\": \"Without the skill, the model may think there's a bug or unit mismatch\",\n \"assertions\": [\n {\"id\": \"12.1\", \"text\": \"Explains that benchstat automatically normalizes units for display\"},\n {\"id\": \"12.2\", \"text\": \"States that ns/op is displayed as sec/op with µ (micro) prefix to avoid nonsensical 'µns/op'\"},\n {\"id\": \"12.3\", \"text\": \"Mentions that MB/s is similarly normalized to B/s with K, M, G prefixes\"},\n {\"id\": \"12.4\", \"text\": \"Confirms this is expected behavior, not an error\"}\n ]\n },\n {\n \"id\": 13,\n \"name\": \"benchstat-filter-syntax\",\n \"description\": \"Tests knowledge of benchstat's filter expression syntax for selecting specific benchmarks\",\n \"prompt\": \"I have benchmark output with many sub-benchmarks like BenchmarkParse/format=json, BenchmarkParse/format=gob, BenchmarkEncode/size=1k, etc. I only want to compare the json format benchmarks. How do I filter benchstat output?\",\n \"trap\": \"Without the skill, the model may suggest grepping the output file instead of using benchstat's built-in filter syntax\",\n \"assertions\": [\n {\"id\": \"13.1\", \"text\": \"Uses benchstat's -filter flag rather than pre-processing with grep\"},\n {\"id\": \"13.2\", \"text\": \"Shows the correct filter syntax: -filter '/format:json' or similar key:value pattern\"},\n {\"id\": \"13.3\", \"text\": \"Mentions regex support in filters (e.g., .name:/Parse/)\"},\n {\"id\": \"13.4\", \"text\": \"Mentions logical operators (AND, OR, negation with -) in filter expressions\"}\n ]\n },\n {\n \"id\": 14,\n \"name\": \"benchstat-projection-col-flag\",\n \"description\": \"Tests knowledge of benchstat's -col flag for comparing sub-benchmark parameters\",\n \"prompt\": \"I have a single benchmark file with results for BenchmarkEncode/format=json and BenchmarkEncode/format=gob. I want to compare json vs gob performance side by side in benchstat. How?\",\n \"trap\": \"Without the skill, the model may suggest splitting into two files and comparing, or not know about the -col flag\",\n \"assertions\": [\n {\"id\": \"14.1\", \"text\": \"Uses the -col /format flag to create columns from sub-benchmark parameter values\"},\n {\"id\": \"14.2\", \"text\": \"Shows the correct command: benchstat -col /format bench.txt\"},\n {\"id\": \"14.3\", \"text\": \"Mentions -row .name to simplify row names by stripping sub-benchmark config\"},\n {\"id\": \"14.4\", \"text\": \"Mentions the @() sort modifier for controlling column order (e.g., /format@(gob json))\"}\n ]\n },\n {\n \"id\": 15,\n \"name\": \"benchstat-assume-exact\",\n \"description\": \"Tests knowledge of benchstat's assume=exact unit metadata for non-varying metrics\",\n \"prompt\": \"I want to track binary size across commits using benchstat. The size doesn't vary between runs (it's deterministic). But benchstat complains about insufficient samples when I use -count=1. How do I handle deterministic metrics?\",\n \"trap\": \"Without the skill, the model may suggest using -count=10 anyway (wasteful for deterministic metrics) or abandoning benchstat\",\n \"assertions\": [\n {\"id\": \"15.1\", \"text\": \"Recommends the assume=exact unit metadata annotation\"},\n {\"id\": \"15.2\", \"text\": \"Shows the syntax: Unit \u003cmetric> assume=exact in benchmark output\"},\n {\"id\": \"15.3\", \"text\": \"Explains that assume=exact disables non-parametric statistics\"},\n {\"id\": \"15.4\", \"text\": \"Notes that benchstat will warn if values vary when assume=exact is set\"},\n {\"id\": \"15.5\", \"text\": \"States that single measurement (no -count needed) works with assume=exact\"}\n ]\n },\n {\n \"id\": 16,\n \"name\": \"ci-regression-tool-selection\",\n \"description\": \"Tests judgment on which CI regression detection tool to use\",\n \"prompt\": \"I want to add automated benchmark regression detection to my CI pipeline. I need something that:\\n1. Compares PR performance against the base branch\\n2. Uses statistical analysis (not just single-run comparison)\\n3. Integrates with our GitHub Actions workflow\\n\\nWhat tool should I use?\",\n \"trap\": \"Without the skill, the model may suggest writing a custom script or using only raw benchstat without knowing about benchdiff\",\n \"assertions\": [\n {\"id\": \"16.1\", \"text\": \"Recommends benchdiff as the primary tool for PR-to-base comparison with statistical rigor\"},\n {\"id\": \"16.2\", \"text\": \"Explains that benchdiff uses benchstat internally for statistical analysis\"},\n {\"id\": \"16.3\", \"text\": \"Mentions cob as a simpler alternative but notes it uses single-run comparison without benchstat-style statistics\"},\n {\"id\": \"16.4\", \"text\": \"Mentions gobenchdata for long-term trend tracking and visualization\"},\n {\"id\": \"16.5\", \"text\": \"Explains the tradeoff: benchdiff=high rigor, cob=quick+simple, gobenchdata=trends+dashboard\"}\n ]\n },\n {\n \"id\": 17,\n \"name\": \"cob-data-loss-warning\",\n \"description\": \"Tests awareness of cob's destructive behavior with uncommitted changes\",\n \"prompt\": \"I want to use `cob` for quick benchmark regression checking in my local development workflow. Any caveats?\",\n \"trap\": \"Without the skill, the model may recommend cob for local use without the critical safety warning\",\n \"assertions\": [\n {\"id\": \"17.1\", \"text\": \"Warns that cob uses `git reset` internally which can cause data loss with uncommitted changes\"},\n {\"id\": \"17.2\", \"text\": \"Recommends committing all work before running cob\"},\n {\"id\": \"17.3\", \"text\": \"Suggests running cob only in CI pipelines, not locally\"},\n {\"id\": \"17.4\", \"text\": \"Notes that cob compares single runs without benchstat-style statistics, making it susceptible to noise\"},\n {\"id\": \"17.5\", \"text\": \"Mentions [skip cob] commit message convention to bypass checks\"}\n ]\n },\n {\n \"id\": 18,\n \"name\": \"noisy-neighbor-mitigation\",\n \"description\": \"Tests knowledge of why CI benchmarks are noisy and mitigation strategies\",\n \"prompt\": \"Our CI benchmark results fluctuate wildly — sometimes showing 15% regression, sometimes 10% improvement, for the same code. We're using GitHub-hosted runners. How do we fix this?\",\n \"trap\": \"Without the skill, the model may suggest tightening thresholds (which causes more false positives) or simply retrying\",\n \"assertions\": [\n {\"id\": \"18.1\", \"text\": \"Explains that shared CI runners have 5-10% variance due to noisy neighbors\"},\n {\"id\": \"18.2\", \"text\": \"Recommends running both base and head benchmarks in the same CI job for relative comparison\"},\n {\"id\": \"18.3\", \"text\": \"Recommends using -count=10+ with benchstat to filter noise statistically\"},\n {\"id\": \"18.4\", \"text\": \"Suggests conservative thresholds (20%+) on shared runners rather than tight thresholds\"},\n {\"id\": \"18.5\", \"text\": \"Warns against 'retry until pass' as selection bias\"},\n {\"id\": \"18.6\", \"text\": \"Mentions dedicated/self-hosted runners as the definitive solution for critical benchmarks\"}\n ]\n },\n {\n \"id\": 19,\n \"name\": \"self-hosted-runner-tuning\",\n \"description\": \"Tests knowledge of system-level tuning for reproducible benchmarks on self-hosted runners\",\n \"prompt\": \"We have a dedicated self-hosted CI runner for benchmark tests. What system-level settings should we configure to minimize benchmark variance?\",\n \"trap\": \"Without the skill, the model may only suggest generic OS tuning without knowing the specific settings for benchmark stability\",\n \"assertions\": [\n {\"id\": \"19.1\", \"text\": \"Recommends disabling CPU frequency scaling by setting the 'performance' governor\"},\n {\"id\": \"19.2\", \"text\": \"Recommends disabling Turbo Boost (Intel no_turbo or AMD boost)\"},\n {\"id\": \"19.3\", \"text\": \"Recommends pinning benchmarks to specific CPU cores using taskset\"},\n {\"id\": \"19.4\", \"text\": \"Recommends disabling SMT/Hyper-Threading to avoid execution unit sharing\"},\n {\"id\": \"19.5\", \"text\": \"Warns that these settings should ONLY be applied to dedicated runners, never developer machines\"}\n ]\n },\n {\n \"id\": 20,\n \"name\": \"taskset-core-pinning-rationale\",\n \"description\": \"Tests understanding of WHY core pinning helps benchmarks\",\n \"prompt\": \"Why does pinning a Go benchmark to specific CPU cores with taskset help reduce variance? What's the underlying mechanism?\",\n \"trap\": \"Without the skill, the model may give a vague answer about CPU contention without explaining the cache thrashing mechanism\",\n \"assertions\": [\n {\"id\": \"20.1\", \"text\": \"Explains that without pinning, the OS migrates the process across cores\"},\n {\"id\": \"20.2\", \"text\": \"Explains that L1/L2 caches are per-core, so migration causes cache thrashing\"},\n {\"id\": \"20.3\", \"text\": \"Recommends leaving cores 0-1 for OS and other processes, using cores 2+ for benchmarks\"},\n {\"id\": \"20.4\", \"text\": \"Shows the taskset command syntax (e.g., taskset -c 2,3 go test ...)\"}\n ]\n },\n {\n \"id\": 21,\n \"name\": \"b-report-metric-custom\",\n \"description\": \"Tests knowledge of b.ReportMetric and b.Elapsed for custom benchmark metrics\",\n \"prompt\": \"I want my Go benchmark to report throughput in bytes/second in addition to the standard ns/op. How do I add custom metrics to benchmark output?\",\n \"trap\": \"Without the skill, the model may suggest manual calculation and fmt.Printf instead of the built-in b.ReportMetric\",\n \"assertions\": [\n {\"id\": \"21.1\", \"text\": \"Uses b.ReportMetric() to add custom metrics\"},\n {\"id\": \"21.2\", \"text\": \"Uses b.Elapsed() to get the total benchmark duration\"},\n {\"id\": \"21.3\", \"text\": \"Shows the correct pattern: b.ReportMetric(float64(bytes)/b.Elapsed().Seconds(), \\\"bytes/s\\\")\"},\n {\"id\": \"21.4\", \"text\": \"The custom metric integrates with standard benchmark output format (not separate print statements)\"}\n ]\n },\n {\n \"id\": 22,\n \"name\": \"alloc-space-cumulative-trap-for-leak\",\n \"description\": \"Tests that alloc_space is cumulative (includes freed objects) and therefore wrong for leak detection\",\n \"prompt\": \"My Go service's memory keeps growing. I captured a heap profile with:\\n\\ngo tool pprof -alloc_space http://localhost:6060/debug/pprof/heap\\n\\nThe top functions show my database layer allocating hundreds of MBs. But when I check the actual process RSS, most of that memory has already been freed. Am I reading this profile correctly?\",\n \"trap\": \"Without the skill, the model validates the use of alloc_space for leak detection, missing that alloc_space is cumulative since program start and includes objects already freed by GC — inuse_space is the correct choice for leak detection\",\n \"assertions\": [\n {\"id\": \"22.1\", \"text\": \"Explains that alloc_space is cumulative since program start — it counts ALL allocations including those already freed by GC\"},\n {\"id\": \"22.2\", \"text\": \"Identifies that alloc_space is the wrong profile type for leak detection because freed memory still appears\"},\n {\"id\": \"22.3\", \"text\": \"Recommends inuse_space instead — it shows only currently live heap objects, making leaked objects visible\"},\n {\"id\": \"22.4\", \"text\": \"Explains the correct leak detection workflow: take two inuse_space snapshots separated by time and compare with pprof -base\"},\n {\"id\": \"22.5\", \"text\": \"Mentions common leak causes: unbounded caches, maps that never shrink, goroutine leaks holding references\"}\n ]\n },\n {\n \"id\": 23,\n \"name\": \"mutex-block-profile-enablement\",\n \"description\": \"Tests awareness that mutex and block profiles must be explicitly enabled\",\n \"prompt\": \"I want to profile mutex contention in my Go service. I fetched the mutex profile from /debug/pprof/mutex but it's empty. What's wrong?\",\n \"trap\": \"Without the skill, the model may suggest the endpoint is broken or suggest different debugging approaches\",\n \"assertions\": [\n {\"id\": \"23.1\", \"text\": \"Identifies that mutex profiling is disabled by default and must be explicitly enabled\"},\n {\"id\": \"23.2\", \"text\": \"Shows runtime.SetMutexProfileFraction() as the enablement call\"},\n {\"id\": \"23.3\", \"text\": \"Explains the fraction parameter (e.g., 5 means 1 out of 5 events recorded)\"},\n {\"id\": \"23.4\", \"text\": \"Recommends disabling after investigation (SetMutexProfileFraction(0)) to eliminate overhead\"},\n {\"id\": \"23.5\", \"text\": \"Mentions runtime.SetBlockProfileRate() for the related block profile\"}\n ]\n },\n {\n \"id\": 24,\n \"name\": \"trace-custom-annotations\",\n \"description\": \"Tests knowledge of runtime/trace custom annotations (tasks, regions, logs)\",\n \"prompt\": \"I'm using go tool trace to analyze my HTTP handler but the trace timeline only shows generic goroutine activity. How can I add application-level context to see which request phases (validation, database query, serialization) are taking time?\",\n \"trap\": \"Without the skill, the model may suggest using log statements or pprof labels instead of trace-specific annotations\",\n \"assertions\": [\n {\"id\": \"24.1\", \"text\": \"Recommends trace.NewTask for logical operations that may span goroutines\"},\n {\"id\": \"24.2\", \"text\": \"Recommends trace.WithRegion for phases within a task or goroutine\"},\n {\"id\": \"24.3\", \"text\": \"Mentions trace.Log for point-in-time markers in the trace\"},\n {\"id\": \"24.4\", \"text\": \"Shows correct usage with context propagation (ctx parameter)\"},\n {\"id\": \"24.5\", \"text\": \"Notes that annotations add negligible overhead when tracing is disabled\"}\n ]\n },\n {\n \"id\": 25,\n \"name\": \"trace-gc-phase-interpretation\",\n \"description\": \"Tests ability to interpret GC phases in execution traces\",\n \"prompt\": \"I'm looking at an execution trace and I see large blocks of time where my goroutines show as 'GC assist' (blue). What does this mean and how do I fix it?\",\n \"trap\": \"Without the skill, the model may not explain the proportional allocation tax mechanism or the specific remediation\",\n \"assertions\": [\n {\"id\": \"25.1\", \"text\": \"Explains that GC mark assist means goroutines are being drafted by the GC to help scan the heap\"},\n {\"id\": \"25.2\", \"text\": \"Explains that the runtime forces goroutines to assist in proportion to their allocation rate — heavy allocators get taxed more\"},\n {\"id\": \"25.3\", \"text\": \"Identifies this as a symptom of too many allocations, not a GC configuration problem\"},\n {\"id\": \"25.4\", \"text\": \"Recommends reducing allocation rate (the root cause) rather than tuning GOGC\"},\n {\"id\": \"25.5\", \"text\": \"Distinguishes mark assist from STW (stop-the-world) phases which affect all goroutines equally\"}\n ]\n },\n {\n \"id\": 26,\n \"name\": \"trace-pprof-extraction\",\n \"description\": \"Tests knowledge of extracting pprof profiles from trace data\",\n \"prompt\": \"I have an execution trace file from a production service. I want to find which functions are responsible for the most network blocking time. Can I use the trace for this?\",\n \"trap\": \"Without the skill, the model may suggest capturing a separate pprof profile instead of extracting from the trace\",\n \"assertions\": [\n {\"id\": \"26.1\", \"text\": \"Uses `go tool trace -pprof=net trace.out > net.prof` to extract a network blocking profile\"},\n {\"id\": \"26.2\", \"text\": \"Then uses go tool pprof on the extracted profile for analysis (top, list, etc.)\"},\n {\"id\": \"26.3\", \"text\": \"Mentions other extractable profile types: sync, syscall, sched\"},\n {\"id\": \"26.4\", \"text\": \"Explains this bridges trace data (nanosecond events) with pprof analysis (statistical aggregation)\"}\n ]\n },\n {\n \"id\": 27,\n \"name\": \"fieldalignment-no-autofix\",\n \"description\": \"Tests that the model uses fieldalignment without the -fix flag\",\n \"prompt\": \"I suspect some of my Go structs have suboptimal field ordering causing padding waste. How do I check?\",\n \"trap\": \"Without the skill, the model may suggest using fieldalignment with -fix flag or structlayout without the diagnostic-first approach\",\n \"assertions\": [\n {\"id\": \"27.1\", \"text\": \"Recommends running `fieldalignment ./...` to detect padding waste\"},\n {\"id\": \"27.2\", \"text\": \"Does NOT use the -fix flag — the skill explicitly says to let the agent apply changes manually\"},\n {\"id\": \"27.3\", \"text\": \"Mentions using unsafe.Sizeof/Alignof/Offsetof to inspect struct layout before and after\"},\n {\"id\": \"27.4\", \"text\": \"Treats this as a diagnostic step, not an automatic fix\"}\n ]\n },\n {\n \"id\": 28,\n \"name\": \"godebug-gctrace-output-interpretation\",\n \"description\": \"Tests ability to read and interpret a GODEBUG=gctrace=1 output line\",\n \"prompt\": \"I ran my Go service with GODEBUG=gctrace=1 and see lines like:\\n\\ngc 14 @5.234s 2%: 0.13+1.4+0.21 ms clock, 0.53+0.43/1.1/0+0.85 ms cpu, 24->27->13 MB, 24 MB goal, 0 MB stacks, 0 MB globals, 4 P\\n\\nWhat does each field mean? Is this GC behavior healthy?\",\n \"trap\": \"Without the skill, the model knows GODEBUG=gctrace exists but cannot interpret the specific output format fields — it guesses or gives vague descriptions\",\n \"assertions\": [\n {\"id\": \"28.1\", \"text\": \"Identifies 'gc 14' as the GC cycle number\"},\n {\"id\": \"28.2\", \"text\": \"Identifies '@5.234s 2%' as time since program start and CPU percentage spent in GC\"},\n {\"id\": \"28.3\", \"text\": \"Identifies '24->27->13 MB' as heap size before GC, heap size at GC trigger, and live heap after GC\"},\n {\"id\": \"28.4\", \"text\": \"Identifies '24 MB goal' as the target heap size for the next GC cycle based on the GC pacing algorithm\"},\n {\"id\": \"28.5\", \"text\": \"Assesses health: 2% CPU in GC is acceptable (generally \u003c5% is fine); 13MB live vs 24MB goal means 46% overhead headroom\"}\n ]\n },\n {\n \"id\": 29,\n \"name\": \"runtime-scanobject-cpu-diagnosis\",\n \"description\": \"Tests interpretation of runtime.scanobject appearing high in CPU profile\",\n \"prompt\": \"My CPU profile shows `runtime.scanobject` consuming 25% of CPU time. What does this mean and how do I reduce it?\",\n \"trap\": \"Without the skill, the model may not recognize this as a GC pointer scanning issue or may suggest wrong remediation\",\n \"assertions\": [\n {\"id\": \"29.1\", \"text\": \"Identifies runtime.scanobject as GC pointer scanning — the GC is tracing pointers in the heap\"},\n {\"id\": \"29.2\", \"text\": \"Explains that the heap contains many pointers that the GC must trace\"},\n {\"id\": \"29.3\", \"text\": \"Recommends reducing pointer density: value types instead of pointers in slices/maps\"},\n {\"id\": \"29.4\", \"text\": \"Suggests flattening nested structures or using [N]byte arrays instead of strings in hot structs\"},\n {\"id\": \"29.5\", \"text\": \"References the golang-performance skill for optimization patterns\"}\n ]\n },\n {\n \"id\": 30,\n \"name\": \"top-cum-for-application-callsites\",\n \"description\": \"Tests that top -cum is the correct command to find application code when runtime symbols dominate flat profile\",\n \"prompt\": \"I ran `go tool pprof cpu.prof` and typed `top`. The output is dominated by runtime functions:\\n\\n1. runtime.memmove 18%\\n2. runtime.mallocgc 14%\\n3. runtime.gcWriteBarrier 9%\\n\\nNone of my application code appears in the top 10. My CPU overhead is high. What pprof command should I run next?\",\n \"trap\": \"Without the skill, the model suggests using `web` for the graph view, `list` on a runtime function, or other commands — missing that `top -cum` reveals the application callers who are responsible\",\n \"assertions\": [\n {\"id\": \"30.1\", \"text\": \"Recommends running `top -cum` (cumulative mode) as the immediate next command\"},\n {\"id\": \"30.2\", \"text\": \"Explains that flat time shows where CPU is spent directly; cumulative time shows which application functions called into the expensive runtime functions\"},\n {\"id\": \"30.3\", \"text\": \"Explains that runtime.memmove, mallocgc, gcWriteBarrier are symptoms — the application code that triggers them will have high cumulative time\"},\n {\"id\": \"30.4\", \"text\": \"Suggests using `list FunctionName` on the top cumulative callers to see the exact lines triggering the expensive runtime calls\"},\n {\"id\": \"30.5\", \"text\": \"Does NOT suggest trying to optimize the runtime functions directly\"}\n ]\n },\n {\n \"id\": 31,\n \"name\": \"fgprof-off-cpu-profiling\",\n \"description\": \"Tests knowledge of fgprof for capturing off-CPU time that pprof misses\",\n \"prompt\": \"My Go service has high latency but the CPU profile barely shows any CPU usage. Standard pprof doesn't reveal where time is spent. What tool captures both on-CPU and off-CPU time in a single profile?\",\n \"trap\": \"Without the skill, the model may suggest only the execution tracer or not know about fgprof\",\n \"assertions\": [\n {\"id\": \"31.1\", \"text\": \"Recommends fgprof (github.com/felixge/fgprof) for full goroutine profiling\"},\n {\"id\": \"31.2\", \"text\": \"Explains that fgprof captures both on-CPU and off-CPU (I/O wait) time in a single profile\"},\n {\"id\": \"31.3\", \"text\": \"Explains that standard pprof CPU profiles only show on-CPU time, missing I/O waits\"},\n {\"id\": \"31.4\", \"text\": \"Describes the use case: pprof shows low CPU% but latency is high\"}\n ]\n },\n {\n \"id\": 32,\n \"name\": \"flight-recorder-go125\",\n \"description\": \"Tests knowledge of the Go 1.25 flight recorder for retroactive trace capture\",\n \"prompt\": \"My Go service occasionally experiences timeout spikes but I can't reproduce them. By the time I notice and start tracing, the problem is gone. Is there a way to capture trace data retroactively for these intermittent issues?\",\n \"trap\": \"Without the skill, the model may suggest continuous tracing (too expensive) or instrumented logging\",\n \"assertions\": [\n {\"id\": \"32.1\", \"text\": \"Recommends the Go 1.25 flight recorder (trace.NewFlightRecorder)\"},\n {\"id\": \"32.2\", \"text\": \"Explains it keeps a circular buffer of recent trace data in memory\"},\n {\"id\": \"32.3\", \"text\": \"Shows snapshotting with WriteTo when the anomaly is detected\"},\n {\"id\": \"32.4\", \"text\": \"Mentions the MinAge and MaxBytes configuration parameters\"},\n {\"id\": \"32.5\", \"text\": \"Shows a trigger pattern (e.g., slow request detection with time.Since threshold)\"},\n {\"id\": \"32.6\", \"text\": \"Notes the constraint: at most one flight recorder active at a time\"}\n ]\n },\n {\n \"id\": 33,\n \"name\": \"flight-recorder-sync-once-pattern\",\n \"description\": \"Tests the sync.Once pattern for flight recorder snapshots\",\n \"prompt\": \"I'm setting up a flight recorder in my Go service. I want to snapshot it when a slow request is detected, but multiple goroutines might detect slow requests simultaneously. How do I prevent multiple overlapping snapshots?\",\n \"trap\": \"Without the skill, the model may use a mutex or channel instead of the idiomatic sync.Once pattern\",\n \"assertions\": [\n {\"id\": \"33.1\", \"text\": \"Uses sync.Once to ensure only one snapshot is taken\"},\n {\"id\": \"33.2\", \"text\": \"Calls fr.WriteTo inside the sync.Once.Do function\"},\n {\"id\": \"33.3\", \"text\": \"Notes that only one goroutine may call WriteTo at a time\"},\n {\"id\": \"33.4\", \"text\": \"Shows calling the snapshot in a separate goroutine (go captureSnapshot)\"},\n {\"id\": \"33.5\", \"text\": \"Shows fr.Stop() after WriteTo completes\"}\n ]\n },\n {\n \"id\": 34,\n \"name\": \"flight-recorder-sizing\",\n \"description\": \"Tests understanding of flight recorder buffer sizing\",\n \"prompt\": \"I'm configuring a flight recorder for my Go service. My typical investigation window when a timeout occurs is about 5 seconds. How should I size the MinAge and MaxBytes parameters?\",\n \"trap\": \"Without the skill, the model may guess arbitrary values without the 2x rule or data rate context\",\n \"assertions\": [\n {\"id\": \"34.1\", \"text\": \"Sets MinAge to ~2x the problem window (10 seconds for a 5-second investigation window)\"},\n {\"id\": \"34.2\", \"text\": \"Mentions that busy services generate ~1-10 MB/s of trace data\"},\n {\"id\": \"34.3\", \"text\": \"Recommends starting MaxBytes at 1-5 MiB and adjusting\"},\n {\"id\": \"34.4\", \"text\": \"Explains that MaxBytes takes precedence over MinAge — when buffer fills, older data is discarded\"}\n ]\n },\n {\n \"id\": 35,\n \"name\": \"trace-timeline-color-coding\",\n \"description\": \"Tests ability to interpret execution trace timeline colors\",\n \"prompt\": \"I opened an execution trace in the web UI. I see lots of yellow/orange gaps before green segments on my goroutine lanes, and some red bands spanning all processor lanes. What does this indicate?\",\n \"trap\": \"Without the skill, the model may not know the specific color coding of the trace viewer\",\n \"assertions\": [\n {\"id\": \"35.1\", \"text\": \"Identifies yellow/orange as runnable state — goroutines ready to run but waiting for a processor\"},\n {\"id\": \"35.2\", \"text\": \"Identifies red bands across all P lanes as GC stop-the-world pauses\"},\n {\"id\": \"35.3\", \"text\": \"Diagnoses the yellow gaps as CPU saturation — too many runnable goroutines competing for processors\"},\n {\"id\": \"35.4\", \"text\": \"Identifies green as actively executing/running state\"},\n {\"id\": \"35.5\", \"text\": \"Suggests examining goroutine count vs GOMAXPROCS to verify CPU saturation\"}\n ]\n },\n {\n \"id\": 36,\n \"name\": \"pprof-labels-tagfocus-multitenant\",\n \"description\": \"Tests knowledge of pprof.Do() custom labels and tagfocus for isolating a specific request type in a mixed-workload profile\",\n \"prompt\": \"My Go service handles both API requests and batch jobs in the same process. CPU profiles mix both workloads together so I can't tell if API or batch is causing GC pressure. How can I profile only the API request code path?\",\n \"trap\": \"Without the skill, the model suggests running a separate dedicated process for each workload, or using separate profiles — missing pprof.Do() labels with -tagfocus filtering which solves this in a single profile\",\n \"assertions\": [\n {\"id\": \"36.1\", \"text\": \"Recommends using pprof.Do() with pprof.Labels() to tag goroutines with a request_type label\"},\n {\"id\": \"36.2\", \"text\": \"Shows the pattern: pprof.Do(ctx, pprof.Labels(\\\"request_type\\\", \\\"api\\\"), func(ctx context.Context) { ... })\"},\n {\"id\": \"36.3\", \"text\": \"Shows using -tagfocus=request_type=api when analyzing the profile to filter to only API samples\"},\n {\"id\": \"36.4\", \"text\": \"Explains that labels are inherited by the goroutine's profile samples — no need for separate processes\"},\n {\"id\": \"36.5\", \"text\": \"Mentions the tags command in pprof interactive mode to see all label keys and their value distributions\"}\n ]\n },\n {\n \"id\": 37,\n \"name\": \"pprof-focus-ignore-difference\",\n \"description\": \"Tests understanding of the difference between pprof's focus, ignore, show, and hide filters\",\n \"prompt\": \"In pprof, what's the difference between `focus`, `ignore`, `show`, and `hide`? When would I use each one?\",\n \"trap\": \"Without the skill, the model may confuse the cost accounting behavior of these filters\",\n \"assertions\": [\n {\"id\": \"37.1\", \"text\": \"Explains that focus keeps only paths containing a matching function — everything else dropped\"},\n {\"id\": \"37.2\", \"text\": \"Explains that ignore removes matching functions entirely, attributing their costs to callers\"},\n {\"id\": \"37.3\", \"text\": \"Explains that show is like focus but only affects display, not cost accounting\"},\n {\"id\": \"37.4\", \"text\": \"Explains that hide is like ignore but only hides from display, not cost accounting\"},\n {\"id\": \"37.5\", \"text\": \"Mentions `reset` to clear all filters\"}\n ]\n },\n {\n \"id\": 38,\n \"name\": \"pprof-tags-and-labels\",\n \"description\": \"Tests knowledge of pprof custom labels via pprof.Do() for multi-tenant profiling\",\n \"prompt\": \"I want to break down my CPU profile by request type (API vs batch). How can I add custom labels to pprof samples so I can filter and group by request type?\",\n \"trap\": \"Without the skill, the model may suggest separate profiles per request type instead of using pprof labels\",\n \"assertions\": [\n {\"id\": \"38.1\", \"text\": \"Uses pprof.Labels() and pprof.Do() to add custom labels to profiled code\"},\n {\"id\": \"38.2\", \"text\": \"Shows the correct pattern: pprof.Do(ctx, pprof.Labels(\\\"key\\\", \\\"value\\\"), func(ctx context.Context) {...})\"},\n {\"id\": \"38.3\", \"text\": \"Shows using tagfocus to filter by label (e.g., -tagfocus=request_type=api)\"},\n {\"id\": \"38.4\", \"text\": \"Shows using tagroot to group by label (e.g., -tagroot=request_type)\"},\n {\"id\": \"38.5\", \"text\": \"Mentions the tags command to see all tag keys and distributions\"}\n ]\n },\n {\n \"id\": 39,\n \"name\": \"pprof-sample-index-switching\",\n \"description\": \"Tests knowledge of switching between metric types in a heap profile without reloading\",\n \"prompt\": \"I opened a heap profile in pprof interactive mode and I'm looking at alloc_objects. Now I want to also see inuse_space. Do I need to exit and reopen with different flags?\",\n \"trap\": \"Without the skill, the model may suggest exiting and reopening with a different flag\",\n \"assertions\": [\n {\"id\": \"39.1\", \"text\": \"Uses sample_index command to switch metrics without reloading\"},\n {\"id\": \"39.2\", \"text\": \"Shows the correct syntax: sample_index=inuse_space\"},\n {\"id\": \"39.3\", \"text\": \"Lists available indices: alloc_objects, alloc_space, inuse_objects, inuse_space\"},\n {\"id\": \"39.4\", \"text\": \"Explains that heap profiles contain multiple metrics and you can switch between them interactively\"}\n ]\n },\n {\n \"id\": 40,\n \"name\": \"pprof-granularity-lines\",\n \"description\": \"Tests knowledge of pprof granularity control for multi-hot-spot functions\",\n \"prompt\": \"I used `list` on a function and it shows two expensive lines, but the `top` output only shows the function name once with aggregated cost. How can I see per-line costs in the top output?\",\n \"trap\": \"Without the skill, the model may only suggest using list for per-line analysis\",\n \"assertions\": [\n {\"id\": \"40.1\", \"text\": \"Uses granularity=lines to group by exact source line in top output\"},\n {\"id\": \"40.2\", \"text\": \"Mentions other granularity levels: functions (default), filefunctions, files, addresses\"},\n {\"id\": \"40.3\", \"text\": \"Shows the command: either `granularity=lines` in interactive mode or `-granularity=lines` as flag\"}\n ]\n },\n {\n \"id\": 41,\n \"name\": \"ssa-dump-investigation\",\n \"description\": \"Tests knowledge of SSA dump for understanding compiler optimization passes\",\n \"prompt\": \"I want to understand exactly what optimizations the Go compiler applies to a specific function. I need more detail than escape analysis or inlining flags provide. Is there a way to see the compiler's intermediate representation?\",\n \"trap\": \"Without the skill, the model may not know about the GOSSAFUNC environment variable\",\n \"assertions\": [\n {\"id\": \"41.1\", \"text\": \"Uses GOSSAFUNC=FunctionName go build to generate SSA dump\"},\n {\"id\": \"41.2\", \"text\": \"Mentions that it creates ssa.html which can be opened in a browser\"},\n {\"id\": \"41.3\", \"text\": \"Describes the optimization passes visible: source, AST, start SSA, optimization, lower, regalloc, genssa\"},\n {\"id\": \"41.4\", \"text\": \"Mentions what to look for: remaining bounds checks, dead code elimination, constant folding, register spills\"},\n {\"id\": \"41.5\", \"text\": \"Mentions clicking on values to highlight them across passes\"}\n ]\n },\n {\n \"id\": 42,\n \"name\": \"noinlines-inlined-function-attribution\",\n \"description\": \"Tests that noinlines aggregates inlined function costs to the outer non-inlined caller\",\n \"prompt\": \"My pprof call graph is cluttered. A single hot function has been inlined into 6 different callers, so its cost is split across 6 separate nodes in the graph. I want to see the aggregated cost in one place. What pprof option should I use?\",\n \"trap\": \"Without the skill, the model suggests using show/hide filters or focus (which change display but not cost aggregation) or ignoring inlined nodes — missing the noinlines option which correctly attributes inlined function costs to their first out-of-line caller\",\n \"assertions\": [\n {\"id\": \"42.1\", \"text\": \"Recommends the noinlines option (or -noinlines flag)\"},\n {\"id\": \"42.2\", \"text\": \"Explains that noinlines attributes inlined function costs to their first out-of-line caller — the inlined nodes are collapsed\"},\n {\"id\": \"42.3\", \"text\": \"Shows correct usage: noinlines in interactive mode or -noinlines as CLI flag\"},\n {\"id\": \"42.4\", \"text\": \"Distinguishes noinlines from hide/ignore: hide/ignore only affect display without changing cost attribution\"}\n ]\n },\n {\n \"id\": 43,\n \"name\": \"receiver-choice-inlining-and-escape-tradeoffs\",\n \"description\": \"Tests knowledge that receiver choice can affect copying, aliasing, escape analysis, and inlining, but must be verified\",\n \"prompt\": \"I'm designing a fluent API for a small config builder in a performance-sensitive path. Should I use value receivers or pointer receivers for the method chain? Consider both correctness and compiler optimization.\",\n \"trap\": \"Without the skill, the model defaults to pointer receivers for method chains without considering inlining implications\",\n \"assertions\": [\n {\"id\": \"43.1\", \"text\": \"Mentions that receiver choice can affect copying, aliasing, escape analysis, and inlining\"},\n {\"id\": \"43.2\", \"text\": \"Does NOT claim that pointer receivers categorically block inlining or that value receivers guarantee full inlining\"},\n {\"id\": \"43.3\", \"text\": \"Recommends checking with -gcflags=\\\"-m -m\\\" to verify inlining and escape behavior\"},\n {\"id\": \"43.4\", \"text\": \"Considers both the performance trade-off and the struct size (value receivers copy the struct)\"}\n ]\n },\n {\n \"id\": 44,\n \"name\": \"prometheus-go-metrics-vs-runtime-metrics\",\n \"description\": \"Tests understanding that runtime/metrics are NOT the same as Prometheus metrics\",\n \"prompt\": \"I want to monitor Go runtime metrics in my Prometheus dashboard. Can I just use Go's runtime/metrics package directly in my PromQL queries?\",\n \"trap\": \"Without the skill, the model may conflate runtime/metrics keys with Prometheus metric names\",\n \"assertions\": [\n {\"id\": \"44.1\", \"text\": \"Clarifies that runtime/metrics are Go internal data structures, not Prometheus metrics\"},\n {\"id\": \"44.2\", \"text\": \"Explains that prometheus/client_golang selectively converts some runtime/metrics into Prometheus format\"},\n {\"id\": \"44.3\", \"text\": \"Lists the actual Prometheus metric names (e.g., go_memstats_alloc_bytes, go_goroutines)\"},\n {\"id\": \"44.4\", \"text\": \"Mentions that by default only traditional go_memstats_* and go_gc_* are exposed\"}\n ]\n },\n {\n \"id\": 45,\n \"name\": \"go-memstats-stw-overhead\",\n \"description\": \"Tests awareness of ReadMemStats causing stop-the-world pauses\",\n \"prompt\": \"I'm using prometheus/client_golang to expose Go runtime metrics. My high-throughput service occasionally shows brief latency spikes correlated with Prometheus scrape intervals. Could the metrics collection itself be causing this?\",\n \"trap\": \"Without the skill, the model may not connect metrics collection to STW pauses\",\n \"assertions\": [\n {\"id\": \"45.1\", \"text\": \"Identifies that go_memstats_* metrics internally call runtime.ReadMemStats() which triggers a short stop-the-world pause\"},\n {\"id\": \"45.2\", \"text\": \"Recommends the Go 1.17+ runtime/metrics-based collector as lower overhead alternative\"},\n {\"id\": \"45.3\", \"text\": \"Shows the code to register the modern collector: collectors.NewGoCollector(collectors.WithGoCollectorRuntimeMetrics(collectors.MetricsAll)) or specific runtime metric rules\"},\n {\"id\": \"45.4\", \"text\": \"Recommends using a custom prometheus.NewRegistry() rather than the default one\"}\n ]\n },\n {\n \"id\": 46,\n \"name\": \"promql-gc-pressure-queries\",\n \"description\": \"Tests knowledge of specific PromQL queries for GC pressure investigation\",\n \"prompt\": \"I suspect my Go service has excessive GC pressure. What Prometheus queries should I use to confirm this and find the root cause?\",\n \"trap\": \"Without the skill, the model may suggest generic monitoring queries without the specific GC-related PromQL\",\n \"assertions\": [\n {\"id\": \"46.1\", \"text\": \"Uses rate(go_gc_duration_seconds_count[5m]) for GC frequency (cycles per second)\"},\n {\"id\": \"46.2\", \"text\": \"States that >2 GC cycles/s sustained indicates excessive allocation rate\"},\n {\"id\": \"46.3\", \"text\": \"Uses go_gc_duration_seconds{quantile=\\\"1\\\"} for worst-case GC pause\"},\n {\"id\": \"46.4\", \"text\": \"Uses rate(go_memstats_alloc_bytes_total[5m]) for allocation rate comparison before/after deploy\"},\n {\"id\": \"46.5\", \"text\": \"Recommends correlating with P99 latency to confirm GC pauses cause tail latency\"}\n ]\n },\n {\n \"id\": 47,\n \"name\": \"promql-goroutine-leak-detection\",\n \"description\": \"Tests specific PromQL patterns for detecting goroutine leaks\",\n \"prompt\": \"How do I detect goroutine leaks in my production Go service using Prometheus metrics?\",\n \"trap\": \"Without the skill, the model may suggest only looking at the absolute go_goroutines count without the delta pattern\",\n \"assertions\": [\n {\"id\": \"47.1\", \"text\": \"Uses go_goroutines gauge for current count\"},\n {\"id\": \"47.2\", \"text\": \"Uses delta(go_goroutines[1h]) for net goroutine change — positive without load increase indicates leak\"},\n {\"id\": \"47.3\", \"text\": \"Notes that goroutine count should correlate with load — independent growth is a leak signal\"},\n {\"id\": \"47.4\", \"text\": \"Suggests an alerting rule with threshold (e.g., go_goroutines > 10000)\"}\n ]\n },\n {\n \"id\": 48,\n \"name\": \"investigation-session-setup\",\n \"description\": \"Tests the structured approach to production performance investigation sessions\",\n \"prompt\": \"I need to do a deep-dive performance investigation on one instance of my Go service in production. What should I set up before I start collecting profiles?\",\n \"trap\": \"Without the skill, the model may jump straight to pprof without the investigation session preparation\",\n \"assertions\": [\n {\"id\": \"48.1\", \"text\": \"Recommends reducing Prometheus scrape interval to \u003c=10s on the target instance\"},\n {\"id\": \"48.2\", \"text\": \"Recommends enabling pprof via environment variable without recompile\"},\n {\"id\": \"48.3\", \"text\": \"Recommends enabling continuous profiling on only the target instance, not fleet-wide\"},\n {\"id\": \"48.4\", \"text\": \"Emphasizes reverting all changes after investigation\"},\n {\"id\": \"48.5\", \"text\": \"Mentions the key design principle: all debug features should be toggleable via environment variables\"}\n ]\n },\n {\n \"id\": 49,\n \"name\": \"cost-warnings-trace-duration\",\n \"description\": \"Tests awareness of the cost and practical limits of execution traces\",\n \"prompt\": \"I want to capture a 5-minute execution trace of my Go service to analyze a periodic issue that happens every 2-3 minutes. Is this feasible?\",\n \"trap\": \"Without the skill, the model may agree to a 5-minute trace without warning about data volume\",\n \"assertions\": [\n {\"id\": \"49.1\", \"text\": \"Warns that traces generate data at MB/s — a 5-minute trace would be enormous (potentially GBs)\"},\n {\"id\": \"49.2\", \"text\": \"Recommends keeping traces to 5-10 seconds maximum\"},\n {\"id\": \"49.3\", \"text\": \"Warns that large traces are slow to parse and may need 1GB+ RAM to open\"},\n {\"id\": \"49.4\", \"text\": \"Suggests using the flight recorder (Go 1.25+) as an alternative for intermittent issues\"},\n {\"id\": \"49.5\", \"text\": \"Notes that the browser UI struggles with traces >100MB\"}\n ]\n },\n {\n \"id\": 50,\n \"name\": \"benchstat-three-version-comparison\",\n \"description\": \"Tests knowledge of comparing more than two versions with benchstat\",\n \"prompt\": \"I have benchmark results from three different versions of my code (v1, v2, v3). I want to compare all three in a single benchstat output. How?\",\n \"trap\": \"Without the skill, the model may suggest running benchstat twice for pairwise comparisons\",\n \"assertions\": [\n {\"id\": \"50.1\", \"text\": \"Shows labeling inputs: benchstat v1=v1.txt v2=v2.txt v3=v3.txt\"},\n {\"id\": \"50.2\", \"text\": \"Explains that the first input is always the base for comparison\"},\n {\"id\": \"50.3\", \"text\": \"States that v2 vs v1 and v3 vs v1 comparisons are shown (both relative to first input)\"}\n ]\n },\n {\n \"id\": 51,\n \"name\": \"host-level-correlation\",\n \"description\": \"Tests awareness of correlating Go metrics with host-level metrics\",\n \"prompt\": \"My Go service shows high process_cpu_seconds_total but the CPU profile looks normal. Could the problem be outside my application?\",\n \"trap\": \"Without the skill, the model may only investigate within the Go application\",\n \"assertions\": [\n {\"id\": \"51.1\", \"text\": \"Recommends checking node_exporter metrics for host-level CPU, memory, disk I/O\"},\n {\"id\": \"51.2\", \"text\": \"Explains the noisy neighbor pattern: high node_cpu with low process_cpu = external contention\"},\n {\"id\": \"51.3\", \"text\": \"Mentions process-exporter for per-process metrics when multiple services share a host\"},\n {\"id\": \"51.4\", \"text\": \"Suggests correlating Go app metrics with infrastructure metrics to determine if the problem is in the app or the environment\"}\n ]\n },\n {\n \"id\": 52,\n \"name\": \"pprof-diff-base-vs-base\",\n \"description\": \"Tests understanding of -base vs -diff_base flags in pprof for comparison\",\n \"prompt\": \"I want to compare two CPU profiles — one before and one after my optimization. What's the difference between `pprof -base` and `pprof -diff_base`?\",\n \"trap\": \"Without the skill, the model may not know both flags exist or may confuse their semantics\",\n \"assertions\": [\n {\"id\": \"52.1\", \"text\": \"Explains that -base subtracts base from source — all values become deltas\"},\n {\"id\": \"52.2\", \"text\": \"Explains that -diff_base shows percentages relative to the base profile\"},\n {\"id\": \"52.3\", \"text\": \"Mentions -normalize flag for making ratios comparable when capture durations differ\"},\n {\"id\": \"52.4\", \"text\": \"Shows how to generate a diff SVG for visual comparison\"}\n ]\n },\n {\n \"id\": 53,\n \"name\": \"gobenchdata-github-action-setup\",\n \"description\": \"Tests knowledge of setting up gobenchdata for long-term trend tracking\",\n \"prompt\": \"I want to track benchmark performance trends over time with an interactive web dashboard. I'm using GitHub Actions. What's the best approach?\",\n \"trap\": \"Without the skill, the model may suggest building a custom dashboard or only using benchstat\",\n \"assertions\": [\n {\"id\": \"53.1\", \"text\": \"Recommends gobenchdata for trend tracking and visualization\"},\n {\"id\": \"53.2\", \"text\": \"Shows the GitHub Action configuration with bobheadxi/gobenchdata@v1\"},\n {\"id\": \"53.3\", \"text\": \"Mentions publishing to gh-pages for the dashboard\"},\n {\"id\": \"53.4\", \"text\": \"Shows the regression checks config (.gobenchdata-checks.yml) with thresholds\"},\n {\"id\": \"53.5\", \"text\": \"Mentions PRUNE_COUNT to limit stored history\"}\n ]\n },\n {\n \"id\": 54,\n \"name\": \"benchstat-single-file-summary\",\n \"description\": \"Tests knowledge of using benchstat with a single file for variance analysis\",\n \"prompt\": \"I haven't made any code changes yet. I just want to check if my benchmarks are stable (low variance) before I start optimizing. Can benchstat help?\",\n \"trap\": \"Without the skill, the model may say benchstat requires two files for comparison\",\n \"assertions\": [\n {\"id\": \"54.1\", \"text\": \"Shows running benchstat with a single file: benchstat bench.txt\"},\n {\"id\": \"54.2\", \"text\": \"Explains it shows median and confidence interval for each benchmark\"},\n {\"id\": \"54.3\", \"text\": \"Recommends using this to check measurement stability before making changes\"},\n {\"id\": \"54.4\", \"text\": \"Notes that high variance (± >5%) indicates noisy benchmarks needing more runs or better isolation\"}\n ]\n },\n {\n \"id\": 55,\n \"name\": \"count-minimum-by-scenario\",\n \"description\": \"Tests knowledge of appropriate -count values for different scenarios\",\n \"prompt\": \"How many benchmark iterations (-count) should I use? I've seen recommendations ranging from 1 to 30.\",\n \"trap\": \"Without the skill, the model may give a single number without context-dependent guidance\",\n \"assertions\": [\n {\"id\": \"55.1\", \"text\": \"Recommends 6 minimum for quick local checks\"},\n {\"id\": \"55.2\", \"text\": \"Recommends 10 for standard pre-merge comparisons\"},\n {\"id\": \"55.3\", \"text\": \"Recommends 20-30 for detecting small changes (\u003c5%)\"},\n {\"id\": \"55.4\", \"text\": \"Recommends 20+ for noisy CI environments\"},\n {\"id\": \"55.5\", \"text\": \"Explicitly warns against -count=1 as providing no variance information\"}\n ]\n },\n {\n \"id\": 56,\n \"name\": \"closure-capturing-pointer-escape\",\n \"description\": \"Tests understanding of a subtle escape: a closure capturing a loop variable forces the variable to escape to the heap even when no interface boxing occurs\",\n \"prompt\": \"I have a benchmark and escape analysis shows these local integer variables escaping to the heap:\\n\\n```go\\nfor i := 0; i \u003c 10; i++ {\\n val := computeValue(i)\\n go func() {\\n results[i] = val\\n }()\\n}\\n```\\n\\nWhy do `i` and `val` escape to the heap? There are no interface conversions here.\",\n \"trap\": \"Without the skill, the model explains interface boxing (which isn't happening here) or vaguely mentions 'closures cause escapes' without explaining the specific mechanism: the goroutine may outlive the enclosing function, so any variable the closure captures must be heap-allocated to remain accessible\",\n \"assertions\": [\n {\"id\": \"56.1\", \"text\": \"Explains that the goroutine launched with `go func()` may outlive the enclosing function\"},\n {\"id\": \"56.2\", \"text\": \"Explains that any variable captured by a closure that escapes (e.g., via goroutine launch or return) must be heap-allocated to remain accessible after the outer function returns\"},\n {\"id\": \"56.3\", \"text\": \"Identifies the specific escape cause: `go func()` body references `val` and `i`, so both variables escape via closure capture\"},\n {\"id\": \"56.4\", \"text\": \"Distinguishes this from interface boxing — this escape is purely lifetime-based, not type-conversion based\"},\n {\"id\": \"56.5\", \"text\": \"Suggests passing variables as arguments to the goroutine function to break the closure capture and avoid escape\"}\n ]\n },\n {\n \"id\": 57,\n \"name\": \"trace-scheduling-latency-diagnosis\",\n \"description\": \"Tests ability to diagnose scheduling latency from trace data\",\n \"prompt\": \"My execution trace shows many goroutines spending significant time in the 'runnable' (yellow) state before becoming 'running' (green). What does this mean and how do I fix it?\",\n \"trap\": \"Without the skill, the model may not connect runnable-to-running delay with CPU saturation\",\n \"assertions\": [\n {\"id\": \"57.1\", \"text\": \"Identifies this as high scheduling latency — goroutines ready to run but waiting for a processor\"},\n {\"id\": \"57.2\", \"text\": \"Diagnoses the cause: too many goroutines competing for GOMAXPROCS processors (CPU saturation)\"},\n {\"id\": \"57.3\", \"text\": \"Suggests extracting the scheduling latency profile: go tool trace -pprof=sched trace.out\"},\n {\"id\": \"57.4\", \"text\": \"Recommends checking for uneven distribution across Ps (work imbalance)\"},\n {\"id\": \"57.5\", \"text\": \"Lists other potential causes: OS scheduling interference, goroutines pinned by cgo or long syscalls\"}\n ]\n },\n {\n \"id\": 58,\n \"name\": \"benchmark-output-format-parsing\",\n \"description\": \"Tests understanding of the benchmark output format\",\n \"prompt\": \"I see this benchmark output: `BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op`. What does each part mean?\",\n \"trap\": \"Without the skill, the model may not explain the -8 suffix correctly\",\n \"assertions\": [\n {\"id\": \"58.1\", \"text\": \"Explains that -8 is the GOMAXPROCS suffix\"},\n {\"id\": \"58.2\", \"text\": \"Explains that 5000000 is the number of iterations (b.N)\"},\n {\"id\": \"58.3\", \"text\": \"Explains that ns/op is time per operation\"},\n {\"id\": \"58.4\", \"text\": \"Explains that B/op is bytes allocated per operation\"},\n {\"id\": \"58.5\", \"text\": \"Explains that allocs/op is heap allocation count per operation\"}\n ]\n },\n {\n \"id\": 59,\n \"name\": \"pprof-show-from-framework-noise\",\n \"description\": \"Tests knowledge of show_from to trim framework noise in profiles\",\n \"prompt\": \"My pprof output is cluttered with HTTP framework routing functions (net/http.(*ServeMux).ServeHTTP, etc.) that appear above every handler. How do I remove this noise and start the analysis from my handler code?\",\n \"trap\": \"Without the skill, the model may suggest using ignore (which changes cost accounting) instead of show_from\",\n \"assertions\": [\n {\"id\": \"59.1\", \"text\": \"Uses show_from=regex to trim all frames above the first matching function\"},\n {\"id\": \"59.2\", \"text\": \"Shows the correct pattern: show_from=handler.Handle or similar\"},\n {\"id\": \"59.3\", \"text\": \"Explains that show_from hides all callers above the match point\"},\n {\"id\": \"59.4\", \"text\": \"Differentiates from ignore which removes functions and re-attributes their costs\"}\n ]\n },\n {\n \"id\": 60,\n \"name\": \"optional-prometheus-metrics-enablement\",\n \"description\": \"Tests knowledge of how to enable optional Go runtime Prometheus metrics\",\n \"prompt\": \"I want to expose Go scheduler metrics (goroutine scheduling latency) and CPU class breakdowns in Prometheus. The default go_memstats_* metrics don't include these. How do I enable them?\",\n \"trap\": \"Without the skill, the model may not know about the opt-in collector configuration\",\n \"assertions\": [\n {\"id\": \"60.1\", \"text\": \"Shows creating a custom registry with collectors.NewGoCollector\"},\n {\"id\": \"60.2\", \"text\": \"Uses collectors.WithGoCollectorRuntimeMetrics option\"},\n {\"id\": \"60.3\", \"text\": \"Mentions collectors.MetricsAll or specific GoRuntimeMetricsRule values\"},\n {\"id\": \"60.4\", \"text\": \"Notes this requires Go 1.17+\"},\n {\"id\": \"60.5\", \"text\": \"Lists scheduler metrics like go_sched_latencies_seconds and CPU class metrics like go_cpu_classes_*\"}\n ]\n },\n {\n \"id\": 61,\n \"name\": \"benchdiff-usage-patterns\",\n \"description\": \"Tests practical usage of benchdiff for PR comparisons\",\n \"prompt\": \"I want to quickly compare the benchmark performance of my current changes against the main branch. I don't want to manually check out branches and run benchmarks separately. What tool can automate this?\",\n \"trap\": \"Without the skill, the model may suggest a manual checkout-and-compare workflow\",\n \"assertions\": [\n {\"id\": \"61.1\", \"text\": \"Recommends benchdiff for automatic branch comparison\"},\n {\"id\": \"61.2\", \"text\": \"Shows the basic command: benchdiff -base-ref main -- -benchmem -count=10\"},\n {\"id\": \"61.3\", \"text\": \"Explains that benchdiff caches results for non-worktree refs so re-runs are fast\"},\n {\"id\": \"61.4\", \"text\": \"Mentions benchdiff -clear-cache for stale cache situations\"},\n {\"id\": \"61.5\", \"text\": \"Notes that benchdiff prevents macOS sleep during benchmarks\"}\n ]\n },\n {\n \"id\": 62,\n \"name\": \"pprof-noinlines-attribution\",\n \"description\": \"Tests knowledge of the noinlines option for simplifying inlined function chains\",\n \"prompt\": \"My pprof call graph shows many tiny inlined functions creating confusing chains. The real cost is in the outer function but it's split across multiple inline nodes. How do I simplify this?\",\n \"trap\": \"Without the skill, the model may suggest using show/hide which don't properly aggregate inlined costs\",\n \"assertions\": [\n {\"id\": \"62.1\", \"text\": \"Uses the noinlines option or -noinlines flag\"},\n {\"id\": \"62.2\", \"text\": \"Explains that noinlines attributes inlined functions to their first out-of-line caller\"},\n {\"id\": \"62.3\", \"text\": \"Shows correct usage: either `noinlines` in interactive mode or `-noinlines` as CLI flag\"}\n ]\n },\n {\n \"id\": 63,\n \"name\": \"sub-benchmarks-table-driven\",\n \"description\": \"Tests proper structure for table-driven sub-benchmarks with b.Run\",\n \"prompt\": \"I want to benchmark my Encode function with different input sizes (64, 256, 4096 bytes) in a single benchmark function. The project uses Go 1.24.\",\n \"trap\": \"Without the skill, the model may write separate benchmark functions or use b.N loop inside b.Run without b.Loop()\",\n \"assertions\": [\n {\"id\": \"63.1\", \"text\": \"Uses b.Run() with descriptive sub-benchmark names (e.g., size=64)\"},\n {\"id\": \"63.2\", \"text\": \"Uses b.Loop() (not range b.N) inside each sub-benchmark since Go 1.24\"},\n {\"id\": \"63.3\", \"text\": \"Places setup code (like make([]byte, size)) before the b.Loop() call\"},\n {\"id\": \"63.4\", \"text\": \"Uses a loop or range over sizes with fmt.Sprintf for names\"},\n {\"id\": \"63.5\", \"text\": \"Output will look like BenchmarkEncode/size=64, BenchmarkEncode/size=256, etc.\"}\n ]\n },\n {\n \"id\": 64,\n \"name\": \"benchstat-row-col-projection\",\n \"description\": \"Tests knowledge of -row flag to control what appears as rows vs the default grouping\",\n \"prompt\": \"I ran benchmarks for two packages (parser and encoder) with two sub-benchmark parameters each (format=json and format=gob). benchstat mixes everything into one table. I want separate tables per package, with format as columns. What benchstat flags should I use?\",\n \"trap\": \"Without the skill, the model suggests splitting into separate files and running benchstat separately, missing the -table pkg and -col /format flags that combine to produce the desired layout\",\n \"assertions\": [\n {\"id\": \"64.1\", \"text\": \"Uses -table pkg to produce a separate table per package\"},\n {\"id\": \"64.2\", \"text\": \"Uses -col /format to make format values (json, gob) the column headers\"},\n {\"id\": \"64.3\", \"text\": \"Shows the combined command: benchstat -table pkg -col /format bench.txt\"},\n {\"id\": \"64.4\", \"text\": \"Explains that -table controls the grouping dimension and -col controls column layout\"}\n ]\n },\n {\n \"id\": 65,\n \"name\": \"trace-short-lived-goroutines\",\n \"description\": \"Tests ability to identify goroutine creation overhead in traces\",\n \"prompt\": \"My execution trace shows thousands of very short-lived goroutines being created and destroyed rapidly. Is this a problem?\",\n \"trap\": \"Without the skill, the model may say goroutines are cheap and this is fine\",\n \"assertions\": [\n {\"id\": \"65.1\", \"text\": \"Identifies high overhead from goroutine creation and scheduling for very short-lived goroutines\"},\n {\"id\": \"65.2\", \"text\": \"Recommends batching work or using worker pools to reduce creation overhead\"},\n {\"id\": \"65.3\", \"text\": \"Mentions checking for goroutines created in loops without bounds as a potential leak pattern\"},\n {\"id\": \"65.4\", \"text\": \"Notes that goroutines created but never finishing indicates a leak\"}\n ]\n },\n {\n \"id\": 66,\n \"name\": \"benchmark-side-effects-corrupt-results\",\n \"description\": \"Tests awareness that benchmarks with shared mutable state produce non-reproducible and misleading results\",\n \"prompt\": \"I wrote this benchmark to measure LRU cache insertions:\\n\\n```go\\nvar globalCache = NewLRUCache(1000)\\n\\nfunc BenchmarkCacheInsert(b *testing.B) {\\n for b.Loop() {\\n globalCache.Set(randomKey(), randomValue())\\n }\\n}\\n```\\n\\nThe first run shows 45 ns/op. The second run shows 180 ns/op. Why do results vary so much between runs?\",\n \"trap\": \"Without the skill, the model suggests adding b.ResetTimer() or increasing benchtime — missing that shared mutable global state across runs causes the cache to be in different fill states, making each b.N iteration operate on a different cache state (empty, half-full, full with evictions)\",\n \"assertions\": [\n {\"id\": \"66.1\", \"text\": \"Identifies that globalCache is mutable shared state — its fill level changes across b.N iterations and across separate runs\"},\n {\"id\": \"66.2\", \"text\": \"Explains that early iterations insert into empty slots (fast), while later iterations trigger evictions (slow) — the benchmark measures a mix of two different operations\"},\n {\"id\": \"66.3\", \"text\": \"Recommends using b.StopTimer()/b.StartTimer() or b.ResetTimer() with setup per batch, OR pre-filling/resetting the cache state to a known baseline before the benchmark loop\"},\n {\"id\": \"66.4\", \"text\": \"Notes that mutable global state in benchmarks produces results that vary by execution order — isolated state inside b.Run sub-benchmarks or per-benchmark setup is the correct pattern\"}\n ]\n },\n {\n \"id\": 67,\n \"name\": \"async-work-allocation-measurement-scope\",\n \"description\": \"Tests that benchmark allocation counts cover allocations during the measured window, including other goroutines, but async work can escape measurement if synchronization is wrong\",\n \"prompt\": \"I have a benchmark that spawns a goroutine inside the loop to do async work:\\n\\n```go\\nfunc BenchmarkProcess(b *testing.B) {\\n b.ReportAllocs()\\n for b.Loop() {\\n ch := make(chan Result, 1)\\n go worker(ch) // worker allocates internally\\n \u003c-ch\\n }\\n}\\n```\\n\\nThe benchmark reports 1 alloc/op (just the channel). But profiling shows the worker goroutine allocates heavily. Why doesn't ReportAllocs see those?\",\n \"trap\": \"Without the skill, the model may incorrectly claim ReportAllocs only tracks the benchmark goroutine. Allocation counts are global deltas while the timer is running; discrepancies usually mean async work occurred outside the measured window, synchronization is wrong, or the profile covers a different scope.\",\n \"assertions\": [\n {\"id\": \"67.1\", \"text\": \"Explains that ReportAllocs/-benchmem use allocation deltas while the benchmark timer is running, not per-goroutine attribution\"},\n {\"id\": \"67.2\", \"text\": \"Checks whether the worker goroutine is fully synchronized inside the b.Loop measured window\"},\n {\"id\": \"67.3\", \"text\": \"Recommends using a heap profile (-memprofile) to inspect allocation sites across goroutines\"},\n {\"id\": \"67.4\", \"text\": \"Notes that profile-vs-benchmark discrepancies can come from async work outside timing or a different measurement scope\"}\n ]\n },\n {\n \"id\": 68,\n \"name\": \"pprof-goroutine-debug-levels\",\n \"description\": \"Tests knowledge of the different debug levels for goroutine dumps\",\n \"prompt\": \"I want to get a human-readable dump of all goroutine stacks from my running Go service via HTTP, without using go tool pprof. How?\",\n \"trap\": \"Without the skill, the model may only suggest the binary profile format\",\n \"assertions\": [\n {\"id\": \"68.1\", \"text\": \"Uses curl with ?debug=1 for human-readable goroutine dump\"},\n {\"id\": \"68.2\", \"text\": \"Mentions ?debug=2 for full stack traces with creation site and labels\"},\n {\"id\": \"68.3\", \"text\": \"Shows the URL: http://localhost:6060/debug/pprof/goroutine?debug=1 or ?debug=2\"},\n {\"id\": \"68.4\", \"text\": \"Notes that no go tool pprof is needed for debug mode dumps\"}\n ]\n },\n {\n \"id\": 69,\n \"name\": \"ci-threshold-calibration\",\n \"description\": \"Tests knowledge of appropriate regression thresholds for different CI environments\",\n \"prompt\": \"I'm setting up benchmark regression detection in CI. What threshold should I set for failing PRs on performance regression?\",\n \"trap\": \"Without the skill, the model may suggest a tight threshold (5%) without considering the CI environment\",\n \"assertions\": [\n {\"id\": \"69.1\", \"text\": \"Recommends 20%+ threshold on shared/GitHub-hosted runners\"},\n {\"id\": \"69.2\", \"text\": \"Recommends 10% on dedicated self-hosted runners\"},\n {\"id\": \"69.3\", \"text\": \"Explains that tight thresholds on noisy environments produce false positives that erode trust\"},\n {\"id\": \"69.4\", \"text\": \"Mentions that GitHub-hosted runners show ~2-3% coefficient of variation in best case\"},\n {\"id\": \"69.5\", \"text\": \"States that \u003c1% false positive rate requires 7%+ performance gate\"}\n ]\n },\n {\n \"id\": 70,\n \"name\": \"cpu-profile-requires-binary-for-symbolization\",\n \"description\": \"Tests knowledge that a CPU profile file captured from a benchmark contains no symbols — the original test binary is required for symbolization\",\n \"prompt\": \"I ran `go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser` on our CI server, then downloaded cpu.prof to my laptop to analyze. When I run `go tool pprof cpu.prof` I see only hex addresses instead of function names. What's missing?\",\n \"trap\": \"Without the skill, the model may suggest rebuilding pprof or using -symbolize=remote — missing that the pprof file is unsymbolized and requires the original compiled test binary (parser.test) to resolve addresses to function names\",\n \"assertions\": [\n {\"id\": \"70.1\", \"text\": \"Explains that cpu.prof contains memory addresses, not function names — it requires the compiled test binary for symbolization\"},\n {\"id\": \"70.2\", \"text\": \"States that go test -bench also produces a test binary (e.g., parser.test) alongside the profile\"},\n {\"id\": \"70.3\", \"text\": \"Shows the correct command: go tool pprof parser.test cpu.prof (pass the binary as the first argument)\"},\n {\"id\": \"70.4\", \"text\": \"Recommends downloading both the .prof file AND the corresponding test binary from CI for remote analysis\"}\n ]\n },\n {\n \"id\": 71,\n \"name\": \"benchstat-ignore-dimension-warning\",\n \"description\": \"Tests knowledge of the -ignore flag for suppressing benchstat dimension warnings\",\n \"prompt\": \"benchstat gives me a warning: 'benchmarks vary in /gomaxprocs'. How do I suppress this?\",\n \"trap\": \"Without the skill, the model may suggest filtering the output or restructuring benchmarks\",\n \"assertions\": [\n {\"id\": \"71.1\", \"text\": \"Uses -ignore /gomaxprocs to suppress the warning\"},\n {\"id\": \"71.2\", \"text\": \"Explains that -ignore omits keys from grouping\"},\n {\"id\": \"71.3\", \"text\": \"Alternatively suggests -row .name to simplify row grouping\"},\n {\"id\": \"71.4\", \"text\": \"Mentions that -col /gomaxprocs could be used to compare across GOMAXPROCS values instead\"}\n ]\n },\n {\n \"id\": 72,\n \"name\": \"gcflags-all-vs-single-package-scope\",\n \"description\": \"Tests understanding of -gcflags=\\\"all=-m\\\" vs -gcflags=\\\"-m\\\" and when the wider scope matters\",\n \"prompt\": \"I'm investigating unexpected heap allocations in my Go service. I ran `go build -gcflags=\\\"-m\\\" ./...` and the escape analysis output only shows my own packages. But I suspect a third-party library function is causing my structs to escape when passed to it. How do I see escape decisions for dependencies too?\",\n \"trap\": \"Without the skill, the model only knows -gcflags=\\\"-m\\\" which applies to packages named on the command line, not their dependencies — missing the all= prefix that applies flags to all transitively compiled packages including vendor/module dependencies\",\n \"assertions\": [\n {\"id\": \"72.1\", \"text\": \"Explains that -gcflags=\\\"-m\\\" only applies escape analysis to packages explicitly listed on the command line, not their dependencies\"},\n {\"id\": \"72.2\", \"text\": \"Shows go build -gcflags=\\\"all=-m\\\" ./... to apply escape analysis to ALL compiled packages including dependencies\"},\n {\"id\": \"72.3\", \"text\": \"Warns that all=-m produces very verbose output — recommends piping through grep to filter for the specific package or function of interest\"},\n {\"id\": \"72.4\", \"text\": \"Notes this is especially useful for diagnosing parameter escapes caused by passing values to functions in external packages that the compiler cannot analyze inline\"}\n ]\n },\n {\n \"id\": 73,\n \"name\": \"runtime-readmemstats-vs-runtime-metrics\",\n \"description\": \"Tests knowledge of the programmatic APIs for runtime statistics\",\n \"prompt\": \"I want to read GC statistics programmatically in my Go application for a custom dashboard. What APIs are available and which should I prefer?\",\n \"trap\": \"Without the skill, the model may recommend ReadMemStats without noting its overhead or the modern alternative\",\n \"assertions\": [\n {\"id\": \"73.1\", \"text\": \"Mentions runtime.ReadMemStats for heap size, NumGC, pause durations\"},\n {\"id\": \"73.2\", \"text\": \"Mentions debug.ReadGCStats for GC-specific statistics\"},\n {\"id\": \"73.3\", \"text\": \"Recommends runtime/metrics (Go 1.16+) as the preferred modern API\"},\n {\"id\": \"73.4\", \"text\": \"Explains that runtime/metrics has lower overhead and is safe for concurrent reads\"},\n {\"id\": \"73.5\", \"text\": \"Notes that ReadMemStats is more expensive due to internal locking\"}\n ]\n },\n {\n \"id\": 74,\n \"name\": \"pprof-callgrind-export\",\n \"description\": \"Tests knowledge of exporting pprof data to external visualization tools\",\n \"prompt\": \"I want to analyze a Go pprof profile in KCachegrind for more advanced visualization. How do I export the data?\",\n \"trap\": \"Without the skill, the model may not know about the callgrind export format\",\n \"assertions\": [\n {\"id\": \"74.1\", \"text\": \"Uses the callgrind command or -callgrind flag to export\"},\n {\"id\": \"74.2\", \"text\": \"Shows the command: go tool pprof -callgrind cpu.prof > cpu.callgrind\"},\n {\"id\": \"74.3\", \"text\": \"Mentions KCachegrind or QCachegrind as visualization tools\"},\n {\"id\": \"74.4\", \"text\": \"Notes the proto command for saving in protobuf format as an alternative\"}\n ]\n },\n {\n \"id\": 75,\n \"name\": \"expvar-lightweight-monitoring\",\n \"description\": \"Tests knowledge of expvar as a lightweight alternative to Prometheus for runtime metrics\",\n \"prompt\": \"I want a lightweight way to expose Go runtime variables as JSON for monitoring without adding a full Prometheus dependency. What does the stdlib offer?\",\n \"trap\": \"Without the skill, the model may suggest writing a custom handler or only mention pprof\",\n \"assertions\": [\n {\"id\": \"75.1\", \"text\": \"Recommends expvar package from the stdlib\"},\n {\"id\": \"75.2\", \"text\": \"Shows that import _ \\\"expvar\\\" auto-registers at /debug/vars\"},\n {\"id\": \"75.3\", \"text\": \"Notes it serves JSON format\"},\n {\"id\": \"75.4\", \"text\": \"Mentions integration with Netdata, Telegraf, or custom dashboards\"}\n ]\n },\n {\n \"id\": 76,\n \"name\": \"benchstat-table-flag-per-package\",\n \"description\": \"Tests knowledge of the -table flag for grouping benchstat output by package\",\n \"prompt\": \"I ran benchmarks across multiple packages and the benchstat output mixes them all together. How do I get separate comparison tables per package?\",\n \"trap\": \"Without the skill, the model may suggest running benchstat separately per package\",\n \"assertions\": [\n {\"id\": \"76.1\", \"text\": \"Uses -table pkg flag to create one table per package\"},\n {\"id\": \"76.2\", \"text\": \"Shows the command: benchstat -table pkg old.txt new.txt\"},\n {\"id\": \"76.3\", \"text\": \"Explains that the default -table value is .config which groups by goos/goarch/pkg/cpu\"}\n ]\n },\n {\n \"id\": 77,\n \"name\": \"pprof-symbolization-remote\",\n \"description\": \"Tests knowledge of pprof symbolization modes for remote profiling\",\n \"prompt\": \"I captured a pprof profile from a production server but the function names show as hex addresses instead of readable names. How do I fix this?\",\n \"trap\": \"Without the skill, the model may not know about the symbolization modes\",\n \"assertions\": [\n {\"id\": \"77.1\", \"text\": \"Explains pprof symbolization modes: local, remote, none\"},\n {\"id\": \"77.2\", \"text\": \"Shows -symbolize=local to use local binaries\"},\n {\"id\": \"77.3\", \"text\": \"Mentions PPROF_BINARY_PATH environment variable for setting binary search paths\"},\n {\"id\": \"77.4\", \"text\": \"Shows -symbolize=remote to contact the running service for symbol information\"}\n ]\n },\n {\n \"id\": 78,\n \"name\": \"trace-concurrent-with-flight-recorder\",\n \"description\": \"Tests understanding that trace.Start and FlightRecorder can run concurrently\",\n \"prompt\": \"I have a flight recorder running in my service. If I also call trace.Start() to capture a short trace for debugging, will they conflict?\",\n \"trap\": \"Without the skill, the model may assume they can't coexist\",\n \"assertions\": [\n {\"id\": \"78.1\", \"text\": \"States that a flight recorder can run concurrently with trace.Start\"},\n {\"id\": \"78.2\", \"text\": \"Notes the constraint that at most one flight recorder may be active at a time\"},\n {\"id\": \"78.3\", \"text\": \"Clarifies that both can be active simultaneously without conflict\"}\n ]\n },\n {\n \"id\": 79,\n \"name\": \"pyroscope-overhead-warning\",\n \"description\": \"Tests awareness of continuous profiling overhead at scale\",\n \"prompt\": \"Our SRE team wants to enable Pyroscope continuous profiling on all 200 production instances. Any concerns?\",\n \"trap\": \"Without the skill, the model may approve fleet-wide enablement without the cost warning\",\n \"assertions\": [\n {\"id\": \"79.1\", \"text\": \"Warns about ~2-5% CPU overhead per instance for continuous profiling\"},\n {\"id\": \"79.2\", \"text\": \"Notes that at 200 instances, the aggregate compute cost and backend storage cost is significant\"},\n {\"id\": \"79.3\", \"text\": \"Recommends enabling on a subset of instances or on-demand via environment variable\"},\n {\"id\": \"79.4\", \"text\": \"Emphasizes the investigation session approach: enable on target instances only, not fleet-wide\"}\n ]\n },\n {\n \"id\": 80,\n \"name\": \"non-go-memory-leak-detection\",\n \"description\": \"Tests the PromQL pattern for detecting non-Go memory leaks (cgo, mmap)\",\n \"prompt\": \"My Go service's RSS keeps growing but go_memstats_alloc_bytes appears stable. The leak doesn't seem to be in Go heap memory. How do I diagnose this?\",\n \"trap\": \"Without the skill, the model may only investigate Go heap, missing cgo/mmap leaks\",\n \"assertions\": [\n {\"id\": \"80.1\", \"text\": \"Uses the PromQL pattern: process_resident_memory_bytes - go_memstats_sys_bytes\"},\n {\"id\": \"80.2\", \"text\": \"Explains that the gap represents non-Go memory (cgo, mmap, etc.)\"},\n {\"id\": \"80.3\", \"text\": \"A growing gap indicates a non-Go memory leak\"},\n {\"id\": \"80.4\", \"text\": \"Suggests investigating cgo calls or memory-mapped files as potential sources\"}\n ]\n }\n]\n","content_type":"application/json; charset=utf-8","language":"json","size":83154,"content_sha256":"3c0bfb6f93dec158f30cec0484eef76a6649c0c89766a00026bf6fde02868a92"},{"filename":"references/benchstat.md","content":"# benchstat Reference\n\n`benchstat` computes statistical summaries and A/B comparisons of Go benchmark results. A single benchmark run tells you nothing about variance — `benchstat` tells you whether the difference between two runs is real or noise.\n\n## Installation\n\n```bash\ngo install golang.org/x/perf/cmd/benchstat@latest\n```\n\n## Usage\n\n```bash\nbenchstat [flags] inputs...\n```\n\nEach input is a file containing `go test -bench` output. Optionally label inputs with `label=path` syntax.\n\n## Basic Workflow\n\n### Step 0: Write benchmarks\n\nUse the standard Go benchmark function signature in `*_test.go`:\n\n### Step 1: Measure baseline\n\nRun benchmarks with `-count=10` or more. Each run produces one data point — you need at least 10 to compute a meaningful confidence interval:\n\n```bash\ngo test -run='^

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision. Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions. Go Benchmarking & Performance Measurement Performance improvement does not exist without measures — if you can measure it, you can improve it. This skill covers the full measurement workflo…

-bench=BenchmarkParse -benchmem -count=10 ./pkg/parser | tee old.txt\n```\n\n`-run='^

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision. Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions. Go Benchmarking & Performance Measurement Performance improvement does not exist without measures — if you can measure it, you can improve it. This skill covers the full measurement workflo…

` skips unit tests so only benchmarks run — avoids wasting time on tests during measurement sessions.\n\n### Step 2: Make your change\n\nEdit the code you want to optimize.\n\n### Step 3: Measure again\n\nSame command, same flags, same machine, same load conditions:\n\n```bash\ngo test -run='^

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision. Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions. Go Benchmarking & Performance Measurement Performance improvement does not exist without measures — if you can measure it, you can improve it. This skill covers the full measurement workflo…

-bench=BenchmarkParse -benchmem -count=10 ./pkg/parser | tee new.txt\n```\n\n### Step 4: Compare\n\n```bash\nbenchstat old.txt new.txt\n```\n\nOutput:\n\n```\ngoos: linux\ngoarch: amd64\npkg: myapp/pkg/parser\ncpu: AMD Ryzen 9 5950X 16-Core Processor\n │ old.txt │ new.txt │\n │ sec/op │ sec/op vs base │\nParse-32 4.592µ ± 2% 3.041µ ± 1% -33.78% (p=0.000 n=10)\n\n │ old.txt │ new.txt │\n │ B/op │ B/op vs base │\nParse-32 1.024Ki ± 0% 0.512Ki ± 0% -50.00% (p=0.000 n=10)\n\n │ old.txt │ new.txt │\n │ allocs/op │ allocs/op vs base │\nParse-32 12.00 ± 0% 6.000 ± 0% -50.00% (p=0.000 n=10)\n```\n\n## Reading the Output\n\n| Element | Meaning | What to look for |\n| --- | --- | --- |\n| **median** (e.g., `4.592µ`) | Central value across runs — more robust than mean because outliers don't skew it | The reference number for this benchmark |\n| **± N%** (e.g., `± 2%`) | Half-width of the 95% confidence interval as a percentage of the median | Low (≤2%) = stable measurement. High (>5%) = noisy — investigate noise sources before trusting results |\n| **vs base** (e.g., `-33.78%`) | Percentage change from the first input (base) to subsequent inputs | Negative = faster/smaller. Positive = slower/larger |\n| **p=N** (e.g., `p=0.000`) | p-value from Mann-Whitney U-test (non-parametric) | \u003c0.05 = statistically significant. ≥0.05 = difference could be noise |\n| **n=N** (e.g., `n=10`) | Number of samples used in the comparison | Should usually match your `-count`; if it does not, check that each input file contains the same benchmark rows and units |\n| **`~`** | No statistically significant difference detected | Do NOT claim improvement — the change might be zero |\n| **geomean** row | Geometric mean of changes across all benchmarks in the table | Overall proportional change; useful when comparing many benchmarks at once |\n\n### Unit normalization\n\nbenchstat automatically normalizes units for display:\n\n- `ns/op` → displayed as `sec/op` (with µ, m prefixes) to avoid nonsensical `µns/op`\n- `MB/s` → displayed as `B/s` (with K, M, G prefixes)\n\n### When the `~` symbol appears\n\n```\nParse-32 4.592µ ± 8% 4.481µ ± 7% ~ (p=0.089 n=10)\n```\n\nThis means benchstat cannot distinguish the difference from random noise. The wide confidence intervals (±8%, ±7%) overlap. Do not claim improvement. Options:\n\n- Increase `-count` to 20+ (narrower CI may reveal a real difference)\n- Reduce noise sources (close applications, plug in power, use dedicated machine)\n- Accept that the change has no measurable effect on this benchmark\n\n## Flags Reference\n\n### Projection flags\n\nThese flags control how benchmark results are grouped into tables, rows, and columns.\n\n| Flag | Default | Purpose |\n| --- | --- | --- |\n| `-table KEYS` | `.config` | Group results into separate tables by these keys |\n| `-row KEYS` | `.fullname` | Group results into table rows by these keys |\n| `-col KEYS` | `.file` | Compare across columns with different values of these keys |\n| `-ignore KEYS` | (none) | Omit keys from grouping — suppresses \"benchmarks vary\" warnings |\n\n**Available keys:**\n\n| Key | Meaning | Example value |\n| --- | --- | --- |\n| `.name` | Base benchmark name (without sub-benchmark config) | `Parse` from `BenchmarkParse/size=4k-16` |\n| `.fullname` | Full name including sub-benchmark configuration | `Parse/size=4k-16` |\n| `.file` | Input file name or custom label | `old.txt` or `baseline` |\n| `.config` | All file-level configuration keys combined | `goos/goarch/pkg/cpu` |\n| `.unit` | Metric unit name | `sec/op`, `B/op`, `allocs/op` |\n| `/{name-key}` | Per-benchmark sub-name key | `/size` extracts `4k` from `Parse/size=4k` |\n| `/gomaxprocs` | GOMAXPROCS value — recognizes both `/gomaxprocs=N` and the `-N` suffix convention | `16` from `Parse-16` |\n| `goos` | Operating system (from benchmark output header) | `linux`, `darwin` |\n| `goarch` | Architecture (from benchmark output header) | `amd64`, `arm64` |\n| `pkg` | Package path (from benchmark output header) | `myapp/pkg/parser` |\n| `cpu` | CPU model (from benchmark output header) | `AMD Ryzen 9 5950X` |\n\n**Sort order modifiers** — append to any key:\n\n| Modifier | Meaning | Example |\n| --- | --- | --- |\n| `@alpha` | Alphabetic sort | `/format@alpha` |\n| `@num` | Numeric sort (understands prefixes: 2k, 1Mi) | `/size@num` |\n| `@(val1 val2 ...)` | Fixed order + filter (only listed values, in this order) | `/format@(gob json)` |\n\n### Filter flag\n\n| Flag | Purpose |\n| --- | --- |\n| `-filter EXPR` | Filter which benchmarks are processed before grouping and comparison |\n\nSee [Filter Expression Syntax](#filter-expression-syntax) below for full details.\n\n### Input labeling\n\nNot a flag but a syntax feature — label input files for clearer column headers:\n\n```bash\n# Default: file names become column headers\nbenchstat old.txt new.txt\n\n# Custom labels\nbenchstat baseline=old.txt optimized=new.txt\n\n# Multiple versions\nbenchstat v1=v1.txt v2=v2.txt v3=v3.txt\n```\n\nThe first input is always the **base** for comparison. All subsequent inputs are compared against it.\n\n## Filter Expression Syntax\n\nFilters select which benchmarks to include before grouping and comparison. The syntax is:\n\n### Matching operators\n\n| Pattern | Meaning | Example |\n| --- | --- | --- |\n| `key:value` | Exact match | `goos:linux` |\n| `key:\"value\"` | Exact match with quoted value (allows spaces, special chars) | `pkg:\"github.com/user/repo\"` |\n| `key:/regexp/` | Regular expression match (Go regexp syntax) | `.name:/Parse\\|Encode/` |\n| `key:(val1 OR val2)` | Match any of the listed values | `goos:(linux OR darwin)` |\n| `*` | Match everything (all benchmarks) | `*` |\n\n### Logical operators\n\n| Operator | Meaning | Example |\n| --- | --- | --- |\n| `x y` | AND — both must match (implicit) | `goos:linux goarch:amd64` |\n| `x AND y` | AND — explicit form | `goos:linux AND goarch:amd64` |\n| `x OR y` | OR — either must match | `goos:linux OR goos:darwin` |\n| `-x` | NOT — must not match | `-goos:windows` |\n| `(...)` | Grouping / subexpression | `(goos:linux OR goos:darwin) -pkg:/internal/` |\n\n### Filter key types\n\n| Key | What it matches | Example |\n| --- | --- | --- |\n| `.name` | Base benchmark name | `.name:Parse` |\n| `.fullname` | Full name with sub-benchmark config | `.fullname:/Parse\\/size=4k/` |\n| `/{name-key}` | Sub-benchmark parameter | `/size:4k` |\n| `/gomaxprocs` | GOMAXPROCS value | `/gomaxprocs:16` |\n| `.file` | Input file label | `.file:old.txt` |\n| `.unit` | Metric unit | `.unit:sec/op` |\n| `goos` | OS from header | `goos:linux` |\n| `goarch` | Architecture from header | `goarch:amd64` |\n| `pkg` | Package from header | `pkg:/parser/` |\n\n### Filter examples\n\n```bash\n# Only Parse benchmarks\nbenchstat -filter '.name:Parse' old.txt new.txt\n\n# Only benchmarks with size=4096 sub-parameter\nbenchstat -filter '/size:4096' old.txt new.txt\n\n# Exclude Parallel benchmarks\nbenchstat -filter '-.name:/Parallel/' old.txt new.txt\n\n# Linux amd64 only\nbenchstat -filter 'goos:linux goarch:amd64' old.txt new.txt\n\n# Multiple benchmark names\nbenchstat -filter '.name:(Parse OR Encode OR Decode)' old.txt new.txt\n\n# Complex: Linux or Darwin, not internal packages, only sec/op metric\nbenchstat -filter '(goos:linux OR goos:darwin) -pkg:/internal/ .unit:sec/op' old.txt new.txt\n\n# Regex: all benchmarks starting with Bench\nbenchstat -filter '.name:/^Bench/' old.txt new.txt\n```\n\n## Projection Examples\n\n### Default: before/after file comparison\n\n```bash\nbenchstat old.txt new.txt\n# Equivalent to:\nbenchstat -table .config -row .fullname -col .file old.txt new.txt\n```\n\nCreates one row per benchmark, one column per file.\n\n### Compare sub-benchmark parameters within a single file\n\nWhen a single benchmark file contains multiple sub-benchmarks (e.g., `BenchmarkEncode/format=json` and `BenchmarkEncode/format=gob`):\n\n```bash\nbenchstat -col /format bench.txt\n```\n\nCreates columns for each value of `/format`, comparing them against each other.\n\n### Simplify rows to base name only\n\n```bash\nbenchstat -col /format -row .name bench.txt\n```\n\nStrips sub-benchmark configuration from row names, making the table more compact.\n\n### Control column order\n\n```bash\n# Force gob first, then json (instead of alphabetical)\nbenchstat -col '/format@(gob json)' bench.txt\n```\n\n### Group by GOMAXPROCS\n\n```bash\nbenchstat -col /gomaxprocs bench.txt\n```\n\nCompares performance across different GOMAXPROCS values within the same file.\n\n### Separate tables per package\n\n```bash\nbenchstat -table pkg old.txt new.txt\n```\n\nCreates one table per package — useful when comparing benchmarks across multiple packages.\n\n### Ignore a dimension\n\n```bash\n# Suppress \"benchmarks vary in /gomaxprocs\" warning\nbenchstat -row .name -ignore /gomaxprocs bench.txt\n```\n\n### Compare three versions\n\n```bash\nbenchstat v1=v1.txt v2=v2.txt v3=v3.txt\n```\n\nShows v2 vs v1 and v3 vs v1 (first input is always the base).\n\n### Cross-dimensional comparison\n\n```bash\n# Rows = benchmark name, columns = OS, separate tables per architecture\nbenchstat -row .name -col goos -table goarch results.txt\n```\n\n## Unit Metadata\n\n### `assume=exact`\n\nFor metrics that should not vary between runs (e.g., binary size, generated code size):\n\n```\nBenchmarkSize 1 42 custom-bytes/op\nUnit custom-bytes/op assume=exact\n```\n\nWith `assume=exact`:\n\n- Non-parametric statistics are disabled\n- benchstat warns if measured values vary\n- Shows comparisons even with a single before/after measurement (no `-count` needed)\n\n### `assume=nothing` (default)\n\nStandard behavior — uses non-parametric statistics (median + Mann-Whitney U-test). Requires multiple samples.\n\n## Interleaving Runs\n\nSequential runs (all old, then all new) are vulnerable to **systematic bias** — thermal throttling builds up over time, background processes come and go, CPU frequency scaling adapts. Interleaving reduces this:\n\n```bash\n# Pre-compile both versions to avoid measuring compilation time\ngo test -c -o old.test ./pkg/parser\n# ... make your change ...\ngo test -c -o new.test ./pkg/parser\n\n# Interleave runs — alternating reduces systematic bias\nfor i in $(seq 1 10); do\n ./old.test -test.bench=BenchmarkParse -test.benchmem >> old.txt\n ./new.test -test.bench=BenchmarkParse -test.benchmem >> new.txt\ndone\n\nbenchstat old.txt new.txt\n```\n\nPre-compiling with `go test -c` is critical — without it, each `go test -bench` invocation includes compilation time, which varies and contaminates results.\n\n## How Many Runs?\n\n| Scenario | Minimum `-count` | Why |\n| --- | --- | --- |\n| Quick local check | 6 | Enough for a rough confidence interval; fast feedback loop |\n| Pre-merge comparison | 10 | Standard for detecting moderate (>5%) changes with confidence |\n| Detecting small changes (\u003c5%) | 20-30 | More samples narrow the CI; needed when signal is small relative to noise |\n| Noisy CI environment | 20+ | Shared CI runners have higher variance; more runs compensate |\n\n**Never \"retry until significant\"** — rerunning benchmarks until `~` goes away introduces selection bias (p-hacking). If 10 runs show `~`, the change is probably not meaningful. Increase run count **once** and accept the result.\n\nAt α=0.05, expect ~5% of benchmarks to randomly report significance with no real change (false positives). This is normal — don't chase them.\n\n## Single-File Summary\n\nAnalyze variance of a single run without comparison:\n\n```bash\nbenchstat bench.txt\n```\n\nShows median and confidence interval for each benchmark. Use to:\n\n- Check measurement stability before making code changes\n- Identify noisy benchmarks that need more runs or better isolation\n- Get a quick summary of current performance\n\n## Common Pitfalls\n\n| Pitfall | Why it's wrong | Fix |\n| --- | --- | --- |\n| `-count=1` | Single run has no variance information; benchstat can't compute confidence | Always use `-count=6` minimum, prefer `-count=10` |\n| Running on a laptop on battery | CPU throttles to save power; variance explodes | Plug in, disable power saving, or use a desktop/server |\n| Running with browser/IDE open | Background processes steal CPU cycles; adds noise | Close unnecessary applications, or accept wider CIs |\n| Rerunning until `~` disappears | Selection bias (p-hacking) — you're cherry-picking runs that showed improvement | Run once with high `-count`, accept the result |\n| Comparing across machines | Different CPUs, memory, OS = incomparable baselines | Same machine, same conditions, both runs |\n| Not interleaving | Systematic bias from thermal throttling, background load drift | Pre-compile both versions with `go test -c`, alternate runs |\n| Measuring compilation time | `go test -bench` compiles first; startup overhead varies | Pre-compile with `go test -c`, run the binary directly |\n| Ignoring wide CI (± >5%) | Results look significant but variance is too high to be trustworthy | Fix the noise first, then compare; or increase `-count` |\n| Comparing different `-count` values | Unequal sample sizes bias the comparison | Use the same `-count` for all inputs |\n\n## benchstat in CI\n\nSee [CI Regression Detection](./ci-regression.md) for integrating benchstat comparisons into CI pipelines with benchdiff, cob, and gobenchdata.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":14540,"content_sha256":"0145a98c5a9be29078d3281e9ebcd670b6f402abbf4bf29facc995f2569342d9"},{"filename":"references/ci-regression.md","content":"# CI Benchmark Regression Detection\n\n> **Run these tools in CI only, not on local machines.** Local benchmark results are noisy due to background processes, thermal throttling, and inconsistent CPU frequency — regressions detected locally are unreliable and waste developer time. Even shared CI runners can produce significant variance (5-10%); use statistical methods like `benchstat` with multiple iterations and relative comparisons to filter noise, or invest in dedicated benchmark runners for critical paths.\n\n## benchdiff\n\nRuns Go benchmarks on two git refs and uses `benchstat` to display deltas. Caches results for non-worktree refs so re-runs are fast. Prevents macOS sleep during benchmarks.\n\n```bash\ngo install filippo.io/mostly-harmless/benchdiff@latest\n```\n\n```bash\n# Compare current worktree against HEAD (default)\nbenchdiff -- -benchmem\n\n# Compare two specific refs\nbenchdiff -base-ref main -head-ref feature-branch\n\n# Compare against a specific commit or tag\nbenchdiff -base-ref v1.2.0\n\n# Pass extra flags to go test — everything after -- goes to go test\nbenchdiff -- -benchmem -count=10 -benchtime=3s\n\n# Filter to specific benchmarks\nbenchdiff -- -benchmem -count=10 -bench=BenchmarkParse\n\n# Target a specific package\nbenchdiff -- -benchmem -count=10 ./pkg/parser/...\n\n# Clear cached results (useful after rebasing or when cache is stale)\nbenchdiff -clear-cache\n\n# Combine: compare main with 10 iterations, filtered to critical benchmarks\nbenchdiff -base-ref main -- -benchmem -count=10 -bench='BenchmarkParse|BenchmarkEncode'\n```\n\nBest for: quick PR-to-base comparisons in git-based workflows. Leverages `benchstat` for statistical rigor and caches non-worktree refs so re-runs only re-measure the worktree.\n\n## cob\n\nCompares benchmarks between HEAD and HEAD~1, failing the CI job if performance degrades beyond a configurable threshold (default 20%).\n\n```bash\ngo install github.com/knqyf263/cob@latest\n```\n\n```bash\n# Run with default 20% threshold — compares HEAD vs HEAD~1\ncob\n\n# Stricter threshold for critical paths (10% regression = failure)\ncob -threshold 10\n\n# Compare against a specific base commit\ncob -base main\n\n# Only report regressions (ignore improvements)\ncob -only-degression\n\n# Choose which metrics to compare (default: ns/op,B/op)\ncob -compare \"ns/op,B/op,allocs/op\"\n\n# Custom go test arguments\ncob -bench-args \"test -run '^

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision. Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions. Go Benchmarking & Performance Measurement Performance improvement does not exist without measures — if you can measure it, you can improve it. This skill covers the full measurement workflo…

-bench BenchmarkParse -benchmem ./pkg/parser/...\"\n\n# Increase benchmark duration for more stable results\ncob -bench-args \"test -run '^

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision. Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions. Go Benchmarking & Performance Measurement Performance improvement does not exist without measures — if you can measure it, you can improve it. This skill covers the full measurement workflo…

-bench . -benchmem -benchtime=3s ./...\"\n\n# Skip cob for a specific commit: include [skip cob] in commit message\n```\n\n**Caution:** `cob` uses `git reset` internally, which can cause data loss if uncommitted changes exist. Always commit your work before running. Additionally, `cob` requires all benchmarks to pass; it skips CI gating if any benchmark fails. For safety, run only in CI pipelines, not locally. Note that `cob` compares single runs without `benchstat`-style statistics, making it more susceptible to noise than `benchdiff`.\n\nBest for: simple post-commit regression gating in CI where statistical rigor is less critical than fast feedback.\n\n## gobenchdata\n\nGitHub Action + CLI that collects benchmark results, publishes to gh-pages as JSON, and visualizes with an interactive web dashboard. Shows performance trends over time.\n\n```bash\ngo install go.bobheadxi.dev/gobenchdata@latest\n```\n\n### CLI commands\n\n```bash\n# Parse go test -bench output to JSON\ngo test -bench=. -benchmem -count=5 ./... | gobenchdata --json bench.json\n\n# Parse from a file\ngobenchdata --json bench.json \u003c bench.txt\n\n# Add a tag to the benchmark run (e.g., git commit)\ngobenchdata --json bench.json --tag \"$(git rev-parse --short HEAD)\" \u003c bench.txt\n\n# Evaluate regression checks against a checks config\ngobenchdata checks eval bench.txt --checks-config .gobenchdata-checks.yml\n\n# Generate the web dashboard app (static Vue.js site)\ngobenchdata web generate ./dashboard-app\n\n# Serve the dashboard locally for preview\ngobenchdata web serve ./dashboard-app\n\n# Merge multiple benchmark JSON files\ngobenchdata merge old-bench.json new-bench.json > combined.json\n\n# Prune old entries (keep last 30 runs)\ngobenchdata prune --count 30 bench.json\n```\n\n### GitHub Action setup\n\n```yaml\n# .github/workflows/benchmark.yml\nname: Benchmark\non: [push]\njobs:\n benchmark:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v4\n - uses: actions/setup-go@v5\n with:\n go-version: stable\n - name: Run benchmarks\n run: go test -bench=. -benchmem -count=5 ./... | tee bench.txt\n - uses: bobheadxi/gobenchdata@v1\n with:\n PRUNE_COUNT: 30\n GO_TEST_PKGS: ./...\n BENCHMARKS_OUT: bench.txt\n PUBLISH: true\n PUBLISH_BRANCH: gh-pages\n env:\n GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n```\n\n### Regression gating on PRs\n\n```yaml\n- name: Check for regressions\n run: gobenchdata checks eval bench.txt --checks-config .gobenchdata-checks.yml\n```\n\n```yaml\n# .gobenchdata-checks.yml\nchecks:\n - name: \"No major regressions\"\n package: ./...\n benchmarks: [\".*\"]\n thresholds:\n - metric: NsPerOp\n max: 1.2 # fail if >20% slower\n - metric: AllocedBytesPerOp\n max: 1.3 # fail if >30% more allocations\n - name: \"Critical path stability\"\n package: ./pkg/parser\n benchmarks: [\"BenchmarkParse.*\"]\n thresholds:\n - metric: NsPerOp\n max: 1.1 # stricter: fail if >10% slower\n```\n\n### Dashboard configuration\n\n```yaml\n# gobenchdata-web.yml — configure the Vue.js dashboard\ntitle: \"My Project Benchmarks\"\ndescription: \"Performance tracking dashboard\"\nchartGroups:\n - name: Parser\n charts:\n - name: Parse Performance\n package: myapp/pkg/parser\n benchmarks: [\"BenchmarkParse.*\"]\n metrics: [NsPerOp, AllocedBytesPerOp, AllocsPerOp]\n - name: Encoding\n charts:\n - name: Encode/Decode\n package: myapp/pkg/encoding\n benchmarks: [\"Benchmark(Encode|Decode).*\"]\n metrics: [NsPerOp, MBPerS]\n```\n\nBest for: long-term trend tracking and visualization; complements benchdiff/cob for immediate gating.\n\n## Tool Selection Guide\n\n| Tool | Statistical rigor | Dashboard | Best for |\n| --- | --- | --- | --- |\n| **benchdiff** | High (uses benchstat) | No | Local dev + CI PR comparisons |\n| **cob** | Low (single comparison) | No | Quick CI gate, simple setup |\n| **gobenchdata** | Medium (configurable checks) | Yes (Vue.js on gh-pages) | Long-term trend tracking |\n| **benchstat** (raw) | High | No (CSV export) | Maximum control, custom workflows |\n\n## Noisy Neighbor Mitigation\n\nCloud CI environments share hardware with other jobs. Expect 5-10% variance even on quiet machines.\n\n### Why CI benchmarks are noisy\n\n- **Shared CPU/memory** — other CI jobs compete for resources\n- **Thermal throttling** — sustained load reduces clock speed\n- **Different hardware across runs** — CI runners may have different specs\n- **Kernel scheduling** — context switches add unpredictable latency\n- **Disk I/O contention** — shared storage affects I/O-bound benchmarks\n\n### Strategies\n\n**Statistical rigor** — run with `-count=10` or more and compare with `benchstat`. A single run is meaningless. benchstat's p-value test filters out noise-induced false positives.\n\n**Relative comparison in same job** — run both base and head benchmarks in the same CI job on the same machine, rather than comparing against historical absolute values. This cancels out machine-to-machine variation. Tools like `benchdiff` do this automatically by checking out both git refs.\n\n**Dedicated benchmark runners** — for critical path benchmarks, use self-hosted CI runners with no other workloads. This eliminates noisy neighbors entirely but costs more infrastructure.\n\n**Conservative thresholds** — set regression thresholds higher on shared CI (20%+) than on dedicated runners (10%). Tight thresholds on noisy environments produce false positives that erode trust. GitHub-hosted runners show ~2-3% coefficient of variation in the best case; to guarantee \u003c1% false positive rate, you need a 7%+ performance gate.\n\n**Never \"retry until pass\"** — rerunning benchmarks until they pass introduces selection bias. If a benchmark is flaky, fix the noise source (more iterations, dedicated runner, wider threshold) rather than retrying.\n\n## System Tuning for Self-Hosted Runners\n\n> **WARNING: These commands modify kernel and CPU settings. Apply them ONLY on dedicated CI runners, NEVER on developer machines or shared servers.**\n\nWhen you control the CI hardware, these settings dramatically reduce benchmark variance by eliminating the main sources of non-determinism.\n\n### Disable CPU frequency scaling\n\nVariable CPU frequency makes benchmark times meaningless — the same code runs at different speeds depending on load and thermals:\n\n```bash\n# Set all CPUs to \"performance\" governor (fixed maximum frequency)\necho performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor\n```\n\n### Disable Turbo Boost\n\nTurbo Boost temporarily increases clock speed but throttles under sustained load, creating variance between the start and end of a benchmark run:\n\n```bash\n# Intel\necho 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo\n\n# AMD\necho 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost\n```\n\n### Pin benchmarks to specific CPU cores\n\nPrevents the OS from migrating the benchmark process across cores, which causes cache thrashing (L1/L2 caches are per-core):\n\n```bash\n# Pin to cores 2 and 3 (leave cores 0-1 for OS and other processes)\ntaskset -c 2,3 go test -bench=. -count=10 ./...\n```\n\n### Disable SMT (Hyper-Threading)\n\nSMT shares execution units between logical cores on the same physical core, causing unpredictable contention:\n\n```bash\n# Disable SMT system-wide\necho off | sudo tee /sys/devices/system/cpu/smt/control\n\n# Or disable individual sibling cores (check /sys/devices/system/cpu/cpu*/topology/thread_siblings_list)\necho 0 | sudo tee /sys/devices/system/cpu/cpu1/online # if cpu0 and cpu1 are siblings\n```\n\n### Combined CI setup script\n\n```bash\n#!/bin/bash\n# benchmark-setup.sh — run on self-hosted CI runner before benchmarks\nset -euo pipefail\n\necho \"=== Configuring CPU for stable benchmarks ===\"\necho performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor\necho 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo 2>/dev/null || true\necho off | sudo tee /sys/devices/system/cpu/smt/control 2>/dev/null || true\n\necho \"=== Running benchmarks on isolated cores ===\"\ntaskset -c 2,3 go test -bench=. -benchmem -count=10 ./... | tee bench.txt\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":10659,"content_sha256":"3fec5ce7c0f3c4f17b3102702bae64534794343eef7bcb4577031b012e955a54"},{"filename":"references/compiler-analysis.md","content":"# Compiler Analysis Reference\n\nThe Go compiler provides diagnostic flags that reveal optimization decisions — escape analysis, inlining, SSA intermediate representation, and generated assembly. These are essential for understanding **why** a function allocates or **why** the compiler won't inline it.\n\nUse compiler diagnostics when pprof shows a hot function and you need to understand the compiler's decisions about that function. These tools are free (no runtime overhead) — they analyze at compile time.\n\n## Escape Analysis\n\nEscape analysis determines whether a variable can live on the stack (cheap — freed when the function returns) or must be allocated on the heap (expensive — requires GC). \"Moved to heap\" means the compiler decided the variable might outlive the function.\n\n### Commands\n\n```bash\n# Show escape decisions — one line per escaped variable\ngo build -gcflags=\"-m\" ./... 2>&1 | grep \"escapes to heap\"\ngo build -gcflags=\"-m\" ./... 2>&1 | grep \"moved to heap\"\n\n# Verbose mode — shows the reason for each escape decision\ngo build -gcflags=\"-m -m\" ./...\n\n# Filter to a specific package\ngo build -gcflags=\"-m\" ./pkg/parser 2>&1 | grep \"escapes\"\n\n# Filter to a specific file\ngo build -gcflags=\"-m\" ./pkg/parser/parse.go 2>&1\n\n# Apply to all dependencies too (usually too noisy, but useful for debugging)\ngo build -gcflags=\"all=-m\" ./...\n\n# Combine with grep for a specific function\ngo build -gcflags=\"-m\" ./pkg/parser 2>&1 | grep \"Parse\"\n\n# Combine with grep to see what stays on the stack (does NOT escape)\ngo build -gcflags=\"-m\" ./pkg/parser 2>&1 | grep \"does not escape\"\n```\n\n### Reading the output\n\n```\n./pkg/parser/parse.go:15:6: can inline Parse\n./pkg/parser/parse.go:42:13: &result escapes to heap\n./pkg/parser/parse.go:42:13: flow: ~r0 = &result:\n./pkg/parser/parse.go:42:13: from &result (address-of) at ./pkg/parser/parse.go:42:13\n./pkg/parser/parse.go:42:13: from return &result (return) at ./pkg/parser/parse.go:42:6\n```\n\nThe `-m -m` (verbose) output shows the **escape chain** — why the compiler decided the variable escapes. In this example: `result` has its address taken (`&result`), and that pointer is returned, so `result` must survive beyond the function — it escapes to heap.\n\n### Common escape causes\n\n| Cause | Example | Why it escapes |\n| --- | --- | --- |\n| **Returning a pointer to a local** | `return &result` | The local must outlive the function call — caller holds a reference |\n| **Interface boxing** | `var x any = myStruct` | Concrete type stored in `interface{}` allocates a copy on the heap |\n| **Closure capturing a local** | `go func() { use(localVar) }()` | The goroutine may run after the enclosing function returns |\n| **Slice append beyond capacity** | `s = append(s, item)` when len == cap | Triggers a new backing array allocation on the heap |\n| **Passing pointer to unanalyzable function** | `json.Marshal(&data)` | Compiler can't prove the pointer won't be retained across package boundary |\n| **Storing in a struct field that escapes** | `obj.Field = &local` | If `obj` is heap-allocated, anything it points to must also be on the heap |\n| **fmt.Sprintf and friends** | `fmt.Sprintf(\"%d\", n)` | Arguments are boxed into `any` (interface boxing) + result string is heap-allocated |\n| **Sending pointer on channel** | `ch \u003c- &data` | Channel receiver may be a different goroutine with a different lifetime |\n\n**Not all escapes are problems.** Only investigate escapes in functions that pprof identifies as allocation-heavy. A function called once at startup can escape freely.\n\n## Inlining Decisions\n\nInlining replaces a function call with the function body at the call site. This eliminates call overhead and enables further optimizations (escape analysis improves, dead code elimination, constant folding). Functions that aren't inlined in hot paths may benefit from simplification.\n\n### Commands\n\n```bash\n# Show which functions CAN be inlined\ngo build -gcflags=\"-m\" ./... 2>&1 | grep \"can inline\"\n\n# Show which functions CANNOT be inlined (with the reason)\ngo build -gcflags=\"-m\" ./... 2>&1 | grep \"cannot inline\"\n\n# Show inlining decisions for a specific package\ngo build -gcflags=\"-m\" ./pkg/handler 2>&1 | grep \"inline\"\n\n# Show where inlining was actually applied (function was inlined into caller)\ngo build -gcflags=\"-m\" ./... 2>&1 | grep \"inlining call to\"\n\n# Verbose mode — shows the cost budget and why inlining was blocked\ngo build -gcflags=\"-m -m\" ./... 2>&1 | grep \"inline\"\n\n# Filter to a specific function\ngo build -gcflags=\"-m\" ./pkg/handler 2>&1 | grep \"HandleRequest\"\n\n# Show both inlining and escape analysis together (they interact)\ngo build -gcflags=\"-m\" ./pkg/handler 2>&1 | grep -E \"(inline|escape|moved to heap)\"\n```\n\n### Reading the output\n\n```\n./pkg/handler/handler.go:20:6: can inline validateInput\n./pkg/handler/handler.go:35:6: cannot inline HandleRequest: function too complex: cost 120 exceeds budget 80\n./pkg/handler/handler.go:42:19: inlining call to validateInput\n```\n\nThe inline cost budget is approximately 80–82 AST nodes (as of Go 1.22+; has increased in later releases). Functions with higher cost (more AST nodes, complex control flow) are not inlined. Check the actual threshold with `-gcflags=\"-m -m\"`.\n\n### Common inlining blockers\n\n| Blocker | Why it prevents inlining | Mitigation |\n| --- | --- | --- |\n| **Function too complex** | Body cost exceeds budget (80) | Split into smaller functions; extract the cold path |\n| **`defer` statement** | Adds cleanup code that complicates inlining | Remove `defer` from tiny hot functions; call cleanup directly |\n| **`recover()` call** | Forces stack frame preservation | Move `recover()` to a wrapper function |\n| **`go` statement** | Goroutine launch has implicit complexity | Extract goroutine body into a separate function |\n| **Type switch / interface method call** | Dynamic dispatch can't be resolved at compile time | Use concrete types in hot paths |\n| **`select` statement** | Complex runtime interaction | Simplify channel patterns in hot functions |\n| **Large function body** | Many statements add up in cost | Break into smaller functions — the hot inner function may inline |\n\n**Value receivers vs pointer receivers:** Receiver choice can affect copying, aliasing, escape analysis, and inlining, but pointer receiver methods can inline too and value receivers do not guarantee inlining. Check real compiler decisions with `-gcflags=\"-m -m\"`.\n\n## SSA Dump\n\nThe SSA (Static Single Assignment) dump shows the compiler's intermediate representation after each optimization pass — dead code elimination, bounds check removal, constant folding, register allocation. Use this when you need to understand exactly what the compiler generates.\n\n### Commands\n\n```bash\n# Generate SSA dump for a specific function — creates ssa.html in current directory\nGOSSAFUNC=Parse go build ./pkg/parser\n# Open ssa.html in browser — shows each optimization pass side by side\n\n# Generate for a method on a type\nGOSSAFUNC='(*Parser).Parse' go build ./pkg/parser\n\n# Generate for a function in a specific package (when names collide)\nGOSSAFUNC=myapp/pkg/parser.Parse go build ./...\n\n# Combine with a specific output directory\nGOSSAFUNC=Parse GOSSADIR=/tmp/ssa go build ./pkg/parser\n# Creates /tmp/ssa/ssa.html\n```\n\n### Reading ssa.html\n\nThe HTML file shows the function's code at each compiler pass:\n\n1. **Source** — original Go code\n2. **AST** — abstract syntax tree\n3. **Start** — initial SSA form\n4. **Opt** — after optimization passes (dead code, constant prop, bounds check elimination)\n5. **Lower** — architecture-specific lowering\n6. **Regalloc** — after register allocation\n7. **Genssa** — final generated code\n\nClick on a value in any pass to highlight it across all passes — see how the compiler transforms it. Red values were eliminated (dead code). Green values are new (introduced by a pass).\n\n**What to look for:**\n\n- **Bounds checks remaining** — `IsInBounds` or `IsSliceInBounds` operations that weren't eliminated. Adding explicit bounds checks or using `_ = s[n-1]` hints can help\n- **Dead code not eliminated** — values computed but never used (should be eliminated; if not, check for side effects)\n- **Constant folding** — computations on constants should be resolved at compile time\n- **Register spills** — values moved to stack because not enough registers; indicates heavy register pressure\n\n## Assembly Output\n\nView the actual machine code the compiler generates. Use for verifying SIMD instructions, bounds checks, register allocation, and micro-optimization decisions.\n\n### Commands\n\n```bash\n# Full assembly output for a package (very verbose)\ngo build -gcflags=\"-S\" ./pkg/parser 2>&1 | head -200\n\n# Assembly for a specific function (grep for the function name)\ngo build -gcflags=\"-S\" ./pkg/parser 2>&1 | grep -A 50 '\"\".Parse'\n\n# Assembly for all packages (including dependencies — very verbose)\ngo build -gcflags=\"all=-S\" ./... 2>&1 | grep -A 50 'myapp/pkg/parser.Parse'\n\n# Disassemble a compiled binary (alternative to -gcflags=\"-S\")\ngo build -o myapp ./cmd/server\ngo tool objdump -s Parse myapp\n\n# Disassemble with source interleaving\ngo tool objdump -S -s Parse myapp\n\n# Disassemble a specific symbol\ngo tool objdump -s 'myapp/pkg/parser.Parse' myapp\n\n# Disassemble a specific text range (by address)\ngo tool objdump -start 0x4a3b00 -end 0x4a3c00 myapp\n\n# List all symbols in a binary\ngo tool nm myapp | grep Parse\n\n# Cross-compile and inspect assembly for a different architecture\nGOARCH=arm64 go build -gcflags=\"-S\" ./pkg/parser 2>&1 | head -200\n```\n\n### Reading assembly output\n\n```asm\n\"\".Parse STEXT size=240 args=0x18 locals=0x48\n 0x0000 MOVQ (TLS), CX ; goroutine stack check\n 0x0009 LEAQ -64(SP), AX\n 0x000e CMPQ AX, 16(CX) ; stack overflow check\n 0x0012 JLS 228 ; jump to stack growth\n 0x0018 SUBQ $72, SP ; allocate stack frame\n 0x001c MOVQ BP, 64(SP) ; save base pointer\n 0x0021 LEAQ 64(SP), BP ; set new base pointer\n ; ... function body ...\n 0x00e0 CALL runtime.makeslice(SB) ; heap allocation!\n```\n\n**What to look for:**\n\n- `CALL runtime.makeslice` or `CALL runtime.newobject` — heap allocations in the hot path\n- `CALL runtime.growslice` — slice capacity exceeded, triggering copy\n- `PCDATA` / `FUNCDATA` — GC metadata (ignore for performance analysis)\n- Bounds check sequences: `CMPQ` + `JCC` before array/slice access — can sometimes be eliminated\n- SIMD instructions: `VMOVDQU`, `VPSHUFB`, `VPADDB`, etc. — verify auto-vectorization or manual SIMD\n- `CALL runtime.morestack_noctxt` — stack growth (normal, but frequent calls indicate deep recursion)\n\n### Comparing assembly before/after optimization\n\n```bash\n# Before your change\ngo build -gcflags=\"-S\" ./pkg/parser 2>&1 > asm-before.txt\n\n# After your change\ngo build -gcflags=\"-S\" ./pkg/parser 2>&1 > asm-after.txt\n\n# Diff the assembly\ndiff asm-before.txt asm-after.txt\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":10972,"content_sha256":"2cb7bde63cb35971188b26346c8b4a0031b14d6b213c06f8caef38bf238938ad"},{"filename":"references/investigation-session.md","content":"# Investigation Session Setup\n\nTools and techniques for **temporary deep-dive performance investigation** — not everyday monitoring. These are things you enable for hours or days while debugging a specific issue, then disable.\n\n## Setting Up a Session\n\nBefore diving into profiles, set up the environment to collect high-resolution data:\n\n1. **Reduce Prometheus scrape interval** to \u003c=10s on the target instance (normally 15-30s). More data points during a short investigation window reveal patterns that 30s intervals miss. Revert after investigation.\n\n2. **Enable pprof** via environment variable — no recompile needed:\n\n ```bash\n kubectl set env deployment/my-service PPROF_ENABLED=true\n kubectl rollout restart deployment/my-service\n ```\n\n3. **Enable continuous profiling** on the target instance only — not fleet-wide. Pyroscope/Parca on a single instance is manageable; on 50 replicas it overwhelms the backend.\n\n ```bash\n kubectl set env deployment/my-service PYROSCOPE_ENABLED=true\n kubectl rollout restart deployment/my-service\n ```\n\n4. **Enable debug logging** via env var if needed — but only on the target instance. Debug logging has significant throughput impact:\n\n ```bash\n kubectl set env deployment/my-service LOG_LEVEL=debug\n kubectl rollout restart deployment/my-service\n ```\n\n**Key principle:** all costly debug features (pprof HTTP, continuous profiling, debug log level, trace collection) SHOULD be configurable via environment variables. This allows instant toggle without recompile. Design your application to support this from day one.\n\n## Prometheus Go Runtime Collector\n\nThe `prometheus/client_golang` library automatically registers collectors that expose Go runtime metrics. These are invaluable during investigation sessions — they provide a time-series view of memory, GC, goroutines, and CPU that complements point-in-time profiles.\n\nWhen using `prometheus/client_golang`, refer to the library's official documentation to verify collector setup and available options.\n\n### Key Series\n\n→ See [prometheus-go-metrics.md](./prometheus-go-metrics.md) for the **exhaustive reference** of all Go runtime metrics (verified from official sources). **Note:** runtime/metrics list varies by Go version — use `metrics.All()` at runtime for your specific Go version.\n\n**Performance note:** `go_memstats_*` metrics internally call `runtime.ReadMemStats()`, which triggers a short stop-the-world pause. In Go 1.17+, the runtime/metrics collector (`collectors.NewGoCollector()`) uses `runtime/metrics` instead, which is cheaper. Prefer the modern collector in high-throughput services:\n\n```go\nimport \"github.com/prometheus/client_golang/prometheus/collectors\"\n\n// Use runtime/metrics-based collector (lower overhead)\nreg := prometheus.NewRegistry()\nreg.MustRegister(collectors.NewGoCollector(\n collectors.WithGoCollectorRuntimeMetrics(collectors.MetricsAll),\n))\nreg.MustRegister(collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}))\n```\n\n## PromQL Deep-Dive Queries\n\nUse these during investigation sessions with the reduced scrape interval. Each query includes what to look for and what the result means.\n\n### GC pressure\n\n| PromQL | What to look for |\n| --- | --- |\n| `rate(go_gc_duration_seconds_count[5m])` | GC cycles/s. >2/s sustained = excessive allocation rate. Reduce allocations per request. |\n| `rate(go_gc_duration_seconds_sum[5m]) / rate(go_gc_duration_seconds_count[5m])` | Average GC pause. Increasing trend = heap growing or too many pointers to scan. |\n| `go_gc_duration_seconds{quantile=\"1\"}` | Worst-case GC pause. Spikes here cause tail latency (P99). |\n\n### Memory leak detection\n\n| PromQL | What to look for |\n| --- | --- |\n| `go_memstats_alloc_bytes` | Should be roughly stable under constant load. Continuous increase = memory leak. |\n| `rate(go_memstats_alloc_bytes_total[5m])` | Allocation rate (bytes/s). Compare before/after deploy — significant increase = new allocation pattern. |\n| `process_resident_memory_bytes - go_memstats_sys_bytes` | Gap = non-Go memory (cgo, mmap). Growing gap = non-Go leak. |\n\n### Goroutine leak detection\n\n| PromQL | What to look for |\n| --- | --- |\n| `go_goroutines` | Should correlate with load. Growing independently of traffic = leak. |\n| `delta(go_goroutines[1h])` | Net goroutine change over 1h. Positive without load increase = leak. |\n\n### CPU saturation\n\n| PromQL | What to look for |\n| --- | --- |\n| `rate(process_cpu_seconds_total[5m])` | CPU cores consumed. Compare to GOMAXPROCS. |\n| `rate(process_cpu_seconds_total[5m]) / \u003cGOMAXPROCS>` | CPU utilization ratio. >0.8 sustained = CPU-saturated. |\n\n### Post-deploy regression detection\n\n| PromQL | What to look for |\n| --- | --- |\n| `rate(go_memstats_alloc_bytes_total[5m])` | Compare before/after deploy window. Significant increase = new allocation pattern introduced. |\n| `histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))` | P99 latency increase after deploy = performance regression. Requires app-level histogram. |\n\n### Example alerting rules\n\n```yaml\n# GC taking too much time\n- alert: HighGCPauseTime\n expr: rate(go_gc_duration_seconds_sum[5m]) / rate(go_gc_duration_seconds_count[5m]) > 0.01\n for: 10m\n annotations:\n summary: \"Average GC pause >10ms — reduce allocations or tune GOGC\"\n\n# Goroutine leak\n- alert: GoroutineLeak\n expr: go_goroutines > 10000\n for: 5m\n annotations:\n summary: \"Goroutine count >10K — check for leaked goroutines\"\n\n# Memory approaching container limit\n- alert: MemoryNearLimit\n expr: predict_linear(process_resident_memory_bytes[1h], 3600) > \u003ccontainer_limit_bytes>\n for: 15m\n annotations:\n summary: \"RSS projected to exceed container limit within 1h\"\n```\n\nAdjust thresholds to your application — a data pipeline has different baselines than an API server.\n\n## Host-Level Correlation\n\nGo runtime metrics alone don't show the full picture. Host-level metrics reveal whether the problem is in your application or the infrastructure.\n\n- **`node_exporter`** — host CPU, memory, disk I/O, network. Correlate with Go app metrics: high `node_cpu_seconds_total` with low `process_cpu_seconds_total` = noisy neighbor, not your app.\n- **`process-exporter`** — per-process metrics on Linux. Useful when multiple Go services share a host.\n\n## Cost Warnings\n\n**Profiles and traces are expensive to collect.** Keep them short-term and localized:\n\n- **pprof CPU profiling** — CPU-intensive during the capture window. Don't run 30s profiles back-to-back in production. Space them out.\n- **Pyroscope continuous profiling** — ~2-5% CPU overhead **per instance, always-on**. At scale (hundreds of instances), this adds up in compute cost and backend storage. Enable on a subset of instances or on-demand via environment variable. → See `samber/cc-skills-golang@golang-observability` skill for Pyroscope setup.\n- **Execution traces** — generate large files quickly (MB/s). Capture 5-10s max. Longer traces are unwieldy and slow to analyze.\n- **Debug log level** — significant throughput impact due to allocation and I/O overhead. Never leave on permanently.\n- **All costly features** SHOULD be toggleable via environment variables for instant on/off without recompile. Design for this from day one.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":7265,"content_sha256":"445b322f40beaab76394034fc9e630f35085cb510e58d43a0e166a4833e60e36"},{"filename":"references/pprof.md","content":"# pprof Reference\n\n`go tool pprof` is the primary tool for understanding where CPU time, memory, and contention go in Go programs. This file covers how to **use** the CLI and **interpret** the output. For enabling pprof endpoints on running services (net/http/pprof import, authentication, security), → See `samber/cc-skills-golang@golang-troubleshooting` skill.\n\n## Profile Types\n\nEach profile type answers a different performance question. Choosing the wrong profile type wastes investigation time — match the symptom to the profile before capturing.\n\n| Profile | Flag / Endpoint | Use when | Why this profile and not another |\n| --- | --- | --- | --- |\n| **CPU** | `-cpuprofile` or `/debug/pprof/profile?seconds=30` | High CPU usage, slow functions | Samples which functions are on-CPU at 100Hz; misses off-CPU time (I/O, sleep) |\n| **Heap (alloc_objects)** | `-memprofile` then `pprof -alloc_objects` | GC pressure, too many allocations | Counts allocation events regardless of size; useful when allocation frequency and object churn dominate |\n| **Heap (alloc_space)** | `pprof -alloc_space` | Finding largest allocation sites by volume | Measures total bytes allocated; use when you need to reduce peak memory, not just GC frequency |\n| **Heap (inuse_space)** | `pprof -inuse_space` | Memory growing over time, suspected leaks | Shows currently live heap objects; compare two snapshots to isolate leak sources |\n| **Heap (inuse_objects)** | `pprof -inuse_objects` | Object count growth, suspected leak of small objects | Counts live objects regardless of size; useful when leak is many small objects not visible in inuse_space |\n| **Goroutine** | `/debug/pprof/goroutine` | Blocked I/O, goroutine leaks, pool exhaustion | Snapshots all goroutine stacks; look for goroutines piling up on the same call site |\n| **Mutex** | `/debug/pprof/mutex` | Lock contention between goroutines | Measures cumulative time goroutines waited to acquire mutexes. Must enable first: `runtime.SetMutexProfileFraction(5)` |\n| **Block** | `/debug/pprof/block` | Goroutines blocked on channels, mutexes, timers, select | Measures cumulative time goroutines spent blocked on synchronization primitives. Must enable first: `runtime.SetBlockProfileRate(1)` |\n| **Threadcreate** | `/debug/pprof/threadcreate` | Excessive OS thread creation | Shows stack traces that created new OS threads; typically from cgo calls or blocking syscalls that pin a thread |\n\n### Choosing between alloc_objects and alloc_space\n\n- **alloc_objects** — \"where do I allocate the most often?\" — use when allocation frequency and object churn are driving GC work\n- **alloc_space** — \"where do I allocate the most bytes?\" — use for reducing peak memory usage and RSS\n- In practice, start with `alloc_objects` because GC churn is the most common allocation-related bottleneck in Go.\n\n### Choosing between inuse_space and alloc_space\n\n- **alloc_space** is cumulative since program start — it includes objects already freed by GC\n- **inuse_space** is a point-in-time snapshot — only currently live objects\n- Use `alloc_space` to find allocation hot spots for optimization. Use `inuse_space` to debug memory leaks.\n\n### Enabling mutex and block profiles\n\nThese profiles are disabled by default because they add overhead. Enable them before capturing:\n\n```go\nimport \"runtime\"\n\n// Mutex profiling: fraction of mutex contention events recorded.\n// 5 means 1 out of 5 events is recorded. Higher = less overhead but less detail.\nruntime.SetMutexProfileFraction(5)\n\n// Block profiling: time-based sampling rate.\n// 1 = record all blocking events. Higher values sample about one event per rate nanoseconds blocked.\n// Use 1 for debugging, higher values (e.g. 1000000 = 1ms) for production.\nruntime.SetBlockProfileRate(1)\n```\n\nDisable after investigation to eliminate overhead:\n\n```go\nruntime.SetMutexProfileFraction(0)\nruntime.SetBlockProfileRate(0)\n```\n\n## Generating Profiles\n\n### From benchmarks (no HTTP server needed)\n\n```bash\n# CPU profile — measures where compute time goes during benchmark execution\ngo test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser\n\n# Memory profile — captures allocation patterns during benchmark\ngo test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser\n\n# Both at once — but be aware CPU profiling adds ~5% overhead which can skew memory results\ngo test -bench=BenchmarkParse -cpuprofile=cpu.prof -memprofile=mem.prof ./pkg/parser\n```\n\n### From running service\n\nRequires `import _ \"net/http/pprof\"` (see `samber/cc-skills-golang@golang-troubleshooting` skill for secure setup):\n\n```bash\n# CPU profile — captures 30 seconds of CPU samples\ngo tool pprof http://localhost:6060/debug/pprof/profile?seconds=30\n\n# Heap profile — snapshots current heap state\ngo tool pprof -alloc_objects http://localhost:6060/debug/pprof/heap\n\n# Goroutine profile — snapshots all goroutine stacks\ngo tool pprof http://localhost:6060/debug/pprof/goroutine\n\n# Mutex profile — contention data since last reset\ngo tool pprof http://localhost:6060/debug/pprof/mutex\n\n# Block profile — blocking data since last reset\ngo tool pprof http://localhost:6060/debug/pprof/block\n```\n\n### From code (programmatic)\n\n```go\nimport \"runtime/pprof\"\n\n// CPU profile\nf, _ := os.Create(\"cpu.prof\")\npprof.StartCPUProfile(f)\ndefer pprof.StopCPUProfile()\n\n// Heap snapshot at a specific point\nf, _ := os.Create(\"heap.prof\")\npprof.WriteHeapProfile(f)\nf.Close()\n\n// Named profile (goroutine, threadcreate, etc.)\npprof.Lookup(\"goroutine\").WriteTo(f, 0)\n```\n\n## Interactive CLI Commands\n\nOpen a profile in interactive mode:\n\n```bash\ngo tool pprof cpu.prof\n# or from a URL:\ngo tool pprof http://localhost:6060/debug/pprof/profile?seconds=30\n```\n\n### `top` — self time ranking (start here)\n\nThe first command to run. Shows functions ranked by the time (or allocations) spent in the function itself:\n\n```\n(pprof) top\nShowing nodes accounting for 4.2s, 84% of 5s total\n flat flat% sum% cum cum%\n 1.50s 30.00% 30.00% 2.80s 56.00% encoding/json.Marshal\n 0.80s 16.00% 46.00% 0.80s 16.00% runtime.mallocgc\n 0.60s 12.00% 58.00% 0.60s 12.00% runtime.memmove\n 0.50s 10.00% 68.00% 0.50s 10.00% runtime.scanobject\n 0.40s 8.00% 76.00% 1.90s 38.00% myapp/pkg/parser.Parse\n 0.30s 6.00% 82.00% 0.30s 6.00% syscall.syscall\n 0.10s 2.00% 84.00% 0.10s 2.00% runtime.futex\n```\n\n| Column | Meaning | How to read it |\n| --- | --- | --- |\n| **flat** | Time spent in the function itself, excluding callees | High flat = the function's own code is expensive |\n| **flat%** | flat as percentage of total sample time | Quick way to see relative cost |\n| **sum%** | Running total of flat% going down the list | \"The top 3 functions account for 58% of total time\" |\n| **cum** | Time in function + all functions it calls (cumulative) | High cum with low flat = the function delegates to expensive callees |\n| **cum%** | cum as percentage of total | Compare with flat% — big gap means the cost is in callees |\n\n**Limiting output:**\n\n```\n(pprof) top 5 # show only top 5 functions\n(pprof) top -cum 10 # top 10 by cumulative time\n(pprof) top -flat 20 # top 20 by flat time (default sort)\n```\n\n### `top -cum` — cumulative time ranking\n\nCritical when `top` shows runtime functions (`runtime.mallocgc`, `runtime.memmove`, `runtime.scanobject`) dominating. These are symptoms, not causes. `top -cum` reveals which **application** functions trigger them:\n\n```\n(pprof) top -cum\n flat flat% sum% cum cum%\n 0.40s 8.00% 8.00% 3.80s 76.00% myapp/pkg/handler.HandleRequest\n 0.10s 2.00% 10.00% 2.80s 56.00% myapp/pkg/handler.serializeResponse\n 1.50s 30.00% 40.00% 2.80s 56.00% encoding/json.Marshal\n```\n\nNow you can see that `HandleRequest` → `serializeResponse` → `json.Marshal` is the hot path. The optimization target is `serializeResponse`, not `runtime.mallocgc`.\n\n### `list funcName` — annotated source\n\nShows the source code of a function with per-line cost annotations. This is how you pinpoint the **exact line** causing the bottleneck:\n\n```\n(pprof) list serializeResponse\nTotal: 5s\nROUTINE ======================== myapp/pkg/handler.serializeResponse\n 0.10s 2.80s (flat, cum) 56.00% of Total\n . . 38:func serializeResponse(w http.ResponseWriter, data any) {\n . 0.20s 39: w.Header().Set(\"Content-Type\", \"application/json\")\n 0.10s 2.60s 40: buf, err := json.Marshal(data)\n . . 41: if err != nil {\n . . 42: http.Error(w, err.Error(), 500)\n . . 43: return\n . . 44: }\n . 0.20s 45: w.Write(buf)\n . . 46:}\n```\n\n- Left column = **flat** time (work done by this line itself)\n- Right column = **cumulative** time (this line + everything it calls)\n- Line 40 accounts for 2.60s cumulative because `json.Marshal` is expensive\n\n**Use `list` with a regex** to find all matching functions:\n\n```\n(pprof) list Parse.* # all functions starting with Parse\n(pprof) list \\.Handle # all Handle methods across packages\n```\n\n### `peek funcName` — callers and callees\n\nShows who calls a function and what it calls — the one-hop neighborhood in the call graph. Use to trace the responsibility chain when a function appears hot but you're unsure whether the problem is upstream (too many calls) or downstream (expensive callees):\n\n```\n(pprof) peek json.Marshal\nShowing nodes accounting for 5s, 100% of 5s total\n----------------------------------------------+-------------\n | flat flat% sum% cum cum%\n myapp/pkg/handler.serializeResponse 2.60s |\n myapp/pkg/api.buildResponse 0.20s | 1.50s 30.00% 30.00% 2.80s 56.00% encoding/json.Marshal\n----------------------------------------------+-------------\n |\n reflect.Value.MapRange 0.40s |\n encoding/json.(*encodeState).marshal 0.30s |\n runtime.mallocgc 0.80s |\n```\n\nTop section = callers (who calls json.Marshal). Bottom section = callees (what json.Marshal calls internally).\n\n### `tree` — hierarchical call tree\n\nDisplays the full call tree with cumulative costs at each level. Useful when you need more context than `peek` provides:\n\n```\n(pprof) tree\n 0.40s 8.00% 8.00% 3.80s 76.00% myapp/pkg/handler.HandleRequest\n 0.10s myapp/pkg/handler.serializeResponse\n 1.50s encoding/json.Marshal\n 0.80s runtime.mallocgc\n 0.20s myapp/pkg/handler.validateInput\n 0.10s myapp/pkg/handler.fetchData\n```\n\n### `traces` — raw stack traces\n\nDumps all raw sample stack traces. Each stack trace shows what the program was doing at the moment it was sampled:\n\n```\n(pprof) traces\n-----------+-------------------------------------------------------\n bytes: 1.5MB\n 1.50s encoding/json.Marshal\n myapp/pkg/handler.serializeResponse\n myapp/pkg/handler.HandleRequest\n net/http.(*ServeMux).ServeHTTP\n-----------+-------------------------------------------------------\n```\n\nUseful for spotting unexpected call paths (e.g., a function you didn't expect being called from a hot path).\n\n### `web` / `svg` — graphical call graph\n\n`web` opens a call graph in the browser. `svg` saves it to a file. Both require graphviz installed (`brew install graphviz` or `apt install graphviz`).\n\nVisual encoding:\n\n- **Thicker edges** = more time flows through that call\n- **Larger nodes** = more time spent in that function\n- **Red/dark nodes** = hot spots (high flat time)\n- **Edge labels** = time flowing through that call path\n\nUse when the text commands don't reveal the full picture — the visual layout often reveals call patterns that are hard to see in text.\n\n### `disasm funcName` — assembly-level\n\nShows generated assembly with per-instruction cost. Use for micro-optimization: verifying SIMD instructions, bounds check elimination, or inlining at the instruction level:\n\n```\n(pprof) disasm Parse\nTotal: 5s\nROUTINE ======================== myapp/pkg/parser.Parse\n 0.40s 1.90s (flat, cum) 38.00% of Total\n 0.10s 0.10s 4a3b20: MOVQ 0x8(SP), AX ;parser.go:15\n 0.20s 0.20s 4a3b28: CMPQ AX, $0x100 ;parser.go:16\n . 0.10s 4a3b2f: JGE 0x4a3b80 ;parser.go:16\n 0.10s 1.50s 4a3b35: CALL runtime.makeslice(SB) ;parser.go:17\n```\n\n### `weblist funcName` — annotated source in browser\n\nLike `list` but opens the annotated source in a browser with color-coded cost highlighting. Each line is shaded from white (no cost) to red (hot). More visually immediate than the text version:\n\n```\n(pprof) weblist serializeResponse\n```\n\nRequires a browser. Falls back to `list` if no browser is available.\n\n### `tags` — profile label breakdown\n\nShows tag values present in the profile. Go runtime profiles carry tags like `thread_id`; custom profiles can add arbitrary labels via `pprof.Do()`:\n\n```go\nlabels := pprof.Labels(\"request_type\", \"api\", \"endpoint\", \"/users\")\npprof.Do(ctx, labels, func(ctx context.Context) {\n handleRequest(ctx)\n})\n```\n\n```\n(pprof) tags\nrequest_type: api (85%), batch (15%)\nendpoint: /users (40%), /orders (35%), /products (25%)\n```\n\n### `tagroot` and `tagleaf` — group by labels\n\nGroup the profile data by tag values, creating a virtual call tree rooted on tag names:\n\n```\n(pprof) tagroot request_type # group everything by request_type first\n(pprof) top # now shows breakdown per request_type\n(pprof) tagleaf endpoint # add endpoint as leaf grouping\n```\n\nUseful for multi-tenant profiling or breaking down by request type without code changes.\n\n### `granularity` — control grouping level\n\nChanges how samples are aggregated:\n\n```\n(pprof) granularity=functions # default — group by function name\n(pprof) granularity=filefunctions # group by file:function\n(pprof) granularity=files # group by file only\n(pprof) granularity=lines # group by exact source line\n(pprof) granularity=addresses # group by instruction address (most granular)\n```\n\n`lines` is especially useful when a single function has multiple hot spots — it reveals which specific lines are expensive without needing `list`.\n\n### `sort` — change sort order\n\n```\n(pprof) sort=flat # sort by flat time (default for top)\n(pprof) sort=cum # sort by cumulative time (same as top -cum)\n```\n\n### `source` — show source for matching regex\n\nSimilar to `list` but searches all functions matching a pattern and shows their annotated source:\n\n```\n(pprof) source handler # show annotated source for all functions matching \"handler\"\n```\n\n### `focus`, `ignore`, `hide`, `show` — filtering\n\nNarrow the analysis to specific functions or exclude noise. These are stateful — they persist across commands until explicitly cleared:\n\n```\n(pprof) focus=myapp # only show call paths that pass through \"myapp\"\n(pprof) ignore=runtime # remove runtime functions from display\n(pprof) hide=testing # hide testing framework noise from graphs\n(pprof) show=handler # only show functions matching \"handler\"\n(pprof) tagfocus=endpoint=/users # only show samples with this tag value\n(pprof) tagignore=request_type=batch # exclude samples with this tag value\n```\n\n**Difference between `focus`, `show`, `hide`, and `ignore`:**\n\n- `focus` — keeps only paths that contain a matching function; everything else is dropped\n- `ignore` — removes matching functions from the graph entirely; their costs are attributed to callers\n- `show` — like `focus` but only affects display, not cost accounting\n- `hide` — like `ignore` but only hides from display, not cost accounting\n\n**Clear all filters:**\n\n```\n(pprof) reset\n```\n\n### `normalize` — normalize against a base profile\n\nWhen comparing two profiles with `-base`, values are deltas by default. `normalize` scales the base profile to match the total of the main profile, making ratios comparable even if run durations differ:\n\n```\n(pprof) normalize\n```\n\n### `sample_index` — switch metric in multi-metric profiles\n\nHeap profiles contain multiple metrics (alloc_objects, alloc_space, inuse_objects, inuse_space). Switch between them without reloading:\n\n```\n(pprof) sample_index=alloc_objects\n(pprof) top # now shows allocation counts\n(pprof) sample_index=inuse_space\n(pprof) top # now shows live memory\n```\n\n### `unit` — change display units\n\n```\n(pprof) unit=ms # display time in milliseconds\n(pprof) unit=seconds # display in seconds\n(pprof) unit=MB # display memory in megabytes\n(pprof) unit=auto # automatic (default)\n```\n\n### `callgrind` — export for KCachegrind\n\nExports the profile in callgrind format, which can be opened in KCachegrind or QCachegrind for advanced visualization:\n\n```\n(pprof) callgrind\nGenerating report in callgrind format\n```\n\n### `proto` — save processed profile\n\nSave the current profile (after filtering) in protobuf format for sharing or later analysis:\n\n```\n(pprof) proto > filtered.pb.gz\n```\n\n### `help` — list all commands\n\n```\n(pprof) help # full command list with descriptions\n(pprof) help top # detailed help for a specific command\n```\n\n### `show_from=regex` — trim callers above match\n\nHides all frames above the first matching function. Useful when you're only interested in a specific subsystem and want to remove framework/routing noise above it:\n\n```\n(pprof) show_from=handler.Handle # start the graph from Handle, hide all callers above\n```\n\n### `noinlines` — flatten inlined functions\n\nAttributes inlined functions to their first out-of-line caller. Useful when inlined functions create confusing call chains in the graph:\n\n```\n(pprof) noinlines\n```\n\n### Full command reference\n\nEvery command below works both as a standalone shell command and inside the interactive `(pprof)` prompt. The interactive form omits `go tool pprof` and the profile path — e.g., `go tool pprof -top cpu.prof` becomes just `top` inside the prompt.\n\n**Reporting commands:**\n\n```bash\n# Top functions by self (flat) cost — the first command to run\ngo tool pprof -top cpu.prof\n\n# Top 20 functions by cumulative cost (self + callees)\ngo tool pprof -cum -top -nodecount=20 cpu.prof\n\n# Annotated source for a specific function — pinpoints the exact expensive line\ngo tool pprof -list=json.Marshal cpu.prof\n\n# Callers and callees of a function — trace the responsibility chain\ngo tool pprof -peek=serializeResponse cpu.prof\n\n# Hierarchical call tree with costs at each level\ngo tool pprof -tree cpu.prof\n\n# Raw sample stack traces — spot unexpected call paths\ngo tool pprof -traces cpu.prof\n\n# Per-instruction assembly cost — verify SIMD, bounds checks, inlining\ngo tool pprof -disasm=Parse cpu.prof\n\n# Annotated source for all functions matching a regex\ngo tool pprof -source='handler\\..*' cpu.prof\n\n# Text output (flat table, alternative to -top)\ngo tool pprof -text cpu.prof\n```\n\n**Graph/export commands:**\n\n```bash\n# SVG call graph (viewable in any browser, no graphviz server needed)\ngo tool pprof -svg cpu.prof > cpu.svg\n\n# SVG of only the subgraph matching a regex\ngo tool pprof -svg -focus=handler cpu.prof > handler.svg\n\n# PDF call graph\ngo tool pprof -pdf cpu.prof > cpu.pdf\n\n# PNG call graph\ngo tool pprof -png cpu.prof > cpu.png\n\n# GIF call graph\ngo tool pprof -gif cpu.prof > cpu.gif\n\n# DOT format (for custom graphviz processing: dot -Tsvg cpu.dot > cpu.svg)\ngo tool pprof -dot cpu.prof > cpu.dot\n\n# Callgrind format (open with KCachegrind / QCachegrind)\ngo tool pprof -callgrind cpu.prof > cpu.callgrind\n\n# Save current profile (with filters applied) in protobuf format\ngo tool pprof -proto -focus=handler cpu.prof > handler-only.pb.gz\n\n# Annotated source in browser with color-coded cost per line\ngo tool pprof -weblist=serializeResponse cpu.prof\n```\n\n**Filtering flags** — narrow analysis to relevant functions:\n\n```bash\n# Focus: keep only call paths passing through matching functions\ngo tool pprof -focus=myapp/pkg/handler -top cpu.prof\n\n# Ignore: remove matching functions — their cost is attributed to callers\ngo tool pprof -ignore=runtime -top cpu.prof\n\n# Show: display only matching functions (display-only, does not change cost accounting)\ngo tool pprof -show=handler -top cpu.prof\n\n# Hide: hide matching functions from display (does not change cost accounting)\ngo tool pprof -hide=testing -svg cpu.prof > clean.svg\n\n# Show_from: trim all frames above the first match — hides framework/routing callers\ngo tool pprof -show_from=handler.Handle -top cpu.prof\n\n# Noinlines: attribute inlined functions to their first out-of-line caller\ngo tool pprof -noinlines -top cpu.prof\n\n# Combine multiple filters\ngo tool pprof -cum -top -nodecount=10 -focus=handler -ignore=runtime cpu.prof\n```\n\n**Tag-based filtering** — for profiles with labels (via `pprof.Do()`):\n\n```bash\n# Show all tag keys and their value distributions\ngo tool pprof -tags cpu.prof\n\n# Keep only samples tagged with a specific key=value\ngo tool pprof -tagfocus=endpoint=/users -top cpu.prof\n\n# Exclude samples with a specific tag\ngo tool pprof -tagignore=request_type=batch -top cpu.prof\n\n# Group by tag — insert pseudo frames at root, breaking down by tag value\ngo tool pprof -tagroot=request_type -top cpu.prof\n\n# Group by tag as leaf — breaks down each function by tag value\ngo tool pprof -tagleaf=endpoint -top cpu.prof\n\n# Show/hide tags as annotations in graph output\ngo tool pprof -tagshow=endpoint -svg cpu.prof > tagged.svg\ngo tool pprof -taghide=thread_id -svg cpu.prof > clean.svg\n```\n\n**Granularity and display control:**\n\n```bash\n# Group by source line instead of function — reveals hot lines in multi-hot-spot functions\ngo tool pprof -granularity=lines -top cpu.prof\n\n# Group by file:function\ngo tool pprof -granularity=filefunctions -top cpu.prof\n\n# Group by file only\ngo tool pprof -granularity=files -top cpu.prof\n\n# Group by instruction address (most granular)\ngo tool pprof -granularity=addresses -top cpu.prof\n\n# Change display units\ngo tool pprof -unit=ms -top cpu.prof\n\n# Edge/node fraction cutoffs — hide small contributions from graphs\ngo tool pprof -edgefraction=0.01 -nodefraction=0.005 -svg cpu.prof > clean.svg\n\n# Disable trimming — show the full graph including tiny nodes\ngo tool pprof -trim=false -svg cpu.prof > full.svg\n```\n\n**Heap profile commands:**\n\n```bash\n# Top allocation sites by object count — diagnose GC churn\ngo tool pprof -top -alloc_objects mem.prof\n\n# Top allocation sites by bytes — diagnose peak memory\ngo tool pprof -top -alloc_space mem.prof\n\n# Currently live objects — diagnose memory leaks\ngo tool pprof -top -inuse_space mem.prof\n\n# Currently live object count — diagnose leak of many small objects\ngo tool pprof -top -inuse_objects mem.prof\n\n# Annotated source showing allocation sites by object count\ngo tool pprof -alloc_objects -list=Parse mem.prof\n\n# SVG call graph colored by allocation objects\ngo tool pprof -alloc_objects -svg mem.prof > allocs.svg\n\n# Compare two heap snapshots — show only growth (memory leak detection)\ngo tool pprof -top -base heap-baseline.prof heap-after.prof\n\n# Diff with normalization — makes ratios comparable when capture durations differ\ngo tool pprof -normalize -top -base heap-baseline.prof heap-after.prof\n\n# Diff as SVG — visualize what grew\ngo tool pprof -base heap-baseline.prof -svg heap-after.prof > leak.svg\n\n# Diff with annotated source for a specific function\ngo tool pprof -base heap-baseline.prof -list=handleRequest heap-after.prof\n```\n\n**Fetching profiles from a running service:**\n\n```bash\n# CPU profile — fetch 30 seconds of samples and open interactive mode\ngo tool pprof http://localhost:6060/debug/pprof/profile?seconds=30\n\n# CPU profile — fetch and immediately generate SVG (no interactive mode)\ngo tool pprof -svg http://localhost:6060/debug/pprof/profile?seconds=10 > cpu.svg\n\n# CPU profile — fetch with a timeout\ngo tool pprof -timeout=60 \"http://localhost:6060/debug/pprof/profile?seconds=30\"\n\n# Heap profile — fetch and show top allocation sites\ngo tool pprof -top -alloc_objects http://localhost:6060/debug/pprof/heap\n\n# Goroutine profile — fetch and show top goroutine stacks\ngo tool pprof -top http://localhost:6060/debug/pprof/goroutine\n\n# Mutex profile — fetch contention data\ngo tool pprof -top http://localhost:6060/debug/pprof/mutex\n\n# Block profile — fetch blocking data\ngo tool pprof -top http://localhost:6060/debug/pprof/block\n\n# Fetch and save to a file without analysis (using curl)\ncurl -o heap.prof http://localhost:6060/debug/pprof/heap\n\n# Human-readable goroutine dump (no go tool pprof needed)\ncurl http://localhost:6060/debug/pprof/goroutine?debug=1\n\n# Goroutine dump with full stack traces, creation site, and labels\ncurl http://localhost:6060/debug/pprof/goroutine?debug=2\n\n# Human-readable heap stats\ncurl http://localhost:6060/debug/pprof/heap?debug=1\n\n# Fetch over TLS with client certificate\ngo tool pprof -tls_cert=client.crt -tls_key=client.key -tls_ca=ca.crt https://myservice:6060/debug/pprof/profile?seconds=30\n\n# Fetch over TLS skipping server certificate verification\ngo tool pprof https+insecure://myservice:6060/debug/pprof/profile?seconds=30\n```\n\n**Comparison commands (diff two profiles):**\n\n```bash\n# Diff: subtract base from source — all values become deltas\ngo tool pprof -base cpu-before.prof cpu-after.prof\n\n# Diff base: percentages shown relative to base profile\ngo tool pprof -diff_base=cpu-before.prof cpu-after.prof\n\n# Diff with normalization — scale base to match source total\ngo tool pprof -normalize -base heap-before.prof heap-after.prof\n\n# Diff as top report\ngo tool pprof -top -base cpu-before.prof cpu-after.prof\n\n# Diff as SVG graph\ngo tool pprof -svg -base cpu-before.prof cpu-after.prof > diff.svg\n```\n\n**Web UI:**\n\n```bash\n# Open interactive web UI with flamegraph, graph, source, and disassembly views\ngo tool pprof -http=:8080 cpu.prof\n\n# Open on a different port\ngo tool pprof -http=:9090 mem.prof\n\n# Open with a specific sample type pre-selected\ngo tool pprof -http=:8080 -alloc_objects mem.prof\n\n# Open with filters pre-applied\ngo tool pprof -http=:8080 -focus=handler cpu.prof\n\n# Open a diff view in the web UI\ngo tool pprof -http=:8080 -base heap-baseline.prof heap-after.prof\n\n# Open with no browser auto-launch (just start the server)\ngo tool pprof -http=:8080 -no_browser cpu.prof\n```\n\n**Symbolization flags:**\n\n```bash\n# Disable symbolization (show raw addresses)\ngo tool pprof -symbolize=none cpu.prof\n\n# Only use local binaries for symbolization (don't contact remote)\ngo tool pprof -symbolize=local cpu.prof\n\n# Contact running service for symbol information\ngo tool pprof -symbolize=remote http://localhost:6060/debug/pprof/profile?seconds=10\n\n# Show mangled C++ names (relevant for cgo profiles)\ngo tool pprof -symbolize=demangle=none cpu.prof\n\n# Full demangling without simplification\ngo tool pprof -symbolize=demangle=full cpu.prof\n```\n\n**Environment variables:**\n\n| Variable | Purpose |\n| --- | --- |\n| `PPROF_BINARY_PATH` | Search path for local binaries used in symbolization (default: `$HOME/pprof/binaries`). Set when profiling remote servers where binaries aren't in the default path. |\n| `PPROF_TOOLS` | Directory containing binutils tools (`addr2line`, `nm`, `objdump`). Set when these tools aren't in `$PATH`. |\n\n## Graphical / Web UI\n\nWhen CLI output is insufficient and you need interactive exploration:\n\n```bash\n# Opens browser with interactive UI\ngo tool pprof -http=:8080 cpu.prof\n\n# Specify a different port if 8080 is taken\ngo tool pprof -http=:9090 mem.prof\n\n# Open with specific sample type pre-selected\ngo tool pprof -http=:8080 -alloc_objects mem.prof\n\n# Open with filters pre-applied\ngo tool pprof -http=:8080 -focus=handler cpu.prof\n\n# Compare two profiles — open with -base\ngo tool pprof -http=:8080 -base heap-baseline.prof heap-after.prof\n```\n\nThe web UI provides:\n\n- **Flamegraph** (most intuitive) — horizontal width proportional to cost; click to zoom into subtrees; inverted flamegraph available (icicle graph)\n- **Graph** — directed call graph with edge weights; nodes and edges sized/colored by cost; interactive zoom and click-to-focus\n- **Top** — same as `top` command but sortable columns, clickable to navigate to source\n- **Source** — annotated source with per-line cost; browsable across all functions\n- **Disassembly** — same as `disasm` but browsable across functions\n- **Peek** — interactive peek view with expandable callers/callees\n\nDefault to CLI commands for quick diagnosis — use the web UI when exploring unfamiliar call graphs, comparing profiles visually, or presenting findings to others.\n\n## Comparing Profiles\n\n### Memory leak detection with `-base`\n\nCompare two heap profiles to isolate what grew between them:\n\n```bash\n# Step 1: take a baseline snapshot\ncurl http://localhost:6060/debug/pprof/heap > heap-baseline.prof\n\n# Step 2: wait for the suspected leak to accumulate (minutes to hours)\n\n# Step 3: take a second snapshot\ncurl http://localhost:6060/debug/pprof/heap > heap-after.prof\n\n# Step 4: diff — shows only what grew between the two snapshots\ngo tool pprof -base heap-baseline.prof heap-after.prof\n# Then use top, list, peek as usual — all values are deltas\n```\n\n### Comparing CPU profiles across code versions\n\n```bash\n# Before your change\ngo test -bench=BenchmarkParse -cpuprofile=cpu-before.prof ./pkg/parser\n\n# After your change\ngo test -bench=BenchmarkParse -cpuprofile=cpu-after.prof ./pkg/parser\n\n# Compare visually — load both in separate browser tabs\ngo tool pprof -http=:8080 cpu-before.prof\ngo tool pprof -http=:8081 cpu-after.prof\n```\n\nFor statistical comparison of benchmark numbers (not profiles), use [benchstat](./benchstat.md) instead.\n\n## Common Patterns\n\nLearn to recognize these recurring shapes — they tell you what class of problem you're dealing with before you start fixing.\n\n### Flat high + cum high\n\nThe function itself is the bottleneck. It does expensive work directly (tight loop, heavy computation, complex string processing). Optimize the function's own code — algorithm, data structure, or implementation.\n\n### Flat low + cum high\n\nThe function calls slow things but does little work itself. It's a coordinator or dispatcher. Drill into callees with `list` or `peek`. The fix is usually in the called functions, or reducing how often they're called.\n\n### `alloc_objects` high, `inuse_space` low\n\nShort-lived allocations creating GC churn. Objects are allocated and freed rapidly — each one is cheap individually but the aggregate volume triggers frequent GC cycles. Common sources: `fmt.Errorf` in hot paths (allocates every call), interface boxing (`any` arguments), string-to-byte conversions, slice growth without preallocation. → See `samber/cc-skills-golang@golang-performance` skill for allocation reduction patterns.\n\n### `inuse_space` growing over time\n\nMemory leak. Take two heap snapshots minutes apart and compare with `-base` (see Comparing Profiles above). Growing types reveal the leak source. Common causes: unbounded caches, maps that never shrink (Go maps don't release bucket memory on delete), goroutine leaks holding references.\n\n### Mutex/block profile hot\n\nContention, not CPU. The CPU is waiting, not working. The goroutines are all trying to acquire the same lock or read from the same channel. Reduce critical section scope, shard locks across multiple mutexes, or use lock-free structures (`sync/atomic`, `sync.Map` for read-heavy workloads). → See `samber/cc-skills-golang@golang-concurrency` skill.\n\n### Many goroutines blocked on same channel/mutex\n\nSerialization bottleneck. All work funnels through a single point. The throughput ceiling is the speed of that single point. Consider worker pools with multiple independent queues, sharding the work, or buffered channels to smooth bursts.\n\n### `runtime.mallocgc` dominates CPU profile\n\nAllocation rate is the bottleneck, not computation. The Go runtime is spending more time allocating and collecting garbage than running your code. Switch to the `alloc_objects` heap profile to find which functions allocate the most, then → See `samber/cc-skills-golang@golang-performance` skill for reduction patterns.\n\n### `runtime.memmove` high in CPU profile\n\nLarge memory copies — usually from slice `append` growing beyond capacity, `copy()` of large slices, or string-to-byte conversions. Pre-allocate slices to final capacity, reuse buffers, or work with `[]byte` directly.\n\n### `runtime.scanobject` high in CPU profile\n\nGC pointer scanning. The heap contains many pointers that the GC must trace. Reduce pointer density: use value types instead of pointers in slices/maps, flatten nested structures, consider `[N]byte` arrays instead of `string` in hot structs.\n\n## Which Profile for Which Symptom?\n\n| Symptom | Profile | Flag/Command |\n| --- | --- | --- |\n| High CPU, slow function | CPU | `-cpuprofile` or `pprof/profile` |\n| Too many allocations (GC pressure) | Heap (alloc_objects) | `-memprofile` then `pprof -alloc_objects` |\n| Large allocations (memory usage) | Heap (alloc_space) | `pprof -alloc_space` |\n| Memory growing over time (leak) | Heap (inuse_space) | `pprof -inuse_space`, compare with `-base` |\n| Lock contention | Mutex | `pprof/mutex` (enable `SetMutexProfileFraction` first) |\n| Goroutines blocked on sync | Block | `pprof/block` (enable `SetBlockProfileRate` first) |\n| Too many goroutines / leak | Goroutine | `pprof/goroutine` |\n| High latency but low CPU | Goroutine + Block + Trace | Scheduling delays, I/O waits — see [Trace Reference](./trace.md) |\n| Excessive thread creation | Threadcreate | `pprof/threadcreate` |\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":33951,"content_sha256":"191070b096d75226aa261d7f163a12500b667603ccfe49ba579b5fafcba352cb"},{"filename":"references/prometheus-go-metrics.md","content":"# Prometheus Go Runtime Metrics Reference\n\nComplete listing of Go runtime metrics **actually exposed as Prometheus metrics** by `prometheus/client_golang` library.\n\n---\n\n## Important Clarification\n\n**`runtime/metrics` are NOT Prometheus metrics.** They're Go runtime data structures.\n\nThe Prometheus Go client library (`prometheus/client_golang`) **selectively converts some** `runtime/metrics` into Prometheus format. By default, it exposes only the traditional `go_memstats_*` and `go_gc_*` metrics to keep cardinality low.\n\n**This document lists only Prometheus metrics** (the ones you actually scrape from `/metrics` endpoint).\n\n---\n\n## Quick Reference\n\n### Metrics with Labels\n\n| Metric | Label | Values |\n| ------------------------ | ---------- | --------------------- |\n| `go_gc_duration_seconds` | `quantile` | 0, 0.25, 0.5, 0.75, 1 |\n| `go_info` | `version` | e.g., \"go1.21.3\" |\n\n### All Other Metrics\n\nAll other metrics have **no labels**.\n\n---\n\n## Default Go Metrics (Always Exposed)\n\nThese are exposed by default by `prometheus/client_golang`.\n\n### Memory Allocation\n\n| Metric | Type | Description |\n| ------------------------------- | ------- | ------------------------------- |\n| `go_memstats_alloc_bytes` | gauge | Current bytes allocated on heap |\n| `go_memstats_alloc_bytes_total` | counter | Cumulative bytes allocated |\n| `go_memstats_sys_bytes` | gauge | Total bytes requested from OS |\n\n### Heap State\n\n| Metric | Type | Description |\n| --------------------------------- | ----- | --------------------------- |\n| `go_memstats_heap_alloc_bytes` | gauge | Allocated heap bytes |\n| `go_memstats_heap_idle_bytes` | gauge | Idle heap bytes |\n| `go_memstats_heap_inuse_bytes` | gauge | Heap bytes in use |\n| `go_memstats_heap_objects` | gauge | Count of heap objects |\n| `go_memstats_heap_released_bytes` | gauge | Heap bytes released to OS |\n| `go_memstats_heap_sys_bytes` | gauge | Heap bytes reserved from OS |\n\n### Stack and Metadata\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `go_memstats_stack_inuse_bytes` | gauge | Stack in-use bytes |\n| `go_memstats_stack_sys_bytes` | gauge | Stack reserved bytes |\n| `go_memstats_mspan_inuse_bytes` | gauge | Mspan in-use bytes |\n| `go_memstats_mspan_sys_bytes` | gauge | Mspan reserved bytes |\n| `go_memstats_mcache_inuse_bytes` | gauge | Mcache in-use bytes |\n| `go_memstats_mcache_sys_bytes` | gauge | Mcache reserved bytes |\n| `go_memstats_other_sys_bytes` | gauge | Other runtime bytes |\n| `go_memstats_gc_sys_bytes` | gauge | GC internal bytes |\n| `go_memstats_buck_hash_sys_bytes` | gauge | Profiling bucket hash table bytes |\n\n### Allocation and Free Counters\n\n| Metric | Type | Description |\n| --------------------------- | ------- | ------------------ |\n| `go_memstats_mallocs_total` | counter | Total malloc calls |\n| `go_memstats_frees_total` | counter | Total free calls |\n\n### GC Configuration and Timing\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `go_gc_gogc_percent` | gauge | GOGC target percentage |\n| `go_gc_gomemlimit_bytes` | gauge | GOMEMLIMIT soft memory limit |\n| `go_memstats_last_gc_time_seconds` | gauge | Last GC end time (Unix timestamp) |\n| `go_memstats_next_gc_bytes` | gauge | Heap size target for next GC |\n\n### GC Pause Duration (with labels)\n\n| Metric | Type | Labels | Description |\n| --- | --- | --- | --- |\n| `go_gc_duration_seconds` | summary | `quantile` (0, 0.25, 0.5, 0.75, 1) | GC pause durations with quantiles |\n| `go_gc_duration_seconds_count` | counter | — | GC pause count |\n| `go_gc_duration_seconds_sum` | counter | — | GC pause total time |\n\n### Runtime State\n\n| Metric | Type | Description |\n| ----------------------------- | ----- | ------------------------ |\n| `go_goroutines` | gauge | Current goroutine count |\n| `go_threads` | gauge | Current OS thread count |\n| `go_sched_gomaxprocs_threads` | gauge | Current GOMAXPROCS value |\n\n### Version Information (with labels)\n\n| Metric | Type | Labels | Description |\n| --------- | ----- | --------- | ----------------- |\n| `go_info` | gauge | `version` | Go version string |\n\n---\n\n## Optional Go Metrics (Opt-in, Go 1.17+)\n\nEnable via:\n\n```go\nprometheus.NewRegistry().MustRegister(\n collectors.NewGoCollector(\n collectors.WithGoCollectorRuntimeMetrics(\n collectors.MetricsAll,\n ),\n ),\n)\n```\n\n### GC Cycles\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `go_gc_cycles_automatic_gc_cycles_total` | counter | Automatic GC cycles (heap growth) |\n| `go_gc_cycles_forced_gc_cycles_total` | counter | Forced GC cycles (runtime.GC()) |\n\n### Additional Heap Metrics\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `go_gc_heap_allocs_bytes_total` | counter | Cumulative heap allocations (bytes) |\n| `go_gc_heap_allocs_objects_total` | counter | Cumulative heap allocations (count) |\n| `go_gc_heap_frees_bytes_total` | counter | Cumulative heap frees (bytes) |\n| `go_gc_heap_frees_objects_total` | counter | Cumulative heap frees (count) |\n| `go_gc_heap_goal_bytes` | gauge | Heap size target for next GC |\n| `go_gc_heap_live_bytes` | gauge | Live heap bytes |\n| `go_gc_heap_objects_objects` | gauge | Total heap objects count |\n\n### GC Pauses Distribution\n\n| Metric | Type | Description |\n| ---------------------- | ------------ | ------------------ |\n| `go_gc_pauses_seconds` | distribution | GC pause durations |\n\n### CPU Classes\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `go_cpu_classes_gc_mark_assist_cpu_seconds_total` | counter | GC mark assist CPU time |\n| `go_cpu_classes_gc_mark_dedicated_cpu_seconds_total` | counter | GC dedicated workers CPU time |\n| `go_cpu_classes_gc_mark_idle_cpu_seconds_total` | counter | GC idle workers CPU time |\n| `go_cpu_classes_gc_pause_cpu_seconds_total` | counter | GC pause CPU time |\n| `go_cpu_classes_gc_total_cpu_seconds_total` | counter | Total GC CPU time |\n| `go_cpu_classes_idle_cpu_seconds_total` | counter | Idle CPU time |\n| `go_cpu_classes_scavenge_assist_cpu_seconds_total` | counter | Scavenger assist CPU time |\n| `go_cpu_classes_scavenge_background_cpu_seconds_total` | counter | Background scavenger CPU time |\n| `go_cpu_classes_scavenge_total_cpu_seconds_total` | counter | Total scavenger CPU time |\n| `go_cpu_classes_total_cpu_seconds_total` | counter | Total CPU time (all classes) |\n| `go_cpu_classes_user_cpu_seconds_total` | counter | User-mode CPU time |\n\n### Memory Classes\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `go_memory_classes_heap_free_bytes` | gauge | Free heap memory |\n| `go_memory_classes_heap_objects_bytes` | gauge | Allocated heap objects |\n| `go_memory_classes_heap_released_bytes` | gauge | Heap released memory |\n| `go_memory_classes_heap_stacks_bytes` | gauge | Stack memory |\n| `go_memory_classes_heap_unused_bytes` | gauge | Unused heap |\n| `go_memory_classes_metadata_mcache_free_bytes` | gauge | Free mcache memory |\n| `go_memory_classes_metadata_mcache_inuse_bytes` | gauge | In-use mcache memory |\n| `go_memory_classes_metadata_mspan_free_bytes` | gauge | Free mspan memory |\n| `go_memory_classes_metadata_mspan_inuse_bytes` | gauge | In-use mspan memory |\n| `go_memory_classes_other_bytes` | gauge | Other memory |\n| `go_memory_classes_total_bytes` | gauge | Total memory |\n\n### Scheduler Metrics\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `go_sched_goroutines_running_goroutines` | gauge | Running goroutines |\n| `go_sched_goroutines_runnable_goroutines` | gauge | Runnable goroutines waiting |\n| `go_sched_goroutines_goroutines` | gauge | Current total goroutines |\n| `go_sched_goroutines_created_goroutines_total` | counter | Total goroutines ever created |\n| `go_sched_goroutines_waiting_goroutines` | gauge | Goroutines waiting (not runnable) |\n| `go_sched_latencies_seconds` | distribution | Goroutine scheduling latency |\n| `go_sched_pauses_stopping_gc_seconds` | distribution | STW pause time (GC stop) |\n| `go_sched_pauses_stopping_other_seconds` | distribution | STW pause time (other stop) |\n| `go_sched_pauses_total_gc_seconds` | distribution | Total GC pause duration |\n| `go_sched_pauses_total_other_seconds` | distribution | Total other pause duration |\n| `go_sched_threads_total_threads` | counter | Total OS threads ever created |\n| `go_sync_mutex_wait_total_seconds_total` | counter | Total time goroutines waited on mutex |\n\n### CGO Metrics\n\n| Metric | Type | Description |\n| ---------------------------- | ------- | ------------------------ |\n| `go_cgo_go_to_c_calls_total` | counter | Total calls from Go to C |\n\n---\n\n## Process Metrics\n\nExposed by Prometheus `process` collector (not Go-specific):\n\n### CPU and Memory\n\n| Metric | Type | Description |\n| --- | --- | --- |\n| `process_cpu_seconds_total` | counter | Total CPU time (user + system) |\n| `process_resident_memory_bytes` | gauge | RSS (physical memory used) |\n| `process_virtual_memory_bytes` | gauge | Virtual memory allocated |\n| `process_virtual_memory_max_bytes` | gauge | Maximum virtual memory allowed |\n\n### File Descriptors\n\n| Metric | Type | Description |\n| ------------------ | ----- | -------------------------------- |\n| `process_open_fds` | gauge | Open file descriptors |\n| `process_max_fds` | gauge | Maximum file descriptors allowed |\n\n### Process Information\n\n| Metric | Type | Description |\n| ---------------------------- | ----- | ----------------------------------- |\n| `process_start_time_seconds` | gauge | Process start time (Unix timestamp) |\n\n### Page Faults\n\n| Metric | Type | Description |\n| --------------------------------- | ------- | ----------------- |\n| `process_page_faults_total` | counter | Total page faults |\n| `process_page_faults_minor_total` | counter | Minor page faults |\n| `process_page_faults_major_total` | counter | Major page faults |\n\n---\n\n## Common PromQL Queries\n\n### Memory Leak Detection\n\n```promql\n# Current heap allocation (should be stable under constant load)\ngo_memstats_alloc_bytes\n\n# Live heap bytes (optional metric)\ngo_gc_heap_live_bytes\n\n# Heap growth rate\nrate(go_memstats_alloc_bytes_total[5m])\n```\n\n### GC Pressure\n\n```promql\n# Worst-case GC pause (quantile 1 = max)\ngo_gc_duration_seconds{quantile=\"1\"}\n\n# Average GC pause\nrate(go_gc_duration_seconds_sum[5m]) / rate(go_gc_duration_seconds_count[5m])\n\n# GC frequency (cycles per second)\nrate(go_gc_duration_seconds_count[5m])\n```\n\n### Goroutine Leaks\n\n```promql\n# Current goroutine count\ngo_goroutines\n\n# Goroutine growth (leak indicator)\ndelta(go_goroutines[1h])\n```\n\n### CPU Usage\n\n```promql\n# Total CPU time consumed\nrate(process_cpu_seconds_total[5m])\n\n# CPU utilization ratio (0-1)\nrate(process_cpu_seconds_total[5m]) / \u003cGOMAXPROCS>\n```\n\n### File Descriptor Leaks\n\n```promql\n# FD growth\ndelta(process_open_fds[1h])\n\n# FD saturation ratio\nprocess_open_fds / process_max_fds\n```\n\n→ See `samber/cc-skills@promql-cli` skill for executing these queries directly against your Prometheus instance from the CLI.\n\n## References\n\n- [prometheus/client_golang collectors](https://github.com/prometheus/client_golang/tree/main/prometheus/collectors)\n- [Go runtime/metrics package](https://pkg.go.dev/runtime/metrics)\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":11623,"content_sha256":"97209299949e3090809ded3510febfb995e6563dc39e0e7a04bf973179d5392e"},{"filename":"references/tools.md","content":"# Diagnostic Tools Quick Reference\n\nUse these tools to validate the root cause of a slowdown BEFORE applying any optimization. Do NOT use auto-fix flags (e.g. `--fix`) — let the coding agent interpret results and apply changes manually with explanatory comments.\n\nFor detailed usage of each tool, see the dedicated reference files:\n\n- [pprof Reference](./pprof.md) — profiling (CPU, heap, goroutine, mutex, block)\n- [benchstat Reference](./benchstat.md) — statistical benchmark comparison\n- [Trace Reference](./trace.md) — execution tracer\n- [Compiler Analysis](./compiler-analysis.md) — escape analysis, inlining, SSA, assembly\n\n## GC and Runtime Diagnostics\n\nConfigure via environment variables — no recompile needed.\n\n| Command | Use for |\n| --- | --- |\n| `GODEBUG=gctrace=1 ./app` | GC frequency, pause times, heap sizes, CPU% — one line per GC cycle |\n| `GODEBUG=gcpacertrace=1 ./app` | Why GC triggers when it does — pacer decisions (trigger ratio, heap goal) |\n| `GODEBUG=schedtrace=1000 ./app` | Load balancing, goroutine distribution across Ps — prints every 1000ms |\n| `GODEBUG=schedtrace=1000,scheddetail=1 ./app` | Per-goroutine state detail on top of schedtrace |\n| Heap/alloc profiles (`go tool pprof -alloc_objects`) | Allocation sites and object churn; use instead of removed/stale allocation trace flags |\n\n→ See `samber/cc-skills-golang@golang-troubleshooting` skill for detailed GODEBUG usage and interpretation.\n\n### Programmatic APIs\n\n- **`runtime.ReadMemStats`** — heap size, NumGC, pause durations (PauseNs circular buffer), TotalAlloc (cumulative). Use for dashboards, alerting on heap growth.\n- **`debug.ReadGCStats`** — GC-specific statistics: pause percentiles, pause timeline, total pause duration. More focused than ReadMemStats.\n- **`runtime/metrics` (Go 1.16+)** — stable API, safe for concurrent reads, lower overhead than ReadMemStats. Keys: `/gc/cycles/total:gc-cycles`, `/gc/heap/allocs:bytes`, `/gc/pauses:seconds`, `/sched/latencies:seconds`, `/memory/classes/heap/released:bytes`.\n- **`debug.FreeOSMemory()`** — forces GC + returns memory to OS. One-off use after large temporary allocations (not for regular use — let the runtime manage this).\n- **`expvar`** — stdlib metrics at `/debug/vars` as JSON. `import _ \"expvar\"` auto-registers. Lightweight, no dependencies. Integrates with Netdata, Telegraf, or custom dashboards.\n\n## Static Analysis\n\n| Command | Use for |\n| --- | --- |\n| `fieldalignment ./...` | Detect suboptimal struct field ordering (padding waste). Do NOT use `-fix` flag — let the coding agent apply changes manually with explanatory comments. |\n| `unsafe.Sizeof` / `Alignof` / `Offsetof` | Inspect struct memory layout at compile time — compare before/after reordering to quantify savings. |\n| `go vet ./...` | Suspicious constructs: printf format mismatches, unreachable code, unused results, suspicious shifts. |\n| `staticcheck ./...` | Advanced linter: performance pitfalls (SA9003: empty branch, SA4006: unused value, SA1019: deprecated API). |\n| `go test -race ./...` | Data race detection at runtime — also useful for confirming false sharing. |\n\n## Third-Party Profiling\n\n| Tool | What it adds | When to use |\n| --- | --- | --- |\n| **fgprof** (`github.com/felixge/fgprof`) | Full goroutine profiler — captures both on-CPU and off-CPU (I/O wait) time in a single profile. Standard pprof CPU profiles only show on-CPU time. | pprof CPU profile shows low CPU% but latency is high. |\n| **Pyroscope / Parca** | Continuous profiling platforms — aggregate pprof profiles over time, compare across deployments, detect regressions. | Production performance monitoring, historical trend analysis. → See `samber/cc-skills-golang@golang-observability` skill for setup. |\n| **Linux perf** (`perf record -g ./app && perf report`) | Hardware performance counters: cache misses, branch mispredictions, TLB misses. Requires `perf_data_converter` for pprof format. | CPU microarchitecture-level analysis when pprof isn't granular enough. |\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":4031,"content_sha256":"d236bd750d0b612ba46a79267ab1cd5c241eda97a35163b53882031fb537902c"},{"filename":"references/trace.md","content":"# Execution Trace Reference\n\n`go tool trace` shows what pprof cannot: **scheduling delays**, GC stop-the-world phases, goroutine state transitions, and why goroutines are **not** running. pprof samples what's on-CPU; trace records every state transition at nanosecond precision.\n\nUse the execution tracer when:\n\n- pprof shows low CPU% but latency is high (goroutines waiting, not working)\n- You suspect GC pauses are causing tail latency spikes\n- You need to understand goroutine scheduling and contention\n- You want to see the wall-clock timeline of concurrent operations\n\n## Generating Traces\n\n### From benchmarks\n\n```bash\ngo test -bench=BenchmarkParse -trace=trace.out ./pkg/parser\ngo tool trace trace.out\n```\n\n### From running service\n\nRequires `import _ \"net/http/pprof\"`:\n\n```bash\n# Capture 5 seconds of trace data (adjust duration as needed)\ncurl -o trace.out http://localhost:6060/debug/pprof/trace?seconds=5\ngo tool trace trace.out\n```\n\n**Warning:** traces generate data at MB/s. Keep captures short — 5-10 seconds is typical. Longer traces are unwieldy, slow to parse, and may consume significant memory when opened.\n\n### From tests\n\n```bash\ngo test -trace=trace.out ./pkg/parser\ngo tool trace trace.out\n```\n\n### From code (programmatic)\n\n```go\nimport \"runtime/trace\"\n\nf, _ := os.Create(\"trace.out\")\ntrace.Start(f)\ndefer trace.Stop()\n```\n\nOr capture a region of interest:\n\n```go\nimport \"runtime/trace\"\n\n// Start tracing only when needed\nf, _ := os.Create(\"trace.out\")\ntrace.Start(f)\n\ndoExpensiveWork()\n\ntrace.Stop()\nf.Close()\n```\n\n## Full Command Reference\n\n### Opening traces\n\n```bash\n# Open trace in web browser (default — starts HTTP server, opens browser)\ngo tool trace trace.out\n\n# Open on a specific port\ngo tool trace -http=:8080 trace.out\n\n# Open on a specific host:port (e.g., for remote access)\ngo tool trace -http=0.0.0.0:8080 trace.out\n```\n\n### Extracting pprof profiles from traces\n\n`go tool trace` can convert trace data into pprof-compatible profiles. This bridges the two tools — you capture with the tracer (nanosecond events) and analyze with pprof (statistical aggregation with `top`, `list`, `peek`):\n\n```bash\n# Network blocking profile — where goroutines wait on network I/O\ngo tool trace -pprof=net trace.out > net.prof\ngo tool pprof -top net.prof\n\n# Synchronization blocking profile — mutexes, channels, wait groups\ngo tool trace -pprof=sync trace.out > sync.prof\ngo tool pprof -top sync.prof\n\n# Syscall blocking profile — system calls that block goroutines\ngo tool trace -pprof=syscall trace.out > syscall.prof\ngo tool pprof -top syscall.prof\n\n# Scheduler latency profile — time between becoming runnable and actually running\ngo tool trace -pprof=sched trace.out > sched.prof\ngo tool pprof -top sched.prof\n```\n\nYou can chain with any pprof command — e.g., annotated source for a blocking function:\n\n```bash\ngo tool trace -pprof=sync trace.out > sync.prof\ngo tool pprof -list=handleRequest sync.prof\ngo tool pprof -svg sync.prof > sync-blocking.svg\n```\n\n### Full capture-to-analysis workflows\n\n```bash\n# Workflow 1: benchmark trace — capture, view, extract blocking profile\ngo test -bench=BenchmarkParse -trace=trace.out ./pkg/parser\ngo tool trace trace.out # visual timeline\ngo tool trace -pprof=sync trace.out > sync.prof # extract sync blocking\ngo tool pprof -top -cum sync.prof # find worst sync blockers\ngo tool pprof -list=processOrder sync.prof # annotated source\n\n# Workflow 2: production trace — capture from running service, analyze scheduling\ncurl -o trace.out http://localhost:6060/debug/pprof/trace?seconds=5\ngo tool trace trace.out # visual timeline\ngo tool trace -pprof=sched trace.out > sched.prof # extract scheduling latency\ngo tool pprof -top sched.prof # goroutines with worst scheduling delay\ngo tool pprof -svg sched.prof > sched.svg # graph of scheduling bottlenecks\n\n# Workflow 3: test trace — capture during test run\ngo test -trace=trace.out -run=TestSlowIntegration ./pkg/api\ngo tool trace trace.out # visual timeline\ngo tool trace -pprof=net trace.out > net.prof # extract network blocking\ngo tool pprof -top net.prof # find network wait sites\n```\n\n### `go tool trace` flags summary\n\n| Flag | Example | Purpose |\n| --- | --- | --- |\n| (none) | `go tool trace trace.out` | Open trace in web browser (default) |\n| `-http=:PORT` | `go tool trace -http=:9090 trace.out` | Set HTTP server address for the web UI |\n| `-pprof=TYPE` | `go tool trace -pprof=net trace.out > net.prof` | Extract pprof profile from trace. Types: `net`, `sync`, `syscall`, `sched` |\n\n### HTTP endpoints served by the web UI\n\nWhen `go tool trace trace.out` starts its HTTP server, it exposes these pages:\n\n| Endpoint | What it shows |\n| --- | --- |\n| `/` | Index page with links to all views |\n| `/trace` | Interactive timeline viewer (Chrome trace viewer) — the main visualization |\n| `/goroutines` | Goroutine analysis — summary table of all goroutine types, counts, and execution stats |\n| `/goroutine/\u003cid>` | Detailed view of a specific goroutine — its full lifecycle timeline |\n\nFrom `/goroutines`, click on a goroutine type to see all instances and their execution statistics (total time, scheduled time, blocked time). Click an individual goroutine to see its timeline.\n\n## Web UI\n\n### Main views\n\nThe web UI (opened by `go tool trace trace.out`) shows a timeline where each horizontal lane represents a processor (P), goroutine, or system event:\n\n- **Trace viewer** (`/trace`) — interactive timeline with:\n - **P lanes** — one per logical processor (GOMAXPROCS), showing which goroutine runs on each P at each moment\n - **Goroutine lanes** — each goroutine's lifecycle: created → runnable → running → waiting → running → …\n - **GC events** — mark phases, sweep, STW pauses shown as colored bands across all P lanes\n - **System events** — syscalls, network I/O, timer events\n - **User annotations** — tasks, regions, and log messages from `runtime/trace` API\n\n- **Goroutine analysis** (`/goroutines`) — summary table:\n - Groups goroutines by creation stack trace (type)\n - Shows count, total execution time, total scheduling wait, total blocking time\n - Click a type to see individual goroutine statistics\n - Click an individual goroutine to see its timeline\n\n### Navigating the trace viewer\n\nThe trace viewer uses the Chrome tracing UI (also used by Chrome DevTools):\n\n| Key/Action | Effect |\n| --- | --- |\n| `W` / scroll up | Zoom in (time axis) |\n| `S` / scroll down | Zoom out (time axis) |\n| `A` | Pan left |\n| `D` | Pan right |\n| Click on event | Show details panel at bottom — goroutine ID, duration, stack trace |\n| `Shift+click` | Select a time range — highlights all events in that window |\n| `M` | Mark current selection |\n| `/` | Search for events by name |\n| `?` | Show keyboard shortcuts |\n\n### Reading the timeline\n\n**Color coding:**\n\n- **Green bars** on P lanes = goroutine actively executing\n- **Blue bars** = syscall (goroutine pinned to OS thread)\n- **Orange/yellow marks** = scheduling events (goroutine becoming runnable)\n- **Red bands** across all P lanes = GC stop-the-world pause\n- **Light blue bands** = GC concurrent mark phase\n- **Purple** = user-defined regions (from `trace.WithRegion`)\n\n**Gaps in P lanes** = the processor was idle (no runnable goroutines, or goroutines blocked). Many idle gaps with pending runnable goroutines suggests scheduling contention.\n\n## What to Look For\n\n### Goroutine states\n\nThe trace timeline color-codes goroutine states:\n\n| Color | State | Meaning | What it indicates |\n| --- | --- | --- | --- |\n| **Green** | Running | Actively executing on a P | Normal — doing useful work |\n| **Yellow/Orange** | Runnable | Ready to run but waiting for a P | CPU-saturated — too many runnable goroutines competing for too few processors |\n| **Red/Pink** | Waiting | Blocked on I/O, channel, mutex, sleep, select | I/O-bound or contention — investigate what it's waiting on |\n| **Blue** | GC assist | Drafted by GC to help mark/sweep | GC pressure — too many allocations forcing goroutines to help the collector |\n\n### GC phases\n\nGC events appear as colored bands across all P lanes:\n\n- **Mark assist** — goroutines drafted to help GC scan the heap. Visible as gaps in application goroutine execution. The runtime forces goroutines to assist with GC work in proportion to their allocation rate — heavy allocators get taxed more.\n- **STW (stop-the-world)** — brief phases where all goroutines are stopped (mark setup, mark termination). These cause latency spikes visible as vertical bands across all lanes.\n- **Sweep** — concurrent sweep of unreachable objects. Usually low overhead but can accumulate if the heap is large.\n\n**Diagnosing GC issues from traces:**\n\n- Frequent GC cycles with long mark assist = too many allocations (reduce allocation rate)\n- Long STW phases = too many pointers for the GC to scan (reduce pointer density)\n- GC cycles clustering after specific operations = those operations allocate heavily\n\n### Scheduling latency\n\nTime between a goroutine becoming **runnable** and actually **running**. High scheduling latency means:\n\n- Too many goroutines competing for GOMAXPROCS processors\n- OS scheduling interference (noisy neighbors, CPU throttling)\n- Goroutines pinned to busy threads by cgo or long syscalls\n\n**What to look for:**\n\n- Yellow (runnable) gaps before green (running) segments — the longer the yellow gap, the higher the scheduling latency\n- Many goroutines in runnable state simultaneously — indicates CPU saturation\n- Uneven distribution across Ps — one P overloaded while others are idle suggests work imbalance\n\n### Network/sync blocking\n\n- **Long red/pink periods** on a goroutine = it's blocked waiting. Click the block event to see what it's waiting on (channel receive, mutex lock, network read, etc.)\n- **Many goroutines blocked on the same channel or mutex** = serialization bottleneck. All work funnels through one point.\n- **Goroutines blocked on network I/O** = external dependency latency. The Go code can't do anything faster — the bottleneck is upstream. Use `-pprof=net` to generate a pprof profile of network wait locations.\n\n### Goroutine creation and destruction\n\nThe trace shows goroutine lifecycle events. Look for:\n\n- **Goroutines created in a loop without bound** = potential goroutine leak\n- **Goroutines that are created but never finish** = leak — they accumulate over time\n- **Very short-lived goroutines created repeatedly** = high overhead from goroutine creation/scheduling (consider batching or worker pools)\n\n## Custom Annotations\n\nAdd application-level context to traces so you can correlate runtime events with business operations.\n\n### Tasks\n\nA task represents a logical operation that may span multiple goroutines:\n\n```go\nimport \"runtime/trace\"\n\nfunc processOrder(ctx context.Context, order Order) error {\n ctx, task := trace.NewTask(ctx, \"processOrder\")\n defer task.End()\n\n // All trace events in this context are grouped under the task\n validate(ctx, order)\n charge(ctx, order)\n fulfill(ctx, order)\n return nil\n}\n```\n\nTasks appear as named groups in the trace timeline. You can filter the trace view to show only events belonging to a specific task.\n\n### Regions\n\nA region represents a phase within a task or goroutine:\n\n```go\nfunc validate(ctx context.Context, order Order) {\n trace.WithRegion(ctx, \"validateAddress\", func() {\n // this block is annotated as a region\n validateAddress(order.Address)\n })\n\n trace.WithRegion(ctx, \"validatePayment\", func() {\n validatePayment(order.Payment)\n })\n}\n```\n\nRegions appear as labeled spans on the goroutine's timeline, making it easy to see which phase of processing takes the most wall-clock time.\n\n### Log messages\n\nAdd point-in-time log messages to the trace:\n\n```go\ntrace.Log(ctx, \"orderID\", order.ID)\ntrace.Log(ctx, \"status\", \"payment_verified\")\n```\n\nLogs appear as markers on the timeline — useful for correlating trace events with specific data.\n\n### When to use annotations\n\n- **Always** in server request handlers — wrap each request in a task\n- **Performance-critical paths** — add regions to phases you want to measure wall-clock time for\n- **Debugging intermittent latency** — add logs at key decision points to see what happened in the slow trace\n\nAnnotations add negligible overhead when tracing is disabled (they check a flag and return immediately).\n\n## Flight Recorder (Go 1.25+)\n\nThe flight recorder solves a fundamental problem with execution traces in long-running services: when a problem occurs (timeout, failed health check), it's already too late to call `trace.Start()`. The flight recorder keeps a circular buffer of recent trace data in memory, and you snapshot it to disk when something goes wrong — like an airplane's black box.\n\n### Setup\n\n```go\nimport \"runtime/trace\"\n\nfr := trace.NewFlightRecorder(trace.FlightRecorderConfig{\n MinAge: 10 * time.Second, // keep at least 10s of data\n MaxBytes: 5 \u003c\u003c 20, // cap at 5 MiB to limit memory usage\n})\nif err := fr.Start(); err != nil {\n return err\n}\n```\n\n**Sizing guidance:**\n\n- **MinAge** — set to ~2x your problem window. For 5-second timeout debugging, use 10 seconds. The runtime may retain more data than MinAge if MaxBytes allows.\n- **MaxBytes** — busy services generate ~1-10 MB/s of trace data. Start with 1-5 MiB and adjust. MaxBytes takes precedence over MinAge — when the buffer fills, older data is discarded regardless of age.\n\n### Snapshot on error\n\nCapture the trace buffer when something unexpected happens. Use `sync.Once` to prevent multiple snapshots overwriting each other:\n\n```go\nvar snapshotOnce sync.Once\n\nfunc captureSnapshot(fr *trace.FlightRecorder) {\n snapshotOnce.Do(func() {\n f, err := os.Create(\"snapshot.trace\")\n if err != nil {\n log.Printf(\"snapshot file: %v\", err)\n return\n }\n defer f.Close()\n\n if _, err := fr.WriteTo(f); err != nil {\n log.Printf(\"snapshot write: %v\", err)\n return\n }\n fr.Stop()\n log.Printf(\"captured snapshot to %s\", f.Name())\n })\n}\n```\n\n### Trigger patterns\n\n```go\n// Pattern 1: slow request detection\nhttp.HandleFunc(\"/api/order\", func(w http.ResponseWriter, r *http.Request) {\n start := time.Now()\n // ... handler logic ...\n\n if fr.Enabled() && time.Since(start) > 100*time.Millisecond {\n go captureSnapshot(fr)\n }\n})\n\n// Pattern 2: health check failure\nif !healthCheck() && fr.Enabled() {\n go captureSnapshot(fr)\n}\n\n// Pattern 3: HTTP endpoint for on-demand capture\nhttp.HandleFunc(\"/debug/flightrecorder\", func(w http.ResponseWriter, r *http.Request) {\n if !fr.Enabled() {\n http.Error(w, \"flight recorder not active\", http.StatusServiceUnavailable)\n return\n }\n w.Header().Set(\"Content-Type\", \"application/octet-stream\")\n w.Header().Set(\"Content-Disposition\", \"attachment; filename=trace.out\")\n fr.WriteTo(w)\n})\n```\n\n### Analyzing a snapshot\n\n```bash\ngo tool trace snapshot.trace\n```\n\nThe snapshot contains the same data as a regular trace — use all the same analysis techniques (timeline viewer, goroutine analysis, pprof extraction). The flight recorder's flow events are particularly useful for diagnosing lock contention and goroutine stalls that caused the anomaly.\n\n### Constraints\n\n- **At most one flight recorder** may be active at a time (this restriction may be relaxed in future Go versions)\n- A flight recorder **can run concurrently** with `trace.Start` — both can be active simultaneously\n- Only one goroutine may call `WriteTo` at a time — the `sync.Once` pattern handles this naturally\n- `Stop()` blocks until any concurrent `WriteTo` completes\n\n### When to use flight recorder vs regular tracing\n\n| Scenario | Tool | Why |\n| --- | --- | --- |\n| Investigating a known slow operation | `go test -trace` or `trace.Start`/`Stop` | You know when to start and stop |\n| Intermittent latency spikes in production | Flight recorder | You don't know when the spike will happen — the buffer captures it retroactively |\n| Post-mortem after a timeout or crash | Flight recorder | The problem already happened; regular tracing would miss it |\n| Continuous performance monitoring | `samber/cc-skills-golang@golang-observability` (Pyroscope) | Flight recorder is for one-shot diagnosis, not continuous collection |\n\n## Overhead and Practical Limits\n\n| Concern | Guidance |\n| --- | --- |\n| **Runtime overhead** | ~1-2% CPU during capture; negligible when not capturing |\n| **Data volume** | Traces generate MB/s of data. A 10-second trace of a busy service can be 50-100MB |\n| **Capture duration** | 5-10 seconds is typical. Longer traces are slow to open and hard to navigate |\n| **Memory to view** | `go tool trace` loads the entire trace into memory. Large traces may need 1GB+ RAM |\n| **Browser performance** | The web UI can struggle with traces >100MB. Use short captures. |\n| **Production use** | Safe for short captures on a single instance. Do not capture continuously. |\n\n## Trace vs pprof: When to Use Which\n\n| Question | Tool | Why |\n| --- | --- | --- |\n| Where does CPU time go? | pprof CPU profile | Statistical sampling, low overhead, good for aggregate view |\n| Why is latency high but CPU low? | go tool trace | Shows goroutine waiting states — I/O, channels, mutexes |\n| Where do allocations happen? | pprof heap profile | Per-function allocation counts and sizes |\n| Why are GC pauses long? | go tool trace | Shows STW phases, mark assist, GC timeline |\n| Is there lock contention? | pprof mutex/block + trace | pprof quantifies it; trace shows the timeline |\n| Are goroutines leaking? | pprof goroutine + trace | pprof shows the stack; trace shows creation/lifecycle |\n| Which goroutines compete for CPU? | go tool trace | Shows runnable vs running states across all Ps |\n| What's the wall-clock breakdown of a request? | go tool trace (with annotations) | Timeline view with tasks and regions |\n\nWhen in doubt, start with pprof (lower overhead, simpler output). Use trace when pprof doesn't explain the latency or when you need the wall-clock timeline view.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":18472,"content_sha256":"382159a0376d50717f28f23f30ad3f24062a82011d193e7767e128ee7339653c"}],"content_json":{"type":"doc","content":[{"type":"paragraph","content":[{"text":"Persona:","type":"text","marks":[{"type":"strong"}]},{"text":" You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.","type":"text"}]},{"type":"paragraph","content":[{"text":"Thinking mode:","type":"text","marks":[{"type":"strong"}]},{"text":" Use ","type":"text"},{"text":"ultrathink","type":"text","marks":[{"type":"code_inline"}]},{"text":" for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.","type":"text"}]},{"type":"heading","attrs":{"level":1},"content":[{"text":"Go Benchmarking & Performance Measurement","type":"text"}]},{"type":"paragraph","content":[{"text":"Performance improvement does not exist without measures — if you can measure it, you can improve it.","type":"text"}]},{"type":"paragraph","content":[{"text":"This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See ","type":"text"},{"text":"samber/cc-skills-golang@golang-performance","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill. For pprof setup on running services, → See ","type":"text"},{"text":"samber/cc-skills-golang@golang-troubleshooting","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Writing Benchmarks","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"b.Loop()","type":"text","marks":[{"type":"code_inline"}]},{"text":" (Go 1.24+) — preferred","type":"text"}]},{"type":"paragraph","content":[{"text":"For Go 1.24+, prefer ","type":"text"},{"text":"b.Loop()","type":"text","marks":[{"type":"code_inline"}]},{"text":" for new benchmarks. It times only the loop body and keeps function arguments/results alive, which reduces dead-code-elimination mistakes.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"go"},"content":[{"text":"func BenchmarkParse(b *testing.B) {\n data := loadFixture(\"large.json\") // setup — excluded from timing\n for b.Loop() {\n Parse(data) // compiler cannot eliminate this call\n }\n}","type":"text"}]},{"type":"paragraph","content":[{"text":"Legacy ","type":"text"},{"text":"b.N","type":"text","marks":[{"type":"code_inline"}]},{"text":" loops still compile and are fine to keep when preserving existing benchmarks or supporting Go \u003c1.24. They are easier to get wrong: setup may need ","type":"text"},{"text":"b.ResetTimer()","type":"text","marks":[{"type":"code_inline"}]},{"text":", and results may need a sink if the compiler can eliminate the work. Go 1.26 fixed an earlier ","type":"text"},{"text":"b.Loop()","type":"text","marks":[{"type":"code_inline"}]},{"text":" inlining limitation — benchmarks on 1.24–1.25 already benefit from ","type":"text"},{"text":"b.Loop()","type":"text","marks":[{"type":"code_inline"}]},{"text":" but may miss inlining optimizations that 1.26 delivers.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Memory tracking","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"go"},"content":[{"text":"func BenchmarkAlloc(b *testing.B) {\n b.ReportAllocs() // or run with -benchmem flag\n var sink []byte\n for b.Loop() {\n sink = make([]byte, 1024)\n }\n _ = sink\n}","type":"text"}]},{"type":"paragraph","content":[{"text":"b.ReportMetric()","type":"text","marks":[{"type":"code_inline"}]},{"text":" adds custom metrics (e.g., throughput):","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"go"},"content":[{"text":"b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), \"bytes/s\") // b.Elapsed() is only valid inside b.Loop()","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Sub-benchmarks and table-driven","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"go"},"content":[{"text":"func BenchmarkEncode(b *testing.B) {\n for _, size := range []int{64, 256, 4096} {\n b.Run(fmt.Sprintf(\"size=%d\", size), func(b *testing.B) {\n data := make([]byte, size)\n for b.Loop() {\n Encode(data)\n }\n })\n }\n}","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Running Benchmarks","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Flag","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Purpose","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-bench=.","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Run all benchmarks (regexp filter)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-benchmem","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Report allocations (B/op, allocs/op)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-count=10","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Run 10 times for statistical significance","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-benchtime=3s","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Minimum time per benchmark (default 1s)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-cpu=1,2,4","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Run with different GOMAXPROCS values","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-cpuprofile=cpu.prof","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Write CPU profile","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-memprofile=mem.prof","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Write memory profile","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"-trace=trace.out","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Write execution trace","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Output format:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op","type":"text","marks":[{"type":"code_inline"}]},{"text":" — the ","type":"text"},{"text":"-8","type":"text","marks":[{"type":"code_inline"}]},{"text":" suffix is GOMAXPROCS, ","type":"text"},{"text":"ns/op","type":"text","marks":[{"type":"code_inline"}]},{"text":" is time per operation, ","type":"text"},{"text":"B/op","type":"text","marks":[{"type":"code_inline"}]},{"text":" is bytes allocated per op, ","type":"text"},{"text":"allocs/op","type":"text","marks":[{"type":"code_inline"}]},{"text":" is heap allocation count per op.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Documenting Results in Commits","type":"text"}]},{"type":"paragraph","content":[{"text":"Paste benchstat output in the commit body when the change has a measurable performance impact. This documents ","type":"text"},{"text":"why","type":"text","marks":[{"type":"em"}]},{"text":" an optimization was made, prevents future readers from reverting it, and lets reviewers verify the claim without re-running benchmarks.","type":"text"}]},{"type":"paragraph","content":[{"text":"Commit format:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"perf(parser): reduce Parse allocations 50% with sync.Pool\n\nReplace per-call []byte allocation with a pooled buffer.\n\ngoos: linux / goarch: amd64 / cpu: AMD Ryzen 9 5950X\n │ old │ new │\n │ sec/op │ sec/op vs base │\nParse-32 4.592µ ± 2% 3.041µ ± 1% -33.78% (p=0.000 n=10)\n\n │ old │ new │\n │ B/op │ B/op vs base │\nParse-32 1.024Ki ± 0% 0.512Ki ± 0% -50.00% (p=0.000 n=10)\n\n │ old │ new │\n │ allocs/op │ allocs/op vs base │\nParse-32 12.00 ± 0% 6.000 ± 0% -50.00% (p=0.000 n=10)","type":"text"}]},{"type":"paragraph","content":[{"text":"Rules:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Only include benchmarks directly affected by the change — strip unrelated rows","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Never paste results with ","type":"text"},{"text":"~","type":"text","marks":[{"type":"code_inline"}]},{"text":" (no statistical significance) — the improvement cannot be claimed","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Include the hardware context line (","type":"text"},{"text":"goos/goarch/cpu","type":"text","marks":[{"type":"code_inline"}]},{"text":") so results are reproducible","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"perf(scope):","type":"text","marks":[{"type":"code_inline"}]},{"text":" commit type for performance-only changes","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Profiling from Benchmarks","type":"text"}]},{"type":"paragraph","content":[{"text":"Generate profiles directly from benchmark runs — no HTTP server needed:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# CPU profile\ngo test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser\ngo tool pprof cpu.prof\n\n# Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)\ngo test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser\ngo tool pprof -alloc_objects mem.prof\n\n# Execution trace\ngo test -bench=BenchmarkParse -trace=trace.out ./pkg/parser\ngo tool trace trace.out","type":"text"}]},{"type":"paragraph","content":[{"text":"For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see ","type":"text"},{"text":"pprof Reference","type":"text","marks":[{"type":"link","attrs":{"href":"./references/pprof.md","title":null}}]},{"text":". For execution trace interpretation, see ","type":"text"},{"text":"Trace Reference","type":"text","marks":[{"type":"link","attrs":{"href":"./references/trace.md","title":null}}]},{"text":". For statistical comparison, see ","type":"text"},{"text":"benchstat Reference","type":"text","marks":[{"type":"link","attrs":{"href":"./references/benchstat.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Reference Files","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"pprof Reference","type":"text","marks":[{"type":"link","attrs":{"href":"./references/pprof.md","title":null}},{"type":"strong"}]},{"text":" — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs alloc","type":"text"},{"text":"objects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into _where","type":"text","marks":[{"type":"em"}]},{"text":" time and memory are being spent in your code.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"benchstat Reference","type":"text","marks":[{"type":"link","attrs":{"href":"./references/benchstat.md","title":null}},{"type":"strong"}]},{"text":" — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Trace Reference","type":"text","marks":[{"type":"link","attrs":{"href":"./references/trace.md","title":null}},{"type":"strong"}]},{"text":" — Execution tracer for understanding ","type":"text"},{"text":"when","type":"text","marks":[{"type":"em"}]},{"text":" and ","type":"text"},{"text":"why","type":"text","marks":[{"type":"em"}]},{"text":" code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows ","type":"text"},{"text":"where","type":"text","marks":[{"type":"em"}]},{"text":" CPU goes) isn't enough — you need to see the timeline of what happened.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Diagnostic Tools","type":"text","marks":[{"type":"link","attrs":{"href":"./references/tools.md","title":null}},{"type":"strong"}]},{"text":" — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Compiler Analysis","type":"text","marks":[{"type":"link","attrs":{"href":"./references/compiler-analysis.md","title":null}},{"type":"strong"}]},{"text":" — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"CI Regression Detection","type":"text","marks":[{"type":"link","attrs":{"href":"./references/ci-regression.md","title":null}},{"type":"strong"}]},{"text":" — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Investigation Session","type":"text","marks":[{"type":"link","attrs":{"href":"./references/investigation-session.md","title":null}},{"type":"strong"}]},{"text":" — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Prometheus Go Metrics Reference","type":"text","marks":[{"type":"link","attrs":{"href":"./references/prometheus-go-metrics.md","title":null}},{"type":"strong"}]},{"text":" — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by ","type":"text"},{"text":"prometheus/client_golang","type":"text","marks":[{"type":"code_inline"}]},{"text":". Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between ","type":"text"},{"text":"runtime/metrics","type":"text","marks":[{"type":"code_inline"}]},{"text":" (Go internal data) and Prometheus metrics (what you scrape from ","type":"text"},{"text":"/metrics","type":"text","marks":[{"type":"code_inline"}]},{"text":"). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Cross-References","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"→ See ","type":"text"},{"text":"samber/cc-skills-golang@golang-performance","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill for optimization patterns to apply after measuring (\"if X bottleneck, apply Y\")","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"→ See ","type":"text"},{"text":"samber/cc-skills-golang@golang-troubleshooting","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodology","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"→ See ","type":"text"},{"text":"samber/cc-skills-golang@golang-observability","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"→ See ","type":"text"},{"text":"samber/cc-skills-golang@golang-testing","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill for general testing practices","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"→ See ","type":"text"},{"text":"samber/cc-skills@promql-cli","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill for querying Prometheus runtime metrics in production to validate benchmark findings","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"golang-benchmark","author":"@skillopedia","source":{"stars":1906,"repo_name":"cc-skills-golang","origin_url":"https://github.com/samber/cc-skills-golang/blob/HEAD/skills/golang-benchmark/SKILL.md","repo_owner":"samber","body_sha256":"345f013373be97cf105d4eff4f06667653c33d2eee767e06bd7ec3e5a01bf3c9","cluster_key":"7105c88596ff2eb5ac17c8da1a2ac45a147001ca426b65590ce7724216ba2fd1","clean_bundle":{"format":"clean-skill-bundle-v1","source":"samber/cc-skills-golang/skills/golang-benchmark/SKILL.md","attachments":[{"id":"83aa54cd-f506-5e2d-bf7d-c311224a1cf9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/83aa54cd-f506-5e2d-bf7d-c311224a1cf9/attachment.json","path":"evals/evals.json","size":83154,"sha256":"3c0bfb6f93dec158f30cec0484eef76a6649c0c89766a00026bf6fde02868a92","contentType":"application/json; charset=utf-8"},{"id":"cfed896b-329e-511e-b81a-06056eeed229","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cfed896b-329e-511e-b81a-06056eeed229/attachment.md","path":"references/benchstat.md","size":14540,"sha256":"0145a98c5a9be29078d3281e9ebcd670b6f402abbf4bf29facc995f2569342d9","contentType":"text/markdown; charset=utf-8"},{"id":"439b4613-57f7-5fd9-901b-8c25de0f366a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/439b4613-57f7-5fd9-901b-8c25de0f366a/attachment.md","path":"references/ci-regression.md","size":10659,"sha256":"3fec5ce7c0f3c4f17b3102702bae64534794343eef7bcb4577031b012e955a54","contentType":"text/markdown; charset=utf-8"},{"id":"65385afc-ee8e-5ef3-aa64-66e6701d7435","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/65385afc-ee8e-5ef3-aa64-66e6701d7435/attachment.md","path":"references/compiler-analysis.md","size":10972,"sha256":"2cb7bde63cb35971188b26346c8b4a0031b14d6b213c06f8caef38bf238938ad","contentType":"text/markdown; charset=utf-8"},{"id":"007455c6-683e-5b12-b31c-5a5daea4afb2","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/007455c6-683e-5b12-b31c-5a5daea4afb2/attachment.md","path":"references/investigation-session.md","size":7265,"sha256":"445b322f40beaab76394034fc9e630f35085cb510e58d43a0e166a4833e60e36","contentType":"text/markdown; charset=utf-8"},{"id":"7bc3a19f-9234-5e13-ba2a-437cebb3412c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7bc3a19f-9234-5e13-ba2a-437cebb3412c/attachment.md","path":"references/pprof.md","size":33951,"sha256":"191070b096d75226aa261d7f163a12500b667603ccfe49ba579b5fafcba352cb","contentType":"text/markdown; charset=utf-8"},{"id":"410c2ffc-e6d0-55db-a837-11d868a9ee94","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/410c2ffc-e6d0-55db-a837-11d868a9ee94/attachment.md","path":"references/prometheus-go-metrics.md","size":11623,"sha256":"97209299949e3090809ded3510febfb995e6563dc39e0e7a04bf973179d5392e","contentType":"text/markdown; charset=utf-8"},{"id":"5418b2c6-1fd7-5435-b4b9-bf90cf89bdb2","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5418b2c6-1fd7-5435-b4b9-bf90cf89bdb2/attachment.md","path":"references/tools.md","size":4031,"sha256":"d236bd750d0b612ba46a79267ab1cd5c241eda97a35163b53882031fb537902c","contentType":"text/markdown; charset=utf-8"},{"id":"418aa9ff-86dc-5653-aa98-344b321b024f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/418aa9ff-86dc-5653-aa98-344b321b024f/attachment.md","path":"references/trace.md","size":18472,"sha256":"382159a0376d50717f28f23f30ad3f24062a82011d193e7767e128ee7339653c","contentType":"text/markdown; charset=utf-8"}],"bundle_sha256":"afb0d67ce448838dbeff1ba7e14bf75e19c4234f6f0567d75006596f1a17eb4c","attachment_count":9,"text_attachments":9,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/golang-benchmark/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"testing-qa","category_label":"Testing"},"exact_dupes_collapsed_into_this":0},"license":"MIT","version":"v1","category":"testing-qa","metadata":{"author":"samber","version":"1.2.3","openclaw":{"emoji":"📊","install":[{"bins":["benchstat"],"kind":"go","package":"golang.org/x/perf/cmd/benchstat@latest"}],"homepage":"https://github.com/samber/cc-skills-golang","requires":{"bins":["go","benchstat"]}}},"import_tag":"clean-skills-v1","description":"Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while `samber/cc-skills-golang@golang-performance` provides the optimization patterns.","allowed-tools":"Read Edit Write Glob Grep Bash(go:*) Bash(golangci-lint:*) Bash(git:*) Agent WebFetch Bash(benchstat:*) Bash(benchdiff:*) Bash(cob:*) Bash(gobenchdata:*) Bash(curl:*) mcp__context7__resolve-library-id mcp__context7__query-docs WebSearch AskUserQuestion","compatibility":"Designed for Claude Code or similar AI coding agents, and for projects using Golang.","user-invocable":true}},"renderedAt":1782981306909}

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision. Thinking mode: Use for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions. Go Benchmarking & Performance Measurement Performance improvement does not exist without measures — if you can measure it, you can improve it. This skill covers the full measurement workflo…