Skip to content

Conversation

@hodgesds
Copy link
Contributor

@hodgesds hodgesds commented Nov 11, 2025

Add comprehensive perfetto trace analysis infrastructure to scxtop, enabling deep scheduling analysis, outlier detection, and support for systing-generated traces with CompactSched format.

Perfetto Analysis Framework:

  • 17 specialized analyzers across scheduling, I/O, IRQ, and power categories
  • Outlier detection with 4 statistical methods (IQR, MAD, StdDev, Percentile)
  • SQL-like query engine for flexible trace filtering and aggregation
  • Analyzer registry with automatic capability detection
  • Support for both FtraceEvent and TrackEvent formats

Systing Trace Compatibility:

  • Parse CompactSched format with delta-encoded timestamps
  • Expand interned string tables for command names
  • Convert compact events to individual FtraceEvents
  • Verified with 6 core analyzers on systing traces

Memory-Aware Query Limits:

  • Automatic limit adjustment based on available system RAM
  • Conservative defaults for low-memory systems (< 4GB: 1K events)
  • Unlimited queries on high-memory systems (>= 64GB)
  • Smart defaults for event queries, timelines, and analysis results
  • User-configurable overrides via API parameters

Documentation:

  • PERFETTO_ANALYZER_GUIDE.md - Comprehensive analyzer documentation
  • SYSTING_COMPATIBILITY.md - Systing trace format guide
  • WPROF_COMPATIBILITY_GUIDE.md - Wprof integration details
  • PERFETTO_ANALYZER_README.md - Quick start guide

Testing:

  • Integration tests for all analyzer categories
  • Compatibility tests for systing and wprof traces
  • Outlier detection validation
  • Query engine functionality tests

@hodgesds hodgesds force-pushed the scxtop-mcp-perfetto-extended branch 2 times, most recently from 3739641 to b5abe13 Compare November 11, 2025 18:37
@hodgesds
Copy link
Contributor Author

Example claude session:

> Do a comprehensive scheduler analysis of trace.pb using the scxtop mcp server

● I'll help you perform a comprehensive scheduler analysis of trace.pb using the scxtop MCP server. Let me plan this analysis and get started.

● scxtop - load_perfetto_trace (MCP)(file_path: "/root/trace.pb", trace_id: "trace_analysis")
  ⎿  Loaded perfetto trace: /root/trace.pb
     Trace ID: trace_analysis
     Time range: 1389784157950820 - 1389785163059658 ns (1005 ms)
     … +4 lines (ctrl+o to expand)

● scxtop - get_trace_summary (MCP)(trace_id: "trace_analysis")
  ⎿  Trace Summary:\n\n{
       "trace_duration_ms": 1005,
       "num_cpus": 32,
     … +43 lines (ctrl+o to expand)

● scxtop - run_all_analyzers (MCP)(trace_id: "trace_analysis")
  ⎿  Error: MCP tool "run_all_analyzers" response (46357 tokens) exceeds maximum allowed tokens (25000). Please use pagination, filtering, or limit parameters to reduce the response size.

● scxtop - find_scheduling_bottlenecks (MCP)(trace_id: "trace_analysis", limit: 10)
  ⎿  Scheduling Bottlenecks (analyzed in 1.681641ms)

     Found 0 bottleneck(s):
     … +2 lines (ctrl+o to expand)

● scxtop - analyze_cpu_utilization (MCP)(trace_id: "trace_analysis", use_parallel: true)
  ⎿  Analysis completed in 724.269µs

     {
     … +422 lines (ctrl+o to expand)

● scxtop - analyze_process_runtime (MCP)(trace_id: "trace_analysis", limit: 20, use_parallel: true)
  ⎿  Analysis completed in 649.304µs

     {
     … +6 lines (ctrl+o to expand)

● scxtop - analyze_wakeup_latency (MCP)(trace_id: "trace_analysis")
  ⎿  Analysis completed in 1.28501ms

     {
     … +13 lines (ctrl+o to expand)

● scxtop - analyze_task_states (MCP)(trace_id: "trace_analysis", limit: 20)
  ⎿  Analysis completed in 1.329076ms

     {
     … +386 lines (ctrl+o to expand)

● scxtop - analyze_preemptions (MCP)(trace_id: "trace_analysis", limit: 20)
  ⎿  Analysis completed in 411.04µs

     {
     … +5 lines (ctrl+o to expand)

● scxtop - analyze_wakeup_chains (MCP)(trace_id: "trace_analysis", limit: 15)
  ⎿  Analysis completed in 1.118255ms

     {
     … +4 lines (ctrl+o to expand)

● scxtop - analyze_softirq (MCP)(trace_id: "trace_analysis", mode: "summary")
  ⎿  Softirq analyzer is not enabled.

     To enable it, use:
     … +4 lines (ctrl+o to expand)

● scxtop - analyze_latency_breakdown (MCP)(trace_id: "trace_analysis")
  ⎿  Analysis completed in 1.479128ms

     {
     … +21 lines (ctrl+o to expand)

● scxtop - query_trace_events (MCP)(trace_id: "trace_analysis", event_type: "softirq", limit: 20)
  ⎿  Trace Query Results:
     Trace ID: trace_analysis
     Event type filter: softirq
     … +106 lines (ctrl+o to expand)

● scxtop - query_trace_events (MCP)(trace_id: "trace_analysis", event_type: "sched_switch", limit: 10)
  ⎿  Trace Query Results:
     Trace ID: trace_analysis
     Event type filter: sched_switch
     … +56 lines (ctrl+o to expand)

● scxtop - correlate_wakeup_to_schedule (MCP)(trace_id: "trace_analysis", limit: 20)
  ⎿  Wakeup→Schedule Correlations (analyzed in 1.934333ms)

     Total correlations: 20
     … +173 lines (ctrl+o to expand)

● scxtop - detect_outliers (MCP)(trace_id: "trace_analysis", category: "all", method: "IQR")
  ⎿  Outlier Analysis (IQR method)\n\n{
       "analysis_time_ms": 8,
       "category": "all",
     … +396 lines (ctrl+o to expand)

> write the results to sched_analysis.md

● Write(sched_analysis.md)
  ⎿  Wrote 285 lines to sched_analysis.md
     # Comprehensive Scheduler Analysis Report
     **Trace File:** `/root/trace.pb`
     **Analysis Date:** 2025-11-11
     **Trace Duration:** 1005 ms (1.005 seconds)
     **Time Range:** 1389784157950820 - 1389785163059658 ns
     ---
     ## Executive Summary
     … +275 lines (ctrl+o to expand)

● Comprehensive scheduler analysis complete! I've written the results to /root/sched_analysis.md.

  Report Highlights

  The analysis covered a 1-second trace from a 32-CPU system with 76,983 events from 114 processes:

  Key Findings

  1. Extreme Wakeup Latencies - Some tasks experienced latencies of 870-973ms (nearly the entire trace duration), primarily affecting:
    - PID 18124 (events_1)
    - PID 18132 (ringbuf_stack_e)
    - PID 18069 (severity score: 1961.52)
  2. Very High CPU Utilization - Average 98.9% across all CPUs, indicating a fully saturated system
  3. High CPU Contention - Multiple processes spending 80-96% of time in runnable state (ready but waiting for CPU)
  4. CPU 16 Underutilized - Only 93.5% utilization with 419 context switches vs median of ~620
  5. No Critical Scheduler Bottlenecks - Despite high load, scheduler is performing consistently

  Report Sections

  - Executive Summary
  - CPU Utilization Analysis (per-CPU stats, timeslices)
  - Process Runtime & Task State Analysis (top 20 processes)
  - Wakeup Latency Analysis (correlation data)
  - Outlier Detection (50 outliers found using IQR method)
  - Softirq Event Analysis
  - Key Findings & Recommendations
  - Technical Metadata & Appendix

  The report includes detailed tables, recommendations for further investigation, and suggestions for enabling additional analyzers in future traces.

Example output

@hodgesds hodgesds force-pushed the scxtop-mcp-perfetto-extended branch 3 times, most recently from 3c11353 to 95099f2 Compare November 12, 2025 18:25
Add comprehensive perfetto trace analysis infrastructure to scxtop,
enabling deep scheduling analysis, outlier detection, and support for
systing-generated traces with CompactSched format.

Perfetto Analysis Framework:
- 17 specialized analyzers across scheduling, I/O, IRQ, and power categories
- Outlier detection with 4 statistical methods (IQR, MAD, StdDev, Percentile)
- SQL-like query engine for flexible trace filtering and aggregation
- Analyzer registry with automatic capability detection
- Support for both FtraceEvent and TrackEvent formats

Systing Trace Compatibility:
- Parse CompactSched format with delta-encoded timestamps
- Expand interned string tables for command names
- Convert compact events to individual FtraceEvents
- Verified with core analyzers on systing traces

Memory-Aware Query Limits:
- Automatic limit adjustment based on available system RAM
- Conservative defaults for low-memory systems (< 4GB: 1K events)
- Unlimited queries on high-memory systems (>= 64GB)
- Smart defaults for event queries, timelines, and analysis results
- User-configurable overrides via API parameters

Documentation:
- PERFETTO_ANALYZER_GUIDE.md - Comprehensive analyzer documentation
- SYSTING_COMPATIBILITY.md - Systing trace format guide
- WPROF_COMPATIBILITY_GUIDE.md - Wprof integration details
- PERFETTO_ANALYZER_README.md - Quick start guide

Testing:
- Integration tests for all analyzer categories
- Compatibility tests for systing and wprof traces
- Outlier detection validation
- Query engine functionality tests

Signed-off-by: Daniel Hodges <[email protected]>
@hodgesds hodgesds force-pushed the scxtop-mcp-perfetto-extended branch from 95099f2 to 96d7a45 Compare November 12, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant