Archive for the ‘pt’ Category
Intel Processor Trace (PT) can be used on modern Intel CPUs to trace execution. This page contains references for learning about and using Intel PT.
- Intel Software Developer’s Manual Vol 3 low level reference information on Processor Trace trace format and registers (Chapter 36)
- Intel Processor Trace on Linux gives an overview of processor trace on Linux
- A tutorial web site for PT that contains many references
- Intel® Developer Forum 2015: Zoom-in on Your Code with Intel® Processor Trace and Supporting Tools (find SPCS012)
- Intel® Developer Forum 2014: Debug and Fine-grain Profiling with Intel® Processor Trace
- Efficient and large scale program flow tracing in Linux
- Adding processor trace to Linux describes the Linux perf Processor trace implementation.
- Reference documentation for PT on Linux perf
- simple-pt is an alternative reference PT implementation. It is implemented on Linux, but can be also used as a starting point to implement PT on other OS.
- The GNU debugger gdb support PT on Linux for backward debugging
- Intel VTune amplifier supports PT for performance analysis
- A Windows windbg processor trace plugin for debugging on Windows with PT.
- A reference Processor Trace decode library.
- A plugin for Linux crash to dump PT buffers (look for ptdump)
- The Lauterbach JTAG debugger supports PT
- Intel system studio supports JTAG debugging with PT
- The SourcePoint for Intel debugger support PT
- The hongfuzz fuzzer supports feedback fuzzing using PT
- Harnessing Intel Processor Trace on Windows for vulnerability discovery
Research papers using PT (subset):
- Failure Sketching: A technique for automated root cause analysis of in production failures
- Griffin: Guarding control flows using Intel Processor Trace
- Hardware-assisted instruction profiling and latency detection
- Inspector: Data Provenance using Intel Processor Trace
- Transparent and efficient CFI enforcement with Intel Processor Trace
Modern Intel Core CPUs (5th and 6th generation) have a Intel Processor Trace (PT) feature to trace branch execution with low overhead. This is useful for performance analysis and debugging.
simple-pt is a simple standalone driver and decoder tool to implement PT on Linux.
Starting with Linux 4.1 Linux already has a integrated PT implementation in perf (see https://lwn.net/Articles/648154/ ). simple-pt is an alternative implementation. It has many disadvantages over the perf PT implementation, such as:
- needs to run as root
- no long term tracing or sampling with interrupts
- no support for interactive debugging (use gdb 7.10 on perf for that)
- no support for histograms
- somewhat experimental
- not as well supported as perf
On the positive side simple-pt is:
- standalone. No kernel changes needed. Could be ported to older kernels or other operating systems
- easy to modify and experiment with
- more ftrace like decoding tool
- support for kprobes based triggers
- modular “unix style” design with simple tools that do only one thing each
- BSD licensed
% sptcmd -c tcall taskset -c 0 ./tcall cpu 0 offset 1027688, 1003 KB, writing to ptout.0 ... Wrote sideband to ptout.sideband % sptdecode --sideband ptout.sideband --pt ptout.0 | less TIME DELTA INSNs OPERATION frequency 32 0 [+0] [+ 1] _dl_aux_init+436 [+ 6] __libc_start_main+455 -> _dl_discover_osversion ... [+ 13] __libc_start_main+446 -> main [+ 9] main+22 -> f1 [+ 4] f1+9 -> f2 [+ 2] f1+19 -> f2 [+ 5] main+22 -> f1 [+ 4] f1+9 -> f2 [+ 2] f1+19 -> f2 [+ 5] main+22 -> f1 ...
Available from https://github.com/andikleen/simple-pt
Processor trace allows to do as very exact histograms of a program’s run time. Normal sampling has shadow effects, which can hide some details. Processor traces every branch, so it can be much more accurate than normal sampling.
You need a Intel Broadwell or Skylake CPU.
Running at 4.1 or later Linux kernel where perf supports PT.
You can verify the kernel supports pt with
You need perf user tools built from https://github.com/virtuoso/linux-perf
(this should soon be fixed when the user tools code is merged into Linux mainline)
Build perf with PT support
# set up https_proxy as needed
git clone https://github.com/virtuoso/linux-perf
Copy the resulting perf binary to where you want to run it
Get the flamegraph code
git clone https://github.com/brendangregg/FlameGraph.git
Collect data from the workload. Best to not collect too long traces as they take much longer to process and may need too much disk space.
perf record -e intel_pt// workload (or -a sleep 1 to collect 1s globally)
Decode the data. This may take quite some time
perf script --itrace=i100usg | /path/to/FlameGraph/ | stackcollapse-perf.pl > workload.folded
The i100us means the trace decoder samples an instruction every 100us. This can be made more accurate (down to 1ns), at the cost of longer decoding time. The ‘g’ tells the decoder to add callgraphs.
Then generate the Flamegraph with
/path/to/FlameGraph/flamegraph.pl workloaded.folded > workload.svg
Then view the resulting SVG in a SVG viewer, such as google chrome
It is possible to click around.
Here’s a larger svg example from a gcc build (2.5MB). May need chrome or firefox to view.
In principle the trace also has support for more information not in normal sampling, such as determining the exact run time of individual functions from the trace. This is unfortunately not (yet?) supported by the Flame Graph tools.