{"id":329,"date":"2015-08-05T23:13:04","date_gmt":"2015-08-05T23:13:04","guid":{"rendered":"http:\/\/halobates.de\/blog\/?p=329"},"modified":"2015-08-05T23:13:04","modified_gmt":"2015-08-05T23:13:04","slug":"generating-flame-graphs-with-processor-trace","status":"publish","type":"post","link":"http:\/\/halobates.de\/blog\/p\/329","title":{"rendered":"Generating Flame graphs with Processor Trace"},"content":{"rendered":"<p>How to generate a <a href=\"http:\/\/www.brendangregg.com\/flamegraphs.html\">FlameGraph<\/a> with <a href=\"https:\/\/lwn.net\/Articles\/648154\/\">Processor Trace<\/a>. Everybody loves Flame Graphs. <\/p>\n<p><a href=\"http:\/\/halobates.de\/blog\/wp-content\/uploads\/2015\/08\/pt-flamegraph.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/halobates.de\/blog\/wp-content\/uploads\/2015\/08\/pt-flamegraph-300x162.png\" alt=\"\" title=\"pt-flamegraph\" width=\"300\" height=\"162\" class=\"alignnone size-medium wp-image-330\" srcset=\"http:\/\/halobates.de\/blog\/wp-content\/uploads\/2015\/08\/pt-flamegraph-300x162.png 300w, http:\/\/halobates.de\/blog\/wp-content\/uploads\/2015\/08\/pt-flamegraph.png 1024w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Processor trace allows to do as very exact histograms of a program&#8217;s run time. Normal sampling has shadow effects, which can hide some details. Processor traces every branch, so it can be much more accurate than normal sampling.<\/p>\n<p>You need a Intel Broadwell or Skylake CPU.<br \/>\nRunning at 4.1 or later Linux kernel where perf supports PT.<br \/>\nYou can verify the kernel supports pt with<\/p>\n<p><code><br \/>\nls \/sys\/devices\/intel_pt<br \/>\n<\/code><\/p>\n<p>You need perf user tools built from https:\/\/github.com\/virtuoso\/linux-perf<br \/>\n(this should soon be fixed when the user tools code is merged into Linux mainline)<\/p>\n<p>Build perf with PT support<br \/>\n<code><br \/>\n# set up https_proxy as needed<br \/>\ngit clone https:\/\/github.com\/virtuoso\/linux-perf<br \/>\ncd linux-perf\/tools\/perf<br \/>\nmake<br \/>\n<\/code><br \/>\nCopy the resulting perf binary to where you want to run it<\/p>\n<p>Get the flamegraph code<br \/>\n<code><br \/>\ngit clone  https:\/\/github.com\/brendangregg\/FlameGraph.git<br \/>\n<\/code><br \/>\n.<br \/>\nCollect data from the workload. Best to not collect too long traces as they take much longer to process and may need too much disk space.<\/p>\n<p><code><br \/>\nperf record -e intel_pt\/\/  workload   (or -a sleep 1 to collect 1s globally)<br \/>\n<\/code><\/p>\n<p>Decode the data. This may take quite some time<br \/>\n<code><br \/>\nperf script --itrace=i100usg | \/path\/to\/FlameGraph\/ | stackcollapse-perf.pl > workload.folded<br \/>\n<\/code><\/p>\n<p>The i100us means the trace decoder samples an instruction every 100us. This can be made more accurate (down to 1ns), at the cost of longer decoding time. The &#8216;g&#8217; tells the decoder to add callgraphs.<\/p>\n<p>Then generate the Flamegraph with<\/p>\n<p><code><br \/>\n\/path\/to\/FlameGraph\/flamegraph.pl workloaded.folded > workload.svg<br \/>\n<\/code><\/p>\n<p>Then view the resulting SVG in a SVG viewer, such as google chrome<\/p>\n<p><code><br \/>\ngoogle-chrome workload.svg<br \/>\n<\/code><\/p>\n<p>It is possible to click around. <\/p>\n<p>Here&#8217;s a larger <a href=\"http:\/\/halobates.de\/gcc-trent.svg\">svg example<\/a> from a gcc build (2.5MB). May need chrome or firefox to view.<\/p>\n<p>In principle the trace also has support for more information  not in normal sampling, such as determining the exact run time of individual functions from the trace. This is unfortunately not (yet?) supported by the Flame Graph tools. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to generate a FlameGraph with Processor Trace. Everybody loves Flame Graphs. Processor trace allows to do as very exact histograms of a program&#8217;s run time. Normal sampling has shadow effects, which can hide some details. Processor traces every branch, so it can be much more accurate than normal sampling. You need a Intel Broadwell [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[7,14,17,11],"tags":[],"_links":{"self":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts\/329"}],"collection":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/comments?post=329"}],"version-history":[{"count":13,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts\/329\/revisions"}],"predecessor-version":[{"id":343,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts\/329\/revisions\/343"}],"wp:attachment":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/media?parent=329"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/categories?post=329"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/tags?post=329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}