TracerV + Flame Graphs: Profiling Software with Out-of-Band Flame Graph Generation
FireSim supports generating Flame Graphs out-of-band, to visualize the performance of software running on simulated processors. This feature was introduced in our FirePerf paper at ASPLOS 2020 .
Before proceeding, make sure you understand the Capturing RISC-V Instruction Traces with TracerV section.
What are Flame Graphs?
Flame Graphs are a type of histogram that shows where software is spending its time, broken down by components of the stack trace (e.g., function calls). The x-axis represents the portion of total runtime spent in a part of the stack trace, while the y-axis represents the stack depth at that point in time. Entries in the flame graph are labeled with and sorted by function name (not time).
Given this visualization, time-consuming routines can easily be identified: they are leaves (top-most horizontal bars) of the stacks in the flame graph and consume a significant proportion of overall runtime, represented by the width of the horizontal bars.
Traditionally, data to produce Flame Graphs is collected using tools like
perf
, which sample stack traces on running systems in software. However,
these tools are limited by the fact that they are ultimately running additional
software on the system being profiled, which can change the behavior of the
software that needs to be profiled. Furthermore, as sampling frequency is
increased, this effect becomes worse.
In FireSim, we use the out-of-band trace collection provided by TracerV to collect these traces cycle-exactly and without perturbing running software. On the host-software side, TracerV unwinds the stack based on DWARF information about the running binary that you supply. This stack trace is then fed to the open-source FlameGraph stack trace visualizer to produce Flame Graphs.
Prerequisites
Make sure you understand the Capturing RISC-V Instruction Traces with TracerV section.
You must have a design that integrates the TracerV bridge. See the Building a Design with TracerV section.
Enabling Flame Graph generation in config_runtime.yaml
To enable Flame Graph generation for a simulation, you must set enable: yes
and
output_format: 2
in the tracing
section of your config_runtime.yaml
file, for example:
tracing:
enable: yes
# Trace output formats. Only enabled if "enable" is set to "yes" above
# 0 = human readable; 1 = binary (compressed raw data); 2 = flamegraph (stack
# unwinding -> Flame Graph)
output_format: 2
# Trigger selector.
# 0 = no trigger; 1 = cycle count trigger; 2 = program counter trigger; 3 =
# instruction trigger
selector: 1
start: 0
end: -1
The trigger selector settings can be set as described in the
Setting a TracerV Trigger section. In particular, when profiling the OS only when
a desired application is running (e.g., iperf3
in our ASPLOS 2020 paper), instruction value
triggering is extremely useful. See the Instruction value trigger
section for more.
Producing DWARF information to supply to the TracerV driver
When running in FirePerf mode, the TracerV software driver expects a binary containing DWARF debugging information, which it will use to obtain labels for stack unwinding.
TracerV expects this file to be named exactly as your bootbinary
, but
suffixed with -dwarf
. For example (and as we will see in the following
section), if your bootbinary
is named br-base-bin
, TracerV will
require you to provide a file named br-base-bin-dwarf
.
If you are generating a Linux distribution with FireMarshal, this file
containing debug information for the generated Linux kernel will automatically
be provided (and named correctly) in the directory containing your images. For
example, building the br-base.json
workload will automatically produce
br-base-bin
, br-base-bin-dwarf
(for TracerV flame graph generation),
and br-base.img
.
Modifying your workload description
Finally, we must make three modifications to the workload description to complete the flame graph flow. For general documentation on workload descriptions, see the Defining Custom Workloads section.
We must add the file containing our DWARF information as one of the
simulation_inputs
, so that it is automatically copied to the remote F1 instance running the simulation.We must modify
simulation_outputs
to copy back the generated trace file.We must set the
post_run_hook
togen-all-flamegraphs-fireperf.sh
(which FireSim puts on your path by default), which will produce flame graphs from the trace files.
To concretize this, let us consider the default linux-uniform.json
workload,
which does not support Flame Graph generation:
{
"benchmark_name" : "linux-uniform",
"common_bootbinary" : "br-base-bin",
"common_rootfs" : "br-base.img",
"common_outputs" : ["/etc/os-release"],
"common_simulation_outputs" : ["uartlog", "memory_stats*.csv"]
}
Below is the modified version of this workload, linux-uniform-flamegraph.json
,
which makes the aforementioned three changes:
{
"benchmark_name" : "linux-uniform",
"common_bootbinary" : "br-base-bin",
"common_rootfs" : "br-base.img",
"common_simulation_inputs" : ["br-base-bin-dwarf"],
"common_outputs" : ["/etc/os-release"],
"common_simulation_outputs" : ["uartlog", "memory_stats*.csv", "TRACEFILE*"],
"post_run_hook" : "gen-all-flamegraphs-fireperf.sh"
}
Note that we are adding TRACEFILE*
to common_simulation_outputs
, which
will copy back all generated trace files to your workload results directory.
The gen-all-flamegraphs-fireperf.sh
script will automatically produce a
flame graph for each generated trace.
Lastly, if you have created a new workload definition, make sure you update
your config_runtime.yaml
to use this new workload definition.
Running a simulation
At this point, you can follow the standard FireSim flow to run a workload. Once your workload completes, you will find trace files with stack traces (as opposed to instruction traces) and generated flame graph SVGs in your workload’s output directory.
Caveats
The current stack trace construction code does not distinguish between different userspace programs, instead consolidating them into one entry. Expanded support for userspace programs will be available in a future release.