TracerV + Flame Graphs: Profiling Software with Out-of-Band Flame Graph Generation¶

FireSim supports generating Flame Graphs out-of-band, to visualize the performance of software running on simulated processors. This feature was introduced in our FirePerf paper at ASPLOS 2020 .

Before proceeding, make sure you understand the Capturing RISC-V Instruction Traces with TracerV section.

What are Flame Graphs?¶

Example Flame Graph (from http://www.brendangregg.com/FlameGraphs/)

Flame Graphs are a type of histogram that shows where software is spending its time, broken down by components of the stack trace (e.g., function calls). The x-axis represents the portion of total runtime spent in a part of the stack trace, while the y-axis represents the stack depth at that point in time. Entries in the flame graph are labeled with and sorted by function name (not time).

Given this visualization, time-consuming routines can easily be identified: they are leaves (top-most horizontal bars) of the stacks in the flame graph and consume a significant proportion of overall runtime, represented by the width of the horizontal bars.

Traditionally, data to produce Flame Graphs is collected using tools like perf, which sample stack traces on running systems in software. However, these tools are limited by the fact that they are ultimately running additional software on the system being profiled, which can change the behavior of the software that needs to be profiled. Furthermore, as sampling frequency is increased, this effect becomes worse.

In FireSim, we use the out-of-band trace collection provided by TracerV to collect these traces cycle-exactly and without perturbing running software. On the host-software side, TracerV unwinds the stack based on DWARF information about the running binary that you supply. This stack trace is then fed to the open-source FlameGraph stack trace visualizer to produce Flame Graphs.

Prerequisites¶

Make sure you understand the Capturing RISC-V Instruction Traces with TracerV section.
You must have a design that integrates the TracerV bridge. See the Building a Design with TracerV section.

Enabling Flame Graph generation in `config_runtime.ini`¶

To enable Flame Graph generation for a simulation, you must set enable=yes and output_format=2 in the [tracing] section of your config_runtime.ini file, for example:

[tracing]
enable=yes

# Trace output formats. Only enabled if "enable" is set to "yes" above
# 0 = human readable; 1 = binary (compressed raw data); 2 = flamegraph (stack
# unwinding -> Flame Graph)
output_format=2

# Trigger selector.
# 0 = no trigger; 1 = cycle count trigger; 2 = program counter trigger; 3 =
# instruction trigger
selector=1
start=0
end=-1

The trigger selector settings can be set as described in the Setting a TracerV Trigger section. In particular, when profiling the OS only when a desired application is running (e.g., iperf3 in our ASPLOS 2020 paper), instruction value triggering is extremely useful. See the Instruction value trigger section for more.

Producing DWARF information to supply to the TracerV driver¶

When running in FirePerf mode, the TracerV software driver expects a binary containing DWARF debugging information, which it will use to obtain labels for stack unwinding.

TracerV expects this file to be named exactly as your bootbinary, but suffixed with -dwarf. For example (and as we will see in the following section), if your bootbinary is named br-base-bin, TracerV will require you to provide a file named br-base-bin-dwarf.

If you are generating a Linux distribution with FireMarshal, this file containing debug information for the generated Linux kernel will automatically be provided (and named correctly) in the directory containing your images. For example, building the br-base.json workload will automatically produce br-base-bin, br-base-bin-dwarf (for TracerV flame graph generation), and br-base.img.

Modifying your workload description¶

Finally, we must make three modifications to the workload description to complete the flame graph flow. For general documentation on workload descriptions, see the Defining Custom Workloads section.

We must add the file containing our DWARF information as one of the simulation_inputs, so that it is automatically copied to the remote F1 instance running the simulation.
We must modify simulation_outputs to copy back the generated trace file.
We must set the post_run_hook to gen-all-flamegraphs-fireperf.sh (which FireSim puts on your path by default), which will produce flame graphs from the trace files.

To concretize this, let us consider the default linux-uniform.json workload, which does not support Flame Graph generation:

{
  "benchmark_name"            : "linux-uniform",
  "common_bootbinary"         : "br-base-bin",
  "common_rootfs"             : "br-base.img",
  "common_outputs"            : ["/etc/os-release"],
  "common_simulation_outputs" : ["uartlog", "memory_stats*.csv"]
}

Below is the modified version of this workload, linux-uniform-flamegraph.json, which makes the aforementioned three changes:

{
  "benchmark_name"            : "linux-uniform",
  "common_bootbinary"         : "br-base-bin",
  "common_rootfs"             : "br-base.img",
  "common_simulation_inputs"  : ["br-base-bin-dwarf"],
  "common_outputs"            : ["/etc/os-release"],
  "common_simulation_outputs" : ["uartlog", "memory_stats*.csv", "TRACEFILE*"],
  "post_run_hook"             : "gen-all-flamegraphs-fireperf.sh"
}

Note that we are adding TRACEFILE* to common_simulation_outputs, which will copy back all generated trace files to your workload results directory. The gen-all-flamegraphs-fireperf.sh script will automatically produce a flame graph for each generated trace.

Lastly, if you have created a new workload definition, make sure you update your config_runtime.ini to use this new workload definition.

Running a simulation¶

At this point, you can follow the standard FireSim flow to run a workload. Once your workload completes, you will find trace files with stack traces (as opposed to instruction traces) and generated flame graph SVGs in your workload’s output directory.

Caveats¶

The current stack trace construction code does not distinguish between different userspace programs, instead consolidating them into one entry. Expanded support for userspace programs will be available in a future release.