3. Manager Tasks

This page outlines all of the tasks that the FireSim manager supports.

3.1. firesim managerinit

This is a setup command that does the following:

  • Run aws configure, prompt for credentials
  • Replace the default config files (config_runtime.ini, config_build.ini, config_build_recipes.ini, and config_hwdb.ini) with clean example versions.
  • Prompt the user for email address and subscribe them to notifications for their own builds.

You can re-run this whenever you want to get clean configuration files – you can just hit enter when prompted for aws configure credentials and your email address, and both will keep your previously specified values.

If you run this command by accident and didn’t mean to overwrite your configuration files, you’ll find backed-up versions in firesim/deploy/sample-backup-configs/backup*.

3.2. firesim buildafi

This command builds a FireSim AGFI (FPGA Image) from the Chisel RTL for the configurations that you specify. The process of defining configurations to build is explained in the documentation for config_build.ini and config_build_recipes.ini.

For each config, the build process entails:

  1. [Locally] Run the elaboration process for your hardware configuration
  2. [Locally] FAME-1 transform the design with MIDAS
  3. [Locally] Attach simulation models (I/O widgets, memory model, etc.)
  4. [Locally] Emit Verilog to run through the AWS FPGA Flow
  5. Launch an FPGA Dev AMI build instance for each configuration you want built.
  6. [Local/Remote] Prep build instances, copy generated verilog for hardware configuration to buidl instance.
  7. [Remote] Run Vivado Synthesis and P&R for the configuration
  8. [Local/Remote] Copy back all output generated by Vivado, including the final tar file
  9. [Local/AWS Infra] Submit the tar file to the AWS backend for conversion to an AFI
  10. [Local] Wait for the AFI to become available, then notify the user of completion by email.

This process happens in parallel for all of the builds you specify. The command will exit when all builds are completed (but you will get notified as INDIVIDUAL builds complete).

It is highly recommended that you either run this command in a ``screen`` or use ``mosh`` to access the build instance. Builds will not finish if the manager is killed due to disconnection to the instance.

When you run a build for a particular configuration, a directory named LAUNCHTIME-CONFIG_TRIPLET-BUILD_NAME is created in firesim/deploy/results-build/. This directory will contain:

  • AGFI_INFO: Describes the state of the AFI being built, while the manager is running. Upon build completion, this contains the AGFI/AFI that was produced, along with its metadata.
  • cl_firesim:: This directory is essentially the Vivado project that built the FPGA image, in the state it was in when the Vivado build process completed. This contains reports, stdout from the build, and the final tar file produced by Vivado.
  • cl_firesim_generated.sv: This is a copy of the generated verilog used to produce this build. You can also find a copy inside cl_firesim.

3.3. firesim shareagfi

This command allows you to share AGFIs that you have already built (that are listed in config_hwdb.ini) with other users. It will take the named hardware configurations that you list in the [agfistoshare] section of config_build.ini, grab the respective AGFIs for each from config_hwdb.ini, and share them across all F1 regions with the users listed in the [sharewithaccounts] section of config_build.ini.

You must own the AGFIs in order to do this – this will NOT let you share AGFIs that someone else owns and gave you access to.

3.4. firesim launchrunfarm

This command launches a Run Farm on which you run simulations. Run Farms consist of f1.16xlarge, f1.2xlarge, and m4.16xlarge instances. Before you run the command, you define the number of each that you want in config_runtime.ini.

A launched Run Farm is tagged with a runfarmtag from config_runtime.ini, which is used to disambiguate multiple parallel Run Farms; that is, you can have many Run Farms running, each running a different experiment at the same time, each with its own unique runfarmtag. One convenient feature to add to your AWS management panel is the column for fsimcluster, which contains the runfarmtag value. You can see how to do that in the Add the fsimcluster column to your AWS management console section.

The other options in the [runfarm] section, runinstancemarket, spotinterruptionbehavior, and spotmaxprice define how instances in the Run Farm are launched. See the documentation for config_runtime.ini for more details.

ERRATA: One current requirement is that you must define a target config in the [targetconfig] section of config_runtime.ini that does not require more resources than the Run Farm you are trying to launch. Thus, you should also setup your [targetconfig] parameters before trying to launch the corresponding Run Farm. This requirement will be removed in the future.

Once you setup your configuration and call firesim launchrunfarm, the command will launch the requested numbers and types of instances. If all succeeds, you will see the command print out instance IDs for the correct number/types of launched instances (you do not need to pay attention to these or record them). If an error occurs, it will be printed to console.

Once you run this command, your Run Farm will continue to run until you call ``firesim terminaterunfarm``. This means you will be charged for the running instances in your Run Farm until you call ``terminaterunfarm``. You are responsible for ensuring that instances are only running when you want them to be by checking the AWS EC2 Management Panel.

3.5. firesim terminaterunfarm

This command terminates some or all of the instances in the Run Farm defined in your config_runtime.ini file, depending on the command line arguments you supply. By default, running firesim terminaterunfarm will terminate ALL instances with the specified runfarmtag. When you run this command, it will prompt for confirmation that you want to terminate the listed instances. If you respond in the affirmative, it will move forward with the termination.

If you do not want to have to confirm the termination (e.g. you are using this command in a script), you can give the command the --forceterminate command line argument. For example, the following will TERMINATE ALL INSTANCES IN THE RUN FARM WITHOUT PROMPTING FOR CONFIRMATION:

firesim terminaterunfarm --forceterminate

There a few additional commandline arguments that let you terminate only some of the instances in a particular Run Farm: --terminatesomef116 INT, --terminatesomef12 INT, and --terminatesomem416 INT, which will terminate ONLY as many of each type of instance as you specify.

Here are some examples:

[ start with 2 f1.16xlarges, 2 f1.2xlarges, 2 m4.16xlarges ]

firesim terminaterunfarm --terminatesomef116 1 --forceterminate

[ now, we have: 1 f1.16xlarges, 2 f1.2xlarges, 2 m4.16xlarges ]
[ start with 2 f1.16xlarges, 2 f1.2xlarges, 2 m4.16xlarges ]

firesim terminaterunfarm --terminatesomef116 1 --terminatesomef12 2 --forceterminate

[ now, we have: 1 f1.16xlarges, 0 f1.2xlarges, 2 m4.16xlarges ]

Once you call ``launchrunfarm``, you will be charged for running instances in your Run Farm until you call ``terminaterunfarm``. You are responsible for ensuring that instances are only running when you want them to be by checking the AWS EC2 Management Panel.

3.6. firesim infrasetup

Once you have launched a Run Farm and setup all of your configuration options, the infrasetup command will build all components necessary to run the simulation and deploy those components to the machines in the Run Farm. Here is a rough outline of what the command does:

  • Constructs the internal representation of your simulation. This is a tree of components in the simulation (simulated server blades, switches)
  • For each type of server blade, query the AWS AFI API to get the build-triplet needed to construct the software simulation driver, then build each driver
  • For each type of switch in the simulation, generate the switch model binary
  • For each host instance in the Run Farm, collect information about all the resources necessary to run a simulation on that host instance, then copy files and flash FPGAs with the required AGFIs.

Details about setting up your simulation configuration can be found in config_runtime.ini.

Once you run a simulation, you should re-run ``firesim infrasetup`` before starting another one, even if it is the same exact simulation on the same Run Farm.

You can see detailed output from an example run of infrasetup in the Running a Single Node Simulation and Running a Cluster Simulation Tutorials.

3.7. firesim boot

Once you have run firesim infrasetup, this command will actually start simulations. It begins by launching all switches (if they exist in your simulation config), then launches all server blade simulations. This simply launches simulations and then exits – it does not perform any monitoring.

This command is useful if you want to launch a simulation, then plan to interact with the simulation by-hand (i.e. by directly interacting with the console).

3.8. firesim kill

Given a simulation configuration and simulations running on a Run Farm, this command force-terminates all components of the simulation. Importantly, this does not allow any outstanding changes to the filesystem in the simulated systems to be committed to the disk image.

3.9. firesim runworkload

This command is the standard tool that lets you launch simulations, monitor the progress of workloads running on them, and collect results automatically when the workloads complete. To call this command, you must have first called firesim infrasetup to setup all required simulation infrastructure on the remote nodes.

This command will first create a directory in firesim/deploy/results-workload/ named as LAUNCH_TIME-WORKLOADNAME, where results will be completed as simulations complete. This command will then automatically call firesim boot to start simulations. Then, it polls all the instances in the Run Farm every 10 seconds to determine the state of the simulated system. If it notices that a simulation has shutdown (i.e. the simulation disappears from the output of screen -ls), it will automatically copy back all results from the simulation, as defined in the workload configuration (see the Defining Custom Workloads section).

For non-networked simulations, it will wait for ALL simulations to complete (copying back results as each workload completes), then exit.

For globally-cycle-accurate networked simulations, the global simulation will stop when any single node powers off. Thus, for these simulations, runworkload will copy back results from all nodes and force them to terminate by calling kill when ANY SINGLE ONE of them shuts down cleanly.

A simulation shuts down cleanly when the workload running on the simulator calls poweroff.

3.10. firesim runcheck

This command is provided to let you debug configuration options without launching instances. In addition to the output produced at command line/in the log, you will find a pdf diagram of the topology you specify, annotated with information about the workloads, hardware configurations, and abstract host mappings for each simulation (and optionally, switch) in your design. These diagrams are located in firesim/deploy/generated-topology-diagrams/, named after your topology.

Here is an example of such a diagram (click to expand/zoom):

Example diagram from running ``firesim runcheck``

Example diagram for an 8-node cluster with one ToR switch