Combining modular SimBricks simulations with distributed tracing offers a powerful approach to understanding complex systems. By using distributed tracing techniques alongside SimBricks, users can gain visibility into the behavior of simulated systems, enabling detailed analysis of performance bottlenecks, causal relationships, and the propagation of data and control flow. This synergy allows for the identification of performance issues at various system levels, from hardware to software, providing valuable insights for optimization and design improvements.
...
...
@@ -29,12 +28,14 @@ Distributed tracing is a technique for tracking and correlating requests as they
Columbo combines distributed tracing techniques with SimBricks simulations to create low level end-to-end system traces. These allow users to analyze the performance of a system simulated through SimBricks by providing end-to-end visibility including host, NIC, and network with the entire software stack. By using SimBricks simulations, Columbo can provide arbitrarily detailed information and flexible control over the level of detail a user can observe in distributed traces. Columbo achieves all this by configuring the simulators in a SimBricks simulation to collect detailed events in the form of log files without affecting the simulated system. The simulation logs are transformed into standardized, traceable data that is assembled into end-to-end system traces which can be analyzed by existing distributed tracing tools that allow visualization and exploration of complex system interactions.
Columbo introduces a systematic approach to handle the diverse log formats generated by different simulators. Columbo standardizes the data format by establishing type-specific event streams, where each simulator adheres to a predefined set of events depending on the simulator type (e.g. host, nic, network). These event streams are processed by simulator-specific pipelines, composed of producers that read and parse a simulator's log file to create a simulator specific event stream, optional actors that filter or modify events contained in the event stream , and consumers which are called SpanWeavers that group the events of an event stream into traditional distributed tracing spans and propagate trace context, thus creating causal relationships between spans (see Figure 1). When simulators communicate through natural boundaries that also exist in real systems like PCIe or Ethernet (e.g. mmio reads/writes, dma accesses or packets put on a wire), causal connections between spans created by different SpanWeavers must be made. SpanWeavers achieve this by communicating with each other to make causal connections between spans created from different simulator log files.
Columbo introduces a systematic approach to handle the diverse log formats generated by different simulators. Columbo standardizes the data format by establishing type-specific event streams, where each simulator adheres to a predefined set of events depending on the simulator type (e.g. host, nic, network). These event streams are processed by simulator-specific pipelines, composed of producers that read and parse a simulator's log file to create a simulator specific event stream, optional actors that filter or modify events contained in the event stream , and consumers which are called SpanWeavers that group the events of an event stream into traditional distributed tracing spans and propagate trace context, thus creating causal relationships between spans (see [Figure 1](#figure1)). When simulators communicate through natural boundaries that also exist in real systems like PCIe or Ethernet (e.g. mmio reads/writes, dma accesses or packets put on a wire), causal connections between spans created by different SpanWeavers must be made. SpanWeavers achieve this by communicating with each other to make causal connections between spans created from different simulator log files.
Once a span is complete, it is exported to an external tool for analysis and visualization. Multiple pipelines, one for each simulator, operate in conjunction to generate comprehensive distributed traces, providing a holistic understanding of the simulated system (see Figure 2).
Once a span is complete, it is exported to an external tool for analysis and visualization. Multiple pipelines, one for each simulator, operate in conjunction to generate comprehensive distributed traces, providing a holistic understanding of the simulated system (see [Figure 2](#figure2)).
SpanWeavers facilitate a form of implicit context propagation to establish causal relationships between events allowing the use of simulators supported by SImBricks out of the box without further instrumentation.