edit blog post

4b8a1176 · Antoine Kaufmann · e0cfea4a · 4b8a1176 · 4b8a1176 · 4b8a1176
Commit 4b8a1176 authored 6 months ago by Antoine Kaufmann
--- a/_posts/2024-09-04-distributed-simulations.md
+++ b/_posts/2024-09-04-distributed-simulations.md
 ---
-title: "Distributed Simulations Using SimBricks"
+title: "Large-Scale Virtual Prototypes With Distributed SimBricks Simulations"
 subtitle: |
-How does SimBricks help users to scale up their simulations
-by distributing them across multiple machines?
+  How does SimBricks enable users to virtually prototype
+  systems with 1000s of components?
 date: 2024-09-04
 author: marvin
 permalink: /blog/distributed-simulations.html
-card_image: TODO
+card_image: /assets/images/blog/distributed-simulations-card.png
 ---

-SimBricks connects and runs multiple simulator components as
-[separate loosely coupled processes](loosely-coupled-simulator-processes.html),
-which all run in parallel. Scaling the system you want to simulate, i.e. adding
-more components therefore means running additional simulator processes. This
-approach naturally parallelizes the simulation and ensures that the simulation
-time stays low, but it also requires more resources, especially in form of
-physical CPU cores. In the following, we describe how SimBricks uses proxies
-leveraging network communication to distribute simulations across multiple
-machines.
+SimBricks users can build and run smaller virtual prototypes comprising only a
+handful of system components or use the same building blocks to assemble large
+system prototypes from hundreds or thousands of components. However, simulating
+larger systems on a single physical machine quickly results in impractically
+long simulation times because of limited computational resources. To address
+this, SimBricks supports distributing larger system simulations across
+multiple physical machines. In this post, we present how SimBricks addresses
+three key challenges: 1) minimize communication overheads that increase
+simulation time, 2) lower complexity for users to run distributed simulations,
+and 3) avoid implementation complexity for each component.

-# The Need For Scaling Beyond One Machine
-SimBricks uses
-[message passing over shared memory queues](shm-message-passing.html) for
-communication between the different simulator processes. For this the SimBricks
-adapters poll the shared memory queues, making the processes always appear busy,
-which means that we should not oversubscribe the available CPU cores. Therefore,
-the size of a full-system simulation on one host is limited by its resources,
-requiring us to distribute the processes across multiple machines to scale the
-simulation beyond the limits of a single host.
+# Communication Challenges for Distributed Simulations
+SimBricks realizes virtual prototypes by combining multiple simulator instances
+for different components as [separate, parallel, and loosely coupled
+processes](loosely-coupled-simulator-processes.html). Component simulators
+communicate via [shared-memory message passing](shm-message-passing.html) along
+natural component interfaces for data transfers and
+[synchronization](accurate-efficient-scalable-synchronization.html). When
+distributing components across multiple physical machines, some of this
+communication needs to be implemented over the network instead.

-# Scale Up By Using Separate Proxy Processes
-While communication between two simulator processes on the same host is
-implemented by shared memory queues, scaling out simulations to multiple hosts
-can easily be accomplished by replacing the shared memory queues with network
-communication.
+However, this introduces two costs: additional processing overhead for sending
+and receiving data over the network and higher message transfer latency. While
+both have the potential to increase simulation times, we found the former
+overhead to be vastly dominant. With our efficient shared-memory message
+passing, simulators spend less than 100 cycles for sending and receiving
+messages, while sending or receiving a message over the network is typically
+100x more expensive. Given that in particular bottleneck simulators incur this
+overhead on the critical path, this overhead is likely to lead to substantially
+longer simulation times. Additionally, implementing multiple message passing
+mechanisms in each component simulator also substantially increases complexity
+and effort for [integrating](integrating-simulators.html) and developing
+component simulators.

-However, directly implementing this in individual component simulators has two
-major drawbacks. First, it increases the complexity for
-[integration](integrating-simulators.html), as each simulator adapter needs to
-implement an additional message transport. Second, it increases communication
-overhead in component simulators, leaving fewer processor cycles for simulators
-and increasing simulation time. To avoid these drawbacks, we instead implement
-network communication separately in proxies.
+# Scale Out with Separate Proxy Processes
+To avoid these drawbacks, we instead implement network communication separately
+in proxies. SimBricks proxies convert between shared memory message passing and
+other message transports, such as TCP or RDMA. Component simulators that connect
+to a simulator on another host instead connect to a local proxy instance using
+regular SimBricks shared memory message passing. The local proxy then forwards
+messages over the network to a proxy on the remote host, which in turn converts
+messages back to shared-memory message passing. Since the proxy on each host
+runs as a separate process, it requires an additional processor core on each
+host.

-SimBricks proxies connect to local component simulators through shared memory
-queues in the same way as two simulators would connect and forward messages over
-the network to their peer proxy which operates symmetrically. This requires an
-additional processor core for the proxy on each side, but is fully transparent
-to component simulators and does not increase their communication overhead,
-since the simulator adapters stay the same.
+![Example showing how to convert a single-host non-distributed SimBricks
+simulation into a distributed simulation by assigning hosts to two machines and
+inserting proxies for channels that cross machine boundaries.
+](/assets/images/blog/distributed-simulations.svg)

-At the moment, SimBricks provides two proxy implementations supporting two
-protocols for network communication: TCP and RDMA. However, additional proxies
-can of course easily be added to support further communication protocols.
+However, relying on proxies has two key advantages by moving the implementation
+of network communication into a separate process on a separate core. First, this
+approach is fully transparent to component simulators and requires no changes in
+individual simulators. Second, and more importantly, this moves network
+processing overheads out of individual simulators, especially at bottlenecks,
+and thereby avoids increasing simulation time in most cases.

-SimBricks proxies also implement multiplexing, so that multiple connections of
-component simulators between two machines can be handled by the same pair of
-proxies. This reduces the number of proxies needed and therefore allows more CPU
-cores to be used for simulators.
+At the moment, SimBricks provides two proxy implementations supporting two
+protocols for network communication: TCP and RDMA. Surprisingly, we found that
+for most synchronized simulations using lower-latency RDMA proxies compared to
+TCP provided no benefit, as message latency was not a bottleneck. However,
+additional proxies can of course easily be added to support further
+communication protocols, e.g. for simulations that benefit from HPC
+interconnects.

 # Orchestrating Distributed Simulations
-Simbricks' [orchestration framework](orchestration_framework.html) of course
-comes with support to use the proxies and distribute full-system simulations
-across multiple machines. The user can create a distributed experiment and add
-simulation components just as with a normal non-distributed simulation. Then,
-the user adds appropriate proxies as needed to the experiment and finally
-assigns the simulation components to the different machines. When starting the
-simulation the user provides a JSON file containing information about the
-available machines, like the IP address and the working directory. The
-orchestration framework then takes care of running all simulators and proxies on
-the respective machines using SSH to execute processes on remote machines.
-
-The orchestration framework also includes an example for automatically
-distributing an experiment across two hosts, showing that this step can even be
-automated.
+While this provides users with the necessary building blocks for assembling even
+large distributed simulations, manually instantiating and configuring proxies,
+assigning simulators to hosts, etc. is extremely tedious and error-prone for
+users. To make this easier, the SimBricks [orchestration
+framework](orchestration_framework.html) offers support to automatically
+configure proxies and distribute full-system simulations across multiple
+machines. The user first prepares a regular SimBricks simulation configuration,
+just as with a non-distributed simulation. Then next, the user can either rely
+on automatic partitioning and distribution to hosts or provide manual
+assignments of components to physical hosts for more control. From there the
+orchestration framework takes care of instantiating and configuring proxies, and
+orchestrating execution of all simulators across the available machines.
+Finally, the orchestration framework collects outputs from simulators exactly as
+with non-distributed SimBricks simulations.

 If you have questions or would like to learn more:
--- a/assets/images/blog/distributed-simulations-card.png
+++ b/assets/images/blog/distributed-simulations-card.png
--- a/assets/images/blog/distributed-simulations.svg
+++ b/assets/images/blog/distributed-simulations.svg