Skip to content
Snippets Groups Projects
Commit 0e6aab1e authored by Jonas Kaufmann's avatar Jonas Kaufmann
Browse files

blog: revise design-space-sweeps

parent 8cba6750
No related branches found
No related tags found
No related merge requests found
Pipeline #105902 passed
---
title: Realistic & Fast Design Space Sweeps
subtitle: How SimBricks Enables the Exploration of System- and Component-Level Design Choices When a Physical Testbed is Infeasible
date: 2024-08-29
date: 2024-08-26
author: jonas
permalink: /blog/design-space-sweeps.html
card_image: TODO
---
Designing tomorrow’s heterogeneous systems is anything but straight-forward. We
are faced with many system-level but also component-level design choices.
Experienced system architects can immediately dismiss a bunch of configurations.
However, as Marvin showed in [this prior blog
Designing tomorrow’s heterogeneous systems is anything but straight-forward.
System architects are faced with many system- and component-level design
choices. Experienced system architects can immediately dismiss a bunch of those.
However, as Marvin highlighted in [this prior blog
post](https://www.simbricks.io/blog/need-for-e2e-simulation.html), even then,
complex interactions between components often make end-to-end performance
impossible to predict and thorough evaluation is pretty much always required.
Unfortunately, realistic physical testbeds that have a similar scale than the
production system we are designing are probably also infeasible.
We built SimBricks for exactly this task. No matter where in the design process
you stand, we allow you to assemble your complete system in simulation and do
thorough, realistic end-to-end evaluation with your actual, unmodified
workloads. Figuring out the best configuration, i.e. performing design space
sweeps, is a first class citizen of our orchestration framework. We even support
running them in parallel if enough compute resources are available. Let me show
you what I mean.
# Working Example & Parameters to Play With
complex interactions between components make final end-to-end performance very
hard to predict, requiring thorough evaluation early on and throughout the
design process to avoid expensive mistakes discovered only late. Unfortunately,
building a physical testbed for evaluation takes money to buy the necessary
components and a lot of engineering hours to integrate all pieces, just to throw
everything away if this doesn't work out. For large-scale systems with many
components, doing this is completely infeasible.
We built SimBricks to tackle exactly this problem. No matter where in the design
process system architects stand, SimBricks allows them to assemble their
complete system in simulation and do thorough end-to-end evaluation with their
actual workloads and software. Doing design space sweeps, i.e. figuring out the
best design choices, is a first class citizen of our orchestration framework. We
even support running them in parallel if enough compute resources are available.
Let me show you what I mean.
# Even Simple Systems Have a Huge Design Space
![Figure showing a heterogeneous system with M clients connected to an external
network and N servers with X hardware accelerators each, which are connected to
an internal network on the other side. There's also a load balancer in the
internal network.](/assets/images/blog/2024-08-28-design-space-sweeps.svg)
This is the system we are going to use as our example. We have M clients
connected to some external network. These send requests to the load balancer in
the internal network. Their requests are then forwarded and served by one of N
servers with X hardware accelerators each. Here, M, N, and X are system-level
design parameters. The external network and clients are fixed but for the
internal network we can freely choose the topology, link speeds, etc.
We also have component-level choices like the number of cores and amount of
memory available at servers, or architectural parameters of our hardware
accelerator, for example clock-speed and the dimensions of the internal compute
array.
To capture more realism, we are also going to add background traffic to the
internal network. We parameterize it in its traffic volume as the percentage of
theoretical max throughput.
# Use the Full Python Machinery to Build SimBricks Experiments!
In a [prior blog post](https://www.simbricks.io/blog/orchestration_framework.html), Hejing illustrates how to easily cast a system design into an experiment in the SimBricks orchestration framework. TL;DR: You just instantiate a few classes. However, there’s no restriction here that forces you to just instantiate one experiment per Python module. Instead, we can construct one for every combination of parameters that we want to evaluate. Since this is Python and we are just instantiating classes, you can use your favorite Python constructs to do so! I decided to go for `itertools.product()` and and a few simple for-loops:
This is the system we are going to use as our working example. We have M clients
connected to an external network. Both are given by the customer and can't be
changed. The clients send requests to the load balancer in the internal network,
which then forwards them to one of N servers with X hardware accelerators each.
Here, N and X are system-level design parameters the system architect can play
with. They can also freely choose what the internal network looks like in terms
of topology, link speeds, etc. Further, we have component-level parameters like
the number of cores and amount of memory available at servers, and architectural
choices for the hardware accelerators like clock-speed and the dimensions of
their inner compute grid. Realistically, both networks are also going to have
background traffic.
Even for this rather simple system, we can already ask a bunch of questions that
need evaluation for reliable answers: Given that the customer wants to have M
clients, how many servers N do we need to achieve the service-level objectives,
for example a guaranteed maximum request latency? Can we reduce the number of
servers required by introducing hardware accelerators? How is all this
influenced by background traffic? Can we reduce costs for building the internal
network by prioritizing client-server traffic over background traffic with the
help of smart network switches?
# Let's do some Evaluation with SimBricks!
To simulate the system we just saw with SimBricks, we need a simulator for each
component. You decide the level of detail you need here! For the hardware
accelerator, we can quickly write up a behavioral model in C++, which already
allows us to answer what if questions. But most importantly, we are going to run
the actual software and workloads of our customer to measure the end-to-end
properties we care about.
For building the simulation, you write a Python script for the SimBricks
orchestration framework that describes the system you want to simulate and which
simulators to use. In this [prior blog
post](https://www.simbricks.io/blog/orchestration_framework.html), Hejing
illustrates such a script.
However, there’s no restriction here that forces us to just instantiate one
experiment per Python module. Instead, we can construct one for every
combination of parameters that we want to evaluate. Since this is Python and we
are just instantiating classes, feel free to use your favorite Python constructs
to do so! I decided to go for `itertools.product()` and a few simple for-loops:
```python
from simbricks.orchestration import experiments as exp
......@@ -63,24 +88,24 @@ num_clients_opts = [4, 16, 128]
num_servers_opts = [1, 2, 4, 8]
num_accel_per_server_opts = [1, 2]
accel_clk_freq_opts = [100, 400]
background_load_opts = [0.5, 0.8]
background_traffic_opts = [0.5, 0.8]
for (
num_clients,
num_servers,
num_accelerators,
accel_clk_freq,
background_load,
background_traffic,
) in itertools.product(
num_clients_opts,
num_servers_opts,
num_accel_per_server_opts,
accel_clk_freq_opts,
background_load_opts,
background_traffic_opts,
):
experiment = exp.Experiment(
f"<experiment_name>-{num_servers}s-{num_clients}c-"
f"{num_accelerators}x-{accel_clk_freq}-{background_load}"
f"{num_accelerators}x-{accel_clk_freq}-{background_traffic}"
)
# Instantiate external & internal network, add background traffic
......@@ -103,8 +128,17 @@ for (
# Parallel Design Space Sweeps
# Fast Design Space Sweeps by Running Experiments in Parallel
In SimBricks, individual experiments can be run independently and thereby in parallel to do fast design space sweeps. Our orchestration framework even automates this for you if you invoke `simbricks-run` with the `--parallel` flag. Parallelizing on the same machine isn’t always possible though. Due to how we establish communication between simulators in the form of shared memory queues, which use active polling for maximum efficiency (learn more about this [here](https://www.simbricks.io/blog/shm-message-passing.html)), no simulator can share a physical thread with another or else simulations become very slow.
To make design space sweeps faster, SimBricks allows you to run experiments in
parallel. Our orchestration framework even automates this if you invoke
`simbricks-run` with the `--parallel` flag. Parallelizing on the same machine
isn’t always possible though. Due to how we establish communication between
simulators with shared memory queues, which use polling for maximum efficiency
(learn more about this
[here](https://www.simbricks.io/blog/shm-message-passing.html)), simulators
mustn't share physical threads or else simulations become very slow.
But even in this case, our orchestration framework offers distributed simulations, where simulations are run on multiple machines in parallel. Stay tuned for more about this! Until then:
But even in this case and to explore even more design choices in parallel, our
orchestration framework offers distributed simulations, where experiments are
run on multiple machines. Stay tuned for more on this! Until then:
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment