Skip to content
Snippets Groups Projects
2024-10-16-homa-evaluation-challenges.md 4.75 KiB
Newer Older
title: "Challenges in Evaluating New Data Center Network Protocols"
    How can virtual prototyping improve
    the process of testing and evaluating
    new protocols for large-scale networks?
date: 2024-10-16
author: marvin
permalink: /blog/homa-evaluation-challenges.html
Antoine Kaufmann's avatar
Antoine Kaufmann committed
card_image: /assets/images/blog/homa-evaluation-challenges-banner.png
---

With the rapid development of computer systems and the ever-growing amounts of
data that modern applications process, the requirements for data centers are
also constantly evolving. The growing communication demands require operators to
rethink their network infrastructure, like network hardware components or
network protocols.

For example, the network protocol Homa takes a new approach and focuses on
connectionless, message-oriented remote procedure calls (RPCs) as opposed to the
widely used connection-oriented protocol TCP. New protocols that take new
approaches can outperform old, well established protocols, but on the flip side
there is a hurdle adopting them, since it requires extensive testing and
intrusive changes in the data center. In the following, we want to look at the
challenges of testing and adopting a new network protocol in a data center,
using Homa as an example. We will also touch upon how SimBricks can help
overcome these challenges by providing virtual prototyping.


# The Homa Network Protocol

The authors of the Homa protocol recognized that albeit the increasing amount of
data that is being exchanged in a data center, a non-negligible part of common
workloads consists of small messages. Further, they found that widely deployed
protocols like TCP do not focus on small messages but instead on large ones,
which results in bad performance for small messages, especially in regards to
latency. With a connection-oriented protocol that is based on flows, small
messages can experience head-of-line blocking, because they get stuck behind
larger messages in the flow. To improve tail latencies for small messages, Homa
approximates SRPT (shortest remaining processing time first) by a receiver
driven approach, which means that the receiver informs the sender which message
to prioritize when sending data. For that, Homa is connectionless,
message-based, and makes use of priority queues provided by modern network
switches.


# Challenges Of Integrating Homa

Because Homa takes a different approach compared to widely used protocols and
makes use of special features of network switches, implementing it in a data
center requires extensive changes for both hardware and software. The
message-based nature of the protocol dictates a new API that focuses on a
request-response pattern typical for RPCs, which requires adapting the
applications accordingly to be able to use Homa. Furthermore, Homa needs network
switches that support priority queues and each of the switches in the data
center that carries Homa traffic has to be configured. Additionally, the Homa
Linux kernel module has to be installed and configured on each host.
Consequently, the process of deploying Homa in a data center is complicated,
requiring intrusive changes and maybe even buying and installing new hardware.

Antoine Kaufmann's avatar
Antoine Kaufmann committed
![Overview of Homa evaluation
challenges](/assets/images/blog/homa-evaluation-challenges.svg)

# Testing Homa In a Data Center

Homa comes with a variety of configuration options that let the user tune the
protocol to the workload and specific setup like the available link bandwidth.
This gives the user a large configuration space to explore. Additionally, the
user has to test the integration of the application with Homa and specifically
its API. Due to the complexity of integrating Homa in a data center, it is
difficult to test the protocol at scale. Even when using a smaller testbed, it
might already be time consuming to configure all of the involved software and
hardware, especially when exploring many different configuration options.


# Virtual Prototyping For Testing And Developing

Virtual prototyping helps to overcome the challenges by allowing data center
engineers to create a virtual replica of the data center. This replica serves
as a testbench to test and evaluate the performance of Homa before implementing
it in the actual network. The virtual replica makes it easy to change the
various configuration options for all components at once or only for a few
selected ones. Additionally, the data center engineer can scale the virtual
prototype from a small testbed to a large scale system using [distributed
simulations](distributed-simulations.html). Finally, software developers are
able to use the virtual prototype for adapting and testing applications with
the new network protocol, ensuring seamless integration and optimal
performance.


If you have any questions or would like to learn more, please do no hesitate to
reach out: