RaptorCast: Designing a Messaging Layer

klabb3 11 hours ago

> Assuming zero network latency and a bandwidth of 1 Gbps, this would still take around 16 seconds

I’m not seeing how the new design affects the throughput needs, but I’ll say this:

Except for highly controlled environments (OS, NICs etc), you will run into perf issues with UDP-based protocols much sooner than with TCP, even if you’re just pushing zeroes. Packet switching is much more difficult to optimize.

If you only use sporadic messages without backpressure, and you’re willing and able to handle out-of-order messages and retransmission logic, by all means, use UDP. Like for realtime multiplayer games, it makes sense.

For high throughput on diverse platforms and hardware, the story is very different. Yes, even with Quic. I learnt this the hard way.

All that said, I’m very curious what the results are. Is this designed fully deployed, and if so in what kind of environment and traffic patterns? Even better: benchmarks/stress tests would be fantastic.

tklenze 10 hours ago

Thanks for your insights! Yes, real-life behavior is indeed interesting to look at, and for this purpose, two testnets are running right now (https://www.gmonads.com).
RaptorCast uses erasure coding to break a block proposal into smaller pieces with plenty of redundancy to allow for omissions. This means that if you receive sufficiently many chunks, you can decode the block proposal (no matter which of the chunks you received). The redundancy factor can be tweaked, but it’ll likely be >2x, to allow for networking issues and faulty/malicious nodes. Furthermore, the blockchain can make progress as long as >2/3 of the validators receive the block proposal and are honest. This means that at least in theory, you should be able to tolerate a lot of packet losses.
Re throughput: Monad has 2 blocks / s, each 2MB in size. So even with a redundancy factor of 3x, each validator only has to send 12MB per second.
Re backpressure: Not really an option for blockchains. If you have 100 peers and one of them is too slow, what are you going to do? If you back pressure to slow down consensus, you slow down the entire blockchain even though most peers are fast. There’s a recent paper about this problem: https://arxiv.org/abs/2410.22080.
What’s important is that the amount of bandwidth required per validator remains constant in RaptorCast, no matter how many validators are part of the network. And you always just need one round-trip to broadcast a block proposal, as opposed to Gossip protocols that may involve more steps and have higher latency.
- lxgr 9 hours ago
  
  > The redundancy factor can be tweaked, but it’ll likely be >2x
  If your packet loss is due to your traffic overwhelming a queue at any intermediate hop, sending more redundant packets would be aggravating the problem instead of solving it.
  Are you running this on top of something providing congestion control?
PhilipRoman 9 hours ago

It's a shame because the datagram paradigm is so much more elegant. In real world cases you end up having to emulate it by putting length prefixed data in TCP streams, reducing TCP timeouts, constantly reconnecting sockets (with the latency penalty), etc.
Really, the only thing that's missing from UDP is (optional) backpressure.
A lot of software can handle out-of-order datagrams with no performance penalty (like file uploads, etc.). This is especially annoying when you're operating in an environment with link aggregation where the interface insists on limiting your bandwidth to a single link.
- simcop2387 8 hours ago
  
  This is one reason that I'm still upset about the failure that SCTP has ended up. It really did try to create a new protocol for dealing with exactly all of these issues but support and ossification basically meant it's a non-starter. I'd have loved if it was a mandatory part of IPv6 so that it'd eventually get useful support but I'm pretty sure that would have made IPv6 adoption even worse.
  - Veserv 7 hours ago
    
    Well we have QUIC now which layers over UDP and is functionally strictly superior to SCTP as SCTP still suffered from head-of-line blocking due to bad acknowledgement design.
  - lxgr 8 hours ago
    
    As long as you're fine with UDP encapsulation, you can definitely use SCTP today! WebRTC data channels do, for example.
- lxgr 8 hours ago
  
  > the only thing that's missing from UDP is (optional) backpressure.
  The lack of congestion control seems significant too. Most message-oriented protocols layered on top of UDP end up adding that back at the application layer as a consequence.

yangl1996 8 hours ago

Looks like there is no mentioning in the blogpost of the paper (poster) [1] in which the two-level broadcast idea is proposed.

[1] https://dl.acm.org/doi/pdf/10.1145/3548606.3563494

ethan_smith 8 hours ago

The cited paper is indeed foundational, introducing not just two-level broadcast but also optimizations for validator selection and network topology that RaptorCast appears to build upon.
compyman 6 hours ago

It seems very similar to an earlier paxos optimization called 'pig paxos' https://dl.acm.org/doi/10.1145/3448016.3452834

wwolffrec 5 days ago

Monad uses RaptorCast to send out block proposals quickly and reliably to a global network of validators. At Category Labs, designing an effective messaging protocol to meet Monad’s high performance requirements was challenging and educational. Read more about the design in the link.