At bunny.net, we keep repeating that we're obsessed with user experience and are on a mission to build a faster internet. Yet, when connecting the two, our dashboard experience has not quite matched up. Every setting would take between 10-30 seconds to update on the CDN or DNS, or in some cases, as high as a minute. While this is still faster than many legacy CDNs, where sometimes changes can take minutes, or even hours, it just wasn't providing an experience we were happy with.
We use bunny.net extensively internally as well, and we simply were not happy waiting 10, 20, 30, or even 60 seconds for a configuration change to take effect. We wanted to change that. With a goal to push our user experience forward, we went ahead and completely reengineered how we propagate global configuration on every service, on every server around the world.
In 2020 Amazon CloudFront proudly announced they're slashing configuration propagation times to 5 minutes on average. We've slashed it to less than a second.
Any product ranging from CDN, DNS, Storage, or Bunny Stream is now able to apply configuration practically in real time. This means no more waiting and refreshing to check if the configuration is applied, just a much smoother user experience.
We're especially thrilled to see this in Bunny DNS. Due to caching at various points of the DNS resolution process, a non-resolved domain can linger in cache for multiple minutes, and there's really not much you can easily do to remove it. With the new instant propagation, any DNS change is instantly replicated everywhere, so you can focus on what you're doing, not waiting long enough for the change to apply.
Reengineering Configuration Management
Our mantra is to innovate quickly and improve continuously, and through time, this is the third and most exciting update on how bunny.net manages configuration between thousands of servers that we operate.
The Original - Basic Configuration Files
When we started bunny.net quite a few years ago, the configuration process was simple. A process would be responsible for writing raw Nginx configuration files and triggering a configuration reload. This was built by having a polling mechanism of a supervisor process that would periodically fetch and configure zones. While this was a great start, it offered very poor scalability after we started reaching tens of thousands of configured zones.
The Next Step - Close To Real-Time Polling
Later, we migrated this into Lua, which opened a whole new world of extensibility in Nginx thanks to almost complete scriptability of every aspect of the reverse proxy. With the new system, we wrote the configuration files written to the disk, then parsed them in Lua and performed all of our logic dynamically based on those files using our own scripts. This configuration would then remain cached for up to 20 seconds in memory, after which they would be reloaded during the next request.
This was a significant stride in scalability as well as configuration change propagation. From legacy CDNs, which could take minutes, or even hours to apply a change, we've been able to move into close to real-time changes being applied worldwide, but it wasn't perfect.
Going Forward - Real-Time Global Push
Today, we're excited to introduce our third major change in how we read and replicate configuration that makes the whole process completely real-time. To achieve this, we switched off the periodic configuration polling and replaced it with a real-time message exchange system. For obvious reasons, we decided to go with RabbitMQ.
Now, a single configuration message is pushed from the core API to a fanout RabbitMQ exchange that automatically relays this to every server in our system registered for that specific configuration line. This allows us to receive and write configuration files on average within 50-500ms worldwide.
Every exchange is then separately encrypted so that only servers that need this specific configuration are able to read and decode the messages for their own set of requirements.
The next step was changing how we keep the configuration in memory. Instead of periodically expiring cache, both Nginx and BunnyProxy, our custom-built reverse proxy, now keep cache indefinitely. Once a message arrives from RabbitMQ, it's received from BunnyProxy, which then also writes the configuration files and signals Nginx to trigger a reload of this new configuration.
Thanks to persistent caching, this also increases performance the performance of our global network, as we no longer have to keep reloading files to check for changes. This noticeably reduced disk reads while accessing configuration and squeezed an extra little bit of performance out of our software stack, and within a busy system, every millisecond counts.
Help us build a faster internet!
At bunny.net, we're obsessed with constantly moving the bar. We continue to try and find ways to make the customer journey just a little bit better every day and, in effect, help build a better internet experience for hundreds of millions of internet users.
If you enjoy what we're doing and would like to help us build a faster internet, make sure to check out our careers page. We're working on some incredible products that we hope will help shape the internet, and we would love to have you on the team. If your position isn't listed and you would like to be a part of what we're doing, make sure to reach out as well.