Blog

How we sped up deploys, updates, and undeploys in Magic Containers

Posted by:

Anton Zvonko Gazvoda

May 28, 2026

Magic Containers became programmable with the introduction of the Public API. The next step was making it react in real time.

In practice, operations like deploys and updates could take tens of seconds to complete. Not because anything was failing, but because of how control loops work.

Each part of the system waits for the next reconciliation cycle, and those delays stack up across components.

As workloads become more short-lived and automated, lifecycle speed becomes an increasingly important part of the developer experience. Spinning up a container, scaling it, or tearing it down should feel immediate.

This is the problem we set out to solve.

The problem: control loops introduce latency

Magic Containers is built around a control loop architecture.

Each component continuously reconciles desired state → actual state:

Application Provisioner → selects regions for deployment
Controller Manager → ensures the desired number of replicas
Scheduler → assigns pods to nodes
Local Container Manager (LCM) → ensures containers run on the assigned node

Each of these components runs independently in a loop:

while True:
	observe_state()
  diff = desired_state - actual_state
  if diff:
	  reconcile(diff)
  sleep(interval)

This model is extremely reliable and forms the backbone of many distributed systems, including Kubernetes.

But it comes with a tradeoff.

Latency compounds across loops.

Each loop runs on an interval of ~5 to 10 seconds.

That means a single operation doesn’t execute immediately. Instead, it waits for the next loop iteration.

Now chain multiple components together:

User action
	→ waits for Provisioner loop (up to 10s)
	→ waits for Controller loop (up to 10s)
	→ waits for Scheduler loop (up to 10s)
  → waits for LCM loop (up to 10s)

In the worst case, this stacks up to tens of seconds before a container is fully running.

Even under typical conditions, deploys and updates felt noticeably delayed.

Nothing was technically incorrect. The system always converged to the correct state, but the experience lagged behind what modern workflows expect.

What we considered (and rejected)

We explored several approaches before settling on the final solution.

1. Decreasing loop intervals

Reducing intervals from ~10 seconds to sub-second.

Why we rejected it:

Significant increase in CPU usage across all control plane components
Higher pressure on state storage and coordination systems
Still fundamentally polling-based, meaning latency never reaches zero

2. Removing loops entirely

Moving to a purely event-driven system.

Why we rejected it:

Loops are critical as a safety mechanism
They continuously verify and correct drift between desired and actual state
Without them, missed events could lead to permanent inconsistencies

3. Hybrid model (chosen)

Keep loops for correctness and safety, but introduce events for immediacy.

The solution: event-driven acceleration

We introduced a message broker with event queues between components.

The key idea: Loops ensure consistency. Events provide speed.

Instead of waiting for the next loop iteration, components now react immediately when something changes.

Before: loop-driven propagation

[User Action]
↓
(wait for Provisioner loop)
↓
(wait for Controller loop)
↓
(wait for Scheduler loop)
↓
(wait for LCM loop)
↓
[Container Running]

After: event-accelerated flow

[User Action]
↓
[Event: Application Created] → Queue
↓
[Provisioner triggered immediately]
↓
[Event: Provisioning Complete]
↓
[Controller Manager triggered]
↓
[Event: Pods Created]
↓
[Scheduler triggered]
↓
[Event: Pod Scheduled]
↓
[LCM triggered]
↓
[Container Running]

How It Works

Each component now listens for specific events and reacts instantly.

Example event

{
    "type": "application.created",
    "app_id": "app_123",
    "regions": ["eu-central", "us-east"],
    "timestamp": 1713949200
}

Provisioner

func handleApplicationCreated(event Event) {
    regions := selectRegions(event)
    publish("provisioning.completed", regions)
}

Controller Manager

func handleProvisioningComplete(event Event) {
    createReplicas(event.app_id, desiredReplicas)
    publish("replicas.created", event.app_id)
}

Scheduler

func handleReplicasCreated(event Event) {
    node := selectNode(event)
    publish("pod.scheduled", node)
}

Local Container Manager (LCM)

func handlePodScheduled(event Event) {  
		startContainer(event.node, event.pod)
}

Important: loops still exist

The control loops were not removed.

They still run continuously to:

Detect drift (e.g., crashed containers or missing replicas)
Reconcile inconsistencies
Act as a fallback if events are delayed or lost

// simplified reconciliation loop
for {
    diff := computeDiff(desiredState, actualState)
    if diff != nil {
        reconcile(diff)
    }
    sleep(5 * time.Second)
}

This hybrid design gives us:

Fast reaction time (events)
Strong consistency guarantees (loops)

These events don’t just drive internal components, they also power real-time updates across the platform, including the Dashboard.

From control plane to user experience

Reducing backend latency is only part of the story.

Before this change, even when operations completed, the Dashboard still relied on polling to fetch updates. This introduced an additional delay between something happening in the system and the user actually seeing it.

In practice, this meant:

Deploy finishes → UI updates a few seconds later
Scaling event happens → user sees it after the next refresh cycle

To solve this, we extended the same event-driven model all the way to the frontend.

Real-time updates via WebSockets

We introduced WebSocket-based event streaming between the control plane and the Dashboard.

Instead of polling for state changes, the UI now subscribes to live updates:

Client → opens WebSocket connection
→ subscribes to application events

Whenever something changes:

[Control Plane Event]
↓
[Message Broker]
↓
[WebSocket Gateway]
↓
[Dashboard UI updates instantly]

What this changes

This removes the final layer of perceived latency.

Before

Backend finishes → UI polls → user sees update later

After

Backend finishes → event emitted → UI updates instantly

Result

Deploy progress updates feel real time
Scaling actions are visible immediately
State transitions (creating → running → scaling) feel continuous

Why this matters

Without this step, the platform would be technically fast but still feel slow.

By pushing events all the way to the UI, we aligned:

System speed
User perception

The impact

By eliminating waiting between steps, we removed the largest source of latency.

Before vs. after

Operation	Before (loop-driven)	After (event-driven)
Deploy	10–40s	< 5s
Update	10–40s	< 4s
Undeploy	~60s+	~60s (grace period)

What changed technically

Removed dependency on loop timing for forward progress
Reduced end-to-end latency by an order of magnitude
Maintained correctness via continuous reconciliation

Why this matters

This fundamentally changes how Magic Containers behaves.

CI/CD pipelines speed up. Infrastructure is no longer the slowest step
Ephemeral workloads become practical. Create → run → destroy flows now complete in seconds
Event-driven systems feel natural. Infrastructure now reacts at the same speed as application logic

Tradeoffs and challenges

1. Event ordering

Ensuring correct sequencing across distributed components.

Solution:

Idempotent handlers

2. Reliability

Events can fail or be delayed.

Solution:

Retry mechanisms
Dead-letter queues
Control loops as fallback

3. Observability

Async systems are harder to debug.

Solution:

Correlation IDs
Event tracing across components

What’s next

We’re already exploring:

Optimizing image download times to reduce startup time
Automated build and updates directly from your GitHub repository
Access to recent log history alongside live logs for easier troubleshooting

Final thoughts

Magic Containers started as a loop-driven control plane designed for correctness.

With the introduction of event-driven acceleration, it now reacts immediately to changes, without relying on the next reconciliation cycle.

The result is a system that converges just as reliably, but gets there much faster.

How we sped up deploys, updates, and undeploys in Magic Containers

Anton Zvonko Gazvoda

May 28, 2026

The problem: control loops introduce latency

What we considered (and rejected)

The solution: event-driven acceleration

How It Works

Important: loops still exist

From control plane to user experience

Real-time updates via WebSockets

What this changes

Before

After

Result

Why this matters

The impact

What changed technically

Why this matters

Tradeoffs and challenges

What’s next

Final thoughts

bunny.net

Products

Developers

Support