The Disconnected Edge: How We Solved In-Flight Data Sync at 35,000 Feet
When most engineers think about deploying a modern application, they imagine a familiar setup: cloud infrastructure, global CDNs, auto-scaling services, and near-constant internet connectivity.
The architecture patterns are well understood. If something goes wrong, the application can usually reach another service, retry a request, or fetch fresh data from the cloud.
But what happens when your application spends most of its life completely disconnected from the internet?
That's the challenge we faced while building a next-generation in-flight entertainment (IFE) platform.
And it forced us to rethink some of our most fundamental assumptions about distributed systems.
The Reality of Running Software on an Aircraft
Our platform ran on embedded edge devices installed onboard commercial aircraft.
These devices powered the entire passenger experience:
Movies and TV shows
Games
Digital publications
E-commerce catalogs
Passenger analytics
Operational data collection
Unlike a typical web application, however, these systems couldn't rely on continuous connectivity.
For most of their operational life, they were completely isolated from the internet.
Yet every day brought new content, software updates, pricing changes, configuration updates, and analytics data that needed to move between a central backend and hundreds of aircraft spread around the world.
Building the application wasn't the hard part.
Keeping everything synchronized was.
When Cloud-Native Assumptions Stop Working
Modern software architecture often assumes the network is always available.
Applications are designed around:
Real-time APIs
Continuous synchronization
Cloud-hosted storage
Immediate access to backend services
Our environment broke every one of those assumptions.
Aircraft devices regularly experienced:
Extended periods with no connectivity
Short and unpredictable synchronization windows
Limited compute resources
Restricted power budgets
High reliability requirements
A failed synchronization couldn't leave a device in a broken state. Passengers still expected the system to work regardless of whether it had connected to the backend recently.
That changed our design philosophy completely.
Designing for Offline First
Instead of treating connectivity loss as an exception, we treated it as the default operating mode.
Every aircraft device maintained its own local copy of everything it needed to operate:
Application assets
Media content
Configuration data
Transaction queues
Analytics events
The backend remained the source of truth, but the aircraft could continue operating independently for days or even weeks if necessary.
Connectivity became a bonus rather than a requirement.
This single mindset shift influenced almost every architectural decision that followed.
Synchronization Became the Product
In many systems, synchronization is a background task.
For us, synchronization became one of the most critical parts of the platform.
The workflow was intentionally simple:
The backend generated update manifests.
Devices checked for available updates whenever connectivity existed.
Only changed assets were downloaded.
Downloaded content was verified.
Updates were applied atomically.
Analytics and operational data were uploaded back to the backend.
The goal wasn't speed.
The goal was reliability.
If a network connection dropped halfway through a transfer, the device needed to recover gracefully and continue from where it left off.
No corruption. No inconsistent state. No manual intervention.
The Challenge of Moving Large Media Libraries
One of the biggest engineering challenges involved content distribution.
A typical update might contain:
Video files
Images
Application bundles
Configuration changes
Re-downloading everything every time simply wasn't realistic.
Bandwidth was limited, synchronization windows were unpredictable, and some assets were extremely large.
To make transfers practical, we relied heavily on:
Manifest-driven synchronization
Content hashing
Incremental downloads
Compression
Delta updates where possible
The principle was straightforward:
Transfer only what changed.
That simple rule dramatically reduced synchronization times and bandwidth usage across the fleet.
Building for Unreliable Networks
Connectivity quality varied enormously.
Sometimes devices synchronized over stable networks.
Other times they relied on slow, intermittent cellular connections.
The synchronization system had to assume failure could happen at any point.
To handle this, we built in:
Retry queues
Checkpointed downloads
Idempotent operations
Integrity verification
Automatic recovery mechanisms
One of the most important requirements was ensuring that a failed synchronization never damaged the existing system.
If an update couldn't be completed successfully, the device simply continued operating from its last known-good state.
Passengers never noticed the difference.
Observability Without Real-Time Access
One of the more interesting challenges wasn't deployment.
It was debugging.
Most modern observability platforms assume engineers can access telemetry in real time.
That wasn't possible for us.
Aircraft devices spent long periods disconnected, making live monitoring ineffective.
Instead, devices accumulated operational information locally:
Structured logs
Health metrics
Synchronization reports
Diagnostic data
Whenever connectivity became available, this information was uploaded and aggregated centrally.
The result wasn't real-time observability.
It was delayed observability.
And that required a different mindset when diagnosing production issues.
What We Learned
Building systems for disconnected environments teaches lessons that are easy to overlook in cloud-first architectures.
Connectivity Is a Feature, Not a Guarantee
Many applications assume the network will always be available.
We learned to assume the opposite.
Systems designed to function without connectivity are often more resilient overall.
Idempotency Is Non-Negotiable
When networks are unreliable, retries become routine.
Every operation must be safe to execute multiple times.
Without idempotency, synchronization quickly becomes a source of inconsistency and failure.
Local-First Architectures Are Surprisingly Powerful
Giving edge devices autonomy reduces dependency on centralized infrastructure.
It improves user experience and limits the blast radius of failures.
In many cases, it produces a more robust system than one that depends heavily on constant connectivity.
Simplicity Wins
The most reliable parts of our platform weren't the most sophisticated.
They were the simplest.
Clear manifests, atomic updates, strong validation, and predictable workflows consistently outperformed more complex approaches.
Reliability often comes from reducing complexity rather than adding it.
Final Thoughts
A lot of modern engineering focuses on scaling cloud infrastructure.
But some of the most interesting distributed systems problems exist far from the cloud at the edge, where connectivity is intermittent, resources are constrained, and reliability matters more than raw throughput.
Building software for aircraft forced us to rethink assumptions about deployment, networking, observability, and synchronization.
The result was a platform capable of operating independently for extended periods while staying synchronized with a centralized backend whenever connectivity became available.
And while the environment was unique, the lessons weren't.
Whether you're building for aircraft, ships, remote industrial sites, IoT devices, or any other disconnected environment, the same principle applies:
Design for a world where the network isn't guaranteed.
Everything else becomes easier after that.


