Tech

Most Cloud Migrations Create the Downtime They Were Supposed to Fix

Cloud Migration Planning

Gartner puts the number at eighty-three percent. That’s the share of data migration projects that either fail or blow past their budgets and timelines. Which is a strange number to sit with when you consider that most of those projects were started specifically to make systems more reliable, and yet the migration itself became the thing that broke them.

The downtime costs behind that statistic aren’t theoretical either. ITIC’s most recent survey found that ninety-one percent of mid-sized businesses report that a single hour of unplanned downtime costs them three hundred thousand dollars or more. Forty-one percent put the number between one and five million. For smaller operations, the figure is lower in absolute terms, but the damage relative to revenue can be worse, because a company doing half a million a year that loses two thousand an hour during an outage feels that loss in a way a Fortune 500 company absorbing five hundred thousand doesn’t.

So the question isn’t really whether to migrate off legacy systems. For most businesses, that decision is already made or overdue. The question is how to do it without becoming part of the eighty-three percent, and the answer is less about the technology than most people think.

What actually goes wrong in the migrations that fail

The assumption most companies walk in with is that cloud migration is a technical operation, you pick up what’s here and put it over there, and it works. Lift and shift is the term, and it’s the approach that creates most of the problems.

British bank TSB is probably the most expensive example in recent memory. They migrated their banking platform, and the inadequate testing before the switchover led to system failures that locked customers out of their accounts for weeks. The final cost came to roughly three hundred million pounds in compensation, remediation, and lost business. A retail company that didn’t properly test its database migration went dark for seventy-two hours and lost about 1.2 million pounds in revenue during the outage. An e-commerce operation lost two million dollars during a single failed weekend migration attempt. A financial services firm’s trading platform went down during market hours, and the cost came to ten million.

These aren’t stories about bad technology. They’re stories about insufficient planning and testing. The migration itself introduced the instability it was supposed to eliminate, which is the irony that sits at the center of eighty-three percent of these projects.

Even Netflix, which is now held up as one of the gold-standard cloud migration cases, went through a three-day database corruption incident before it fully committed to the process and got the approach right. And they had engineering resources that most companies can only dream about.

The other number worth knowing is that seventy-four percent of companies have moved at least one application back to on-premise after migrating it to the cloud. That’s not failure in every case; sometimes it’s a legitimate architectural decision, but in most cases, it’s a sign that the migration wasn’t planned well enough to work the first time.

The three things the successful migrations do differently

Parallel vs Big-Bang Migration

The migrations that actually work, the seventeen percent that come in on time and on budget and don’t create new downtime, tend to share a few specific practices that the failed ones skip.

The first is that they run parallel environments rather than cutting over all at once. The old system stays live while the new one is brought up alongside it, traffic gets shifted gradually, and the team validates performance at each step before moving more. This is slower and costs more in the short term because you’re running two environments, but the alternative is the big-bang switchover that either works perfectly or doesn’t work at all, and the statistics suggest that betting on “works perfectly” is not a great use of money.

The second is that they treat the migration as a process redesign rather than a technology swap. Moving a badly architected system from on-premise to the cloud gives you a badly architected system in the cloud, which is more expensive and sometimes less stable than what you started with. The planning phase needs to include an honest assessment of what should be modernized, what should be consolidated, and what should be retired rather than migrated.

The third is that they bring in planning expertise before the project starts rather than troubleshooting help after it breaks. The difference between a cloud migration that’s been properly scoped, with security frameworks mapped out and workload optimization built into the plan, versus one that’s being figured out on the fly is usually the difference between the seventeen percent and the eighty-three.

That planning phase is the part most companies underinvest in because it doesn’t feel like progress. Nobody’s moving servers, nobody’s deploying code, it looks like meetings and documents, and assessments. But it’s where the TSB-scale failures get prevented, because the problems that kill migrations are almost always problems that could have been identified before anyone touched a production system.

Why legacy systems create a compounding problem if you wait

Why legacy systems create a compounding problem

There’s a tendency among businesses that have heard the migration horror stories to conclude that the safest option is to stay where they are. Keep the on-premise systems running, patch what breaks, and avoid the risk entirely.

The problem with that logic is that legacy infrastructure doesn’t stay stable; it degrades. Hardware ages, vendor support expires, security vulnerabilities accumulate, and the cost of keeping everything running goes up year over year, while the system’s ability to handle modern workloads goes down. ITIC’s data shows that forty-five percent of unplanned downtime incidents are caused by simple things like misconfigured systems and failed updates, and those incidents get more frequent as infrastructure ages.

The other compounding factor is opportunity cost. Legacy systems were designed for predictable workloads and scheduled operations. They don’t handle traffic spikes well, they don’t support remote access cleanly, and they can’t scale without hardware purchases that take weeks or months to provision. Every month, a business runs on infrastructure that can’t flex with demand is a month where that business is operationally slower than competitors who can.

So the risk of migrating is real; the eighty-three percent failure rate makes that clear. But the risk of not migrating is also real; it just shows up differently, as a slow accumulation of downtime events, rising maintenance costs, and missed capacity that the business needed and couldn’t get in time.

What the actual decision comes down to

The honest version of this decision isn’t “should you migrate to the cloud?” For most businesses running aging on-premise systems, the answer to that is yes and has been for a while. The actual decision is whether to invest properly in the planning phase or to treat migration as a technical task that the IT team handles alongside their regular work.

The companies in the eighty-three percent almost always chose the second option. The ones in the seventeen percent almost always chose the first. That’s a pattern consistent enough across enough case studies and industry data to be treated as something close to a rule.

One hour of downtime at three hundred thousand dollars is a bad day. Three hundred million pounds in migration remediation costs is a bad year. Both are avoidable with the same thing, which is planning that’s thorough enough to catch the problems before they become production incidents. The technology is genuinely the easier part of the equation.

Leave a Reply

Your email address will not be published. Required fields are marked *