How do I know if my backups actually work?

Almost every SMB we audit has 'nightly backups.' Almost none of them have ever actually been restored. The first time they get restored, under pressure on the worst day of the company, is when the founder finds out the backups have been silently failing for 11 months. The fix is automated weekly restore drills: every Sunday at 3am a script spins up a sandbox account, restores last night's snapshot, runs validation checksums, and tears down. You get a green Slack message — or a red one, before you need it. Untested backups do not count.

What is an immutable backup and why does my small business need one?

An immutable backup is an object-locked weekly snapshot that ransomware physically cannot delete. We use S3 Object Lock or Azure Immutable Blob Storage. Even with a full admin breach on your production account, the locked weekly snapshot survives. It is the difference between a 4-hour incident and a 4-month rebuild. This is the single most important backup most SMBs are missing — and the reason 60% of small businesses that lose critical data close within six months.

What are RTO and RPO and how do I set them?

RTO is how long you can be down. RPO is how much data loss is survivable. You decide them in business terms — for example, 1 hour of data loss is survivable but 8 hours is not — and we design backups, replication, and the runbook to hit those targets. The point is to set them based on what your business can actually tolerate, not based on a vendor's marketing number. Then everything from backup frequency to recovery procedure is engineered backwards from those two numbers.

What does a quarterly tabletop exercise actually look like?

Once a quarter we simulate a ransomware Friday at 5pm and watch the team walk through the runbook in real time. We time it. We find the gaps. We fix them. In one Q2 tabletop, the on-call engineer did not have the AWS root MFA token. Found in a drill — not in a real incident at 2am. The businesses that survive disasters are the ones that practiced before the disaster. The difference between a company that survives a ransomware Friday and one that does not is almost never the size of the attack — it is whether they treated backups as something to prove instead of something to assume.

Backups · Disaster Recovery · AWS · Azure

We Don't Actually Know If Our Backups Work (And Other Sentences That Have Killed Small Companies)

60% of small businesses that lose critical data close within six months. Most of them had "backups." Almost none had backups that were ever tested. Here's the 7-step framework we install for SMBs who want to find out before a real incident.

Get a free audit of whether your backups actually work

No commitment. No access required. Clear report in 48 hours.

40 to 60 percent of small businesses never reopen after a major data loss event. That's the figure the FEMA Ready.gov business preparedness program has been quoting for years, and it's the most uncomfortable number in this whole topic — because it doesn't measure businesses with no backups, it measures businesses that thought they had backups. Almost every SMB we audit has "nightly backups." Almost none of them have ever actually been restored. The first time they get restored — under pressure, on the worst day of the company — is when the founder finds out the backups have been silently failing for 11 months, or that the encryption key is on the laptop of an engineer who left, or that the backup is for the wrong database.

Disaster recovery isn't about buying a more expensive backup product. It's about a small set of boring, automated checks that run continuously so the worst day of your business doesn't also become the day you discover you have no backups. The 7 steps below are the exact framework we install for clients on AWS, Azure, or both — heavily informed by the AWS Disaster Recovery of Workloads on AWS whitepaper, which lays out the four standard recovery patterns we map every client to. The cousin framework for keeping your public site up — including its own restore-tested backups — is in our breakdown of why WordPress sites keep going down.

Worked example Logistics company, 35 employees: hit by ransomware in Q3, discovered the "nightly backups" had been silently failing for 14 months, paid roughly $80,000 to recover what could be recovered. Each card below shows what that gate of the framework would do for that single situation — from inventory to quarterly tabletop.

Inventory What Would Actually Hurt to Lose

Most companies don't actually know what data is critical.

We mark your databases, file shares, configs, customer records, and anything regulated; everything else is recoverable from source.

Example

"We tag your 6 critical data stores. The other 137 buckets and folders don't need expensive DR — knowing the difference saves real money."

Set RTO and RPO in Plain English

How much data loss is survivable, and how long can you be down.

You decide in business terms; we design the system backwards from there.

Example

"You decide: 1 hour of data loss is survivable, 8 hours is not. We design backups, replication, and the runbook to hit that — not a vendor's marketing number."

Automated Cross-Region Backups

Backups land in a separate cloud account in another region. Encrypted at rest.

Daily, hourly, or continuous depending on your RPO.

Example

"AWS Backup writes encrypted snapshots to a locked, second-account vault in us-west-2. Even with full IAM compromise on your prod account, attackers cannot delete them."

Immutable + Air-Gapped Copy

An object-locked weekly snapshot ransomware physically cannot delete.

The single most important backup most SMBs are missing.

Example

"S3 Object Lock or Azure Immutable Blob Storage. Even with full admin breach, the locked weekly snapshot survives — it's the difference between a 4-hour incident and a 4-month rebuild."

Automated Restore Drills, Weekly

Once a week, a script restores the latest backup to a clean sandbox account, runs checksums, and emails a green check.

Untested backups don't count.

Example

"Every Sunday at 3am: spin up a sandbox account, restore last night's snapshot, run validation checksums, tear down. You get a green Slack message — or a red one, before you need it."

A Runbook a Non-Engineer Could Execute

Step-by-step, screenshots, named phone numbers.

Because on the day you need this, your CTO might be on a flight.

Example

"If your CTO is on a flight to London, your CFO can start the restore from page 2 of the doc — with the exact command, who to call, and what 'success' looks like at each step."

Quarterly Tabletop With Leadership

Once a quarter we simulate ransomware Friday at 5pm and watch the team walk through the runbook in real time.

We time it. We find the gaps. We fix them.

Example

"Q2 tabletop revealed the on-call engineer didn't have the AWS root MFA token. Found in a drill — not in a real incident at 2am."

How we know this works Each gate is mechanical. The system can't tell you "your backups are fine" because nobody can — only a real restore can. Every Sunday a real restore happens, and you see the result before you ever need it.

TestedRestores

ImmutableOff-Site

RunbookAnyone Can Run

When this becomes urgent

This is no longer optional if any of these are true today:

You haven't restored a backup in the last 90 days.
You don't know which person on your team has the encryption keys.
Your backup story is "the host does it" or "I think we have RAID."
You bought cyber insurance and ticked the "we have backups" box without testing — the same box that gets your cyber insurance claim denied when you cannot prove it.
You'd lose more than 24 hours of data if your primary database disappeared right now.

The framework above isn't theoretical — it's a checklist. Every gate has a defined output: an inventory document, a written RTO/RPO, a vault in another account, a locked weekly snapshot, a green Slack message every Sunday, a runbook a non-engineer can execute, and a quarterly drill on the calendar. None of it is exotic technology. It maps cleanly onto established standards — NIST SP 800-34, the federal contingency planning guide, and ISO 22301, the international business continuity standard — which is what most auditors and cyber insurers ultimately want to see evidence against.

The point isn't to make a disaster impossible — that's not on offer. The point is to make sure that when one happens, you already know your backups work, you already know who runs the restore, and you've already practiced. The U.S. Small Business Administration's emergency preparedness guidance says the same thing in plainer language: the businesses that survive disasters are the ones that practiced before the disaster. The difference between a company that survives a ransomware Friday and one that doesn't is almost never the size of the attack — it's whether they treated backups as something to prove instead of something to assume. And once you can prove it, the same numbers also drive your cloud cost optimization story: you stop paying to keep snapshots you'd never actually restore from.

Get a free 30-minute review of your backup story

We'll review what you back up, where it lands, who can delete it, and whether anyone has ever restored it. You'll get a clear report within 48 hours showing exactly which of the 7 gates above are missing — and what each would take to install.

Review my backup story (free)

No access required. No commitment. No pressure. Just a clear review.

When this becomes urgent

Related Reading

Get a free 30-minute review of your backup story