Backing up a database depends on how it is delivered

How you back up your database depends on three factors: how the database is delivered to you, the database backup logistics, and the recovery time objective (RTO) and recovery point objective (RPO) you are trying to achieve. This article explains how the database is delivered.

Databases can be delivered in three ways: as software on a server you own; as a platform as a service (PaaS); and as a serverless service. Let’s take a look at them.

Traditional database software

Until a few years ago, all databases were delivered by purchasing a license for a product and installing it on a server or virtual machine of your choice. You were responsible for everything, including server security and administration, storage, the application itself, and (of course) database backup.

This means you have a variety of choices to make, including how to back it up. Some possibilities are completely invalid because databases behave in a particular way that makes them less easy to back up using methods designed for unstructured data. The following three concepts apply to almost all traditionally shipped databases.

Moving target

Data in a database is usually stored in data files that you can see in the file system of the server or virtual machine that hosts it. These files are constantly changing as long as something updates the database, which means you can’t just back them up like any other file. Backup would be useless.

Backups and point-in-time restores

Most supported database backup methods create a copy of the database at the time the copy was made, such as every night at 10 p.m. This means that he will only be able to restore the database at that exact moment.

Move forward or backward from a point in time

To be able to meet a tighter RPO, most databases have a transaction log that can be replayed after a point-in-time restore to move the point in time to a more recent point in time that you specify. This log can also be used to roll back transactions if the database crashes and is in an inconsistent state.

These three general concepts underlie almost every database running on servers or virtual machines that you administer, although there are exceptions to every rule. Data files are sometimes block devices and not files at all, and sometimes they don’t change even if the database changes. The key to properly obtaining backups of any traditionally shipped database is understanding how the database addresses the above three challenges.

The most common method of backing up a traditionally shipped database is a nightly copy, which can be full or incremental, followed by a continuous transaction log backup. The dump will allow you to restore the entire database and then the logs will allow you to progress transactions to the point where things went wrong.

Platform as a service

A second way to deliver databases is the platform-as-a-service (PaaS) model where you only see the application and have limited, if any, access to the underlying infrastructure. Amazon Relational Database Service (RDS) is an example of a PaaS offering, and it can be configured to deliver Oracle, MySQL, PostgreSQL, MariaDB, and Aurora databases. Azure also offers SQL Server, MySQL, PostgreSQL and others in a PaaS setup.

Backup options for a PaaS database are usually quite straightforward. Each PaaS offering provides a mechanism that supports backup and restore. Some come with backups that automatically run daily and usually create a copy in that provider’s object storage. Others require you to configure backups to run. Therefore, do not assume that your PaaS database is automatically backed up.

In fact, you shouldn’t assume that none of your infrastructure is backed up. Examine each PaaS database you use and see what backup and recovery options it offers. Most of the default backup methods for PaaS databases copy your backups to the same account and region the database is running in, so another thing to check is whether or not you can copy these backups in a different account and region. It’s a good idea that will protect you against things like the OVH fire that destroyed two data centers.

Serverless databases

Serverless databases take PaaS one step further, removing even more administrative requirements from the customer and creating an easier-to-use experience. AWS DynamoDB, Aurora Serverless, and Azure Cosmos DB are all examples of such databases.

With a serverless database, you don’t have to configure anything. You literally start putting data in. Compute and storage resources, as well as database partitioning decisions, will be automatically decided and provisioned for you. It’s so “magical” that many people assume backups are handled automatically, but that’s not always the case.

As with PaaS databases, backup methods are dictated by the vendor offering the database. So the best practice is the same: research the best backup methods for the database you’re using and deploy them. Make sure you understand how to copy data to another region and account.

The cloud isn’t magic, but it has certainly made backing up and restoring databases easier. The ease with which you can create a backup of an entire database, even a partitioned database residing on hundreds of nodes, is much easier than what is required when managing everything in a traditional environment. Don’t let things get so easy that you start assuming things. This is always a recipe for backup disaster.

Join the Network World communities on Facebook and LinkedIn to comment on topics that matter to you.

Copyright © 2022 IDG Communications, Inc.

Maria H. Underwood