EBS Snapshots

Taking a look at what EBS snapshots are and how they work

Matt Houser avatar
Written by Matt Houser
Updated over a week ago

At a very high level, an EBS snapshot is a "copy" of the data on an EBS volume. But snapshots only contain written blocks from the volume, and they only contain blocks which have changed since the last snapshot of the same volume. Think of it as an incremental backup of the volume.

It is possible to create a new volume from a snapshot. When you create a volume from a snapshot, the volume will contain all data from all snapshots up to and including the snapshot you're making the volume from.

Example #1

On Monday morning you create a new volume that is 10 GB in size, and you fill it with 5 GB of data. On Monday evening, you take a snapshot of this volume. Let's call this snapshot #1. Snapshot #1 will contain 5 GB of data.

On Tuesday morning, you modify 1 GB of the data added on Monday, and you add 1 GB of additional data. On Tuesday evening, you take another snapshot of this volume. Let's call this snapshot #2. Snapshot #2 will contain 2 GB of data. But it will have access to the 5 GB of data from snapshot #1.

At this point in time, you have 7 GB of data in snapshots, but only 6 GB of data on your volume. This is because you still have 1 GB of the "old" data saved in your first snapshot.

If you create a volume from snapshot #1, your new volume will contain 5 GB of data. This includes data from only snapshot #1.

If you create a volume from snapshot #2, your new volume will contain 6 GB of data. This includes data from snapshots #1 and #2.

If you delete the latest snapshot of a volume, you'll lose all snapshot data from after the previous snapshot (or all snapshot data if you delete the only snapshot of a volume).

If you delete a snapshot of a volume which is not the latest, then the data is not really deleted. Instead, the data is consolidated with the next later snapshot.

So it is safe to delete old snapshots. You just lose the ability to create new volumes based on that older point-in-time.

Example #1 continued

Returning to our example, if we deleted snapshot #1, we would not lose the ability to create a volume of snapshot #2. A new volume from snapshot #2 would still be the same as before.

It is possible to create a snapshot of a volume which is attached to a running EC2 instance. However, there are risks involved with doing that.

When an EBS snapshot is being created, there exists a instantaneous point in time (we'll call it T0) where everything saved to the EBS volume before T0 is included in the snapshot, and everything saved to the EBS volume after T0 is not included in the snapshot.

If the instance is running, and some running program on the instance was writing to a file at T0, then the snapshot will only contain part of that file. If you created a new volume from that snapshot, that file could be corrupted because only part of the data is present. Depending on the type of file and the program that's using that file, that particular data may be recoverable, or it may not be.

Example #2

At 12:00pm, Microsoft Word was running on your EC2 instance and it started saving a list of your favourite things to MyFavouriteThings.docx.

We'll pretend that you have many favourite things, so this is a very long list. Saving does not complete until 12:05pm.

If a snapshot was created at 12:03pm, then MyFavouriteThings.docx would not be complete in your spreadsheet. On your already running instance, you'd be able to open the file without issue. But if you created a new volume from your snapshot, opening MyFavouriteThings.docs would most likely fail because it would be a corrupted and incomplete file.

Things also get a little more complicated because of operating systems caching data in memory before actually committing the data to "disk".

For these reasons, it is recommended that do one of the following before taking a snapshot:

  • Stop your running instance. This will ensure that all programs have completed and all data is flushed to the volume.

  • Detaching the volume from your instance. This requires making sure all programs are done writing to the volume, then unmounting the drive associated with the volume from within the operating system.

However, when the above is not possible or not practical, taking a snapshot is arguably better than not taking a snapshot.

Additional References

Did this answer your question?