Where snapshots and backups fit in your data management strategy
When NetApp introduced snapshots in the ’90s, it was ridiculed by many as a gimmick that shouldn’t be considered in the enterprise. Now competitors are claiming “me too” and trying to make snapshots a checkbox item while avoiding the facts of how their solutions actually work. This is causing people across the industry to finally ask the right question: Where do snapshots fit in my environment, and where does traditional backup fit in my data management strategy?
This is about where snapshots and where traditional backups fit into data center data protection designs. It is a legitimate question that has had two extreme answers depending on who you talked to. In part one (this post), I’m going to try to describe some concepts that I normally scribble on a whiteboard or draw out in Visio for IT managers, so we’ll see how this goes. In part two I will try to put all concepts into their correct place in a data management scheme.
The basic concepts involved here are:
- Backup (Off-host backup)
- DR copies (Off-site backup)
Snapshots are simply a freeze frame of your data or a “snapshot in time” that a storage array can hold for hours, days, weeks, etc depending on how the array is designed and the data change rate. These snapshots are usually low- to no-impact on production data when taken, but allow for an admin, or possibly a user (careful!) to restore files or entire volumes quickly. If the array is not designed optimally at its core, snapshots can start to impact production performance if the data changes significantly from what is in the snapshot (NetApp does not have this problem).
Backup (remember off-host backup) is simply when you make a secondary copy of data on a different host and storage from the production copy. This is important because if the host or storage fails, you don’t want all your production and backup copies to fail together. This can happen with snapshot copies because they reside on the same storage systems as the production copy. Granted, full storage array failures are so rare, it can be difficult to compel an architect to plan for it. However, people are human and make mistakes, even if the storage array didn’t do anything wrong. We’ve all seen that in action… So a backup copy is important as a second line of defense. I won’t go into all the different ways to do a backup and store a backup, but suffice to say, a backup copy is important. (Yes NetApp-fanatics, I’m hinting at SnapVault.)
Replication is often confused with backup because it does indeed create a second copy of the data on a different storage array. The problem is that replication is really intended to be a long-range version of clustering. If one site goes down, the other comes up with the most recent version of data that has been continuously copied from the production data. It is not intended to be used as a backup copy and shouldn’t be. Think of replication as a Microsoft cluster for the entire data center instead of one server.
What can also cause confusion about replication is when snapshots are replicated with the volumes to the DR site. Some administrators and solutions architects try to say that the replicated copy of the snapshots is a backup because they are a stationary copy of the data and are off-host. The problem with using replicated snapshot copies as backup is that if something happens to the primary copy, it will be replicated immediately to the secondary copy and could possibly blow away all your data completely. That’s not good enough.
DR copies (remember “off-site backup”) are the last line of defense. If your production, snapshot and backup copies all reside in the same location or geographical proximity, it’s possible that something like a long term power outage could take them all down at once and freakishly corrupt your replicated copies. Hey, it could happen. In order to protect the data, the DR copy must be a specified minimum distance (usually several miles) away from the primary data center and not be tied to the active production data. Think of what is described above as a backup, but physically kept at that distance away for access when needed. The data does not necessarily have to be in a form that is ready to turn on right away. It could be on disk, tape or punch card... OK, probably not punch card. What does need to be considered is what the Service-Level Agreement (SLA) on restore times is. If you store your DR copy on tape, having a one-hour SLA for DR restores just isn’t going to happen!