This blog post will cover the varying levels of data caching and data persistence provided by various operating systems and database systems.
Cache - A device placed in front of another storage device that is used for temporary storage of data. The use of a hardware cache is often transparent to the end user and is generally smaller, faster, and less persistent than the backed storage device. Rather than reading/writing from/to a slower storage device (i.e. a hard disk), we read/write from/to the cache (i.e. RAM, NVRAM, etc.).
Reading from a cache is simple. If the datum we requested is in the cache, we simply read it from the cache. If not, we need to retrieve the datum from the storage device and place it into the cache (often a much slower operation).
There are different policies when we write to the cache, though...
Write-Through Cache Policy - When using this policy, writing over the cache also writes over the data on the backed storage device... synchronously and simulataneously. In the event of a catastrophic power failure, for example, the data written to the cache is guaranteed to exist on the backed storage device, as well.
Write-Back Cache Policy - When using this policy, writing over the cache marks the cache block as "dirty." The backed storage device is not updated until the entire cache becomes full and new data needs to be retrieved from the disk into the cache. At this point, to make room for additional space in the cache, one of the dirty cache blocks are written to the backed storage device. In the event of a catastrophic power failure, the data might or might not have time to be transferred to the backed storage device. While the write-back cache policy often performs better than a write-through cache policy, it is also often a source of data integrity and data durability concerns.
What happens when you write to a file?
No matter what operating system you use, no matter what database you use, when you modify a record in a database, something like this happens:
Cache - A device placed in front of another storage device that is used for temporary storage of data. The use of a hardware cache is often transparent to the end user and is generally smaller, faster, and less persistent than the backed storage device. Rather than reading/writing from/to a slower storage device (i.e. a hard disk), we read/write from/to the cache (i.e. RAM, NVRAM, etc.).
Reading from a cache is simple. If the datum we requested is in the cache, we simply read it from the cache. If not, we need to retrieve the datum from the storage device and place it into the cache (often a much slower operation).
There are different policies when we write to the cache, though...
Write-Through Cache Policy - When using this policy, writing over the cache also writes over the data on the backed storage device... synchronously and simulataneously. In the event of a catastrophic power failure, for example, the data written to the cache is guaranteed to exist on the backed storage device, as well.
Write-Back Cache Policy - When using this policy, writing over the cache marks the cache block as "dirty." The backed storage device is not updated until the entire cache becomes full and new data needs to be retrieved from the disk into the cache. At this point, to make room for additional space in the cache, one of the dirty cache blocks are written to the backed storage device. In the event of a catastrophic power failure, the data might or might not have time to be transferred to the backed storage device. While the write-back cache policy often performs better than a write-through cache policy, it is also often a source of data integrity and data durability concerns.
What happens when you write to a file?
No matter what operating system you use, no matter what database you use, when you modify a record in a database, something like this happens:
- The database writes to some sort of cache in memory
- If data persistence is needed (and it normally is), the database "lazily" tells the operating system to write the data to the hard disk. The operating system then determines when the data is actually to be written to the disk.
- If the database determines that it needs to be less lazy and force the disk to flush, it will tell the disk controller to do so. Many database systems force a flush every so often (say once a second). It is also common to flush to the disk when transactions are complete or when sensitive data has just been written to the database. When I say "flushing to disk," I'm talking about calling fsync(). If you're running Windows, I'm talking about writing to a file that was opened with the FILE_FLAG_WRITE_THROUGH flag.
At this point, your data has been given to the disk controller to be written to the hard disk at its convenience. Some disk controllers have caches, some don't. And, many hard disks have internal caches.
Now, we all know that it's much more complicated than that. Databases often queue batches of changes to improve performance, and to ensure data integrity, they also write queued database changes to some sort of append-only log file or journal. Nevertheless, whether writing data to a file or to a database, you eventually want to write to the disk before telling the client that their data is "safe."
Unfortunately, because of varying levels of caching when flushing data to the disk (as we did in step #3 above), there is still a chance that our data will remain unsafe.
Possible Data Persistence Problems
Problem #1 - If your hard disk drive has a write-back cache, your data is at the mercy of the hard disk. Most hard disk caches are supported by no more than a small capacitor, and there is often not enough time to write the entire cache to disk in the event of a power failure; therefore, power failure to the hard disk could mean corruption/loss of data. Typical workstations often have hard disks with write-back caches. Using hard disks with write-through caches solve this problem, or on some disks, you can disable the cache entirely within the operating system.
Problem #2 - If your hard disk controller has a write-back cache, your data is at the mercy of the hard disk controller. This problem is similar to #1 above. Fortunately, some disk controllers have battery-backed write-back caches. So long as these batteries don't die, your data remains safe, and you get the added performance benefit of a write-back cache.
Problem #3 - If your database doesn't force a flush to disk (using fsync() or other OS-specific methods), then the data can be cached by the operating system and is susceptible to loss.
Problem #4 - None of this really matters if your hard disk drive takes a crap. In this situation, data loss/corruption is almost guaranteed unless your data is mirrored on other hard disks.
Solutions for each Problem:
- Ensure that your disk drives don't use write-back caching policies. Write-through caching or no caching at all is okay. Remember, many commodity disk drives use write-back caching.
- Ensure that your disk controllers use a write-through cache policy or a battery-backed write-back cache. Even in the event of a disk controller failure, a well-designed battery-backed write-back cache card can be placed into another disk controller.
- Force your database to flush to the disk as often as necessary. If data persistence is extraordinarily important, you should fsync() to the disk each time the database is modified. It should be noted that the performance penalty associated with constantly syncing is significant; yet, many databases like SQL Server, PostgreSQL, etc. will flush to the disk quite frequently. Databases like Redis allow you to tweak how frequently an fsync() should occur.
- Ensure that your data is mirrored across multiple hard disk drives to ensure that the failure of a few hard disk drives won't destroy your data.
Additional Resources:
http://www.postgresql.org/docs/8.3/static/wal-reliability.html - my favorite article on the topic; discusses when PostgreSQL flushes to disk
http://support.microsoft.com/kb/234656 - another good one
http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html - compares fsync() and other flushing methods for various operating systems
http://support.microsoft.com/kb/78363 - when SQL Server flushes dirty pages to disk
https://forums.aws.amazon.com/thread.jspa?threadID=27590 - clarifies EBS durability
http://redis.io/topics/persistence - when Redis flushes to disk
http://www.slideshare.net/iammutex/what-every-data-programmer-needs-to-know-about-disks - A slideshow that provides some insight into this problem
http://stackoverflow.com/questions/2025694/improve-write-speed-for-high-speed-file-copy - Some benchmarks of FILE_FLAG_WRITE_THROUGH
Comments