In-memory databases

Posted on May 27, 2010 by GenieDB

There’s been a recent rise in interest in “in-memory databases”. The reasoning given is that the cost of synching commits to disk is high, and this is the bottleneck in write operations; ACID databases require that a commit is confirmed written to disk – which often actually requires two or more disk writes, each with a seek penalty of a few milliseconds. Therefore, on-disk databases struggle to commit more than a few hundred updates per second, unless you invest in very expensively large RAID stripe sets.

Reads aren’t an issue, as every disk-based database does caching in memory. If your database is large enough to fit in memory, or access to it is mainly concentrated on a subset that’s small enough to fit in memory, reads are just as fast as any in-memory database. It’s writes that are the issue, and an in-memory database can update records very
quickly indeed.

However, in-memory databases suffer a downside: if you reboot the server for any reason, everything is lost. Therefore they often offer the facility to snapshot the state to disk periodically, and to restart from a saved disk snapshot; in the event of failure, only recent updates are lost. Some go further and offer the ability to log updates since the last snapshot to a file, too, so that they can be replayed on top of the snapshot. People who cannot afford to lose an update, ever, can even request that the log file is synchronised after each update is logged to it, producing the reliability levels of a disk-based database – and the same performance, because you now have a disk-based database, after all.

While at the other end of the spectrum, many disk-based databases offer in-memory tables for non-critical data.

In other words, both sides have come full circle and ended up really being indistinguishable; disk-based databases sometimes offer in-memory tables, while in-memory databases sometimes offer fully durable updates. The technologies aren’t so inherently different, despite what some of the recent hype might suggest. It’s not quite right to think of database products as being “in-memory” versus “disk-based”; it’s more a distinction that applies to individual tables. It’s note quite accurate to call Redis an in-memory database and MySQL a disk-based database…

GenieDB’s provides cloud MySQL database-as-a-service that improves database performance with increased availability, fast response time, and is elastically scalable. Database administrative tasks are automated so you can focus on your core business and application development.

  • http://www.geniedb.com/ Alaric Snell-Pym

    Sure, in that not having to commit writes to disk means you don’t need to be limited by disk bandwidth and the time taken to sync.

    But my point is that it’s silly to make an entire database be “in-memory” or not. The real issue is whether some of your data (eg, a table) needs to be persistent. Turn off persistence, and writes can go faster. Turn on persistence, and writes are limited by disks, but are persistent. Databases like Redis also let you fine-tune that a little, choosing degrees of persistence. And it shouldn’t affect read performance at all, once the cache has warmed up.

    Why can’t we have a single database, with some tables (or even just some records?) being persistent-but-slow-to-write, and some being fast-to-write-but-lossy, as a tweaking option? Why do we need an entire separate database?

  • http://www.seunosewa.com/ Seun Osewa

    In-memory databases are much faster in most cases, though.