A Look at MemSQL’s Memory First Database Software
Eric Frenkiel, CEO and co-founder of MemSQLNamecontacted me to introduce himself and his company after reading some of my recent comments on big data, distributed SQL database products, and memory-based database products.
Where is the bottleneck?
Most of the time, the performance of database products is limited by the performance of the underlying storage and network infrastructure. Traditional RDMBS products from vendors such as Oracle, IBM, Microsoft, and others have done their best to address this performance bottleneck by extracting data into memory caches.
Storage device vendors have developed flash storage appliances, in-system caches, storage server caches, and flash storage devices to solve the same bottleneck using a different approach.
The MemSQL approach
Fenkiel thinks a better approach was to create a new generation of database software designed from the ground up to support system memory on a cluster of systems. According to him, this would be the best approach for extreme transactional systems, analytical systems or even big data applications. Thus, the architecture of MemSQL is based on using system memory first and other storage mechanisms second.
MemSQL also supports tiered storage. MemSQL’s tiered storage architecture allows the use of an in-memory row store to send columns of data to flash or disk stores. The software automatically moves data from memory to flash to disk as needed based on policies.
One of the things I was most interested in was MemSQL’s approach to parsing and executing SQL commands. Fenkiel said MemSQL uses patented code generation technology to create a query execution plan that eliminates the need for interpretation along hot code paths. That is, the repetitive SQL code is interpreted once and then executed as machine code afterwards.
The database engine was also designed from the ground up to live in a highly distributed cluster environment. MemSQL’s distributed query optimizer allows queries to be parsed and then decomposed to run in parallel. It also uses multi-version concurrency control and lockless data structures to enable highly concurrent data access without locking or sacrificing consistency.
Most database companies I’ve spoken to recently have focused on taking some established open-source database software, such as MySQL or PostgreSQL, and extending it with technology that allows it to handle multiple systems efficiently. While this can dramatically improve overall performance and/or scalability, the architecture is still based on a philosophy that data must reside on disks somewhere. The best of these products keep data in memory or in flash caches some of the time, but their design still demands that it eventually end up on disk.
MemSQL looked at the problem that everyone else looked at, but thought of it differently. Why not automatically place the same data in several different systems? This way, consistency and reliability can be maintained while delivering even higher levels of performance.
I recommend that companies that need to develop high-performance transactional or analytical systems that rely on SQL databases consider memSQL’s approach.