The search for a database that can do it all • The Register

Analysis Under the scorching Nevada summer sun, Snowflake promised last week that it would bring together two ways of working with data that mix as well as oil and water.

The data warehouse provider – well known for its stratospheric $120 billion post-IPO valuation – said it would support both analytical and transactional workloads in the same system.

Launching at the Snowflake Summit 2022 in Vegas, Unistore would be the “foundation for another wave of innovation in the Snowflake Data Cloud,” said Christian Kleinerman, senior vice president of product. “Similar to how we have redefined data lakes and data warehouses for our customers, Unistore is ushering in a renaissance of building and deploying a new generation of applications in the Data Cloud,” he said. he declares.

The problem with promises of tech innovation is that they can – much like Snowflake’s market capitalization – be deflated: the company is currently valued at $38 billion.

Snowflake’s row-based storage engine must support analysis of transactional data. Only available in preview, Unistore would let developers create “a data pipeline to pull all that data into Snowflake.” [and then] it’s all in Snowflake and it’s easy to manage,” Carl Perry, director of product management, said on a media call. Or they can develop transactional applications directly in Snowflake’s platform, which includes the Snowpark development framework.

But for some, the promise of “innovation” rang hollow. Following the Snowflake announcement, Domenic Ravita, product marketing manager at database company SingleStore, caught on Twitter to point out that his company had a patent on an approach that, at first glance, might look like Snowflake’s.

Talk to The register, he explained that in 2019, SingleStore launched the first version of the SingleStore database to support both data structures – row store and column store – in a single type of table in the database . “What matters is that you’ve just created a table and you automatically get the benefits of OLTP and OLAP with the data structure and tiered storage,” he said.

SingleStore counts Uber, Kellogg’s and engineering giant GE among its customers. The company was founded in 2011 as MemSQL by former Facebook and Microsoft engineers Adam Prout (CTO) and Nikita Shamgunov, who remains on the board but is also CEO of Neon, which supports serverless Postgres. . The first product was an in-memory transactional database of the same name, released in 2013.

Ravita said that in 2014, SingleStore started working on an in-memory row store and an on-disk column store with tiered storage, “meaning that transactions hit memory first and then they are transferred to disk storage”.

Part of the reason was to regain control of the proliferation of database categories that have populated the modern stack as it cracks beneath the scale of global internet-based applications.

“We need a database to search only text: Elastic Search. We need a database only to scale read volumes: we use Redis. We need a database only for documents, catalogs and tweets: we use MongoDB or Couchbase document The problem with that is now, if you have a modern SaaS application, you have a complex collection of Somehow you accidentally stumbled into creating your own distributed database from these other databases and now you’re a database designer by accident,” Ravita said.

SingleStore’s approach to supporting transactional and analytical workloads on a single data store is now called Universal Storage and was granted a US patent in July 2021.

Ravita said that regarding its patent, the company will take a wait-and-see approach to Snowflake’s Unistore.

“It’s not very clear and [Unistore] is not yet available. We are waiting to see the sequel there. We’ve invested over eight years in our technology and our patent was granted last year, so we’ll see. But our first answer is: welcome to the party and may the best database win.”

Snowflake declined the opportunity to contribute to this article.

The value of using a single database for different workloads is not just in the simplicity of design and support. There’s also an economic driver, especially with the advent of cloud computing where users may end up paying for moving, storing and processing data, Ravita said.

The point is supported by GigaOM. The research firm’s field test showed that SingleStoreDB offered a 50% savings over three years compared to the Snowflake-MySQL stack and a 60% savings over the same period compared to the AWS Redshift-PostgreSQL stack . Meanwhile, its TPC-H workloads were 100% faster than Redshift.

Performance aside, SingleStore isn’t the only company claiming to run analytics from a transactional database. For example, MongoDB column store indexing for its document database to help developers create analytical queries in their applications.

Oracle has its Heatwave for MySQL product, which, running on Oracle Cloud Infrastructure, helps customers run analytics on transactional applications without having to export data to a specialized analytics system such as Teradata, Snowflake, or AWS Redshift.

Meanwhile, SAP has been talking about real-time analytics since 2011 and bases its concept on its HANA in-memory database, which supports the latest iteration of SAP’s enterprise applications.

Ravita said SAP HANA moves data “under the covers” between data storage types with database architecture. This move, he said, is “in the way of hot trades.”

“As far as we know, we are the only customer-used production database on the planet that unifies transactions and analytics into a single type of storage.”

A SAP spokesperson said: “We do not move data between ‘stores’, creating multiple copies of data. We have one source/copy of data that is stored optimally for transactional and analytical performance, what matters.”

Andy Pavlo, associate professor of databases at Carnegie Mellon University, said that while Snowflake’s claims about analyzing live transactional data may not be entirely valid, SingleStore is not alone. combined database.

Snowflake introduces Hybrid Tables, which it says “provides fast single-line operations and enables customers to build transactional business applications directly on Snowflake… [and] allow customers to perform rapid analysis on transactional data for immediate context.”

Pavlo said: “At a high level, Snowflake and SingleStore – and others – do the same thing. They use a row store for transactional updates and then a column store for data that targets analytical queries .”

“The fact that Snowflake calls Unistore tables “hybrid tables” is telling. This means that they probably store data in row and column format. They probably add updates to Unistore-specific log-structured storage and then move batches to their storage column,” he added.

The approach is similar to Vertica’s write-optimized storage (WOS), which has been available since the late 2000s, while Google’s new Napa DBMS from 2021 does something similar, the academic said.

“From what I can tell, Snowflake’s Unistore doesn’t do what SingleStore describes as its ‘universal storage’ architecture, despite similar naming,” he said.

However, he cautioned SingleStore against a posture based on intellectual property rights. “Although I haven’t read SingleStore’s patent, if I were them, I wouldn’t run around in a menacing way like that. I don’t think their patent claims would withstand litigation, assuming it matches what they describe in their blog A good lawyer should be able to get it invalidated There is a lot of prior art on using a single storage representation of a database of data to support both transactions and analytics that predate the SingleStore patent and implementation. Notable academic implementations are TUM’s HyPer from 2015 and Saarland’s OctopusDB from 2010” , Pavlo said in an email.

He also pointed out that in the SingleStoreDB approach, the row store is structured in logs, which could make reads more expensive. “It’s not unique to Unistore; all log-structured storage managers have this problem,” said Pavlo, who is also CEO of automated database tuning company OtterTune.

Still, there were real benefits to running both workloads in a single store, he said, as long as analytical queries didn’t interfere with the operational transaction workload.

But the advantage only holds for the use case where the analytical data comes from a transactional system rather than a data warehouse combining data from multiple upstream databases, as is often the case.

Doug Henschen, vice president and principal analyst at Constellation Research, said one thing database vendors — including Oracle, MongoDB and Snowflake — had in common was a desire to expand their capabilities to prevent customers to turn to third-party products to meet their needs.

“The rise of ‘modern cloud-native applications’ has increased the demand for such versatility. However, customers will still choose their database/database service based on their use case and needs. main needs. MongoDB, for example, will still primarily appeal to developers looking for an agile platform for application development. It now has a fairly compelling feature set for operational analytics, but it is not an SQL data warehouse/mart platform capable of serving as a foundation for BI and analytics.

“Conversely, Snowflake is adamant that it’s not targeting the traditional transactional database market, it’s literally calling them ‘native applications’ where there is a need for real-time transactional and analytical data. »

In that sense, sellers were talking to their own customers, rather than the broader market, Henschen said. “Every database vendor promises their customers that they will be able to do more with their product, but I don’t see this as a change to the fundamental use case or the original purchaser of the product, although the use cases and user populations expand quite a bit as these new capabilities mature.”

While this may be good news for Snowflake customers, it falls short of the promised “wave of innovation”. Maybe that’s what happens when you try to build a Snowpark in the desert. ®

Maria H. Underwood