Oracle Boosts MySQL HeatWave’s Thermostat for In-Database Machine Learning
To further reinforce our commitment to providing cutting-edge data technology coverage, VentureBeat is delighted to welcome Andrew Brust and Tony Baer as regular contributors. Monitor their posts in the data pipeline.
Since acquiring Sun Microsystems over a decade ago, Oracle has owned MySQL. Under Oracle’s watch, MySQL remained separate. But unless you’re MariaDB, until a few years ago few people thought much about Oracle management. And since each of the major cloud providers has deployed its own managed MySQL database services, Oracle has provided relatively few reasons to attract customers to MySQL.
Well, it’s not anymore. Fifteen months ago, Oracle introduced MySQL HeatWave with its own optimized implementation of MySQL running in Oracle Cloud Infrastructure (OCI, aka Oracle’s public cloud platform). These optimizations must be transparent to the application. And now Oracle is releasing version 3.0 of HeatWave, increasing the size of nodes which will reduce the costs of a number of workloads, and introducing machine learning into the database, which could benefit from higher density data nodes.
HeatWave is not open source MySQL, as it differs from extensions developed by Oracle (described below). This isn’t particularly unusual in open source, as Amazon Aurora and Azure PostgreSQL Hyperscale, not to mention countless other PostgreSQL variants on the market, show that open source databases provide a clean slate for differentiation.
How Oracle Makes HeatWave (and MySQL) Different
Striving to become a strong contender in the MySQL space, Oracle has taken the database in a unique direction with HeatWave: it has been optimized for analytics in addition to transaction processing by leveraging the support MySQL support for pluggable storage engines. In this case, it plugged in an in-memory column store engine that works side-by-side with the row store, incorporating optimizations tailored to analytical query processing.
Connecting a columnar storage engine working side-by-side with a row-oriented engine is not unusual; MariaDB did, and in fact, Oracle took a similar path but with a different technology for its flagship database several years ago. But to date, Oracle is the only one to offer an analytical engine optimized for MySQL.
In the latest release, Oracle introduced enhancements to reduce computational costs and integrate machine learning into the database.
Let’s start with running costs. HeatWave version 3.0 doubles the data density in each compute node without changing pricing. So now you can consume (pay) only half the number of nodes to compute the same workload. And, by the way, Oracle laid the groundwork for all of this in the previous release of HeatWave 2.0, where it doubled the maximum upper limit of HeatWave clusters to 64 nodes.
The combination of computational cost-effectiveness and scale should be useful now that machine learning models can be run in the database. Hold that thought.
Beyond data density, HeatWave 3.0 makes scaling more economical, in that you can add any number of nodes (up to a maximum of 64) per increment. This is consistent with what Oracle introduced for its Autonomous Database cloud service, getting rid of so-called “standard t-shirt sizes”. So elasticity with HeatWave doesn’t mean you have to double the number of active nodes every time your workload explodes in compute. HeatWave also improves availability when resizing, with at most a few microseconds while polling is paused.
HeatWave 3.0 adds a few tricks to further speed up processing. Like any columnar storage engine, HeatWave makes extensive use of data compression. And it applies some common techniques such as Bloom filters which reduce the amount of buffer memory required for query processing. Specifically, HeatWave has implemented Blocked Bloom filters that can perform the necessary data lookups with much less overhead, significantly reducing the amount of buffer memory required.
These capabilities, in turn, pave the way for Oracle to introduce the ability to process machine learning models in the database, without the need for an external ETL engine or learning runtime environment. Automatique. And in doing so, Oracle is following a trend that has also been seen in AWS (Amazon Redshift ML), Google (BigQuery ML), Microsoft (SQL Server with R and Python functions in the database), Snowflake (with Snowpark), and Teradata (via extended SQL). But comparing these approaches is like comparing apples and oranges, as each vendor takes different paths from developing models externally to providing curated and limited choices for ML execution, while others extend SQL to it. -same.
Heatwave is going the organized way. It’s a suitable approach for business analysts or “citizen data scientists” to democratize machine learning in the same way that self-service visualization has put BI in the hands of the average user. In contrast, the external track is for data scientists in organizations competing on their ability to develop their own unique and highly sophisticated models.
An advantage of the curated approach is that it does not require external tools, which means that the selection, configuration, training and execution of ML models is done entirely inside the database. . This eliminates the overhead and cost of moving data to ML tools or services running on separate nodes. Oracle also touts that keeping everything in the database reduces potential attack surfaces and therefore reduces security exposure.
Here’s how HeatWave’s AutoML approach works. The user chooses the table, columns, and type of algorithm (eg, regression or classification), then specifies where the model artifacts should be stored. The system automatically determines the best algorithm, appropriate features, and optimal hyperparameters and generates a fitted model.
How AutoML works in HeatWave
It streamlines key steps; For example, when testing a candidate model, it separates the individual tasks or steps that the model performs, with each step evaluated using proxies or stubs that simulate the algorithm against a representative sample of hyperparameters. It then automatically documents the choice of data, algorithms, and hyperparameters to make the model explainable, as shown in the figure below.
The benefit of in-database ML processing is a flatter architecture and the elimination of the overhead of moving data. Although the downside of integrating any application processing into the database is greater processing overhead, there are several design features that make these issues moot.
The cloud-native architecture, which allows compute to scale as needed, eliminates the contention problem for limited resources. Also, most cloud analytics platforms that support in-database ML streamline or only support limited model libraries to prevent the AI equivalent of workload from hell , especially for training runs which tend to be the longest and most computationally intensive. Oracle has released ML benchmarks for HeatWave 3.0 which are available on GitHub for customers and prospects to run themselves and verify.
Oracle’s introduction of ML processing in HeatWave complements an ML-related feature from its latest release, version 2.0 last summer. This release included MySQL Autopilot, which uses internal machine learning to help customers operate the database, such as suggesting how to provision and load the database, while providing closed-loop automation for fault management. / error recovery and query execution.
With version 3.0, MySQL HeatWave comes full circle, using ML to help run the database and support running ML models within it. This is another example of a prediction I made for this year, that machine learning will take center stage, both to optimize database operation and to provide customers with the ability to develop and/or run models in the database.
VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Learn more about membership.