Batching Database Commands in .NET 6

From the beginning, a core feature of .NET was the database access framework known as ADO.NET or System.Data. The purpose of this framework is to ensure that all databases are accessible in a consistent manner. However, there are occasional gaps that need to be filled.

One such shortcoming is the ability to send multiple SQL statements in a single batch. Although some databases such as SQL Server allow statements to be separated by a semicolon, this is not available for all databases. Shay Rojansky writes,

The problem with this approach is that most databases require separate protocol messages for each statement (e.g. PostgreSQL, MySQL), forcing the database’s ADO.NET provider to parse the SQL client side and divide into semicolons. This is both unreliable (SQL parsing is difficult) and bad for performance – ideally an ADO.NET provider should just pass user-provided SQL to the database, without parsing it or rewrite.

And even SQL Server has limitations, such as supporting multiple stored procedure calls in a single batch.

The .NET 6 solution to this is the new DbBatch class. The goals of the batch processing API proposal are;

  • Provides a structured way to execute multiple SQL statements in a single round trip, without the need for client-side SQL parsing.
  • Keep the API consistent with other ADO.NET APIs, and specifically close to DbCommand (both are “executable”), to reduce conceptual complexity for adoption.
  • Allow mixing of different types of statements in the same batch (insert, update, select). The current concatenation approach supports this, as does our read API (multiple result sets).
  • Provide non-aggregated access to the number of rows affected, for each individual order in the batch.

Direct use of a DbBatch

Under this model, you start by creating a database specific DbBatch class. For example, if you are targeting PostgreSQL, you would create an instance of NpgsqlBatch. For MySQL, the class is called MySqlBatch. (For generic programming, you can also use a DbProviderFactory to create the appropriate DbBatch subclass for you.)

On the DbBatch class you add your connection and optional transaction objects. Then you add your list of DbBatchCommand objects.

A DbBatchCommand is a lighter version of DbCommand. Basically, it’s little more than the SQL string and parameter collection, with everything else handled at the DbBatch level.

At this point, you can call a runtime method on the DbBatch. The three basic options are:

  • ExecuteNonQuery: returns the total number of rows affected.
  • ExecuteScalar: returns the first column of the first row of the first result set.
  • ExecuteReader: returns all result sets in order.

When using ExecuteReaderonly the first set of results will be visible until you call DbDataReader.NextResult. At this point, the first result set is lost and the next becomes visible. Repeat until all result sets are consumed and NextResult returns false.

After running a DbBatcheach DbBatchCommand will have his RecordsAffected set of properties. Exactly when this is populated is implementation dependent and may be delayed until all result sets are consumed.

DbBatch and ORM

Most MNOs need to do batch operations for efficiency. So theoretically they could see a performance boost by switching to the DbBatch API. This would be done internally with no visible change to the ORM. This would essentially be a free performance boost for app developers.

PostgreSQL implementation

The official PostgreSQL drivers for .NET have been updated to support batch processing API. With this change came the option of raw SQL mode.

As mentioned above, the PostgreSQL driver needs to parse the SQL string for semicolons so it can convert the batch to single statements. This parsing can be quite expensive, as it must properly handle things like embedded semicolons in string literals.

The analysis also addresses the named parameter problem. PostgreSQL does not natively support named parameters, so the client-side parser must also convert named parameters to positional parameters.

By switching to raw SQL mode, parsing is disabled. This offers potential performance improvements, but you lose the ability to use batch processing through NpgsqlCommand, which is an even bigger performance issue.

Once explicit batch processing became possible through NpgsqlBatch, raw SQL mode also became a viable option. To use this mode, you don’t need to assign a name to any parameter.

If you don’t have parameters, you have to use one of these commands,

AppContext.SetSwitch("Npgsql.EnableSqlRewriting", false);

cmd.Parameters.Add(new() { Value = 0 });

The AppContext version changes the setting globally.

MySQL implementation

The MySQL implementation is offered as part of MySqlConnector. This library is generally considered better than the official Oracle version and also supports MariaDB.

The MySqlBatch feature was actually completed in 2019, but didn’t get much attention because the corresponding framework API was removed from .NET Core 3. When it was finally released with .NET 6, the only significant change was to mark it as inheriting from the corresponding base classes.

SQL server implementation

The SQL Server implementation of this feature lags behind the others. Although planning began in 2018, key decisions have yet to be made. The last update in December stated that this was a management issue,

The api is in net 6 so what we need is a net6 build target and a policy decision from the MS side on the api will be available in other versions where base classes are not available.

As for the implementation, many basic features are already in place. A similar inner class known as SqlCommandSet has been in use for a very long time, but can only be accessed indirectly through SqlDataAdapter, which few developers even know exists.

You can track the progress of this in the ticket named SqlClient: Implementing the Optimized Version of the New ADO.NET Batch API.

Maria H. Underwood