Real-Time Data and Database Design: Why You Should Sweat the Small Stuff

Choosing the right database when there are so many available shouldn’t just be about using what was easy for your last project.

Businesses today rely on their data. Their apps create it, they analyze it to find more opportunities, and it powers the experiences customers want. No company today says, “I wish I had less data.

Implementing the right database can make a huge difference in performance. This might involve knowing the right type of database for your use case, as well as how the details of any deployment will affect performance over time. Looking at the little things – from the version of drivers or application language you support to how you deploy – can have a significant impact on the number of clients your application supports concurrently. This can affect your approach to deploying real-time data.

See also: Enabling real-time applications with change data capture

Choosing the Right Database Platform

There are many options available today. According to the DB-Engines tracking service, there are 359 different databases that can be used in projects. Choosing the right database can make a huge difference to your performance over time, but configuring that implementation the right way can also have a huge impact.

In practice, this means getting the database design right, the right data types, and the right indexes. Choosing the right database when there are so many available shouldn’t just be about using what was easy for your last project. While it’s possible to hack most database setups to get what you want, the reality is that each database will have different strengths and weaknesses that will make them more or less viable for your current project.

For example, you may need the standard Atomicity, Consistency, Isolation, and Durability (ACID) characteristics for your transactions, which may point to relational databases such as MySQL or PostgreSQL. Alternatively, you might want to handle more data and scale, which would mean looking at NoSQL databases instead.

You might want to prioritize developer speed, or you might want to deal with specific data types like JSON, which would suggest MongoDB. In addition to databases designed for typical application workloads, there are also specialized databases for graph data and time series data that may also be better suited for your use cases. Understanding these areas is essential if you want to choose the right database.

Another database design consideration is the workload growth and access model your application will need. For some applications, you will be able to estimate your growth over time with good accuracy. Other apps will be less predictable as they will be based on customer usage and activity around a service – this may be lower than you originally thought or much higher. For customer-facing applications, the temptation is always there to over-specify just in case or over-complicate the design in an effort to future-proof the service. In response to this, the advice should be to keep things simple to start with and focus on what is needed today. You can always update the design to keep up with new query patterns or expanded usage when you need it.

Looking at the details

After selecting your database, you also need to consider how you implement it and keep it updated as well. For example, you can select MySQL as your database of choice. However, how you configure your DB instance with an application driver can affect request-per-second throughput and, therefore, your application’s performance.

Looking at MySQL and Python together, the version of the MySQL connector can affect performance. In testing, using MySQL with version 3.9.7 of the MySQL Python Connector performed significantly better than version 3.10.10, with a ~50% drop in transaction throughput. With version 3.10.0, the app delivered around 2,900 requests per second (RPS), while version 3.9.7 hit around 4,300 RPS. It was also lower than using mysqlclient as an alternate connector, where both versions hit around 4,750 QPS.

What did it show? In theory, this shows how a small decision like which application driver to use can affect performance. In reality, many app developers don’t look into driver version specifics and performance over time. Either they don’t think about these areas and follow them, or they consider a cloud-hosted service or a database as a service option where these decisions are effectively beyond their control.

So why is it important to know these kinds of details? This shows how much a small decision like which driver to use can affect application performance. More importantly, it can affect the ability to deliver the real-time experience customers want versus the budget you spend to achieve it.

Adding a fraction of a second of load time to a user transaction may not be noticeable on its own under normal circumstances. However, as the load on the database increases, the delivery time of a transaction increases. This would require you to upgrade your instance or add another node, which would be more expensive. If you are running your application using a cloud-native deployment, this is a prerequisite for scaling your service. Adding nodes to an app is cheap, but it’s not free. This direct cost can be wasted money on applying that budget to other more pressing concerns. Along with this, it can incur additional infrastructure costs as this data set grows.

For developers looking to build real-time applications, the temptation is to outsource everything to a third party and let them take care of the situation. However, this can lead to much higher costs and a lack of control over running your database of choice. While this may involve making choices around small things like language drivers, it can have a huge impact on your application’s performance and the cost of the infrastructure involved. Moving to new instances and scaling by credit card in the cloud is a short-term solution that may be the right answer for now; however, it is possible to deliver better performance at lower cost by asking the right questions in the first place.

Application workloads evolve over time: they grow, shrink, and need to expand with new requests or new use cases. Modern application design and infrastructure approaches can meet these needs, scaling rapidly to meet demand. However, these systems do not look for hidden problems and bottlenecks. It remains our responsibility to look for potential gaps in what we expect from performance – we should always ask ourselves why something has changed and dig in to find the answers.

Just because we can use automation to mitigate our problems doesn’t mean they’re solved. Instead, this approach can store problems and lead to much higher expenses over time. Instead, we need to consider these issues when dealing with real-time data. Little things can make a huge difference.

Maria H. Underwood