Software developers today have more options available to them. They have tools and services that can help them quickly create new applications, then launch those services to customers around the world, and then scale them to meet growing demand. Microservices architectures and agile development emphasize faster evolution and the creation of new services whenever customer needs and business needs need to be met.
This also applies to data. Developers must support the data created by their applications, which means implementing a database. Choosing the right design can make all the difference to the application; this guarantees that the application will be available, efficient and scalable over time. However, developers don’t want to have to implement and maintain databases themselves. That’s why the majority of enterprises (90%, according to IDC) are migrating their databases and data workloads to the cloud.
For these companies, several different options are available. These include managed services, cloud-based database installations, and database-as-a-service (DBaaS) offerings. These services all promise to ease the burden of data management and help developers achieve their goals of delivering new apps and app updates faster. Phrases like “schema-free” and “fully managed” can make databases look like they can be ported, lulling developers into complacency.
In reality, developers are just as responsible for cloud infrastructure as they are for traditional on-premises systems, especially when it comes to design choices and how to implement the database. It’s not just about believing that the default settings of DBaaS products are suitable for their applications.
Choose the right database
Application developers and architects should therefore consider the long-term future of their application projects and ensure that they understand the basic requirements that these projects will have. The first question is which database design to use for the project.
There are so many database options available today that the choices quickly become overwhelming. The DB-Engines ranking lists 359 different databases, for example, so there’s plenty of temptation to use a database you already know, or one that makes big promises about what it will deliver. If you’ve implemented MongoDB, for example, why not use that same database for your next project?
However, there is no guarantee that what worked for one application will work for another. Some databases and data management approaches are better suited for specific use cases, such as graph and time series databases, and others may be better suited depending on the programming language or software development resources that will be used. Although it is possible to force a database deployment that is unsuitable for a use case, the wrong choice can seriously reduce performance and increase costs.
Choosing the right database requires understanding how an application workload will behave over time, how it will grow, and how access patterns might change. As any database implementation grows, it will need to handle more queries and more stored data. Getting the right approach in place from the start can make it easier to handle more queries on that data. Ignoring this and relying on the database service to handle it on your behalf might work fine at first, but it can dramatically affect performance and cost later on. Spending time planning ahead can therefore lead to significant cost reductions in the longer term.
How to think about database design
Taking a schema-less approach appeals to many developers. After all, if you let the database service organize the data, you don’t have to. However, this is not really the case. All database vendors, even those offering “schemaless” approaches using JSON or the ability to add objects, encourage some form of schema validation. Schemaless databases store information as unstructured data, which has a significant performance and cost impact as the implementation grows.
Even the smallest decisions can have a big impact as databases evolve. Take data formats, for example. Imagine you have a form in your application that will accept data inputs, such as the country a person lives in. What format should you use?
Country names vary in length, so assume an average of 12 characters for input. Store this data in a character variable (
varchar) with a UTF character set will occupy three bytes per character, or 39 bytes in total for each entry. That doesn’t sound like a lot, but let’s compare that with usage
enum for this same field: A
int only requires four bytes in total for each input, while a
enum only takes one byte. Increase that up to 100 million data points, and the
varchar option would take 3700 megabytes (MB) of space, while option
enum the option would require 95MB, a reduction of 97.5%.
The amount of data you store has a bigger impact than increasing the disk space you use. When you have more data to work with, you normally increase the image of the machine you use to process that data in memory. If you take a less efficient approach to data, you will need to increase CPU and memory resources to process the data. Although the cost of storing terabytes of data on disk is relatively cheap, the cost of CPU and computing time is high, so you should try to take the most efficient approach possible.
Along with this, it is important to consider data access patterns. How you plan to search for data will affect how you design your database. If you plan to have common search queries for your application, you can create indexes that can improve performance. Likewise, you may find that your user behavior changes over time and certain queries become more popular. To handle this you need to revisit these models as the queries and indexes you have in place will not be what you need in the future.
An important element here is that the design of the database is potentially hard to think through. However, you can make your job easier if you keep your deployment as simple as possible rather than trying to accommodate potential edge cases or future requirements. It is always possible to extend your database schema or expand your deployment in the future, rather than focusing on future needs now.
Think before you build
What you decide before you start coding will have the biggest impact on your scalability and stability, compared to any decision you make during the lifetime of a project. So it’s important to give your data, and what you choose to use to manage that data, the respect it deserves.
Rather than handing over all the responsibility to a cloud service or third-party provider, understand what you want to achieve and how best to achieve that goal. However, you do not relinquish responsibility for this decision by choosing a service, and you trade flexibility for performance and cost. Simply adding more cloud resources is not an efficient approach to scaling. The database and design choices you choose will impact the success of your new app or service over time.
Matt Yonkovit is head of open source strategy at Percona.
The New Tech Forum provides a venue to explore and discuss emerging enterprise technologies with unprecedented depth and breadth. The selection is subjective, based on our selection of the technologies that we think are important and most interesting for InfoWorld readers. InfoWorld does not accept marketing materials for publication and reserves the right to edit all contributed content. Send all inquiries to [email protected]