Streaming database vs Traditional database
In today's digital age, businesses have more data at their disposal than ever before. This wealth of information can be leveraged to make better decisions, optimize operations, and improve customer experiences. To make the most of this data, businesses need a database that is capable of handling large volumes of data in real-time. These requirements brought us to the revolution of several types of databases: NoSQL databases, NewSQL databases, time-series databases, in-memory databases, or in-memory data grids.
A streaming database (event streaming database) is of them which is designed to handle data that is continuously generated and processed in real-time. However, it is sometimes confusing which type of databases to choose to handle, process, and analyze your data in real time.
This post explores the need for a Streaming database, compares it to a traditional SQL database, and walks you through the similarities, differences, and use cases.
Need for the streaming database
A streaming architecture also requires a modern querying paradigm. With traditional database systems, queries are run against bounded sets of existing data. In each query, a single set of data is returned at the current time, and that’s it. To see that query over time, you need to run the same query again – and again. To get updated results, you need to execute queries repeatedly.
With streaming systems, a single query is written based on the knowledge that data with a certain structure exists. That query sits in memory in the form of materialized views and waits for the data. As data appears on one or more incoming data streams, that query processes the incoming data with the help of incremental updates and outputs results continuously in a never-ending fashion.
A streaming database does almost all of its work at write time. When data flows into a streaming database, it’s processed and immediately used to update the results of any registered queries. When applications want to read query results, they can look at all the ways that it has changed over time.
Unlike Extract, Transform, and Load (ETL) systems and integration technologies of the past, where things were batch-job oriented, real-time stream-processing systems run continuously, 24/7, and the engine behind the processing in those systems is the continuous query. Every time new records appear on a data stream, the query outputs new results. End-to-end latencies can range from microseconds to seconds, depending on how much processing is required, as opposed to the hours or even days typical of batch ETL solutions.
It’s important to understand that continuous queries aren’t limited to simply reading from a data stream. They can read from in-memory caches, from in-memory reference data that might have been stored, or via time window functions. They can read from other – even persistent – storage such as AWS S3, event, and data sources, as well, depending on the architecture of the streaming system.
For example, RisingWave is one of the fastest-growing open-source streaming databases that can ingest data from Apache Kafka, Apache Pulsar, Amazon Kinesis, Redpanda, and databases via native Change data capture connections or using Debezium connectors to MySQL and PostgreSQL sources. Previously, I wrote a blog post about how to choose the right streaming database that discusses some key factors that you should consider.
Additionally, with streaming databases, you can filter, aggregate, transform, and expose the data in real-time applications (Visualizing dashboards, Data Lakes, or BI analytic platforms).
Here is a typical architecture diagram of where a streaming database fits.
Streaming database has similarities to a traditional database
Although the streaming database focus on streaming, it has many characteristics of a traditional SQL database. These similarities include the following:
A SQL interface
Like a traditional SQL database, the streaming database includes SQL grammar, a parser, and an execution engine. This means interacting with data in either type of system can be accomplished using a high-level declarative language: SQL. RisingWave’s SQL dialect contains the language constructs you would expect, including SELECT for projection, FROM for defining sources or materialized views, WHERE for filtering, JOIN for, well, joining, etc.
DDL and DML statements
DDL(Data Definition Language) and DML(Data Manipulation Language) are the two broad categories of statements supported in the streaming database. DDL statements are responsible for focused on creating and destroying database objects, while DML statements are used to read and manipulate data.
# DDL example: CREATE MATERIALIZED VIEW mv_avg_speed AS SELECT COUNT(id) as no_of_trips, SUM(distance) as total_distance, SUM(duration) as total_duration, SUM(distance) / SUM(duration) as avg_speed FROM taxi_trips; # DML example: SELECT * FROM *mv_avg_speed*;
Network service and clients for submitting queries
If you have worked with traditional SQL databases, you have likely come to expect two things: you can connect to the database over a network and there are default client implementations (e.g., a CLI) for submitting queries. The streaming database includes these features as well, you can use the PostgreSQL CLI tool to write queries.
The collections you interact with contain schema definitions, which include field names and types. Furthermore, like some of the more flexible database systems (e.g., Postgres), the streaming database supports user-defined data types as well.
To optimize for read performance in traditional databases, users sometimes create materialized views, which are named objects that contain the results of a query. In traditional systems, however, these views can be updated either lazily (the update to the view is queued for later or must be applied manually) or eagerly (whenever new data arrives). An eagerly maintained view is similar to how the streaming data represents data and updates to streams and tables are made immediately when new data becomes available.
Built-in functions and operators for data transformation
Like many traditional SQL databases, the streaming database includes a rich set of functions and operators for working with data. You can check all functions and operators in detail in “Functions and Operators” section of RisingWave.
Streaming database differences from traditional database
Effectively, there are some key differences between the in-memory continuous queries that happen in stream processing and the way people thought of queries in traditional databases.
First, streaming databases supports continuous queries working on a never-ending, infinite, and unbounded flow of data, as opposed to a bounded and known set of data that is resident in a table.
Second, in most traditional SQL databases, a query works in a “one and done,” fashion meaning that it issues a query against the current snapshot of data and terminates the query as soon as the request is fulfilled or errors out. In streaming databases, continuous in-memory queries continually produce new results as new data is presented on the incoming data streams.
Third, streaming databases can subscribe to changes in a SQL query or a view in a database that allows you to implement push-based event-driven applications.
Lastly, the streaming databases use their cloud-native storage system to persist materialized views and internal states in the stateful stream and you can replicate the data across different storages, and the storage layer can be scaled independently from the SQL engine.
Use Cases for a Streaming Database
There are many reasons why a lot of IT companies are adapting to streaming databases. At a high level, business teams see that streaming databases can enable them to:
Real-time data analysis alongside data generation.
Enable alert and security systems for their real-time applications.
Real-time data transfer from one built-in app to another built-in app for clients.
Update and monitor data as the interface of various machine-learning models.
Build an even-driven communication backbone for microservice architecture.
In conclusion, traditional databases and streaming databases are two distinct types of databases designed for different use cases. Streaming databases uses familiar SQL DB concepts to make powerful stream processing capabilities. Many applications require immediate access to data, such as financial trading applications or online gaming platforms. A streaming database can process and analyze data in real time, providing instant access to critical information. Understanding the differences between these two types of databases is crucial for selecting the right database for a specific use case.
How to choose the right streaming database
Issue SQL queries to manage your data
Query Real-Time Data in Kafka Using SQL
Is RisingWave the Right Streaming Database?
Rethinking Stream Processing and Streaming Databases
About the author
Visit my personal blog: www.iambobur.com