The distributed systems architecture has become increasingly popular due to its flexibility, scalability, and fault tolerance. However, querying data from multiple microservices in real-time can be challenging, as it may require complex and time-consuming data retrieval operations. Materialized views, in combination with the Command Query Responsibility Segregation (CQRS) pattern, can provide a solution to this challenge by enabling efficient and real-time querying of microservices data.
In this article, we will explore how materialized views in a streaming database can be used to query microservices in real-time with an example of an online store application.
Understanding the scenario and problems
Let's consider an example of an online store app that uses microservices architecture for handling various functions, such as product, catalog, customer, inventory, and order management. Each microservice is responsible for handling a specific domain or business function and storing data in its own data store. For example, Product service stores information about products in the Products database.
Without materialized views, querying data from multiple microservices in real-time can be challenging. For example, when a customer searches for a product on the website, the website may need to retrieve product information, catalog, pricing, and inventory status from different microservices. This can result in multiple round-trip queries to different microservices, leading to slow response times and increased database load.
Assume that you apply the API composition pattern for the above case to query data from multiple microservices in a single API call.
In addition to scalability and maintenance issues, it leads to tight coupling between microservices, querying microservice needs to know the structure returned by the microservices it queries and increased complexity in implementation as it requires additional logic to aggregate and combine data from multiple microservices. API composition can result in additional performance overhead, as the orchestrator microservice may need to wait for responses from multiple microservices before returning a result to the client. Most importantly, it may not provide real-time or near-real-time updates of data from multiple microservices. If the data in the queried microservices changes frequently, the API composition may not reflect the most up-to-date data, leading to potential inconsistencies or stale information in the queried results.
Querying with materialized views
Now, let's see how materialized views can be utilized to optimize the querying of microservices in real-time, and boost the performance of the online store. The below architectural diagram describes the idea:
Separate queries from commands
Firstly, we define the commands that will modify the data in the system and queries that will read data from the system. We can use CQRS to separate the responsibility for handling commands (write operations) from that queries (read operations). This allows for the optimization of each operating independently. For example, the products can be stored and updated in a traditional database such as SQL/NOSQL, while product listings, search results, order history, or user recommendations may be some common views that need to be optimized for performance can be stored in the streaming database in the form of materialized views that is updated in real-time based on the events in the stream.
Identify the queries, creating materialized views
Let’s focus on the query part of CQRS. One approach to implementing it is through the use of read-optimized materialized views.
Materialized views are precomputed views of data that are stored and updated independently from the main transactional database. Materialized views can be used to optimize read operations by reducing the complexity and latency associated with querying the main database.
We identify the queries that require real-time or near-real-time access to data in the online store application such as recently viewed products. The main goal here is to keep the data required by the query service in a denormalized format to enable fast retrieval. Materialized views are denormalized and optimized for the specific query or view on the UI, allowing for faster and more efficient data retrieval. Materialized views can be created using different technologies, such as relational databases, NoSQL databases, or specialized caching systems. But if you would like to get real-time updates and keep that data consistent by subscribing to the state changes of other microservices databases, you opt for using the streaming database.
A streaming database is a type of database that is designed to consume data from various sources and handle continuous data streams in real-time and making it possible to query this data from materialized views using SQL. You can read more about how a Streaming database differs from a Traditional database and how to choose the right streaming database in my other blog posts.
Keep materialized views in sync
Next, we should keep the materialized view up to date. The streaming database can ingest data in real-time from all databases as event streams using the Change Data Capture process and built-in connector. If you are using PostgreSQL for a specific microservice as a data source, you can utilize RisingWave’s PostgreSQL CDC connector to obtain direct real-time data from the change logs (without using message brokers like Kafka). RisingWave builds a materialized view by joining those streams together and keeps the view up to date as new events come (event-driven updates). For example, when a new product is added to the product catalog microservice, a corresponding event can be captured and used to update the materialized view for product listings, ensuring that the view is always up-to-date.
RisingWave is an open-source streaming database that has built-in fully-managed CDC source connectors for various databases, also it can collect data from other sources such Kafka, Pulsar, Kinesis, or Redpanda and it allows you to query real-time streams using SQL. You can get a materialized view that is always up-to-date.
Query materialized views
Finally, you add a new query service that serves queries (only read requests) coming from the UI. The query handler converts API requests to queries and fetches data from the streaming database materialized views. For example, when a user searches for a product, the search results can be retrieved from the materialized view for search listings, instead of making multiple API calls to different microservices.
Creating and querying materialized views with RisingWave
On this GitHub repository, you can find several use cases of materialized views with the RisingWave streaming database. See how it is straightforward to connect to MySQL as a source and define database tables to retrieve data in which we are interested and to get real-time updates in streaming databases.
create table order_events ( order_id int, order_date bigint, customer_name varchar, price decimal, product_id int, order_status smallint, PRIMARY KEY (order_id)) with ( connector = 'mysql-cdc', hostname = 'mysql', port = '3306', username = 'root', password = '123456', database.name = 'mydb', table.name = 'orders', server.id = '1');
Then we create a materialized view to count the total and average order prices:
CREATE MATERIALIZED VIEW order_details AS SELECT order_id, SUM(item_price) AS total_price, AVG(item_price) AS avg_price FROM order_events GROUP BYorder_id;
Benefits of materialized views with streaming database for querying
As we saw in the new architecture of the online store application, it introduces new components such as a streaming database, and query handler, and provides several benefits:
Simple queries: Materialized views can simplify the querying logic by aggregating data in advance, you don’t need to write complex queries with a bunch of joins.
Improved performance: Materialized views can be optimized for specific queries, reducing the latency associated with querying the main database.
Real-time Data Access: Materialized views in the streaming database can provide real-time data from multiple microservices.
Increased scalability: The streaming database can be deployed independently from the main database, allowing for horizontal scaling of the query side.
Better fault tolerance: The streaming database can be used as a fallback in case the main database goes down.
CQRS with incrementally updated materialized views in the streaming database is an effective solution to the challenge of querying data across multiple microservices. By separating the read and write operations and precomputing views of the data, this integration provides better performance, scalability, and fault tolerance.
How to choose the right streaming database
How Streaming database differs from a Traditional database?
Query Real-Time Data in Kafka Using SQL
Is RisingWave the Right Streaming Database?
Rethinking Stream Processing and Streaming Databases
About the author
Visit my personal blog: www.iambobur.com