Various Data Architectures
This is a traditional Datawarehouse architecture. A Datawarehouse is a system where you can dump structured data from various production and operational environments for reporting and data analysis purposes.
Here you can have Data Marts on top of a Datawarehouse. Data Marts serve specific business aspects like finance, sales, marketing, etc. This one is called Inmon model in which a Datawarehouse acts as a single data source for several specialized Data Marts. There is also the Kimball model, in which Data Marts come first and later data from various Data Marts is funneled into a single Datawarehouse.
In the below approach, you have OLAP cubes on top of Data Marts. OLAP cubes use multi-dimensional arrays for storing pre-processed aggregated data.
This is what is currently trending, where you can combine data wearhouse and data lake to serve both analytical and machine learning workloads. A data lake is a system where you can store structured, semi-structured, unstructured, and raw data including blobs, texts, and files.
Databricks has come up with this new architecture called “Lakehouse”, where you expose SQL and Data APIs on top of a data lake, which can be consumed by relevant clients. Databricks claims that this will be the future.
Dremio came up with this new concept of Data-as-a-Service platform, which is shown below. Dremio claims that you can avoid several critical components of a data architecture and simplify it by having a unified self-service data access tool.
Please leave a comment, if you think there are important data architectures that are missing in this list.