Every time you start a new project, the question arises as to which data platform and data storage to choose. There is a big variety of storages to choose from. Each option has different features, advantages, and disadvantages. It is really time-consuming to puzzle over trade-off analysis.
The purpose of this article is to help you to choose a data storage engine.
The approach proposed later is somewhat similar to the CAP theorem according to which you need to choose two key properties out of three. That is why I call the approach CBF theorem.
Each data storage has various…
The market of MPP engines is pretty broad and the cloud big players have their offerings that constantly evolve. So it’s really interesting to have a better understanding of their capacities and how they perform.
Let’s review Azure’s Synapse Dedicated SQL Pool MPP database platform, which is the evolution of Microsoft’s PDW appliance-based data warehouse, and intended to serve the Data Warehouse needs.
The motivation of this benchmark is to find the simple answers to simple questions: how to tweak the physical schema on Azure Dedicated SQL Pool built on top of classic star schema with the realistic enterprise-wide data…
Recently I accidentally came across the new book of Bill Inmon and Francesco Puppini called “Unified Star Schema” (will refer to it USS downstream). Having a new book in 2020 from the father of data warehousing definitely grabbed my attention, I bought it and read it in the following 3 days with lots of enthusiasm. It turned out the author of the concept is Francesco Puppini and Bill Inmon is a supporter and promoter of it, but it doesn’t really diminish the value of a new approach for modeling and organizing the data.
In the beginning, I was really skeptical…
Sometimes we come across cases when we need to model a hierarchy of different complexity levels and not really sure how to do that properly in the most efficient, reliable, and flexible way. Let’s review one of the data modeling patterns that give us some answers for that.
Consider we have a hierarchy with a ragged variable depth, like on a picture below:
From the functional standpoint we need to cover the following main cases:
Querying the hierarchy:
Time Series use cases in general, and IoT domain in particular, are growing so fast, so it’s vital to select the right storage for each particular use case.
Nowadays, every other database engine or platform is marketed as the Time Series oriented, so let’s try to go deeper and find out which one suits the best each particular need.
In order to formalize the engine selection let’s define clearly the inputs and criteria of success. As an input, let’s consider a telemetry dataset.
As criteria of success we have:
Recently in the area of data platform architectures, there was introduced a new concept/paradigm called data mesh. It pretends to drive a new architecture approach for building the analytics solutions which often is treated as cutting edge, fancy approach and started already to be adopted by some of the organizations.
Despite the fact, Zhamak Dehghani brings a lot of great thoughts with decent reasoning behind them (original article is here), I see serious concerns preventing me from recommending it to apply to the majority of data analytics platforms.
The Data Mesh Architecture is broad and covers different aspects of a…
This is a continuation of the previous part, where there was described a problem statement related to the analysis of IoT data on the example of fitness tracking activities. There were also described the reasoning behind the storage type selection and recommended to have 2 types of storages:
The purpose of the story is to describe the recommended data model for the second type of storage — relational analytics storage. …
As many of you probably know, Cassandra is an AP big data storage. In other words, when a network partition happens, Cassandra remains available and relaxes the Consistency property. It is always said that it is eventually consistent or, in other words, it will be consistent at some point in time in future.
The important things to know which is not really obvious are:
There is no need to explain how IoT solutions are growing right now and the reasoning behind that. Let’s take it as a fact and from the data architecture perspective consider the following two challenges:
Let’s consider a domain that should be easy to understand for everyone since it something related to everyday life — fitness tracking.
In the area of fitness tracking there are the following parties involved: