First Level Trade-Off to select the right storage for your needs

Intro

Every time you start a new project, the question arises as to which data platform and data storage to choose. There is a big variety of storages to choose from. Each option has different features, advantages, and disadvantages. It is really time-consuming to puzzle over trade-off analysis.

The purpose of this article is to help you to choose a data storage engine.

The approach proposed later is somewhat similar to the CAP theorem according to which you need to choose two key properties out of three. That is why I call the approach CBF theorem.

Theorem Definition

Each data storage has various…


Making Sense of Big Data

Set of Best Practices (dis)proved by Benchmarking

Introduction

The market of MPP engines is pretty broad and the cloud big players have their offerings that constantly evolve. So it’s really interesting to have a better understanding of their capacities and how they perform.

Let’s review Azure’s Synapse Dedicated SQL Pool MPP database platform, which is the evolution of Microsoft’s PDW appliance-based data warehouse, and intended to serve the Data Warehouse needs.

Motivation

The motivation of this benchmark is to find the simple answers to simple questions: how to tweak the physical schema on Azure Dedicated SQL Pool built on top of classic star schema with the realistic enterprise-wide data…


Opinion

Invented by F. Puppini and promoted by B. Inmon it pretends to be a revolution in Self Service BI

Intro

Recently I accidentally came across the new book of Bill Inmon and Francesco Puppini called “Unified Star Schema” (will refer to it USS downstream). Having a new book in 2020 from the father of data warehousing definitely grabbed my attention, I bought it and read it in the following 3 days with lots of enthusiasm. It turned out the author of the concept is Francesco Puppini and Bill Inmon is a supporter and promoter of it, but it doesn’t really diminish the value of a new approach for modeling and organizing the data.

In the beginning, I was really skeptical…


Making Sense of Big Data, BigData Modeling Patterns

How to model hierarchies in NoSQL leveraging the best practices from the relational world

Intro

Sometimes we come across cases when we need to model a hierarchy of different complexity levels and not really sure how to do that properly in the most efficient, reliable, and flexible way. Let’s review one of the data modeling patterns that give us some answers for that.

Problem Statement

Consider we have a hierarchy with a ragged variable depth, like on a picture below:

Image by Author

From the functional standpoint we need to cover the following main cases:

Querying the hierarchy:

  • Query a subtree, preferably up to a certain level of depth, like direct subordinate as well as descendants up to a certain…


IoT Analytics Part 3: Comparison of Time Series Engines

Intro

Time Series use cases in general, and IoT domain in particular, are growing so fast, so it’s vital to select the right storage for each particular use case.

Nowadays, every other database engine or platform is marketed as the Time Series oriented, so let’s try to go deeper and find out which one suits the best each particular need.

Problem Statement

In order to formalize the engine selection let’s define clearly the inputs and criteria of success. As an input, let’s consider a telemetry dataset.

As criteria of success we have:

  • Coverage of functional requirements related to data querying/analytics with different level…


Opinion

Why to think twice before implementing Data Mesh

Intro

Recently in the area of data platform architectures, there was introduced a new concept/paradigm called data mesh. It pretends to drive a new architecture approach for building the analytics solutions which often is treated as cutting edge, fancy approach and started already to be adopted by some of the organizations.

Despite the fact, Zhamak Dehghani brings a lot of great thoughts with decent reasoning behind them (original article is here), I see serious concerns preventing me from recommending it to apply to the majority of data analytics platforms.

Data Mesh Concept Extracts

The Data Mesh Architecture is broad and covers different aspects of a…


Best Practices of DW Modelling applied on IoT data for most flexible and efficient analytics

Intro

This is a continuation of the previous part, where there was described a problem statement related to the analysis of IoT data on the example of fitness tracking activities. There were also described the reasoning behind the storage type selection and recommended to have 2 types of storages:

  1. Big data storage to store the Time-Series kind of data
  2. Relational analytics database giving the maximum flexibility for the data analytics. The more details are here.

The purpose of the story is to describe the recommended data model for the second type of storage — relational analytics storage. …


Photo by Louis Hansel @shotsoflouis on Unsplash

Why Consistency Issues or C in CAP theorem

As many of you probably know, Cassandra is an AP big data storage. In other words, when a network partition happens, Cassandra remains available and relaxes the Consistency property. It is always said that it is eventually consistent or, in other words, it will be consistent at some point in time in future.

The important things to know which is not really obvious are:

  • The cluster does become inconsistent pretty often. Sure, there are many things influencing the stability of the cluster, such as proper configuration, dedicated resources, production load, professionalism of the ops guys etc, but the fact is…


Intro

There is no need to explain how IoT solutions are growing right now and the reasoning behind that. Let’s take it as a fact and from the data architecture perspective consider the following two challenges:

  1. what kind of storage to select to store the data
  2. how to model the data in a way to serve the data analytics needs in the best possible way

Problem Statement. Fitness Tracking Example

Let’s consider a domain that should be easy to understand for everyone since it something related to everyday life — fitness tracking.

Business Requirements (Subject Area)

In the area of fitness tracking there are the following parties involved:

  • A human…

Andriy Zabavskyy

Big Data Architect & Data Warehouse Expert at SoftServe Inc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store