First Level Trade-Off to select the right storage for your needs

Intro

The purpose of this article is to help you to choose a data storage engine.

The approach proposed later is somewhat similar to the CAP theorem according to which you need to choose two key properties out of three. That is why I call the approach CBF theorem.

Theorem Definition


Making Sense of Big Data

Set of Best Practices (dis)proved by Benchmarking

Introduction

Let’s review Azure’s Synapse Dedicated SQL Pool MPP database platform, which is the evolution of Microsoft’s PDW appliance-based data warehouse, and intended to serve the Data Warehouse needs.

Motivation


Opinion

Invented by F. Puppini and promoted by B. Inmon it pretends to be a revolution in Self Service BI

Intro

In the beginning, I was really skeptical…


Making Sense of Big Data, BigData Modeling Patterns

How to model hierarchies in NoSQL leveraging the best practices from the relational world

Intro

Problem Statement

Image by Author

From the functional standpoint we need to cover the following main cases:

Querying the hierarchy:

  • Query a subtree, preferably up to a certain level of depth, like direct subordinate as well as descendants up to a certain…


IoT Analytics Part 3: Comparison of Time Series Engines

Intro

Nowadays, every other database engine or platform is marketed as the Time Series oriented, so let’s try to go deeper and find out which one suits the best each particular need.

Problem Statement

As criteria of success we have:

  • Coverage of functional requirements related to data querying/analytics with different level…


Opinion

Why to think twice before implementing Data Mesh

Intro

Despite the fact, Zhamak Dehghani brings a lot of great thoughts with decent reasoning behind them (original article is here), I see serious concerns preventing me from recommending it to apply to the majority of data analytics platforms.

Data Mesh Concept Extracts


Best Practices of DW Modelling applied on IoT data for most flexible and efficient analytics

Intro

  1. Big data storage to store the Time-Series kind of data
  2. Relational analytics database giving the maximum flexibility for the data analytics. The more details are here.

The purpose of the story is to describe the recommended data model for the second type of storage — relational analytics storage. …


Photo by Louis Hansel @shotsoflouis on Unsplash

Why Consistency Issues or C in CAP theorem

The important things to know which is not really obvious are:

  • The cluster does become inconsistent pretty often. Sure, there are many things influencing the stability of the cluster, such as proper configuration, dedicated resources, production load, professionalism of the ops guys etc, but the fact is…


Intro

  1. what kind of storage to select to store the data
  2. how to model the data in a way to serve the data analytics needs in the best possible way

Problem Statement. Fitness Tracking Example

Business Requirements (Subject Area)

  • A human…

Andriy Zabavskyy

Big Data Architect & Data Warehouse Expert at SoftServe Inc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store