In today's fast-moving business world, companies are always looking for new ways to make their work smoother and quickly get important information from their data. When selecting a suitable solution, decision-makers specifically value cost efficiency and flexibility.

Today we discuss Microsoft Fabric: A powerful cloud platform, that came out of preview lately. Before we delve into a real-world project undertaken for a leading organization in the German hardware industry, lets first discuss MS Fabric and its benefits on a general level.

Microsoft Fabric:

Microsoft Fabric covers everything: ETL-processes, Data Science, Real-Time Analytics, and BI. With that, Microsoft Fabric combines new and existing functionalities from Power BI, Azure Synapse, and Azure Data Factory into a single integrated environment.

To summarise: with Fabric you don’t have to put different data tools together. Instead, you can dive into developing data solutions within a highly integrated end-to-end analytics environment right away.

The purpose of Fabric: simplicity. Within a single environment business and data professionals can now focus on results rather than technical details or the pricing policies of different platforms. This makes MS Fabric a valuable and serious contender on the market of Analytics Cloud Platforms.

The Project:

Let's look at a real-world example of Microsoft Fabric at a leading German hardware company, referred to as Customer A.

Customer A faces the challenge of a large variety of data sources. In our case, data from operational entities is being combined for the group-level reporting.

The project was designed to demonstrate Fabric's ability to cost-effectively ingest, transform and analyse data in a single environment. Let’s explore each step of the ETL process.

Data Ingestion

The first step is to bring the data into MS Fabric. Among several available possibilities, we selected Azure Event Streams to ingest the arriving data in real time and store the raw data in a Lakehouse. Lakehouse instances in Fabric serve as multitool to store, manage, and analyse structured and unstructured data in a single location. An intuitive GUI enables the user to effectively handle not only the Lakehouse component but also Azure’s own EventHubs without any coding experience.

 

Data Transformation

In the Lakehouse one can transform data in three different ways: Notebooks, Pipelines, or Dataflows Gen 2 and handcraft the analytics environment for specific needs. In the case of Customer A, we applied PySpark Notebooks to predefine table schemas, data pipelines to control the transformation process and Dataflows Gen 2 for the specific transformation steps. There are a few things to consider when deciding on the toolbox, especially between Notebooks and Dataflow Gen 2:

  1. Complexity: More complex analysis requires more flexibility -> Notebooks
  2. Usability: Easy to use interface -> Dataflow Gen2
  3. Data Volume: Notebooks shine with a large data volume and big computing capacity
     

Data Load/Consumption Layer

Fabric operates on OneLake - a unified storage system. Lakehouse instances and Warehouses store data automatically in delta parquet format within OneLake. They provide different access capabilities to the data.

The Synapse Data Warehouse supports full read and written T-SQL capabilities, whereas the Lakehouse comes with a read-only system via a SQL analytics endpoint for exploration purpose. On the other hand, the Lakehouse has similar data modelling options as Power BI for seamless integration.

Together with the client we have decided to serve the data through the warehouse, from where it is used as trusted base to for multiple Power BI reports.

Costs

Fabric comes with two possible billing options: 1. Pay-as-you-go, 2. Monthly Capacity Reservation. Without going into further detail, for the project we decide on the monthly reservation and a small capacity size, only scaling up the capacity when running the data pipeline. This flexible approach helps to reduce costs without sacrificing fast data processing.

Conclusion:

Microsoft Fabric appears to be a valuable all-in-one platform to support the customer’s needs. However, there are still a few drawbacks, as the tool is still very new and not all features can be fully exploited (e.g. API in preview, incremental refresh not possible yet). Nevertheless, its versatility (transformation, storage, analysis), multiple integration options, and the ability to choose the right analytics architecture, has led Customer A to purchase multiple MS Fabric capacities for various internal projects.