By Ian Stuart, Principal Consultant
Power BI is not only a reporting and data visualisation platform but, rather, it comes equipped with data preparation and storage capabilities. The functionality of these elements has, like the rest of the product, moved on to the point where we should start to consider whether Power BI can serve as the only tool we need in order to provide data and analytics to our end-users. In other words, can it serve as an end-to-end data and analytics tool (or a “Data Platform”)?
So, the question I am going to address in this blog is: Can Power BI provide the Data Platform as well as being a powerful reporting tool? The high-level answer to this is; yes it can. At least in some cases and we will discuss this in more detail below.
The secondary question is: Why use Power BI as a Data Platform and what are the pros and cons of that approach? This question is also addressed below.
What is a Data Platform?
Before we can determine whether we can use Power BI as a Data Platform we need to understand what a data platform is and what are their features and benefits. Then we can compare Power BI’s capabilities
My colleague Sam Riggs, put together a great blog called Data Platforms in a Nutshell, where he succinctly describes the diagram below. The blog also contains a link to a four-minute video summary.
To focus on the salient points in Sam’s blog, a data platform must be able to:
- Extract data from a variety of sources
- Have powerful transformation capabilities
- Load the data into a presentation repository that is structured to support performant access by analytical tools
- Perform incremental loads and store history
- Store massive amounts of data
How does Power BI Measure Up?
Let us consider how Power BI scores against each of these requirements:
Requirement | Comments | |
Extract data from a variety of sources | ✓ | Power BI is able to connect to a wide variety of data sources including on premise and cloud databases, API’s, etc. |
Have powerful transformation capabilities | ✓ | Whether using Power Query from within Power BI Desktop, or Power BI Dataflows*, or Power BI Datamarts**, Power BI can perform any type of transformation. |
Load the data into a presentation repository that is structured to support performant access by analytical tools | ✓ | Power BI stores the data in an online SQL Server Analysis (SSAS) Database that can be accessed by Power BI reports and other analytical tools such as Excel and Tableau. |
Perform incremental loads and store history | ✓ | Incremental refresh is supported for Power BI Premium, Premium per user, Power BI Pro, and Power BI Embedded datasets. Getting the latest data in real time with Direct Query is only supported for Power BI Premium, Premium per user, and Power BI Embedded datasets. |
Store massive amounts of data | ✓ | Dependent upon licence type, Power BI can store up to 100TB of data and 400GB per model. |
* Power BI Dataflows can be created in the Power BI Service. They use the M language (like Power Query) and they can perform the same functions as Power Query.
** Power BI Datamarts allow us to bring data from multiple sources (including Power BI Dataflows) and create data models in the Power BI Service.
What licence types are required will depend upon the unique requirements of the organisation. Power BI Premium Capacity coupled with Power BI Pro licences for developers gives us the ultimate functionality and sharing capability.
Power BI meets all the requirements of a Data Platform, so do we still need a separate Data Platform?
Not necessarily. If we exclude unstructured data and “data lake” capability Power BI can provide our data platform for many use cases but there are some things to consider when comparing Power BI with more traditional data platforms:
Cost
Cost calculations for cloud services can be complex. To keep this blog short and salient, we will not try to compare Power BI costs to the myriad of other tools and combinations that are available to us.
What we can say is that Power BI will provide us with known costs that are only going to change when we make conscious decisions to (for example) increase the number of Pro licences. Having a clear understanding of costs is an important consideration for any Data Platform.
Just one Tool
There is much to be said for simplicity in architecture. Having many components leads to increased complexity, increased maintenance, and increased chances of something breaking. Power BI provides us with the ability to do everything we need through a single portal and licensing model.
Ease of Maintenance and Administration
As touched on above, having just one tool eases the administrative burden. There are built-in tools and reports for monitoring the Power BI infrastructure and we can also access the Power BI API’s in order to build our own tools.
One can potentially avoid the need to employ a separate team of database administrators and SQL specialists. A lot of integrations in Power BI can be managed with no or low code.
Model Sharing
Power BI provides us with the ability to share our data models so they can be re-used by report developers. Tables, columns, table relationships, measures, and other data model features such as field formats, data classifications and hierarchies are all included in Power BI Datasets that can be shared amongst developers. Additionally, Dataflows and Datamarts, that can each provide a subset of the data model, can be shared.
Power BI can limit access to datasets and to who can see what within the dataset through Row Level Security (RLS)
Performance
The Power BI Service (whether shared or dedicated) provides solid scalable performance and Power BI datasets utilise proven SQL Server Analysis Services (SSAS) technology to optimise dynamic querying of the data model. It is still possible to build poorly performing reports and datasets but adhering to best practice Power BI data modelling design principles will minimise this risk.
Prior to the introduction of Datamarts, the performance of the M language (used in Power Query or dataflow scripts) was sometimes less than ideal when the scripts contained many steps and, in particular, merge steps that cannot be folded. As Datamarts are housed on an Azure SQL Database, this performance should be more manageable as we can fold more queries than before. Additionally, we expect that DDL and DML capabilities will be introduced into the SQL element of Datamarts so we can use TSQL as part of our ETL framework. This will make transformations in Power BI as fast as using SQL Server.
Datamarts are currently a preview feature but, if the pace of development of the rest of the Power BI ecosystem is anything to go by, we might expect great things with Datamarts soon. In the interim, M generally performs acceptably for data refreshes, it is really only an occasional frustration for developers when editing queries.
Source Control
Power BI is comparatively weak when it comes to integrating with source control tools. There are parts of the Power BI toolkit (such as Deployment Pipelines) that can be utilised but a holistic approach to source control is currently challenging. Improved source control is a much-requested feature so we can expect this to improve over time.
Orchestration
Extract Transform & Load (ETL) is the process of moving data from sources, through integrations into production models structured for analytics. In the diagram below we show various Power BI components that can provide ETL capability in Power BI.
Most ETL tools allow various components to be strung together and run as a single job. This is known as orchestration and we can schedule these orchestrations to run unattended. Although individual Datasets, Dataflows and Datamarts can be scheduled to refresh, no combined orchestration functionality exists in Power BI at this time (although it has been requested as a feature). Therefore, in order to manage dependencies between different objects, we may have to write separate code in a script that we can run through an external scheduler. This is possible using the Power BI API and/ or Power Automate but it is not necessarily trivial.
Data Scale & Complexity
Power BI has limitations when it comes to storage capacity that may need to be considered when dealing with “big data”. Bringing data from a great number of data sources and combining them together is possible in Power BI but performance may be questionable. As stated earlier, we can expect improvements in this area.
Data Storage
Power BI stores a copy of our data in the cloud which potentially avoids the need for an on-premise data centre or expensive cloud hosted storage for reporting purposes. Power BI is not, however, a data lake that, for example, may offer optimised storage for unstructured data.
Summary
If you are thinking about a data platform and you are not sure what tools to use, I urge you to include Power BI in the mix. It may not provide all the building blocks required for a Data Platform for complex or large organisations, but it can certainly provide many of them and Power BI may be all that a small or medium sized organisation requires.
Additionally, what is right for one organisation may not be the right answer for another. It will depend upon your unique circumstances and your current state of data and analytics. We would be very happy to help you in your deliberations by looking into more detailed considerations than this blog allows.
Please feel free to get in touch to see how we can help you.