Blog Posts
Altis
December 6, 2018

Greater control of your Power BI Datasets with the Power BI REST API

Sam Turner, Consultant, Altis Sydney

The Power BI REST API allows users to programmatically control their datasets in the Power BI service. This includes triggering a refresh of a dataset, monitor the status of a currently refreshing dataset, or view the refresh history of a dataset.

As will be demonstrated, the use of this API allows much greater control in the refreshing of your datasets in comparison to standard scheduled dataset refreshes.

The standard scheduled refresh configurable in the Power BI online service is limited in that it runs only on a fixed daily schedule – this could be an issue if your ETL runs longer than expected and a scheduled Power BI refresh starts in the middle of this ETL execution – leaving the refresh dataset in a potentially unstable state.

In addition, there are also issues if multiple datasets are scheduled to refresh in quick succession. If one of these datasets may take longer to refresh expected – the refreshing of a different separate dataset may begin and start occurring in parallel, leading to memory issues in the Power BI service.

Programmatically refreshing datasets can allow you to trigger a refresh immediate after the execution of an ETL pipeline, and more easily manage refreshing multiple datasets by performing this refresh in a controlled sequential or parallel. For example, you may only allow 2 or 3 datasets to be refreshed at a time to avoid memory issues.

This blog post will demonstrate you how to programmatically refresh a Power BI dataset using the Power BI REST API, we will then explore how to use Azure Data Factory to trigger this refresh. I will then also demonstrate you can easily refresh multiple datasets across multiple workspaces sequentially or in parallel.

Altis has implemented and is currently using the techniques discussed with a client to ensure that their datasets are consistently and reliably refreshed daily, allowing them to make informed decisions with their latest data.

Example code written in Python will be provided and annotated where required.

Data Refresh Limitations

It should be noted that when using a Power BI Pro license, you can only refresh a dataset 8 times a day, however if your organisation has a Power BI Premium license you can refresh a dataset up to 48 times per day.

Official Power BI REST API Documentation

The official Power BI REST API documentation can be found at the following link: https://docs.microsoft.com/en-us/rest/api/power-bi/.

Programmatically Refresh a Dataset

Registering an Application

Before you can start calling the Power BI API, you must first register an application and receive a Client ID which you will use to call the Power BI API with. This is by completing the following steps:

Navigate to https://dev.powerbi.com/apps
Click Sign in with your existing account
Choose an App Name of your liking
Select Native app from the App Type dropdown
The Redirect URL will not be used, so populate this address with https://app.powerbi.com
When choosing the level of API access required, ensure both the Read All Datasets and Read and Write All Dataset checkboxes are ticked. This is important as the Power BI REST API endpoints we will be using require this level of API access.

Click Register App and note down the Client ID somewhere safe, as it is required to be provided when calling the Power BI API’s.

Obtaining Workspace ID and Dataset ID

When using the Power BI API’s, datasets are identified by their Workspace ID and Dataset ID – not by their name. To obtain the Workspace ID and Dataset ID of a dataset, complete the following steps:

Navigate to https://app.powerbi.com/home
Select the workspace that contains your dataset in the Workspaces dropdown on the left panel of the page
Select the Datasets tab, click the “…” button next to the dataset and then select Settings
Now refer to your browser URL:
- It will look something likes follows: https://app.powerbi.com/groups/{workspace_id}/settings/datasets/{dataset_id}
- The sequence of numbers, letters and hyphens after the text groups is your Workspace ID, and the sequence of numbers, letters and hyphens after the text datasets is your Dataset ID
- In the given example {workspace_id} is the Workspace ID, and {dataset_id} is the Dataset ID
- Store the Workspace ID and Dataset ID somewhere safe, as you will need them in the next section

Creating the Script

We can now create the Python 2.7 script to programmatically refresh the dataset we have chosen:

To use the script, supply values for the following variables:

power_bi_group
power_bi_dataset
power_bi_username
power_bi_password
power_bi_client_id

This code can be split in two main sections: the first is obtaining an authorization token used to call the Power BI API, and the second in calling Power BI API to refresh the dataset we specify using the authorization token we received in the previous section.

It should be noted that when refreshing the dataset, we specify {“NotifyOption”: “MailOnCompletion”}, this will send an email reporting on a successful/unsuccessful refresh to the owner of the dataset.

Trigger Data Refresh using Azure Data Factory

In addition to Azure Data Factory service we will also make use of the Azure Automation service.

Creating a runbook

Runbooks in the context of Azure, are cloud hosted serverless scripts. We will create a Python 2.7 runbook to host the Power BI refresh script we created in the previous section. This can be done by completing the following steps:

Navigate to https://portal.azure.com
Search Automation Account in the search bar at the top of the page, and click the result that appears under Services in the suggestion pane that pops up. See the picture below:
If you do not already have an Automation Account, create one by clicking Add
a. Follow the steps provided to create an Automation Account, ensuring that Yes is selected for the Create Azure Run As account option
Select the Automation Account you wish to use
In the Automation Account menu, under Process Automation, select Runbooks
Select Add a runbook
Create a new runbook, ensuring the Runbook type selected in Python 2 and specifying a Name and Description of your choosing; and then selecting Create
Copy and paste the script created in the previous section into the code area that opens, then click Publish
You may want to test the script works correctly by hitting Start. You may wish to check in Power BI itself if the dataset specified starts refreshing.

Creating a webhook

In this step we will create a webhook that kicks off the execution of our runbook. Webhooks allow us to start the execution of a runbook by sending a HTTP POST request to a URL. To create a webhook that executes the runbook we just created, complete the following steps:
1. Navigate into the runbook you created in the previous step.
2. Under the Resources section in the pane on the left-hand side of the page, select Webhooks, and then Add Webhook.
3. Name your web hook, specify Yes in the Enabled option, set the Expires date to sometime in the future (you can set this Expires date up to 10 years into the future).
4. Lastly, and most importantly, there exists a URL field at the bottom of the creation page. Ensure you copy this and keep it somewhere safe, this is the URL that you will send a HTTP POST to trigger the execution of the runbook. Make sure you click OK AND then Create.

Trigger the Webhook within a Data Factory Pipeline

To trigger this Webhook within a Data Factory Pipeline, for example just after the execution of a collection of stored procedures, complete the following steps:

Search Data factories (not Data factory) in the search bar at the top of the page, and click the result that appears under Services
If you do not have one already, create a Data Factory instance by clicking Add
a. Fill out the Name, Subscription, Resource Group and Location Ensure the Version specified is V2.
Select your Data Factory instance and click the Author & Monitor button in the middle of the page, which will open a new window.
For demonstration purposes we will create a new pipeline from scratch. To do this select Create Pipeline in the window that opened in the previous step.
Under the General tab on the left pane:
Drag a Wait and Web component into the design area. This Wait component will simulate the execution of an ETL pipeline, while the Web component will be used to execute our webhook.
Select the Wait component, then the Settings tab; and set Wait time in seconds to 15 seconds.
Select the Web component, then Settings In the URL field, paste the URL of the webhook you created in the previous section. In the Method field select POST, and in the Body field specify an empty JSON object: “{}”.
Link the Wait component and the Web component by clicking and holding the green square on the right-hand side of the Wait component and dragging your mouse over the Web component.
Test the pipeline by selecting the Debug button.

Programmatically Refreshing Multiple Datasets Across Multiple Workspaces Sequentially or in Parallel

As your organisation starts to use Power BI increasingly, you may find that you have multiple datasets spread across multiple workspaces that need to be refreshed at the same time.

If you use the script created in the previous section, you may find this hard to manage as you will need to keep track of multiple runbooks, multiple webhooks and multiple components within Data Factory.

In addition, if you have multiple large datasets and attempt to refresh them all at the same time, you may find that some of, or all the datasets run into memory issues; as Power BI only allocates a finite amount of memory. To avoid this issue, you can refresh that datasets sequentially – waiting till the first dataset to finish refreshing before starting refreshing the second.

We can create an updated script to solve the above issues as follows. Go to the runbook you created earlier and click the Edit button. Update the script with the code below and specify the variables according to the following instructions, and then click Publish.

To use this script, a user specifies the Workspace ID’s and Datasets ID’s in a hierarchical JSON like structure within the payload object. In addition, like in the previous version of the script, a user must specify values for the following variables:

power_bi_username
power_bi_password
power_bi_client_id

The sequential refreshing is achieved through intermittently polling the status of the current refresh, if the refresh is in a completed or failed state we continue and start the refresh of the next dataset.

Other examples of our dashboard solutions can be found here.

If you would like to know more about our offering on the Azure Platform, have a look at our Microsoft Azure Practice Page or contact us.

If you are interested in reading more about Power BI or Microsoft Azure check out these blogs:

https://altisconsulting.com/au/powering-modern-analytics-with-power-bi/

https://altisconsulting.com/au/using-ssis-within-azure-data-factory-v2/

https://altisconsulting.com/au/fast-track-your-azure-data-factory-v2-etl-development/

https://altisconsulting.com/au/azure-data-factory-templates-with-data-lake/

6 Responses

Gayathri says:

10 August 2019 at 12:15 am

Is there any API to view all the refreshes currently going on in the Power BI Premium Capacity?
Prasanna says:

22 May 2020 at 11:47 pm

Hi,
I tired above code. I am getting output as 400 and 403 which is basically indicating that I am getting error for HTTP with 400 and 403 as code.
Can you please help me in this.
Joe says:

10 July 2020 at 12:56 am

Any chance you have rewritten the python script for 3.8?
Christophe Hervouet says:

24 March 2021 at 12:44 am

G’Day & thanks a lot Sam
Your super second script (converted to python 3) also run very well on an Airflow DAG
We confirm
Pingback: Cost-effective XML storage and querying at scale - Altis
Pingback: What is Power BI DIAD and How can it Benefit your Organisation? - Altis

Recent

Blog Posts

Fixing Line Charts in Power BI When Using Separate Date and Time Dimensions

Blog Posts

Why Water Utilities Should Build Their Own Water Leakage Solution (And Why It’s Easier Than You Think)

Blog Posts

Demystifying Modern Data Architecture

Blog Posts

Snapshots in Time-frames: How to find point-in-time totals based on start and end times using Power BI

Connect with us

If you’d like to be kept in the loop on courses, events and other related topics, simply complete your details and we’ll add you to our list.

Consulting Services

Frameworks

Training

Greater control of your Power BI Datasets with the Power BI REST API

Data Refresh Limitations

Official Power BI REST API Documentation

Programmatically Refresh a Dataset

Registering an Application

Obtaining Workspace ID and Dataset ID

Creating the Script

Trigger Data Refresh using Azure Data Factory

Creating a runbook

Creating a webhook

Trigger the Webhook within a Data Factory Pipeline

Programmatically Refreshing Multiple Datasets Across Multiple Workspaces Sequentially or in Parallel

Share

6 Responses

Recent

Fixing Line Charts in Power BI When Using Separate Date and Time Dimensions

Why Water Utilities Should Build Their Own Water Leakage Solution (And Why It’s Easier Than You Think)

Demystifying Modern Data Architecture

Snapshots in Time-frames: How to find point-in-time totals based on start and end times using Power BI

Connect with us

We are Altis

Quick Links

Contact us

Connect with us