Blog Posts, Data & Analytics
Spark
June 3, 2020

Azure Data Factory (ADF): How to extract JSON data from an API to Azure SQL Server

by Mik Panchal, Managing Consultant – Altis Sydney

This Blog is a step-by-step guide to build the integration pattern used to extract JSON data from an API using Azure Data Factory.

Using an API to extract data in a JSON format has become a common method for extracting data into Azure SQL Database or Azure Blob. As integration tools evolve, greater effort is placed on enabling the tools to extract data from various source systems in different formats, thus enabling Data Engineers to create integration patterns in the most efficient way possible.

WIIFM?

By using the ADF “REST dataset” I was able to successfully create a “copy data” task to directly map the JSON output to an Azure SQL server table, however, after inspecting the data I noticed NULL data returned from a nested JSON object while non-nested JSON object returned the right data.

Even though the ADF REST dataset can read the API metadata, the values in these nested JSON objects/arrays are returned as NULL when you use a Copy data task to directly load JSON to Azure SQL Server.

It seems that there is a bug with ADF (v2) when it comes to directly extract a nested JSON to Azure SQL Server using the REST dataset and Copy data task.

Example of nested JSON object

High-level data flow using Azure Data Factory

The process involves using ADF to extract data to Blob (.json) first, then copying data from Blob to Azure SQL Server. This additional step to Blob ensures the ADF dataset can be configured to traverse the nested JSON object/array.

Below is a step-by-step guide to extracting complex JSON data in your Azure platform using Azure Data Factory (ADF).

Follow the steps outlined below:

Set up API linked service, and create a REST dataset to API
Set up Azure Blob Storage dataset to Blob storage “DS_Source_Location”
1. Clear the Shema objects
2. Manually update the JSON of the dataset using JSON editor
3. Example of edited JSON
4. Manually updating this ensures nested json is mapped to the right columns.
5. Sample connection
Create a Azure SQL Database dataset “DS_Sink_Location” that points to the destination table. The destination table must be 1-1 to the source, ensure all columns match between blob file and the destination table.
In a new Pipeline, create a Copy data task to load Blob file to Azure SQL Server

a) Connect “DS_Source_Location” dataset to the Source tab

b) Connect “DS_Sink_Location” dataset to the Sink tab

c) Review Mapping tab, ensure each column is mapped between Blob file and SQL table

d) Specify the JSONPath of the nested JSON array for cross-apply

e) Execute pipeline

The process outlined above loads the data from a JSON object using an API connection to the Azure SQL Database. Please contact us if you would like a full end-to-end solution to copy data from an API using a delta load and pagination.

Please see our Azure Data Frameworks page or contact us if you would like to know more about our Azure offering. Below are links to more of our Azure Blogs:

Hyperscaler native vs best of breed services – what should you choose for your cloud data & analytics solution?

Blog Posts

Enhanced dbt Data Quality Observability at Speed

Blog Posts

Why you may need a dedicated stream to tackle source system onboarding as part of your data transformation program

Blog Posts

Modernising your legacy time-series solution using Cloud Data ecosystems

Connect with us

If you’d like to be kept in the loop on courses, events and other related topics, simply complete your details and we’ll add you to our list.

Consulting Services

Frameworks

Training

Azure Data Factory (ADF): How to extract JSON data from an API to Azure SQL Server

Example of nested JSON object

High-level data flow using Azure Data Factory

Follow the steps outlined below:

Share

Leave a Reply Cancel reply

Recent

Hyperscaler native vs best of breed services – what should you choose for your cloud data & analytics solution?

Enhanced dbt Data Quality Observability at Speed

Why you may need a dedicated stream to tackle source system onboarding as part of your data transformation program

Modernising your legacy time-series solution using Cloud Data ecosystems

Connect with us

We are Altis

Quick Links

Contact us

Connect with us