In this repository you will find an example of how to extract data from SAP Success Factors using Azure Data Factory.
SAP Successfactors has its own oData based API used for data extraction. Please refer to SAP SuccessFactors API Reference Guide (OData V2).
To be able to use the oData API you first need to authenticate using oAuth. This will provide you with an access token which can be passed on to the oData API's which will extract the data.
These are the high level steps which need to be executed :
- Register your client application in SAP SuccessFactors to obtain an API key.
- Obtain a SAML assertion
- Generate an oAuth token.
- Use the generated token to call oData APIs to extract data
In Azure Date Factory the overall pipeline looks as follows :
The GetAssertion and GetOAuthToken activities are used to authenticate against SAP SuccessFactors. The actual extraction is done in the Copy Data steps. Since we want to extract multiple entities, both source and sink datasets and also the mappings towards a SQL database are parameterized. These parameters are retrieved in the 'JsonTransformation' and 'Set filename output' activities.
Since we also want to extract multiple SAP SuccessFactor entities, this parameterid pipeline is capsulated in a For Each loop to iterate over the different entities.
The list of entities and mappings is stored in a SQL DB. The content of the DB is retrieved in the GetTableList activity which reads out the DB using SQL.
Note: In this blog we focus on the authentication and data extraction part. Details on how to implement the table, SQL Queries, mappings, ... are not explained in detail. They are however contained in the included sample pipeline, datasets and sql scripts.
Note: One might also consider to move to Authentication part out of the
for eachloop.
The following figure gives an overview of the authentication flow :
For more information on each individual step, please refer to SAP Successfactors - Authentication Using OAuth 2.0
In this step you need to register your client application, which is Azure Data Factory, in SAP SuccessFactors. This step is handled within SAP SuccessFactors itself.
This step will result in an API Key, client_id. You can also associate a user with Azure Data Factory.
For more information, please refer to SAP Successfactors - Registering your oAuth2 Client Application.
In this step you generate a SAML assertion document, which will be used to request the access token. You can obtain a SAML assertion either from your trusted IdP (recommended) or using the offline SAML generator provided by SAP SuccessFactors. For more information see : SAP SuccessFactors - Generating a SAML Assertion
Note : we used the SAP SuccessFactors built-in Identity Provided
/oauth/idp.
The HTTP Request looks as follows :
### Get SAML Assertion
# @name getSAMLAssertion
POST https://api012.successfactors.eu/oauth/idp
?client_id={{client_id}}
&user_id={{user_id}}
&token_url={{token_url}}
&private_key={{private_key}}
Content-Type: application/x-www-form-urlencoded
Note: Above sample code can be used in Visual Studio Code using the REST Client Extension. A complete http test script can be found here.
In Azure Data Factory we can obtain the SAML assertion using a Web activity:
- URL :
https://api012.successfactors.eu/oauth/idp(or other url provided by SAP SuccessFactors) - Method :
POST - Body :
client_id=@{pipeline().parameters.client_id}&user_id=@{pipeline().parameters.user_id}&token_url=@{pipeline().parameters.token_url}&private_key=@{pipeline().parameters.private_key} - Header:
Content-type = application/x-www-form-urlencoded
The parameters are defined at pipeline level.
Note : some of these parameters are also used to parameterize the sink and source data sets.
The generated SAML Assertion can then be used for oAuth authentication.
You can obtain the oAuth Access token as follows :
### Get OAuthToken
# @name getOAuthToken
@assertion = {{getSAMLAssertion.response.body.*}}
POST {{token_url}}
Content-Type: application/x-www-form-urlencoded
company_id={{comany_id}}&client_id={{client_id}}&company_id={{comany_id}}&grant_type=urn:ietf:params:oauth:grant-type:saml2-bearer&assertion={{assertion}}
The access token can be found in the response body :
{
"access_token": "*** your access token ***",
"token_type": "Bearer",
"expires_in": 45321
}In ADF this is also modeled as a WebActivity, following the GetAssertion.
- URL :
https://api012.successfactors.eu/oauth/token - Method :
POST - Body :
@concat('client_id=',pipeline().parameters.client_id,'&company_id=',pipeline().parameters.company_id,'&grant_type=urn:ietf:params:oauth:grant-type:saml2-bearer&assertion=',activity('GetAssertion').output.Response)
Note:
activity('GetAssertion')refers to the previousGetAssertionstep
The access token can then be used as Bearer authentication in the subsequent oData calls to extract data.
### Get UserSchema
# @name getUserSchema
@access_token = {{getOAuthToken.response.body.access_token}}
GET https://api012.successfactors.eu/odata/v2/Entity('User')?$format=json
Authorization: Bearer {{access_token}}
In Azure Data Factory the data extraction is modeled using the Copy Data activity :
This relates to the SAP SuccessFactors oData API. Since we want to extract multiple SAP SuccessFactor Entities, the source (and also the sink) data set are parameterized.
-
BaseURL = https://api012.successfactors.eu/or other relevant API access point provided by SAP -
RelativeURL = @concat('odata/v2/',dataset().TableName,'?$format=JSON'). Thedataset().TableNameis the entity set to be extracted, eg.User,EmpJob, ... . This information is retrieved from a SQL Table. -
Request Method = Get -
Additional Headers : here we need to add the
Authorizationheader with theBeareraccess token retrieved from the previous step.Authorization = @concat('Bearer ',activity('GetOauthToken').output.access_token)Note: The
activity('GetOauthToken')refers to the previous step.
- This is the json file to write the data to. The JSON filename was parameterized.
Since we want to extract multiple SAP SuccessFactors entities, the source and sink datasets were parameterized. These parameters are retrieved at the beginning of the pipeline using a Lookup activity. Here a select statement is used to retrieve a table containing the different parameters per entity. This parameter table is then used in a ForEach activity to loop over the different entities to be extracted.
Note: this is just a sample pipeline. Feel free to change and optimize.
Hint: In our case the Authorization steps are with theFor-Eachloop, but one might consider to move these steps outside of the loop.
To implement the example pipeline, you need to have access to an Azure Data Factory instance and a sql database (e.g. azure sql database).
It's highly recommended to deploy a new data factory for this example flow.
You can also use Synapse pipelines, the steps are similar.
-
Connect a freshly deployed Azure Data Factory to a fork of this repo. Once completed you should be able to see the pipelines, datasets and other necessary objects needed by the example flow.
-
In the repo under the [SQL Setup](\SQL setup) folder, execute the two sql scripts against your sql instance. These scripts will populate some meta data tables and some sample data for the example flow.
This example flow is explained in the following videos:
- Part1: https://youtu.be/lm2kqTaXatI
- Part2: https://youtu.be/Lt1FO8NsUkQ












