A Python ETL Bridge for Blue Prism Process Intelligence (BPPI) and ABBYY Timeline
Extract, Transform, and Load process data from multiple sources into BPPI repositories
- Overview
- Features
- Architecture
- Installation
- Quick Start
- Configuration
- Data Sources
- Usage Examples
- API Reference
- Testing
- Troubleshooting
- Contributing
- License
PIDG (Process Intelligence Data Gateway) is a Python-based ETL (Extract, Transform, Load) solution that bridges various data sources with Blue Prism Process Intelligence (BPPI) or ABBYY Timeline repositories. It enables organizations to:
- Extract process execution logs from Blue Prism (via direct database access or API)
- Import data from CSV files, Excel spreadsheets, ODBC databases, and XES files
- Transform and enrich data for process mining analysis
- Automatically upload processed data to BPPI repositories
- Execute BPPI ToDo workflows after data loading
Note: BPPI (Blue Prism Process Intelligence) is the process and task mining solution provided by Blue Prism (ABBYY Timeline OEM).
| Source | Status | Description |
|---|---|---|
| ๐ CSV Files | โ | Import data from CSV files with configurable separators |
| ๐ Excel | โ | Support for .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt |
| ๐ XES Files | โ | Standard process mining event log format |
| ๐๏ธ ODBC | โ | Connect to SQL Server, PostgreSQL, MySQL, and more |
| ๐ค Blue Prism Repository | โ | Direct database access to BP session logs |
| ๐ Blue Prism API | โ | OAuth2 API connection for BP v7.x+ |
| ๐ฆ SAP RFC | โ | Read tables via SAP RFC (requires pyrfc) |
- Modular Pipeline Architecture: Extensible ETL pipelines with pluggable extractors, transformers, and loaders
- Blue Prism Log Transformation: Parse XML attributes, filter stages, and create unique event identifiers
- Delta Loading: Incremental data extraction with automatic date tracking
- Batch Upload: Automatic chunking for large datasets (10,000 rows per batch)
- ToDo Automation: Execute BPPI workflows automatically after data loading
- Comprehensive Logging: Rotating log files with configurable levels
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PIDG Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ EXTRACTORS โ โ TRANSFORMERS โ โ LOADERS โ โ
โ โโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโค โ
โ โ โข CSV Extractor โโโโโถโ โข BP Logs โโโโโถโ โข BPPI Repositoryโ โ
โ โ โข ODBC Extractor โ โ Transformer โ โ API Wrapper โ โ
โ โ โข BP API Extract โ โ โข Event Mapper โ โ โข ToDo Executor โ โ
โ โ โข BP Repo Extractโ โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PIPELINE FACTORY โ โ
โ โ Dynamic pipeline instantiation & execution โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ CONFIGURATION (INI/SQLite) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
PIDG/
โโโ src/
โ โโโ pidg.py # Main entry point (INI config)
โ โโโ pidgsq.py # Entry point for SQLite config
โ โโโ pidg/
โ โ โโโ __init__.py # Package initialization with main()
โ โโโ config/
โ โ โโโ appConfig.py # Configuration management
โ โ โโโ cmdLineConfig.py # Command-line argument parsing
โ โโโ pipelines/
โ โ โโโ pipeline.py # Base pipeline class
โ โ โโโ pidgPipeline.py # PIDG-specific pipeline
โ โ โโโ pipelineFactory.py # Dynamic pipeline instantiation
โ โ โโโ classes/ # Pipeline implementations
โ โ โ โโโ bppiPLRCSVFile.py
โ โ โ โโโ bppiPLRODBC.py
โ โ โ โโโ bppiPLRBluePrismRepo.py
โ โ โ โโโ bppiPLRBluePrismApi.py
โ โ โโโ extractors/ # Data extraction modules
โ โ โ โโโ Extractor.py
โ โ โ โโโ csvFileExtractor.py
โ โ โ โโโ odbcExtractor.py
โ โ โ โโโ bpAPIExtractor.py
โ โ โ โโโ builders/ # SQL query builders
โ โ โโโ transformers/ # Data transformation modules
โ โ โ โโโ bplogsTransformer.py
โ โ โโโ loaders/ # BPPI API integration
โ โ โโโ bppi/
โ โโโ utils/
โ โโโ constants.py # Application constants
โ โโโ log.py # Logging utilities
โโโ config-samples/ # Configuration templates
โโโ tests/ # Unit tests
โโโ docs/ # Documentation
โโโ vbo/ # Blue Prism VBO objects
โโโ requirements.txt # Python dependencies
โโโ pyproject.toml # Package configuration
- Python 3.10 or higher
- ODBC Driver (for database connections)
- BPPI/Timeline account with API token
# Clone the repository
git clone https://github.com/datacorner/PIDG.git
cd PIDG
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpip install pyBPPIBridge| Package | Version | Purpose |
|---|---|---|
| pandas | 2.0.3 | Data manipulation |
| openpyxl | 3.1.2 | Excel file support |
| pyodbc | 4.0.39 | ODBC database connectivity |
| requests | 2.31.0 | HTTP API calls |
| xmltodict | 0.13.0 | XML parsing |
Copy the template and customize it:
cp config-samples/config.ini-template config.iniEdit config.ini with your settings:
[source]
filename=data/events.csv
separator=,
[pipeline]
path=pipelines.classes
classname=bppiPLRCSVFile
[bppi]
url=https://your-bppi-server.com
token=your-api-token
table=my_event_table
todos=nopython src/pidg.py -configfile config.iniPIDG uses INI-format configuration files with the following sections:
[source]
# CSV separator (default: comma)
separator=,
# Source filename for file-based sources
filename=data/events.csv
# Excel sheet name (for Excel files)
sheet=Sheet1
# Folder path (for batch processing)
folder=/path/to/files
files=*.csv[pipeline]
# Module path for pipeline classes
path=pipelines.classes
# Pipeline class name (case-sensitive)
# Options: bppiPLRBluePrismRepo, bppiPLRBluePrismApi, bppiPLRCSVFile, bppiPLRODBC
classname=bppiPLRCSVFile[database]
# ODBC connection string
connectionstring=DRIVER={ODBC Driver 18 for SQL Server};SERVER=localhost\SQLEXPRESS;DATABASE=mydb;UID=user;PWD=pass;ENCRYPT=No
# SQL query file path
query=queries/extract.sql[blueprism]
# Process name to extract logs from
processname=My Business Process
# Parameters to extract from XML attributes (comma-separated)
parameters=CustomerID,ProductCode,Amount
# Stage types to filter out (comma-separated IDs)
# 1=Internal, 4/65536=Decision, 8=Calculation, 128=CallPage, etc.
stagetypefilters=1,4,65536,8,536870912
# Include VBO logs (yes/no)
includevbo=yes
# Unicode logs (yes/no)
unicode=no
# Filter Start/End stages to main page only (yes/no)
startendfilter=yes
# Main process page name
mainprocesspage=Main Page
# Delta loading (yes/no)
delta=no
# Delta tracking file
deltafile=delta.tag[blueprismapi]
# SSL certificate verification (yes/no)
ssl_verification=yes
# OAuth2 client credentials
client_id=your-client-id
client_secret=your-client-secret
# Authentication server URL
auth_url=https://authentication.blueprism.local
# API server URL
api_url=https://api.blueprism.local
# API page size (max 1000)
api_page_size=300[bppi]
# BPPI server URL (without trailing slash)
url=https://your-bppi-server.com
# API token from BPPI repository
token=your-api-token
# Target table name in repository
table=process_events
# Execute ToDo lists after loading (yes/no)
todos=yes
# ToDo lists to execute (comma-separated)
todolist=TRANSFORM_DATA,LOAD_PROJECT[other]
# Log folder (with trailing slash)
logfolder=/var/log/pidg/
# Log filename
logfilename=pidg.log
# Log level (DEBUG|INFO|WARNING|ERROR)
loglevel=INFO
# Log format (Python logging format)
logformat=%%(asctime)s|%%(name)s|%%(levelname)s|%%(message)sSimple CSV file import with configurable separator.
Configuration:
[source]
filename=data/events.csv
separator=;
[pipeline]
classname=bppiPLRCSVFileConnect to any ODBC-compliant database.
Configuration:
[database]
connectionstring=DRIVER={ODBC Driver 18 for SQL Server};SERVER=localhost;DATABASE=mydb;UID=user;PWD=pass
query=config/query.sql
[pipeline]
classname=bppiPLRODBCSQL Query File (query.sql):
SELECT
EventID,
CaseID,
Activity,
Timestamp,
Resource
FROM ProcessEvents
WHERE Timestamp >= '2024-01-01'Direct connection to Blue Prism database for session log extraction.
Configuration:
[database]
connectionstring=DRIVER={ODBC Driver 18 for SQL Server};SERVER=bpserver;DATABASE=blueprism;UID=reader;PWD=pass
query=config-samples/bplogs.sql
[blueprism]
processname=Invoice Processing
parameters=InvoiceNumber,Vendor,Amount
stagetypefilters=1,4,65536,8
includevbo=no
unicode=no
startendfilter=yes
mainprocesspage=Main Page
delta=yes
deltafile=bp_delta.tag
[pipeline]
classname=bppiPLRBluePrismRepoSQL Template (bplogs.sql):
SELECT logId,
LOG.sessionnumber AS SessionID,
stageName,
result,
LOG.startdatetime AS resourceStartTime,
BPAResource.name AS ResourceName,
actionname,
stageType,
pagename,
attributexml,
IIF(processname IS NULL, 'VBO', 'PROC') as OBJECT_TYPE,
IIF(processname IS NULL, objectname, processname) as OBJECT_NAME
FROM $tablelog AS LOG, BPASession, BPAResource
WHERE LOG.sessionnumber IN
(SELECT distinct sessionnumber
FROM $tablelog
WHERE processname = '$processname'
AND $delta)
AND LOG.sessionnumber = BPASession.sessionnumber
AND BPAResource.resourceid = BPASession.runningresourceid
AND stagetype NOT IN($stagetypefilters)
AND $onlybpprocessOAuth2 API connection for Blue Prism v7.x and later.
Configuration:
[blueprism]
processname=Invoice Processing
[blueprismapi]
ssl_verification=no
client_id=my-app-client-id
client_secret=my-app-secret
auth_url=https://auth.blueprism.local
api_url=https://api.blueprism.local
api_page_size=500
[pipeline]
classname=bppiPLRBluePrismApiImport a CSV event log into BPPI repository:
# config-csv.ini
[source]
filename=data/process_events.csv
separator=,
[pipeline]
path=pipelines.classes
classname=bppiPLRCSVFile
[bppi]
url=https://bppi.company.com
token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
table=imported_events
todos=no
[other]
logfolder=logs/
logfilename=csv_import.log
loglevel=INFORun:
python src/pidg.py -configfile config-csv.iniExtract Blue Prism session logs incrementally:
# config-bprepo.ini
[database]
connectionstring=DRIVER={ODBC Driver 18 for SQL Server};SERVER=bp-db-server;DATABASE=BluePrism;UID=readonly;PWD=secure123;ENCRYPT=Yes
query=config-samples/bplogs.sql
[blueprism]
processname=Customer Onboarding
parameters=CustomerID,AccountType
stagetypefilters=1,4,65536,8,536870912
includevbo=no
unicode=no
startendfilter=yes
mainprocesspage=Main Page
delta=yes
deltafile=onboarding_delta.tag
[pipeline]
path=pipelines.classes
classname=bppiPLRBluePrismRepo
[bppi]
url=https://bppi.company.com
token=your-token-here
table=onboarding_events
todos=yes
todolist=CALCULATE_KPIs,UPDATE_DASHBOARD
[other]
logfolder=logs/
logfilename=bp_extraction.log
loglevel=DEBUG# config-odbc.ini
[database]
connectionstring=DRIVER={ODBC Driver 18 for SQL Server};SERVER=sql-server;DATABASE=ProcessDB;Trusted_Connection=yes
query=queries/custom_extract.sql
[pipeline]
path=pipelines.classes
classname=bppiPLRODBC
[bppi]
url=https://bppi.company.com
token=your-token-here
table=sql_events
todos=no
[other]
loglevel=INFOfrom config.cmdLineConfig import cmdLineConfig
from pipelines.pipelineFactory import pipelineFactory
# Load configuration from INI file
config = cmdLineConfig.emulate_readIni("config.ini")
# Initialize logger
log = pipelineFactory.getLogger(config)
# Create and execute pipeline
factory = pipelineFactory(config, log)
extracted, transformed, loaded = factory.process()
print(f"Results: Extracted={extracted}, Transformed={transformed}, Loaded={loaded}")| Class | Description |
|---|---|
bppiPLRCSVFile |
CSV file extraction and BPPI loading |
bppiPLRODBC |
ODBC database extraction and BPPI loading |
bppiPLRBluePrismRepo |
Blue Prism repository extraction with log transformation |
bppiPLRBluePrismApi |
Blue Prism API v7+ extraction |
| ID | Stage Type |
|---|---|
| 1 | Internal (always filtered) |
| 2 | Action |
| 4, 65536 | Decision |
| 8 | Calculation |
| 128 | Call Page |
| 1024, 262144 | Start |
| 2048 | End |
| 131072 | Writer |
| 4194304 | Wait |
| 16777216 | Alert |
| 33554432 | Exception |
| 536870912 | Multi Calculation |
Run the test suite:
# Run all tests
python -m pytest tests/
# Run specific test file
python -m pytest tests/test_Files.py -v
# Run with coverage
python -m pytest tests/ --cov=srcCreate test configuration files in tests/config/:
# tests/config/config-test.ini
[source]
filename=tests/data/test.csv
separator=,
[pipeline]
path=pipelines.classes
classname=bppiPLRCSVFile
[bppi]
url=https://test-bppi.company.com
token=test-token
table=test_table
todos=no
[other]
logfolder=tests/logs/
logfilename=test.log
loglevel=DEBUGError: pyodbc.Error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found...')
Solution: Install the appropriate ODBC driver:
- Windows: Download from Microsoft
- Linux:
apt install unixodbc-devand driver package - macOS:
brew install unixodbc
Error: Impossible to collect repository informations.
Solution:
- Verify your token is valid and not expired
- Check the BPPI URL (no trailing slash)
- Ensure network connectivity to BPPI server
Error: Unable to get the Blue Prism API Access Token
Solution:
- Verify client_id and client_secret
- Check auth_url is correct
- Ensure the OAuth2 client has proper permissions
For datasets larger than 10,000 rows, PIDG automatically chunks the upload. If timeouts occur:
- Reduce batch size in code (
C.API_BLOC_SIZE_LIMIT) - Check network stability
- Verify BPPI server capacity
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Clone and setup
git clone https://github.com/datacorner/PIDG.git
cd PIDG
python -m venv .venv
source .venv/bin/activate
# Install dev dependencies
pip install -r requirements.txt
pip install pytest pytest-cov black flake8
# Run tests
pytest tests/ -v- Follow PEP 8 guidelines
- Use type hints where possible
- Add docstrings to all public methods
- Write unit tests for new features
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation Wiki: https://exypro.org/docs/pybppibridge-documentation/
- PyPI Package: https://pypi.org/project/pyBPPIBridge/
- Issue Tracker: https://exypro.org/Discussions/forum/pybppibridge-solution/
- Blue Prism Documentation: https://bpdocs.blueprism.com/
- ABBYY Timeline: https://www.abbyy.com/timeline/
Made with โค๏ธ by Benoรฎt Cayla
Copyright ยฉ 2023-2025 Benoรฎt Cayla