DataWald Integration Framework

Introduction

DataWald, a framework powered by SilvaEngine, is designed to streamline system integration with unparalleled flexibility. By enabling configurable data mapping, it efficiently processes and adapts data to meet diverse requirements. Built on a modular, microservices architecture, DataWald is highly scalable, making it easy to integrate and support a wide range of systems for seamless data flow and interoperability.

Dataflow

First Approach with AWS EventBridge

EventBridge triggers the data synchronization process by invoking the retrieve_entities_from_source function via the silvaengine_agenttask AWS Lambda function.
Silvaengine_agenttask calls silvaengine_microcore_src, a module structured around the core abstract module datawald_agency that is specifically configured to interact with the designated source system. Within this structure, src_connector manages direct communication with the source system, while datawald_srcagency operates as the business logic layer, orchestrating data retrieval processes.
Silvaengine_microcore_src then initiates data synchronization by calling the insert_update_entities_to_target function through the datawald_interface_engine, which facilitates the transition of data into the target system.
Datawald_interface_engine holds the synchronized data in a staging area, coordinating the entire synchronization task. It then uses AWS SQS to send a message to silvaengine_task_queue, which triggers the insert_update_entities_to_target function. Following this queue process, it dispatches the sync_task function to update the status of the synchronization task.
Upon receiving the queued message, silvaengine_agenttask activates silvaengine_microcore_tgt, which processes and prepares the data for integration into the target system. Once the data is processed, silvaengine_microcore_tgt updates the synchronization task status within datawald_interface_engine by calling sync_task.

This structured, layered workflow enables efficient and cohesive data integration and synchronization across source and target systems, maintaining data consistency and task tracking throughout the process.

Second Approach with AWS SQS

The source system initiates data synchronization by invoking the datawald_interface_engine with the data payload. This data is then sent to the AWS SQS datawald_input_queue, which automatically triggers the silvaengine_agenttask Lambda function.
Silvaengine_agenttask subsequently calls silvaengine_microcore_sqs, a module structured around the abstract base datawald_agency to interact with the specified source system. Within this framework, datawald_sqsagency operates as the business logic layer, managing data processing and preparation based on the queue input.
Silvaengine_microcore_sqs then synchronizes the data by invoking the insert_update_entities_to_target function through the datawald_interface_engine, setting up data for integration with the target system.
Datawald_interface_engine stores the synchronized data in a staging area and orchestrates the synchronization task. It then dispatches the insert_update_entities_to_target function via AWS SQS silvaengine_task_queue. Once this queue process completes, it triggers the sync_task function to update the task’s synchronization status.
Upon receiving the final queued message, silvaengine_agenttask initiates silvaengine_microcore_tgt, which processes and prepares the data for integration into the target system. After processing, silvaengine_microcore_tgt updates the synchronization task status by calling the sync_task function within datawald_interface_engine.

This layered and modular workflow ensures seamless data integration and synchronization between source and target systems, enabling efficient task management, data consistency, and traceability throughout the process.

Detail of Modules

Core Modules

datawald_interface_engine: Serves as the central engine that orchestrates the entire data management framework.
datawald_agency: Provides an abstract layer for system-specific modules, enabling streamlined data integration across different platforms.
datawald_connector: Acts as a bridge between the datawald_interface_engine and external dataflows, facilitating seamless data communication.

NetSuite Integration

datawald_nsagency: Processes NetSuite data, applying tailored business logic to meet operational requirements.
suitetalk_connector: Communicates with NetSuite via SOAP and RESTful protocols to ensure effective data exchange.

Magento 2 Integration

datawald_mage2agency: Manages and processes data for Magento 2, embedding business logic to support e-commerce functions.
mage2_connector: Connects to Magento 2 to enable efficient data transactions and synchronization.

HubSpot Integration

datawald_hubspotagency: Processes and manages HubSpot data, integrating specific business logic for customer relationship workflows.
hubspot_connector: Facilitates communication with HubSpot, enabling seamless data integration and CRM functionality.

AWS DynamoDB Integration

datawald_dynamodbagency: Tailors and processes data with business-specific logic for DynamoDB, supporting database interactions.
dynamodb_connector: Connects with AWS DynamoDB to execute efficient data transactions within the framework.

AWS SQS Integration

datawald_sqsagency: Processes messages from AWS SQS, embedding business rules to handle message flow effectively.
sqs_connector: Manages connections with AWS SQS to enable message handling and integration within the framework.

AWS S3 Integration

datawald_s3agency: Applies business logic to process and manage data for storage and retrieval in AWS S3.
s3_connector: Connects with AWS S3 to facilitate file management and data storage operations within the DataWald ecosystem.

Installation and Configuration

Step 1: Clone Repositories

Create a main project directory named silvaengine.
Within this folder, clone the following repositories:
- silvaengine_aws
- datawald_deployment

Step 2: Download and Set Up Docker

Clone the silvaengine_docker project.
Create two directories named logs and projects inside the www directory and .ssh inside the python directory at the root of the Docker Compose setup. Use the commands below:
```
$ mkdir -p www/logs
$ mkdir -p www/projects
$ mkdir -p python/.ssh
```
Place your SSH private and public key files in the python/.ssh directory (optional for furture customization).
Set up a .env file in the root directory, using the provided .env.example for reference. Here’s a sample configuration:
```
PIP_INDEX_URL=https://pypi.org/simple/ # Or use <https://mirrors.aliyun.com/pypi/simple/> for users in China
PROJECTS_FOLDER={path to your projects directory}
PYTHON=python3.11 # Python version
DEBUGPY=/var/www/projects/silvaengine_aws/deployment/cloudformation_stack.py # Debug Python file path
```
Example Configuration:
- PIP_INDEX_URL: https://pypi.org/simple/
- PROJECTS_FOLDER: "C:/Users/developer/GitHubRepos/silvaengine"
- DEBUGPY: /var/www/projects/silvaengine_aws/deployment/cloudformation_stack.py
Build the Docker image:
```
$ docker compose build
```
Start the Docker container:
```
$ docker compose up -d
```

Step 3: Setup and Deployment

Create an S3 Bucket: Ensure versioning is enabled (e.g., xyz-silvaengine-aws).

Configure the .env File: Place this file inside the datawald_deployment folder with the following settings:

#### Stack Deployment Settings
root_path=../silvaengine_aws # Root path of the stack
site_packages=/var/python3.11/silvaengine/env/lib/python3.11/site-packages # Python packages path

#### CloudFormation Settings
bucket=silvaengine-aws # S3 bucket for zip packages
region_name=us-west-2 # AWS region
aws_access_key_id=XXXXXXXXXXXXXXXXXXX # AWS Access Key ID
aws_secret_access_key=XXXXXXXXXXXXXXXXXXX # AWS Secret Access Key
iam_role_name=silvaengine_exec (optional) # IAM role for SilvaEngine Base.
microcore_iam_role_name=silvaengine_microcore_dw_exec (optional) # IAM role for silvaEngine microcore.

# AWS Lambda Function Variables
REGIONNAME=us-west-2 # AWS region for resources
EFSMOUNTPOINT=/mnt # EFS mount point (optional)
PYTHONPACKAGESPATH=pypackages # Folder for large packages (optional)
runtime=python3.11 # Lambda function runtime (optional)
security_group_ids=sg-XXXXXXXXXXXXXXXXXXX # Security group IDs (optional)
subnet_ids=subnet-XXXXXXXXXXXXXXXXXXX,subnet-XXXXXXXXXXXXXXXXXXX # Subnet IDs (optional)
efs_access_point=fsap-XXXXXXXXXXXXXXXXXXX # EFS access point (optional)
efs_local_mount_path=/mnt/pypackages # EFS local mount path (optional)
{function name or layer name}_version=XXXXXXXXXXXXXXXXXXX # Function or layer version (optional)

Example Configuration:

#### Stack Deployment Settings
root_path=../silvaengine_aws
site_packages=/var/python3.11/silvaengine/env/lib/python3.11/site-packages

#### CloudFormation Settings
bucket=xyz-silvaengine-aws
region_name=us-west-2
aws_access_key_id=XXXXXXXXXXXXXXXXXXX
aws_secret_access_key=XXXXXXXXXXXXXXXXXXX
REGIONNAME=us-west-2
runtime=python3.11

Step 4: Deploy SilvaEngine Base

Run the following command to access the container:

$ docker exec -it container-aws-suites-311 /bin/bash

Activate the virtual environment:

source /var/python3.11/silvaengine/env/bin/activate

Navigate to the deployment directory and execute the CloudFormation stack:
```
cd ./datawald_deployment
python cloudformation_stack.py .env silvaengine
```

Step 5: Deploy DataWald Integration Framework

Add entries into the se-endpoints (DynamoDB Table) collection, using the endpoint_id from the lambda_config.json file located in the datawald_deployment directory. The format for each entry should be as follows:
```
{
    "endpoint_id": {endpoint_id},
    "code": 0,
    "special_connection": true
}
```
For each endpoint_id in the lambda_config.json file within datawald_deployment, insert two separate records into se-connections (DynamoDB table):
- One record using the static api_key value '#####':
```
{
    "endpoint_id": {endpoint_id},
    "api_key": "#####",
    "functions": []
}
```
- Another record with the actual api_key associated with the deployed AWS API Gateway:
```
{
    "endpoint_id": {endpoint_id},
    "api_key": {api_key},
    "functions": []
}
```
To access the container, execute the following command:
```
$ docker exec -it container-aws-suites-311 /bin/bash
```

Activate the Python virtual environment by running:

source /var/python3.11/silvaengine/env/bin/activate

Navigate to the datawald_deployment directory and execute the CloudFormation stack setup script:
```
cd ./datawald_deployment
sh dw_requirements.sh
```

Step 6: Configuration

1. Initial Configuration Setup for the Foundation

To establish the base configuration, insert the following records into the se-configdata DynamoDB table:

[
    {
        "setting_id": "beta_core_dw",
        "variable": "area",
        "value": "core"
    },
    {
        "setting_id": "beta_core_dw",
        "variable": "user_source",
        "value": "0"
    },
    {
        "setting_id": "datawald_agency",
        "variable": "DW_API_KEY",
        "value": "XXXXXXXXXXXXXXXXXXX"
    },
    {
        "setting_id": "datawald_agency",
        "variable": "DW_API_URL",
        "value": "https://xxxxxxxxxx.execute-api.us-xxxxx-x.amazonaws.com/beta"
    },
    {
        "setting_id": "datawald_agency",
        "variable": "DW_AREA",
        "value": "core"
    },
    {
        "setting_id": "datawald_agency",
        "variable": "DW_ENDPOINT_ID",
        "value": "dw"
    },
    {
        "setting_id": "datawald_agency",
        "variable": "input_queue_name",
        "value": "datawald_input_queue.fifo"
    },
    {
        "setting_id": "datawald_agency",
        "variable": "task_queue_name",
        "value": "silvaengine_task_queue.fifo"
    },
    {
        "setting_id": "datawald_agency",
        "variable": "tx_type",
        "value": {
            "asset": [
                "product",
                "inventory",
                "inventorylot",
                "pricelevel",
                "inventory_data"
            ],
            "person": [
                "customer",
                "vendor",
                "company",
                "contact",
                "company_type",
                "factory"
            ],
            "transaction": [
                "order",
                "invoice",
                "purchaseorder",
                "itemreceipt",
                "itemfulfillment",
                "opportunity",
                "quote",
                "rma",
                "billcredit",
                "payment",
                "inventoryadjustment",
                "creditmemo",
                "inventorytransfer"
            ]
        }
    }
]

Configuration Details:

DW_API_KEY: API key for authentication.
DW_API_URL: Endpoint URL provided by SilvaEngine.
DW_AREA: Variable defining the area for the datawald_interface_engine core module.
DW_ENDPOINT_ID: Endpoint identifier for the core module.
input_queue_name: SQS queue for incoming messages.
task_queue_name: SQS queue for dispatching tasks.
tx_type: Data types categorized as assets, persons, and transactions.

2. Configure the Core Module `datawald_interface_engine`

Insert the following records into the se-configdata DynamoDB table:

[
    {
        "setting_id": "datawald_interface_engine",
        "variable": "default_cut_date",
        "value": "2024-05-24T02:21:00+00:00"
    },
    {
        "setting_id": "datawald_interface_engine",
        "variable": "input_queue_name",
        "value": "datawald_input_queue.fifo"
    },
    {
        "setting_id": "datawald_interface_engine",
        "variable": "max_entities_in_message_body",
        "value": "200"
    },
    {
        "setting_id": "datawald_interface_engine",
        "variable": "sync_task_notification",
        "value": {
            "<endpoint_id>": {
                "<data_type>": "<async_function>"
            }
        }
    },
    {
        "setting_id": "datawald_interface_engine",
        "variable": "task_queue_name",
        "value": "silvaengine_task_queue.fifo"
    }
]

Configuration Details:

default_cut_date: Default cut-off date for data synchronization.
input_queue_name: SQS queue for receiving incoming messages.
max_entities_in_message_body: Maximum number of entities allowed per message body.
sync_task_notification: Asynchronous notification configuration.
task_queue_name: SQS queue for dispatching tasks.

3. Module Configuration for Each Application

NSAgency for NetSuite Integration: Facilitates data exchange with NetSuite. See the DataWald NSAgency GitHub repository.
DynamoDBAgency for Data Integration: Automates synchronization with DynamoDB. See the DataWald DynamoDBAgency GitHub repository.
SQSAgency for AWS SQS Data: Integrates with AWS SQS for data processing. See the DataWald SQSAgency GitHub repository.

4. Configure the `setting_id` for Each Function

NetSuite Configuration:

{
    "endpoint_id": "ns",
    "api_key": "#####",
    "functions": [
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
            "function": "retrieve_entities_from_source",
            "setting": "datawald_nsagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
            "function": "insert_update_entities_to_target",
            "setting": "datawald_nsagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
            "function": "update_sync_task",
            "setting": "datawald_nsagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
            "function": "retry_sync_task",
            "setting": "datawald_nsagency"
        }
    ]
}

SQS Configuration:

{
    "endpoint_id": "sqs",
    "api_key": "#####",
    "functions": [
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
            "function": "retrieve_entities_from_source",
            "setting": "datawald_sqsagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
            "function": "insert_update_entities_to_target",
            "setting": "datawald_sqsagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
            "function": "update_sync_task",
            "setting": "datawald_sqsagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
            "function": "retry_sync_task",
            "setting": "datawald_sqsagency"
        }
    ]
}

DynamoDB Configuration:

{
    "endpoint_id": "datamart",
    "api_key": "#####",
    "functions": [
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
            "function": "retrieve_entities_from_source",
            "setting": "datawald_dynamodbagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
            "function": "insert_update_entities_to_target",
            "setting": "datawald_dynamodbagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
            "function": "update_sync_task",
            "setting": "datawald_dynamodbagency"
        },
        {
            "aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
            "function": "retry_sync_task",
            "setting": "datawald_dynamodbagency"
        }
    ]
}

Feel free to create a GitHub issue or send us an email for support regarding this application.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cloudformation_stack.py		cloudformation_stack.py
dw_requirements.sh		dw_requirements.sh
lambda_config.json		lambda_config.json
silvaengine-microcore-dw.json		silvaengine-microcore-dw.json
silvaengine.json		silvaengine.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataWald Integration Framework

Introduction

Dataflow

First Approach with AWS EventBridge

Second Approach with AWS SQS

Detail of Modules

Installation and Configuration

Step 1: Clone Repositories

Step 2: Download and Set Up Docker

Step 3: Setup and Deployment

Step 4: Deploy SilvaEngine Base

Step 5: Deploy DataWald Integration Framework

Step 6: Configuration

1. Initial Configuration Setup for the Foundation

2. Configure the Core Module `datawald_interface_engine`

3. Module Configuration for Each Application

4. Configure the `setting_id` for Each Function

About

Uh oh!

Releases

Packages

Languages

License

ideabosque/datawald_deployment

Folders and files

Latest commit

History

Repository files navigation

DataWald Integration Framework

Introduction

Dataflow

First Approach with AWS EventBridge

Second Approach with AWS SQS

Detail of Modules

Installation and Configuration

Step 1: Clone Repositories

Step 2: Download and Set Up Docker

Step 3: Setup and Deployment

Step 4: Deploy SilvaEngine Base

Step 5: Deploy DataWald Integration Framework

Step 6: Configuration

1. Initial Configuration Setup for the Foundation

2. Configure the Core Module datawald_interface_engine

3. Module Configuration for Each Application

4. Configure the setting_id for Each Function

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Configure the Core Module `datawald_interface_engine`

4. Configure the `setting_id` for Each Function

Packages