DataWald, a framework powered by SilvaEngine, is designed to streamline system integration with unparalleled flexibility. By enabling configurable data mapping, it efficiently processes and adapts data to meet diverse requirements. Built on a modular, microservices architecture, DataWald is highly scalable, making it easy to integrate and support a wide range of systems for seamless data flow and interoperability.
- EventBridge triggers the data synchronization process by invoking the
retrieve_entities_from_sourcefunction via thesilvaengine_agenttaskAWS Lambda function. - Silvaengine_agenttask calls
silvaengine_microcore_src, a module structured around the core abstract moduledatawald_agencythat is specifically configured to interact with the designated source system. Within this structure,src_connectormanages direct communication with the source system, whiledatawald_srcagencyoperates as the business logic layer, orchestrating data retrieval processes. - Silvaengine_microcore_src then initiates data synchronization by calling the
insert_update_entities_to_targetfunction through thedatawald_interface_engine, which facilitates the transition of data into the target system. - Datawald_interface_engine holds the synchronized data in a staging area, coordinating the entire synchronization task. It then uses AWS SQS to send a message to
silvaengine_task_queue, which triggers theinsert_update_entities_to_targetfunction. Following this queue process, it dispatches thesync_taskfunction to update the status of the synchronization task. - Upon receiving the queued message, silvaengine_agenttask activates
silvaengine_microcore_tgt, which processes and prepares the data for integration into the target system. Once the data is processed,silvaengine_microcore_tgtupdates the synchronization task status withindatawald_interface_engineby callingsync_task.
This structured, layered workflow enables efficient and cohesive data integration and synchronization across source and target systems, maintaining data consistency and task tracking throughout the process.
- The source system initiates data synchronization by invoking the
datawald_interface_enginewith the data payload. This data is then sent to the AWS SQSdatawald_input_queue, which automatically triggers the silvaengine_agenttask Lambda function. - Silvaengine_agenttask subsequently calls
silvaengine_microcore_sqs, a module structured around the abstract basedatawald_agencyto interact with the specified source system. Within this framework,datawald_sqsagencyoperates as the business logic layer, managing data processing and preparation based on the queue input. - Silvaengine_microcore_sqs then synchronizes the data by invoking the
insert_update_entities_to_targetfunction through thedatawald_interface_engine, setting up data for integration with the target system. - Datawald_interface_engine stores the synchronized data in a staging area and orchestrates the synchronization task. It then dispatches the
insert_update_entities_to_targetfunction via AWS SQSsilvaengine_task_queue. Once this queue process completes, it triggers thesync_taskfunction to update the task’s synchronization status. - Upon receiving the final queued message, silvaengine_agenttask initiates
silvaengine_microcore_tgt, which processes and prepares the data for integration into the target system. After processing,silvaengine_microcore_tgtupdates the synchronization task status by calling thesync_taskfunction withindatawald_interface_engine.
This layered and modular workflow ensures seamless data integration and synchronization between source and target systems, enabling efficient task management, data consistency, and traceability throughout the process.
Core Modules
- datawald_interface_engine: Serves as the central engine that orchestrates the entire data management framework.
- datawald_agency: Provides an abstract layer for system-specific modules, enabling streamlined data integration across different platforms.
- datawald_connector: Acts as a bridge between the datawald_interface_engine and external dataflows, facilitating seamless data communication.
NetSuite Integration
- datawald_nsagency: Processes NetSuite data, applying tailored business logic to meet operational requirements.
- suitetalk_connector: Communicates with NetSuite via SOAP and RESTful protocols to ensure effective data exchange.
Magento 2 Integration
- datawald_mage2agency: Manages and processes data for Magento 2, embedding business logic to support e-commerce functions.
- mage2_connector: Connects to Magento 2 to enable efficient data transactions and synchronization.
HubSpot Integration
- datawald_hubspotagency: Processes and manages HubSpot data, integrating specific business logic for customer relationship workflows.
- hubspot_connector: Facilitates communication with HubSpot, enabling seamless data integration and CRM functionality.
AWS DynamoDB Integration
- datawald_dynamodbagency: Tailors and processes data with business-specific logic for DynamoDB, supporting database interactions.
- dynamodb_connector: Connects with AWS DynamoDB to execute efficient data transactions within the framework.
AWS SQS Integration
- datawald_sqsagency: Processes messages from AWS SQS, embedding business rules to handle message flow effectively.
- sqs_connector: Manages connections with AWS SQS to enable message handling and integration within the framework.
AWS S3 Integration
- datawald_s3agency: Applies business logic to process and manage data for storage and retrieval in AWS S3.
- s3_connector: Connects with AWS S3 to facilitate file management and data storage operations within the DataWald ecosystem.
- Create a main project directory named
silvaengine. - Within this folder, clone the following repositories:
-
Clone the silvaengine_docker project.
-
Create two directories named
logsandprojectsinside thewwwdirectory and.sshinside thepythondirectory at the root of the Docker Compose setup. Use the commands below:$ mkdir -p www/logs $ mkdir -p www/projects $ mkdir -p python/.ssh
-
Place your SSH private and public key files in the
python/.sshdirectory (optional for furture customization). -
Set up a
.envfile in the root directory, using the provided.env.examplefor reference. Here’s a sample configuration:PIP_INDEX_URL=https://pypi.org/simple/ # Or use <https://mirrors.aliyun.com/pypi/simple/> for users in China PROJECTS_FOLDER={path to your projects directory} PYTHON=python3.11 # Python version DEBUGPY=/var/www/projects/silvaengine_aws/deployment/cloudformation_stack.py # Debug Python file path
Example Configuration:
PIP_INDEX_URL: https://pypi.org/simple/PROJECTS_FOLDER: "C:/Users/developer/GitHubRepos/silvaengine"DEBUGPY: /var/www/projects/silvaengine_aws/deployment/cloudformation_stack.py
-
Build the Docker image:
$ docker compose build
-
Start the Docker container:
$ docker compose up -d
-
Create an S3 Bucket: Ensure versioning is enabled (e.g.,
xyz-silvaengine-aws). -
Configure the
.envFile: Place this file inside thedatawald_deploymentfolder with the following settings:#### Stack Deployment Settings root_path=../silvaengine_aws # Root path of the stack site_packages=/var/python3.11/silvaengine/env/lib/python3.11/site-packages # Python packages path #### CloudFormation Settings bucket=silvaengine-aws # S3 bucket for zip packages region_name=us-west-2 # AWS region aws_access_key_id=XXXXXXXXXXXXXXXXXXX # AWS Access Key ID aws_secret_access_key=XXXXXXXXXXXXXXXXXXX # AWS Secret Access Key iam_role_name=silvaengine_exec (optional) # IAM role for SilvaEngine Base. microcore_iam_role_name=silvaengine_microcore_dw_exec (optional) # IAM role for silvaEngine microcore. # AWS Lambda Function Variables REGIONNAME=us-west-2 # AWS region for resources EFSMOUNTPOINT=/mnt # EFS mount point (optional) PYTHONPACKAGESPATH=pypackages # Folder for large packages (optional) runtime=python3.11 # Lambda function runtime (optional) security_group_ids=sg-XXXXXXXXXXXXXXXXXXX # Security group IDs (optional) subnet_ids=subnet-XXXXXXXXXXXXXXXXXXX,subnet-XXXXXXXXXXXXXXXXXXX # Subnet IDs (optional) efs_access_point=fsap-XXXXXXXXXXXXXXXXXXX # EFS access point (optional) efs_local_mount_path=/mnt/pypackages # EFS local mount path (optional) {function name or layer name}_version=XXXXXXXXXXXXXXXXXXX # Function or layer version (optional)
Example Configuration:
#### Stack Deployment Settings root_path=../silvaengine_aws site_packages=/var/python3.11/silvaengine/env/lib/python3.11/site-packages #### CloudFormation Settings bucket=xyz-silvaengine-aws region_name=us-west-2 aws_access_key_id=XXXXXXXXXXXXXXXXXXX aws_secret_access_key=XXXXXXXXXXXXXXXXXXX REGIONNAME=us-west-2 runtime=python3.11
-
Run the following command to access the container:
$ docker exec -it container-aws-suites-311 /bin/bash -
Activate the virtual environment:
source /var/python3.11/silvaengine/env/bin/activate -
Navigate to the deployment directory and execute the CloudFormation stack:
cd ./datawald_deployment python cloudformation_stack.py .env silvaengine
-
Add entries into the
se-endpoints(DynamoDB Table) collection, using theendpoint_idfrom thelambda_config.jsonfile located in thedatawald_deploymentdirectory. The format for each entry should be as follows:{ "endpoint_id": {endpoint_id}, "code": 0, "special_connection": true } -
For each
endpoint_idin thelambda_config.jsonfile withindatawald_deployment, insert two separate records intose-connections(DynamoDB table):-
One record using the static
api_keyvalue '#####':{ "endpoint_id": {endpoint_id}, "api_key": "#####", "functions": [] } -
Another record with the actual
api_keyassociated with the deployed AWS API Gateway:{ "endpoint_id": {endpoint_id}, "api_key": {api_key}, "functions": [] }
-
-
To access the container, execute the following command:
$ docker exec -it container-aws-suites-311 /bin/bash -
Activate the Python virtual environment by running:
source /var/python3.11/silvaengine/env/bin/activate -
Navigate to the
datawald_deploymentdirectory and execute the CloudFormation stack setup script:cd ./datawald_deployment sh dw_requirements.sh
To establish the base configuration, insert the following records into the se-configdata DynamoDB table:
[
{
"setting_id": "beta_core_dw",
"variable": "area",
"value": "core"
},
{
"setting_id": "beta_core_dw",
"variable": "user_source",
"value": "0"
},
{
"setting_id": "datawald_agency",
"variable": "DW_API_KEY",
"value": "XXXXXXXXXXXXXXXXXXX"
},
{
"setting_id": "datawald_agency",
"variable": "DW_API_URL",
"value": "https://xxxxxxxxxx.execute-api.us-xxxxx-x.amazonaws.com/beta"
},
{
"setting_id": "datawald_agency",
"variable": "DW_AREA",
"value": "core"
},
{
"setting_id": "datawald_agency",
"variable": "DW_ENDPOINT_ID",
"value": "dw"
},
{
"setting_id": "datawald_agency",
"variable": "input_queue_name",
"value": "datawald_input_queue.fifo"
},
{
"setting_id": "datawald_agency",
"variable": "task_queue_name",
"value": "silvaengine_task_queue.fifo"
},
{
"setting_id": "datawald_agency",
"variable": "tx_type",
"value": {
"asset": [
"product",
"inventory",
"inventorylot",
"pricelevel",
"inventory_data"
],
"person": [
"customer",
"vendor",
"company",
"contact",
"company_type",
"factory"
],
"transaction": [
"order",
"invoice",
"purchaseorder",
"itemreceipt",
"itemfulfillment",
"opportunity",
"quote",
"rma",
"billcredit",
"payment",
"inventoryadjustment",
"creditmemo",
"inventorytransfer"
]
}
}
]Configuration Details:
- DW_API_KEY: API key for authentication.
- DW_API_URL: Endpoint URL provided by SilvaEngine.
- DW_AREA: Variable defining the area for the
datawald_interface_enginecore module. - DW_ENDPOINT_ID: Endpoint identifier for the core module.
- input_queue_name: SQS queue for incoming messages.
- task_queue_name: SQS queue for dispatching tasks.
- tx_type: Data types categorized as assets, persons, and transactions.
Insert the following records into the se-configdata DynamoDB table:
[
{
"setting_id": "datawald_interface_engine",
"variable": "default_cut_date",
"value": "2024-05-24T02:21:00+00:00"
},
{
"setting_id": "datawald_interface_engine",
"variable": "input_queue_name",
"value": "datawald_input_queue.fifo"
},
{
"setting_id": "datawald_interface_engine",
"variable": "max_entities_in_message_body",
"value": "200"
},
{
"setting_id": "datawald_interface_engine",
"variable": "sync_task_notification",
"value": {
"<endpoint_id>": {
"<data_type>": "<async_function>"
}
}
},
{
"setting_id": "datawald_interface_engine",
"variable": "task_queue_name",
"value": "silvaengine_task_queue.fifo"
}
]Configuration Details:
- default_cut_date: Default cut-off date for data synchronization.
- input_queue_name: SQS queue for receiving incoming messages.
- max_entities_in_message_body: Maximum number of entities allowed per message body.
- sync_task_notification: Asynchronous notification configuration.
- task_queue_name: SQS queue for dispatching tasks.
- NSAgency for NetSuite Integration: Facilitates data exchange with NetSuite. See the DataWald NSAgency GitHub repository.
- DynamoDBAgency for Data Integration: Automates synchronization with DynamoDB. See the DataWald DynamoDBAgency GitHub repository.
- SQSAgency for AWS SQS Data: Integrates with AWS SQS for data processing. See the DataWald SQSAgency GitHub repository.
NetSuite Configuration:
{
"endpoint_id": "ns",
"api_key": "#####",
"functions": [
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
"function": "retrieve_entities_from_source",
"setting": "datawald_nsagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
"function": "insert_update_entities_to_target",
"setting": "datawald_nsagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
"function": "update_sync_task",
"setting": "datawald_nsagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_ns",
"function": "retry_sync_task",
"setting": "datawald_nsagency"
}
]
}SQS Configuration:
{
"endpoint_id": "sqs",
"api_key": "#####",
"functions": [
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
"function": "retrieve_entities_from_source",
"setting": "datawald_sqsagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
"function": "insert_update_entities_to_target",
"setting": "datawald_sqsagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
"function": "update_sync_task",
"setting": "datawald_sqsagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_sqs",
"function": "retry_sync_task",
"setting": "datawald_sqsagency"
}
]
}DynamoDB Configuration:
{
"endpoint_id": "datamart",
"api_key": "#####",
"functions": [
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
"function": "retrieve_entities_from_source",
"setting": "datawald_dynamodbagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
"function": "insert_update_entities_to_target",
"setting": "datawald_dynamodbagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
"function": "update_sync_task",
"setting": "datawald_dynamodbagency"
},
{
"aws_lambda_arn": "arn:aws:lambda:us-xxxx-x:xxxxxxxxxxxx:function:silvaengine_microcore_dynamodb",
"function": "retry_sync_task",
"setting": "datawald_dynamodbagency"
}
]
}Feel free to create a GitHub issue or send us an email for support regarding this application.

