I’m a Data Engineer with experience building data pipelines, streaming systems, and ML-driven analytics using Python and AWS.
I come from a traditional engineering background and transitioned into data engineering by building real systems end to end — not just notebooks or tutorials. I focus on making data usable, scalable, and reliable, especially in environments where resources, time, or cloud budgets are limited.
I design and implement systems that:
- Ingest and process data reliably
- Support analytics and machine learning use cases
- Follow cloud-native and event-driven architecture principles
- Are understandable and maintainable by teams
My work sits at the intersection of data engineering, backend systems, and applied machine learning.
Real-time, event-driven architecture
- Built a streaming data platform using Kafka
- Event-driven ingestion and processing
- Routing concepts inspired by AWS Route 53
- Observability with Grafana
- Focus on data flow, reliability, and monitoring
Tech: Python, Kafka, Event-driven architecture, Grafana
→ Repository: ecommerce-streaming-data-platform
Batch ingestion and analytics
- End-to-end data ingestion and processing pipeline
- Structured data lake layout
- Designed for analytics and reporting use cases
- Emphasis on automation and data quality checks
Tech: Python, Data pipelines
→ Repository: datalake-analytics-pipeline
Machine learning on AWS
- Customer churn prediction workflow
- Training and evaluation using AWS SageMaker
- End-to-end ML lifecycle: data prep → training → evaluation
- Focus on deployable, reproducible ML workflows
Tech: Python, AWS SageMaker, Machine Learning
→ Repository: customer-churn-prediction
Predictive analytics for inventory management
- Time-series forecasting for demand prediction
- Feature engineering and model training pipeline
- Designed to support business decision-making
Tech: Python, Forecasting models
→ Repository: demand-forecasting-system
Distributed systems & event-driven backend (AWS-style simulation)
- Microservices-based backend system for a radio station
- Event-driven communication using Kafka
- Service coordination with ZooKeeper
- Built with Java, Spring Boot, and Spring Cloud
- Architecture designed to simulate AWS-managed services locally for learning and cost efficiency
Focus: Distributed systems design, messaging, service discovery
Tech: Java, Spring Boot, Spring Cloud, Kafka, ZooKeeper
- Python, SQL
- Kafka
- Batch & streaming pipelines
- Data modeling and data flow design
- AWS (S3, Lambda, DynamoDB, SageMaker, API Gateway)
- Infrastructure as Code: Terraform (hands-on learning and application)
- IAM & least-privilege design
- scikit-learn
- Time-series forecasting
- ML pipelines and experimentation
- SageMaker workflows
- Java
- Spring Boot, Spring Cloud
- Microservices architecture
- Event-driven systems
- Designing AWS-native architectures
- Infrastructure as Code with Terraform
- Improving observability and system design
- Preparing for Data Engineer / Data Platform Engineer roles
- 💼 LinkedIn: https://www.linkedin.com/in/rociobaigorria/
- 📧 Email: rociomnbaigorria@gmail.com
- 🌍 Location: Argentina (GMT-3) — open to remote opportunities
“Making data accessible to people who actually need to use it.”
ecommerce-streaming-data-platformdatalake-analytics-pipelinecustomer-churn-predictiondemand-forecasting-system