I’m Cyprien Kelma, a Cloud Data Engineer based in Lille, France.
I love designing, implementing, and maintaining data processing systems that transform corporate data into actionable and profitable return on investment.
After a successful four-month Data Engineering internship at Decathlon in Brussels, I landed a one-year work-study (apprenticeship) program at Decathlon in France. Here I'm furthering my technical skills while completing my last year of software engineering degree at ISEN Lille.
Rather than rushing headlong into things, my approach is to always prioritize system designs that best meet requirements and good practice. As well as ensuring that solutions are optimized and robust in the long term.
While I advocate for using the right tech for the right task (rather than choosing with the tempting shiny object syndrome 👀), I mostly leverage the following technologies:
- The good timeless classics : SQL and Python (with dbt, PySpark and Pandas)
- Spark for distributed computing
- Airflow and Prefect for batch pipeline orchestration
- Relationnal Databases (PostgreSQL, SQLite) and NoSQL (MongoDB, Cassandra, Redis)
- Java (and Spring Boot for Backend)
- Databricks and Delta Lake, on which I'm currently working to pass the Data Engineer Professional Certification this year
- DevOps/DataOps Concepts and CI/CD with GitHub Action, especially to manage artifact publication and project deployment on dev/preprod/prod environments
- Docker and Kubernetes (on premise or with Cloud services)
- IaC and Backend State with Terraform
I love working with Cloud solution 🙂. And even if I believe more in deep understanding of concepts rather than debate which tool to use, I have a personal preference building Cloud System with products from the Google Cloud Platform. Especially those : BigQuery, Cloud Storage, Cloud Runner, GKE, Cloud Run, Dataform, and DataFlow.
I'm currently actively working on the GCP Professional DE Certification this year too. That said, I'm far from disliking AWS, especially MWAA, S3, EC2, Lambda and ECR/EKS.
I'm currently working on this Cloud Data Engineering project.
- It's a complete ELT pipeline architecture template that can be reused by anyone. The goal is to pre-build a fully working data storage and processing system that cover everything from infrastructure to orchestration and configuration, so that it can be ready to use in less than 20 minutes.
- Perfect for startup or small company that want to start getting insight from their raw data without spending to much time and energy on infrastructure and pipeline creation.
- Stack : GCP (Cloud Storage Bucket, BigQuery, Cloud Run), Prefect Cloud, dbt, Power BI (other choices is possible)
Apart from this one, I’ve got a bunch of interesting public pinned projects. For exemple, you can check this scalable, distributed, data system architeture.
I am always open to discussing any questions or freelance work opportunities to do in addition to my main job :)
You can text me on Linkedin : Cyprien Kelma



