ID: data-guarddata
Data-Guard: Resilient Sales Pipeline
Engineered a defensive end-to-end pipeline to process and validate Tunisian retail sales events. - Hybrid Lambda-Medallion architecture using Spark Structured Streaming and Kafka - Silver-Guard validation engine enforcing 7 business guardrails and quarantine logic - DuckDB and Parquet serving layer for sub-second Power BI reporting - Fully containerized ecosystem with Docker for reproducible deployments
PythonKafkaSparkDuckDBPower BIDocker
ACCESS REPO →ID: realtime-gaming-pipelinedata
Real-Time Data Streaming Pipeline
Streamed hundreds of live game events per minute and processed player scores and logins instantly. - Kafka ingestion for real-time event capture - Spark processing for low-latency enrichment - ELK Stack (Elasticsearch, Kibana) for live metrics dashboards - Dockerized infrastructure for easy deployment
PythonKafkaSparkElasticsearchKibanaDocker
ACCESS REPO →ID: ai-job-matchingdata
AI Job-Matching Pipeline
Developed a real-time ingestion pipeline for semantic resume-to-job matching. - all-MiniLM-L6-v2 embeddings for semantic similarity - JobSpy adaptation for localized Tunisian job indexing - SQLite persistence for search history, duplicate detection, and observability logging
PythonSentenceTransformersStreamlitSQLitePyMuPDFMachine Learning
ACCESS REPO →ID: airflow-weatherdata
Airflow Weather Pipeline
Fetched weather data from OpenWeatherMap API and trained regression models. - Automated ETL with Apache Airflow - JSON to CSV transformation - LinearRegression, DecisionTree, and RandomForest models achieving ~79% accuracy
PythonAirflowPandasscikit-learnDocker
ID: gitlab-cicddata
GitLab CI/CD Pipeline
Created CI/CD pipeline for Python data processing. - GitLab project, runner, and SSH authentication setup - Automated build, test, and deployment - Kubernetes (MicroK8s) orchestration
PythonGitLab CI/CDKubernetesMicroK8s
ID: mongodb-projectdata
MongoDB Data Engineering
Imported ~200k JSON documents and performed advanced analytics. - MQL queries and aggregation pipelines - Data modeling for retrieval optimization - Large-scale document processing
MongoDBPythonMQL