Forward-thinking Data Engineer evolving into a Data Architect, skilled in building reliable, scalable, and secure data foundations that turn raw data into actionable business insights. Combines 3+ years of business & compliance experience with 2+ years of hands-on engineering and programming experience within production environments.
"My architectural focus is on treating data as a product."
- Quality-First Engineering: I prioritize data quality and apply rigorous software engineering best practices to distributed systems so that downstream analysts and data scientists can derive value without friction.
- Declarative Logic: I strongly advocate for declarative, easily readable logic, such as modular SQL in dbt or well-documented PySpark transformations.
- Production Stability: A robust pipeline is not just about moving data from Point A to Point B; it is about testing, governing, and securing it along the way—especially when navigating strict compliance frameworks like DORA and the EU AI Act.
To me, data engineering is the ultimate intersection of security, data management, DataOps, data architecture, orchestration, and software engineering. I am highly communicative, endlessly curious, and deeply passionate about continuous learning.
Furthermore, I am an AI-empowered practitioner. I actively leverage modern AI tooling in my daily workflows to significantly boost my productivity, accelerate my learning curve, and deliver faster, higher-quality results for the business.
| 💻 Languages | |
| 🗄️ Databases | |
| ☁️ Cloud | |
| ⚙️ Infrastructure | |
| 🔧 Data Engineering |
|
| 📊 Visualization | |
| ⚙️ Webpages |
A production-ready metadata ingestion engine and local lakehouse orchestrator for European power grid data.
This repository implements a scalable data ingestion layer pulling electrical transmission metadata from the ENTSO-E platforms. It features a layered, fully testable I/O structure that separates raw client fetches from cloud storage uploads.
- Layered I/O & Emulated Lakehouse: Orchestrates dynamically configured sync routines (using standard
pytestmocks) and syncs data to a local SeaweedFS S3 instance running Apache Iceberg tables on Spark 4.1.1. - Centralized Path SSOT: All directory layouts are declaratively configured in
paths.yml(Single Source of Truth), dynamically populated as PythonPathobjects, and audited via AST-based quality gates to prevent hardcoding. - Configurable Ingestion Scopes: Developers can declaratively select specific power grid domains (Load, Generation, Transmission) to ingest via YAML configurations.
- Strict Observability: Uses structured logging, custom domain exceptions, and localized limits config to prevent API rate-limiting issues.
An industrial-grade real-time MLOps pipeline estimating engine Remaining Useful Life (RUL) via Bayesian Inference.
This project features a real-time streaming telemetry pipeline and a complete MLOps lifecycle. The core Bayesian Variational Inference (Flipout) model is trained on Google Cloud Platform (GCP) using high-performance compute nodes, then packaged and cryptographically signed before deployment for inference.
- High-Performance GCP Training Loop: Orchestrates automated training runs on ephemeral GCP Compute Engine instances (AMD Milan-based C2D High-Performance instances) provisioned via Terraform. Preprocesses massive NASA HDF5 telemetry datasets in parallel across 32 cores, computing global Z-score statistics before training.
- Keyless Attestation & Secure MLOps: Hardens model distribution by exporting weights as SafeTensors, generating keyless cryptographic signatures via Sigstore / Cosign on the GCS-integrated worker, and publishing build metadata (
provenance.json) to Google Artifact Registry. - Bayesian VI & Flight Class Analysis: Solves short-haul vs long-haul mission estimation drift using Bayesian CNNs to output both RUL predictions and a real-time confidence/uncertainty (Sigma) threshold.
- Hardware Isolation Shim: Employs an adaptive runtime shim layer that dynamically intercepts research-grade execution parameters (via metaclass hooks) to force CPU execution and prevent CUDA runtime crashes on edge serving nodes.
An enterprise-grade, real-time data engineering pipeline streaming aircraft engine telemetry.
This project simulates a fleet of aircraft engines generating high-frequency telemetry in real-time, ingests the massive event stream using a modern distributed stack, and delivers analytical health insights through a cloud data warehouse.
- High-Throughput Edge Simulator: Features a custom Golang simulator acting as an edge device, streaming millions of sensor telemetry records directly to Redpanda (Kafka).
- Structured Streaming & Lakehouse: Consumes event streams via PySpark 4.1.1 and flushes them to Google Cloud Storage (GCS) as Parquet files using Hive partitioning to minimize downstream scan costs.
- Zero-Copy BigQuery DWH: Integrates BigQuery External Tables to automatically discover GCS partitions, enabling analytics without data duplication.
- Analytics Engineering & BI: Implements staging and metrics mart layers in dbt (calculating running averages of exhaust temperature margins) and visualizes engine degradation in a Streamlit dashboard.
A modern, data-driven static website for a children's theatre studio in Saint Petersburg.
Designed as a responsive, content-rich web platform for a real children's theatre studio. Originally built on vanilla HTML/JS, the site was refactored into a modular, high-performance static site using Astro and JSON-driven content schemas.
- Astro Architecture Migration: Refactored the codebase from a monolithic layout into a multi-page static site utilizing reusable Astro components and layouts.
- JSON-Driven Content Modeling: Decoupled structural data (repertoire, scheduling, FAQs) from HTML, storing them as clean JSON collections rendered dynamically inside Astro templates.
- Advanced Styling with Sass: Reorganized custom styles using modular SCSS variables, nested nesting, and structured mixins compiled via Vite.
- Third-Party Afisha Integration: Seamlessly embeds the Yandex.Afisha widget, letting users browse shows and securely purchase tickets inline.
- SEO & Performance Optimization: Generated zero-JS static HTML outputs, ensuring sub-second load times, excellent web vitals, and search engine indexability.
Feel free to reach out for collaborations, data engineering discussions, or just to say hello!
- 💬 Telegram: @stan_buren
- 🐦 X (Twitter): @stan_buren
- 📧 Email: stan-buren-dev@proton.me



