Stan Büren stan-buren

Hi there 👋

I'm Stan Büren

Forward-thinking Data Engineer evolving into a Data Architect, skilled in building reliable, scalable, and secure data foundations that turn raw data into actionable business insights. Combines 3+ years of business & compliance experience with 2+ years of hands-on engineering and programming experience within production environments.

🎯 Core Philosophy

"My architectural focus is on treating data as a product."

Quality-First Engineering: I prioritize data quality and apply rigorous software engineering best practices to distributed systems so that downstream analysts and data scientists can derive value without friction.
Declarative Logic: I strongly advocate for declarative, easily readable logic, such as modular SQL in dbt or well-documented PySpark transformations.
Production Stability: A robust pipeline is not just about moving data from Point A to Point B; it is about testing, governing, and securing it along the way—especially when navigating strict compliance frameworks like DORA and the EU AI Act.

💡 The Engineering Intersection

To me, data engineering is the ultimate intersection of security, data management, DataOps, data architecture, orchestration, and software engineering. I am highly communicative, endlessly curious, and deeply passionate about continuous learning.

Furthermore, I am an AI-empowered practitioner. I actively leverage modern AI tooling in my daily workflows to significantly boost my productivity, accelerate my learning curve, and deliver faster, higher-quality results for the business.

🛠 Tech Stack

💻 Languages
🗄️ Databases
☁️ Cloud
⚙️ Infrastructure
🔧 Data Engineering
📊 Visualization
⚙️ Webpages

Certificates

📁 Projects

⚡ ENTSO-E Data Pipeline & Lakehouse

A production-ready metadata ingestion engine and local lakehouse orchestrator for European power grid data.

This repository implements a scalable data ingestion layer pulling electrical transmission metadata from the ENTSO-E platforms. It features a layered, fully testable I/O structure that separates raw client fetches from cloud storage uploads.

Layered I/O & Emulated Lakehouse: Orchestrates dynamically configured sync routines (using standard pytest mocks) and syncs data to a local SeaweedFS S3 instance running Apache Iceberg tables on Spark 4.1.1.
Centralized Path SSOT: All directory layouts are declaratively configured in paths.yml (Single Source of Truth), dynamically populated as Python Path objects, and audited via AST-based quality gates to prevent hardcoding.
Configurable Ingestion Scopes: Developers can declaratively select specific power grid domains (Load, Generation, Transmission) to ingest via YAML configurations.
Strict Observability: Uses structured logging, custom domain exceptions, and localized limits config to prevent API rate-limiting issues.

🛡️ N-CMAPSS Telemetry Factory: Predictive Maintenance Digital Twin

An industrial-grade real-time MLOps pipeline estimating engine Remaining Useful Life (RUL) via Bayesian Inference.

This project features a real-time streaming telemetry pipeline and a complete MLOps lifecycle. The core Bayesian Variational Inference (Flipout) model is trained on Google Cloud Platform (GCP) using high-performance compute nodes, then packaged and cryptographically signed before deployment for inference.

High-Performance GCP Training Loop: Orchestrates automated training runs on ephemeral GCP Compute Engine instances (AMD Milan-based C2D High-Performance instances) provisioned via Terraform. Preprocesses massive NASA HDF5 telemetry datasets in parallel across 32 cores, computing global Z-score statistics before training.
Keyless Attestation & Secure MLOps: Hardens model distribution by exporting weights as SafeTensors, generating keyless cryptographic signatures via Sigstore / Cosign on the GCS-integrated worker, and publishing build metadata (provenance.json) to Google Artifact Registry.
Bayesian VI & Flight Class Analysis: Solves short-haul vs long-haul mission estimation drift using Bayesian CNNs to output both RUL predictions and a real-time confidence/uncertainty (Sigma) threshold.
Hardware Isolation Shim: Employs an adaptive runtime shim layer that dynamically intercepts research-grade execution parameters (via metaclass hooks) to force CPU execution and prevent CUDA runtime crashes on edge serving nodes.

✈️ C-MAPSS Factory 4.0: Scalable Engine Telemetry Pipeline

An enterprise-grade, real-time data engineering pipeline streaming aircraft engine telemetry.

This project simulates a fleet of aircraft engines generating high-frequency telemetry in real-time, ingests the massive event stream using a modern distributed stack, and delivers analytical health insights through a cloud data warehouse.

High-Throughput Edge Simulator: Features a custom Golang simulator acting as an edge device, streaming millions of sensor telemetry records directly to Redpanda (Kafka).
Structured Streaming & Lakehouse: Consumes event streams via PySpark 4.1.1 and flushes them to Google Cloud Storage (GCS) as Parquet files using Hive partitioning to minimize downstream scan costs.
Zero-Copy BigQuery DWH: Integrates BigQuery External Tables to automatically discover GCS partitions, enabling analytics without data duplication.
Analytics Engineering & BI: Implements staging and metrics mart layers in dbt (calculating running averages of exhaust temperature margins) and visualizes engine degradation in a Streamlit dashboard.

🎭 Emozika Theatre Website (Live Site)

A modern, data-driven static website for a children's theatre studio in Saint Petersburg.

Designed as a responsive, content-rich web platform for a real children's theatre studio. Originally built on vanilla HTML/JS, the site was refactored into a modular, high-performance static site using Astro and JSON-driven content schemas.

Astro Architecture Migration: Refactored the codebase from a monolithic layout into a multi-page static site utilizing reusable Astro components and layouts.
JSON-Driven Content Modeling: Decoupled structural data (repertoire, scheduling, FAQs) from HTML, storing them as clean JSON collections rendered dynamically inside Astro templates.
Advanced Styling with Sass: Reorganized custom styles using modular SCSS variables, nested nesting, and structured mixins compiled via Vite.
Third-Party Afisha Integration: Seamlessly embeds the Yandex.Afisha widget, letting users browse shows and securely purchase tickets inline.
SEO & Performance Optimization: Generated zero-JS static HTML outputs, ensuring sub-second load times, excellent web vitals, and search engine indexability.

📬 Let's Connect

Feel free to reach out for collaborations, data engineering discussions, or just to say hello!

💬 Telegram: @stan_buren
🐦 X (Twitter): @stan_buren
📧 Email: stan-buren-dev@proton.me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly