Fitting 100 Statistical Distributions at Scale: 1000x Memory Reduction with PySpark

How spark-bestfit 2.0 fits distributions across Spark, Ray, and local backends with a class-based API and 1000x memory reduction

spark python data engineering data science statistics optimization ray distributed computing
Read more

Building Production-Ready Spark Pipelines with Configuration-Driven Architecture

How I built a configuration-driven Spark pipeline framework with structured observability and reflection-based component instantiation

spark scala data engineering open source observability
Read more

Delivery Hero 2023 January Layoffs

My experience from Delivery Hero's 2023 layoffs.

layoffs tech layoffs delivery hero
Read more

Capital Budgeting with Monte Carlo Simulations in Python

How to use Monte Carlo simulations in Python to make better capital investment decisions, with a practical example of evaluating cloud migration costs.

python finance monte carlo capital budgeting data science
Read more

Configuration Files in Python Using Dataclasses

How to use the dataconf library to parse HOCON, JSON, YAML, and properties files directly into Python dataclasses with full type safety.

python dataclasses configuration dataconf type safety
Read more

Data Optimization for Compacted Partitions: Achieving 77% Storage Reduction

How intelligent data optimization with linear ordering and Z-ordering achieved 77% storage reduction and 90% runtime improvements on petabyte-scale data lakes.

apache spark data engineering big data optimization parquet orc
Read more