Fitting 100 Statistical Distributions at Scale: 1000x Memory Reduction with PySpark
How spark-bestfit 2.0 fits distributions across Spark, Ray, and local backends with a class-based API and 1000x memory reduction
Read moreTechnical articles on data engineering, Apache Spark, Python, MLOps, and site reliability engineering by Dustin Smith.
How spark-bestfit 2.0 fits distributions across Spark, Ray, and local backends with a class-based API and 1000x memory reduction
Read moreHow I built a configuration-driven Spark pipeline framework with structured observability and reflection-based component instantiation
Read moreMy experience from Delivery Hero's 2023 layoffs.
Read moreHow to use Monte Carlo simulations in Python to make better capital investment decisions, with a practical example of evaluating cloud migration costs.
Read moreHow to use the dataconf library to parse HOCON, JSON, YAML, and properties files directly into Python dataclasses with full type safety.
Read moreHow intelligent data optimization with linear ordering and Z-ordering achieved 77% storage reduction and 90% runtime improvements on petabyte-scale data lakes.
Read more