Skip to main content
Home
  • About
  • Blog
  • Resume
Menu
Home About Blog Resume

Optimization

Preview for Optimization
Dec 23, 2025

Fitting 100 Statistical Distributions at Scale: 1000x Memory Reduction with PySpark

How spark-bestfit 2.0 fits distributions across Spark, Ray, and local backends with a class-based API and 1000x memory reduction

spark python data engineering data science statistics optimization ray distributed computing
Read more
Preview for Optimization
Jul 28, 2021

Data Optimization for Compacted Partitions: Achieving 77% Storage Reduction

How intelligent data optimization with linear ordering and Z-ordering achieved 77% storage reduction and 90% runtime improvements on petabyte-scale data lakes.

apache spark data engineering big data optimization parquet orc
Read more

No posts found matching your search.

Contact

Dustin.William.Smith@gmail.com

Location

Ha Noi, Viet Nam
Creative Commons CC-BY
2026 Dustin Smith