Skip to main content
Home
  • About
  • Blog
  • Resume
Menu
Home About Blog Resume

Optimization

Preview for Optimization
Dec 23, 2025

Fitting 100 Statistical Distributions at Scale: 1000x Memory Reduction with PySpark

How spark-bestfit 3.0 fits distributions across Spark, Ray, and local backends with survival analysis, mixture models, and multivariate support

spark python data engineering data science statistics optimization ray distributed computing survival analysis
Read more
Preview for Optimization
Jul 28, 2021

Data Optimization for Compacted Partitions: Achieving 77% Storage Reduction

How intelligent data optimization with linear ordering and Z-ordering achieved 77% storage reduction and 90% runtime improvements on petabyte-scale data lakes.

apache spark data engineering big data optimization parquet orc
Read more

No posts found matching your search.

Contact

Dustin.William.Smith@gmail.com

Location

Ha Noi, Viet Nam
Creative Commons CC-BY
2026 Dustin Smith