Project Fuse

Work with our primary data scientist to migrate EXCEL/VBS scripted analytical into modern Python-based containerised solution.

Outdated Excel/VBS

  • Slow, 2 hours to finish daily jobs
  • Costly, 200 EC2 servers + Windows/Excel lisence fee every day
  • Poor Code, duplicated codes, unnecessary loops

Rewrite in Python

Rewrote the analysis logic in modern Python stack including:
  • Pandas and Polar
  • Numpy
  • Matplotlib and Seaborn

Serverless Architecture

  • Firstly on AWS Lambda then moved to AWS Batch
  • DataFrame choice: Pandas vs Polar
  • Lambda performance/cost tunning

CICD and Monitor

  • Automated test/build/deployment
  • Build up Lambda layer and AMI image for runtime
  • Setup custom metrics, traces on CloudWatch

Modulization

  • Configuration extract
  • TDD applied
  • Data reader and writer for local, S3 and DB

Great Result

  • 30 secs to process daily analysis for 20k sites worldwide
  • 10x faster and lower cost