Project Fuse
Work with our primary data scientist to migrate EXCEL/VBS scripted analytical into modern Python-based containerised solution.
Outdated Excel/VBS
- Slow, 2 hours to finish daily jobs
- Costly, 200 EC2 servers + Windows/Excel lisence fee every day
- Poor Code, duplicated codes, unnecessary loops
Rewrite in Python
Rewrote the analysis logic in modern Python stack including:
- Pandas and Polar
- Numpy
- Matplotlib and Seaborn
Serverless Architecture
- Firstly on AWS Lambda then moved to AWS Batch
- DataFrame choice: Pandas vs Polar
- Lambda performance/cost tunning
CICD and Monitor
- Automated test/build/deployment
- Build up Lambda layer and AMI image for runtime
- Setup custom metrics, traces on CloudWatch
Modulization
- Configuration extract
- TDD applied
- Data reader and writer for local, S3 and DB
Great Result
- 30 secs to process daily analysis for 20k sites worldwide
- 10x faster and lower cost