Our client produces portable ultrasound imaging devices to make expensive medical imaging available and accessible on the move. Efficiently engineered tools for ultrasound imaging coupled with ML help medical practitioners in diagnosis and quick responses to medical cases.
The client used ultrasound images to train their hundreds of ML/Deep Learning models on Amazon Web Services (AWS). They wanted to improve the training of their Deep Learning models on three parameters – speed of training, the price per run, and ease of use. Hence, they decided to evaluate Google Cloud Platform against AWS through Deep Learning model training experiments.
They chose Persistent Systems as their digital consulting partner to carry out the extensive evaluation process aimed at benchmarking GCP against AWS for Deep Learning model trainings.
The client provided a representative model out of their model gallery. It was a multi-layered convolutional neural network built on Tensorflow. Also, training data and containerized training code were shared with our team. The team set up separate projects on GCP and AWS environments, ran the training experiments on three kinds of GPUs – Tesla K80s, P4, and V100, and experimented with different machine types.
The training routines ran in multiple iterations. Training time and cost performance on GCP were improved after each iteration by evaluating the impact of preemptible VMs, different machine types, different hardware accelerators (for example, GPUs/TPUs). A report recorded the time and cost performance benchmarks of moving the Deep Learning workloads to GCP.
The experiment results against the three parameters of ease of use, cost, and speed were as follows:
Improvements:
- Ease of experiment management with one-click deployment of Kubeflow Pipelines
- Since requesting a preemptible VM in GCP did not have to go through the bidding process as in AWS, the client could easily deploy preemptible VMs at a more predictable pricing
Cost advantage:
- GCP’s resource-based pricing for Custom Machine Types allowed selecting only as much vCPU and RAM as required in contrast to AWS’s Fixed Machine Types
- Overall, the experiments saw 30% cost savings on similar hardware configurations moved to GCP. There was a price difference between AWS and GCP, and GCP also applied Custom Machine Types, sustained usage discount, and committed usage discounts
- As much as 75% savings were observed with preemptible VMs on GCP as compared to AWS
Speed advantage:
- In many cases, the time taken for each experiment was reduced at an average of 18% on GCP as compared to AWS
At the end of the engagement, we successfully trained the Deep Learning model on GCP with improved speed and performance. The project on GCP highlighted Google’s capabilities in automatic scaling of ML/Deep Learning training.