GCP Dataproc Serverless for Spark v/s AWS Glue and Databricks

‘Do more with less’ has been a top CIO agenda, primarily as businesses contend with tightening IT budgets, a widening skills gap, and ever-mounting pressures to innovate. While traditional cloud offerings pushed the needle on all three fronts, over time, the complexity, cost, and efforts required to execute workloads in the cloud started to outweigh business returns. The cloud-enabled outsourcing of infrastructure elements still required high-level configuration and management of resources such as clusters or orchestrators. Below-the-line infrastructure management still takes up developer bandwidth, draining developer efforts into activities unrelated to their core skills.

Google Cloud Platform (GCP) introduced the industry’s first autoscaling Spark offering with Dataproc Serverless. GCP Dataproc Serverless abstracts underlying infrastructure management, letting developers focus on the core job.

Read: Five ways enterprises can fire up innovation with Dataproc Serverless for running Spark workloads

But how does GCP’s Dataproc Serverless for Spark compare to similar offerings such as AWS Glue or Databricks? Why should CIOs choose GCP Serverless Spark over other offerings?

Here’s why:

Spark Fidelity: Databricks Serverless supports only Spark SQL workloads. At the same time, AWS Glue leans more into enterprise-grade Spark jobs, with minimal support for lower-volume Spark workloads. On the other hand, GCP’s Dataproc Serverless supports all popular Spark flavors.
Truly NoOps: By eliminating the need for developers to interface with operations or admin teams, GCP enables a NoOps model with Dataproc Serverless for Spark. It boosts developer productivity by returning the focus to coding without worrying about performance testing the pipelines or cluster management. In comparison, Databricks and AWS Glue require a bit of infrastructure management and are not NoOps-enabling.
Truly Serverless: Databricks requires a developer to set up a SQL endpoint, akin to creating a cluster that auto-scales. GCP’s Dataproc Serverless offering is truly serverless. It offers automation of infrastructure creation, freeing up developer resources from having to define clusters. With GCP’s Dataproc Serverless, developers can solely focus on writing code contextualized to business needs since the infrastructure creation is now Google-managed.
Lower Wait Time: GCP Dataproc Serverless takes less than 45 seconds to set up servers compared to the 7 minutes that Databricks takes. It reduces the wait time for spinning up infrastructure to code, allowing developers to get on the job within a minute. Google plans to further reduce the start-up time for Dataproc Serverless to 5 seconds.
Reduced Cost: Dataproc Serverless auto-scales Spark clusters as per the job characteristics, and GCP charges enterprises only for the duration of the job run, leading to significantly reduced IT costs. In contrast to similar offerings, enterprises pay for the entire time the clusters are up.

Why GCP Dataproc Serverless makes sense for you

GCP Dataproc Serverless is an excellent fit if you run Spark workloads with Spark version 3.2+. Infrastructure maintenance is practically free, your Spark jobs are secure by design, and you get Google’s scalability and speed to back all of this up.

Start your GCP Dataproc Serverless journey today! Get in touch with us here.

Author

Ambesh Singh

Architect

How Google’s Dataproc Serverless for Spark compares with Databricks and AWS Glue

Why GCP Dataproc Serverless makes sense for you

Explore our Industry & Service Offerings

Related Content

Author

Ambesh Singh

Related Content

Contact us