Terraform Module, Used for Running Kubeflow ML Pipelines, Open-Sourced by Spotify

Spotify’s Terraform module, which is used for creating Google Kubernetes Engine (GKE) clusters to run Kubeflow machine learning (ML) pipelines, has been open-sourced by the company. This would help Spotify to do 7x more experiments in developing end-to-end machine learning solutions and deploying these solutions to the market faster than ever before.

What is Spotify’s Paved Road Concept?

Spotify’s Discover Weekly service uses machine learning to provide music recommendations to users. The company initially used Scala language and other frameworks, most of which have been open-sourced by Spotify. The music streaming company faced several drawbacks for choosing these tools and frameworks as they didn’t scale well.

It started the “Paved Road” concept to address the data interface problems in their ML workflow. They started using TFRecord and tf.Example formats by Google’s TensorFlow Extended (TFX) and Tensorflow Data Validation (TFDV). However, these components and tools lacked the orchestration framework. Spotify then moved to Kubeflow Pipelines in which the clusters are configured using the Terraform module. Now, the company is using Kubeflow Pipelines for managing its entire machine learning lifecycle.

Benefits of Open-Sourcing The Terraform Module

Open-sourcing the Terraform module would enable Spotify to develop and release machine learning-based services with faster time-to-market delivery.

