Image obtained from Udacity’s Data Science Nanodegree Course

Data Science, Music Streaming, and Big Data. Behold the Sparkify.

Motivation

Introduction

Code

Imports and Setup

ETL — Extract, Transform, and Load

sparkify = spark.read.json('mini_sparkify_event_data.json')
sparkify = sparkify.where("userId != ''")

EDA — Exploratory Data Analysis

#Pivot Transformation
page_pivot = make_pivot(sparkify, 'page', fill_na=True)
#Renaming column to Define Churn
page_pivot = page_pivot.withColumnRenamed('Cancellation
Confirmation','Churn')

Complete Machine Learning Pipeline

Conclusion

References

Mechanical Engineering Undergraduate who loves Python and Data Science.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store