dgadiraju
11/5/2017 - 12:34 AM

This is to understand the formatting of the dates.

This is to understand the formatting of the dates.

/*
hadoop fs -ls -h /public/crime/csv
spark-shell --master yarn \
  --conf spark.ui.port=12345 \
  --num-executors 6 \
  --executor-cores 2 \
  --executor-memory 2G
*/

// Solution using Core API
val crimeData = sc.textFile("/public/crime/csv")
val header = crimeData.first
val crimeDataWithoutHeader = crimeData.filter(criminalRecord => criminalRecord != header)

val rec = crimeDataWithoutHeader.first
val distinctDates = crimeDataWithoutHeader.
  map(criminalRecord => criminalRecord.split(",")(2).split(" ")(0)).
  distinct.
  collect.
  sorted
distinctDates.foreach(println)