Big Twitter Clustering & Classification

Big Twitter Clustering & Classification

in
  • Streamed tweets into AWS S3 with Kinesis Firehose and combined it with a larger 55 mil.-tweet dataset (Not covered in this repo)
  • Utilized PySpark in DataBricks to build custom PySpark transformers, label sentiment with SparkNLP/VADER, explore SparkML RandomForest and Logistic Regression classifiers, and to perform Latent Drichlet Allocation topic modelling
  • Visualized results in AWS QuickSight through an Athena pipeline

Github