Using Pyspark to work with Hamlet text: Import/Clean/Transform Data, EDA, NLP, and Sentiment Analysis
TODO:
- Update README
- Import Data
- Clean/Transform Data
- EDA
- NLP/Sentiment Analysis
Resources:
- Pyspark Udemy course
- Project Gutenberg Hamlet text
- Folger's Online Shakespeare Hamlet with line numbers
- Wolfram Hamlet Data
- SO: SparkFiles for data from HTTP
- SO: id column
- SO: slicing a pyspark df
- SO: pyspark column with conditional values
- SO: create column using regular expression matching
- SO: replace column value conditionally
- Mode Analytics: Window Functions
- SO: Pyspark window functions
- Apache Drill: SQL Window Functions
- SO: Tracking previous row's value, consecutive zeros
- SO: Comparing to SQL null is undefined
- Blog article about SQL NULL
- SO: Why != not working in pyspark