The chanalysis Python scripts in this repository offer a convenient approach to map trends, shifts, and other characteristics with 4chan data, or any data with a similar structure. The scripts can help moving beyond the consideration of the anonymous imageboard as an un-researchable and homogeneous blob.
Note that these scripts are a work in progress and may contain bugs.
The current chanalysis scripts include:
createHistogram.pyFrequency histograms: Visualise the occurances of a particular word over timegetReplies.pyIdentifying popular posts: Show which posts are most replied to and thus garnered attention.createTokens.pyTokenization: Tokenise the text (lemmatization and stemming).createLongString.pyCreating a text file: Takes text in a csv column and outputs as a long text in a .txt file. Useful in tandem with jasondavies.com/wordtree/.getImages.pyDownloading images: Download images from a set of postscreateImageWall.pyMaking an image wall: Use downloaded images to create an image wall.getTfidf.pyGet popular terms: (work in progress) Outputs popular words in the dataset via tf-idf.
More scripts will be added later.
The scripts require Python 3.
Once you have installed Python 3 on your computer, clone or download this repository.
Go to the folder of the scrips in a terminal and install the requirements (python -m pip install -r requirements.txt). When executing the scripts (e.g. python3 createLongString.py) without parameters, it will show the functions and various options. Run the scripts e.g. like so: python createLongString.py --source=input/star-wars.csv.
All the resulting data files are saved in the output/ folder.