Skip to content
jadelintott edited this page Oct 16, 2020 · 9 revisions

Winnow: Generate relevant subcorpora from your oral history collections for easier analysis.

Winnow is a tool that generates relevant subcorpora based on metadata and lists of researcher-generated keywords. Winnow provides researchers with smaller subsets of large collections of texts that they can then analyze via traditional close-reading methods or with out-of-the-box textual analysis tools or custom scripts.

Background

We are the Stanford Oral History Text Analysis Project (OHTAP). Our goal is to explore new methodologies for digital analysis of oral history transcripts; specifically, we aim to meld together qualitative and quantitative methodologies in order to take advantage of both.

In the early stages of the project, we found that one of our main pain points was that existing technologies were not scalable, nor did they have the functionality to hone in on the small portion of our very large corpus that was relevant to our research question. Specifically, we wanted a user-friendly tool that could generate relevant subsets of documents, called "subcorpora," that we would then analyze using qualitative methodologies. At first, it was easy enough to read through transcripts and flag the ones we were interested in. However, as our corpus grew, this technique became infeasible.

We then wrote Python scripts to create subcorpora of transcripts that contained certain keywords that we generated. These scripts were run through the terminal. For many oral historians, this way of running the scripts may not be easy nor intuitive. Thus, Winnow was born.

Current developmental stage

Winnow is not currently generally applicable to any set of texts and metadata (however, we are working on making it so!). Please see the pages Running the application and Functionality for information on how to run Winnow and its use cases.

Team

Our team is:

Clone this wiki locally