-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Winnow is a tool that generates relevant subcorpora based on metadata and lists of researcher-generated keywords. Winnow provides researchers with smaller subsets of large collections of texts that they can then analyze via traditional close-reading methods or with out-of-the-box textual analysis tools or custom scripts.
We are the Stanford Oral History Text Analysis Project (OHTAP). Our goal is to explore new methodologies for digital analysis of oral history transcripts; specifically, we aim to meld together qualitative and quantitative methodologies in order to take advantage of both.
In the early stages of the project, we found that one of our main pain points was that existing technologies were not scalable, nor did they have the functionality to hone in on the small portion of our very large corpus that was relevant to our research question. Specifically, we wanted a user-friendly tool that could generate relevant subsets of documents, called "subcorpora," that we would then analyze using qualitative methodologies. At first, it was easy enough to read through transcripts and flag the ones we were interested in. However, as our corpus grew, this technique became infeasible.
We then wrote Python scripts to create subcorpora of transcripts that contained certain keywords that we generated. These scripts were run through the terminal. For many oral historians, this way of running the scripts may not be easy nor intuitive. Thus, Winnow was born.
Winnow is not currently generally applicable to any set of texts and metadata (however, we are working on making it so!). Please see the pages Running the application and Functionality for information on how to run Winnow and its use cases.
Our team is:
- Professor Estelle Freedman, Edgar E. Robinson Professor in U.S. History at Stanford University
- Dr. Natalie Jean Marine-Street, Oral History Program Manager at the Stanford Historical Society
- Dr. Katherine McDonough, Senior Research Associate at The Alan Turing Institute
- Cheng-Hau Kee, Stanford University '19
- Hilary Sun, Stanford University B.S. '18, M.S. '19
- Preston Carlson, Stanford University '21
- Jade Lintott, Stanford University '21
This code belongs to the Stanford Oral History Text Analysis Project and is licensed under The MIT License.