Skip to content

How to Use

Marcel Heinz edited this page Jun 3, 2019 · 5 revisions

Reproduce the Dataset

  • Required technology:

  • How to reproduce results:

    1. The file src/data/init.py serves as the core configuration. You need to enter depth level, and root categories.
    2. Run src/mine/pipeline.py. The process creates an annotated dictionary of article titles. Most data is mined from DBpedia.
    3. Run src/check/seed.py for annotating whether an articles is a seed.
    4. src/classify/decision_tree.py configures the decision tree classifier.

Be careful when inspecting other scripts. Many scripts explore indication directly in an active learning manner.

Querying the Dataset

Having the titles as keys of article dictionaries allows convenient querying in the Python Console of Pycharm. For example:

from data import load_articledict`
ad = load_articledict()
# Get all articles with 'language' as the retrieved hypernym:
[a for a in ad if "COPHypernym" in ad[a] and "language" in ad[a]["COPHypernym"]]
# Get all articles classified as relevant for software languages.
[l for l,ld in ad.items() if ld["Class"]=='1']

Clone this wiki locally