How to Use

Reproduce the Dataset

Required technology:
- Too many to list here, inspect the below referenced files.
- Download Stanford Core NLP: https://stanfordnlp.github.io/CoreNLP/index.html#download. You need to run a CoreNLP Server for reproducing results.
How to reproduce results:
1. The file src/data/init.py serves as the core configuration. You need to enter depth level, and root categories.
2. Run src/mine/pipeline.py. The process creates an annotated dictionary of article titles. Most data is mined from DBpedia.
3. Run src/check/seed.py for annotating whether an articles is a seed.
4. src/classify/decision_tree.py configures the decision tree classifier.

Be careful when inspecting other scripts. Many scripts explore indication directly in an active learning manner.

Querying the Dataset

Having the titles as keys of article dictionaries allows convenient querying in the Python Console of Pycharm. For example:

from data import load_articledict`
ad = load_articledict()
# Get all articles with 'language' as the retrieved hypernym:
[a for a in ad if "COPHypernym" in ad[a] and "language" in ad[a]["COPHypernym"]]
# Get all articles classified as relevant for software languages.
[l for l,ld in ad.items() if ld["Class"]=='1']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Use

Reproduce the Dataset

Querying the Dataset

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WikiOnto

Clone this wiki locally