Skip to content

Testing

Marcel Heinz edited this page Aug 1, 2018 · 3 revisions

Evaluating against Infoboxes

The indicator that relies on infobox template references is the most reliable indicator. We use this observation by determining three sets of infoboxes:

  • The positive set contains all template names that give a clear indication on a software language knowledge trace.
  • The negative set contains all template names that do not appear in an article with software language knowledge.
  • The maybe set does not give a clear indication as names might appear in articles with software language knowledge and in articles without software language knowledge.

Positive Infobox Template References

The positive infobox template references are:

  • programming language
  • file format

Distinguishing between Maybe Negative and Definite Negative

The set of non-positive infobox template references is much larger. We explored all articles up to a depth of 8 under our chosen root categories and found 684 infobox template references that are not positive. We save each template reference together with its article names where they appear at https://github.com/softlang/wikionto/blob/master/data/seed_neg_pre.json .

For each non-positive infobox template reference, we pick a single random article and manually classify it as providing software language knowledge or not. This manual annotation is persisted at https://github.com/softlang/wikionto/blob/master/data/seed_neg.json

We gain the following infobox template references that are just sometimes appear in software language articles. In most cases, the article described an entity that could be interpreted as a language, but this interpretation is not obvious and has to be reviewed by many more experts to reach consensus.

From the exploration of the seed set, we already knew the following references that appear in software language articles.

  • software
  • technology standard
  • software license

Other Indicator vs Indicator

We systematically compare the results of single indicators.

Clone this wiki locally