Skip to content

Code Review for SemanticLayerTools #1

@jdamerow

Description

@jdamerow

Requester: Malte Vogl (Max Planck Institute for the History of Science)
GitHub repo: https://github.com/maltevogl/SemanticLayerTools
Programming Language: Python
Authors and contributors: Malte Vogl (main author), Ira Kokoshko and Robert Egel (contributers)
Field: History of Science

Supporting papers:
The clustering of cocitation data is done using the Leiden Algorithm : Traag, V.A., Waltman. L., Van Eck, N.-J. (2018). From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1), 5233. 10.1038/s41598-019-41695-z

Standards/Required Domain Knowledge:
To judge the usability of the sub-module for clustering cocitation across time some familiarity with scientometrics can be useful.

Requirements:
Package installation was tested on Linux, for Mac some basic packages like cmake etc need to be installed before hand. Using the package for large corpora (~10 million publications) can be very memory intensive (~250GB) . Corpora of around ~30tsd publications and more should run on a standard laptop.

Anything else:
Due to the large size of the full package, I suggest a review of one module for the creation of cocitation networks from a corpus (https://github.com/maltevogl/SemanticLayerTools/blob/main/src/semanticlayertools/linkage/cocitation.py), or if time allows for a pipeline which involves three subpackages (create cocitaton, cluster, write reports) here (https://github.com/maltevogl/SemanticLayerTools/blob/main/src/semanticlayertools/pipelines/cocitetimeclusters.py)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions