This folder contains generic scripts for manipulating, processing, validating individual ELTeC repositories
-
REFRESH Updates your local copy of each repo by doing "git pull"; writes a new version of the file driver.tei which is used by subsequent processes. Run
python3 Scripts/refresh.py(or bash scriptrefresh) -
SUMMARIZE Processes the driver.tei file made by REFRESH to create a new summary of the whole collection of repos as an index.html file stored in your local copy of the distantreading.github.io repository. Run
python3 Scripts/summarize.py(or bash scriptsummarize) -
REPORT Processes the driver.tei file made by REFRESH to create a new index.html file listing titles etc. for each repository, stored in your local copy of the distantreading.github.io repository. Run
python3 Scripts/report.py(or bash scriptreport) -
EXPOSE Processes the driver.tei file made by REFRESH to create HTML link files for each title in each repository, stored in your local copy of the distantreading.github.io repository. These link files transform and display the source XML files direct from the main repository, using CSS and Javascript files stored in the distantreading.github.io repository. Run
python3 Scripts/expose.py -
CHECKREPO Uses the XSLT script
checkUp.xslto validate each text in the level0 and level1 folders of the specified repo and prepare it for release. See below for details of the checks and modifications it carries out. A new version of each file is written to a folder calledOutin the repo. Runpython3 Scripts/checkRepo.py xxxto check/update the repository for language code xxx -
REFRESHREPO Does the equivalent of REFRESH, REPORT, EXPOSE for a single repository only. Run
python3 Scripts/refreshRepo.py xxxto update repository for language code xxx. Note that the index page produced by SUMMARIZE is not updated by this script.
N extremely B if you run any of these except the first, don't forget to commit and push the changes if you want them to be visible at the website https://distantreading.github.io/ELTeC !!
These scripts assume you can run saxon from the command line, and that you have a local installation of Rscript.
You will need to edit these scripts:
- to specify path names for your local installation
- if you add a new language repository
The XSLT script checkUp.xsl checks for some common problems in the way ELTeC texts are encoded, applying fixes wherever possible, and producing a new version of the text with a modified publicationStmt (inter alia) ready for Zenodo. It can be run against a single file for testing purposes, but it's meant to be used on a whole repository. To run it on the zzz language repository, use a command line like this python Scripts/checkRepo.py zzz 2>zzzLog.txt This will save the output from the script (it's quite chatty) in a file called zzzLog.txt for you to scan through looking for surprises.
For details of the checks and fixes applied, see the XSLT source code.
Note that this script only validates against the RELAXNG schema; the additional schematron checks defined by our ODD are only implemented when a text is opened in oXygen or Atom.
Makefile- copy this into the root of your local copy of an ELTEC repo
- edit LOCAL to point to the path for your local copy of the repo
- edit LANG to match the language of your repo (e.g.
engfor English) - edit the PREFIX to match the prefix of your text files (e.g.
ENG) - run
make driverto generate a driver file which will process all available text files - run
make validateto check validity of each text file individually - run
make reportto generate a balance report using thereporter.xslstylesheet
To issue a complete new update run the scripts in the order shown above, i.e.
python refresh.py python report.py python summarize.py python expose.py
These scripts will update your local copy of the distantreading.github.io pages. You still need to push them to the repo to see changes on the web.
If you've added any new texts since the last time, you will also need to add the new files to the distantreading.github.io/ELTeC/xxx language repo.