Pre-commit hook for mirroring Word (docx) files into plain text files (using Pandoc).
This pre-commit hook provides a solution for organizations that manage Word (.docx) documents with Git and GitHub. With this hook, whenever a Word document is committed or updated in a Git repository, a plain text version is also created. You can use this plain-text mirror to facilitate GitHub Pull Request reviews.
At the root of your document's Git repository, add a file named .pre-commit-config.yaml with the following contents:
repos:
- repo: https://github.com/jsickcodes/pre-commit-docx-plain
rev: 0.3.0
hooks:
- id: docxplain
Next, you'll need to install pre-commit (if you haven't already):
pip install -U pre-commit
Initialize the pre-commit hooks in the repository itself:
pre-commit install
If the repository has an existing Word document, it is a good idea to create the mirrored plain text file now:
pre-commit run --all-files
Commit the plain text (.txt) file that is generated.
If you are contributing to a repository using pre-commit-docx-plain, you will also need to install pre-commit itself and install the pre-commit hooks in your local clone of the repository:
pre-commit install -U pre-commit pre-commit install
Now, when you update and commit changes to the Word file in your repository, pre-commit will run the pre-commit-docx-plain hook and generate a new or updated mirror of the file in plain text.
Use git add to stage the plain text file and try your git commit again.
On this second try, the plain text mirror file should be in sync with the Word file, and the commit can go ahead.
You can run pre-commit-docx-plain in GitHub Actions to ensure that the plain-text mirror file is always up-to-date.
If the repository does not already have a GitHub Actions workflow, create a file with the path .github/workflows/ci.yaml with the following contents:
name: CI
'on':
pull_request:
push:
branches: [main]
jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
- name: Install pandoc
run: brew install pandoc
- name: Run pre-commit hooks
uses: pre-commit/action@v2.0.3
This workflow will generate a build "failure" if the plain-text mirror file is out of date with the Word file in the repository — as might happen if a contributor did not install pre-commit locally.
To avoid complexities related to installing pre-commit, the GitHub Actions workflow can be configured to automatically generate, commit, and push updates to the plain text mirror.
The .github/workflows/ci.yaml file:
name: CI
'on':
pull_request:
push:
branches: [main]
jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v2
- name: Install pandoc
run: brew install pandoc
- name: Run pre-commit hooks
uses: pre-commit/action@v2.0.3
with:
token: ${{ secrets.GITHUB_TOKEN }}
Note that this workflow can only run with private repositories.
The GITHUB_TOKEN secret is not available to public forks.
When using this workflow, contributors need to either pull down the plain text file update to their local branch, or be prepared to use a forced push (git push --force) because their branch is "behind" the GitHub origin.
This pre-commit hook works out of the box, but does allow for some customization.
By default, if the Word file is named document.docx, the plain text mirror file is named document.txt.
However, you can customize the suffix of the file name by setting a --suffix command-line option:
repos:
- repo: https://github.com/jsickcodes/pre-commit-docx-plain
rev: 0.3.0
hooks:
- id: docxplain
args:
- "--suffix"
- ".extracted.txt"
You can add a header to the plain text file's content by setting the --header command-line option
This is useful for explaining that the file is autogenerated:
repos:
- repo: https://github.com/jsickcodes/pre-commit-docx-plain
rev: 0.3.0
hooks:
- id: docxplain
args:
- "--header"
- "THIS FILE IS AUTOGENERATED"
You can also insert the name of the source docx file using Python format string syntax and the docx template variable:
repos:
- repo: https://github.com/jsickcodes/pre-commit-docx-plain
rev: 0.3.0
hooks:
- id: docxplain
args:
- "--header"
- "This file is autogenerated from {docx}. Do not edit."
From the pull request:
- Update the changelog
- Update the version numbers in the
.pre-commit-config.yamlcode samples in the README. - Update the version in setup.cfg.
Next, merge the PR to the main branch once checks pass.
Finally, create a Release using the GitHub Release UI from the main branch. The tag name should be the semantic version set in the first step.
pre-commit-docx-plain is developed and maintained by J.Sick Codes Inc.