Skip to content

LDFLK/doctracer

Repository files navigation

doctracer

Prerequisites

mamba create -n doctracer_env python=3.9

To install required libraries run:

pip install -e .

Also add your OpenAI API key:

export OPENAI_API_KEY=`openai_key`

Setup Neo4j

Environment Variables

Before using the Neo4jInterface, ensure the following environment variables are set:

  • NEO4J_URI: The URI of your Neo4j database.
  • NEO4J_USER: The username for your Neo4j database.
  • NEO4J_PASSWORD: The password for your Neo4j database.

You can set these variables in your shell like this:

export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j_username
export NEO4J_PASSWORD=your_password
docker build --build-arg NEO4J_USER=$NEO4J_USER --build-arg NEO4J_PASSWORD=$NEO4J_PASSWORD -t doctracer_neo4j .

Ensure you have a .env file in your project directory with the following content:

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password

Running the Docker Container

To run the Docker container with the environment variables from the .env file, use the following command:

docker run -p 7474:7474 -p 7687:7687 --name doctracer_neo4j_server \
    --env-file .env \
    -v neo4j_data:/data doctracer_neo4j:latest
docker exec -it doctracer_neo4j_server bash
neo4j-admin set-initial-password test

Running Doctracer

$ doctracer --help
Usage: doctracer [OPTIONS] COMMAND [ARGS]...

  Doctracer command line interface.

Options:
  --help  Show this message and exit.

Commands:
  extract  Extract information from gazette PDFs.

To test extragazette amendment extraction try:

doctracer extract --type extragazette_amendment --input data/testdata/sample_gazette.pdf --output output.json

To test extragazette table extraction try:

doctracer extract --type extragazette_table --input data/gzt_images --output output.txt
{
  "metadata": {
    "Gazette ID": "2382/35",
    "Gazette Published Date": "2024-05-03",
    "Gazette Published by": "Authority"
  },
  "changes": {
    "RENAME": [],
    "MERGE": [],
    "MOVE": [],
    "ADD": [
      {
        "Parent Name": "Minister of Defence",
        "Child Name": "National Hydrographic Act No. 7 of 2024",
        "Type": "legislation",
        "Date": "2024-05-03"
      },
      {
        "Parent Name": "Minister of Education",
        "Child Name": "Formulation and implementation of a national policy for pre-schools",
        "Type": "policy",
        "Date": "2024-05-03"
      }
    ],
    "TERMINATE": [
      {
        "Type": "policy",
        "Date": "2024-05-03"
      }
    ]
  }
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •