Skip to content

bash create_datasets_from_start.sh Throws 404 error for downloading wikipedia dataset #16

@ghost

Description

bash create_datasets_from_start.sh throws error downloading datasets when https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 is updating (shows 404 error).
On top of that in order to change that, you cannot just change it in WikiDownloader.py, it is hardcoded way back in /opt/conda/lib/python3.8/site-packages/lddl/download/wikipedia.py. I had to change in that file with another available dataset.
Used dataset: https://dumps.wikimedia.your.org/enwiki/20220820/enwiki-20220820-pages-articles.xml.bz2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions