Dados.gov.br Scraper

This script efficiently downloads all dataset metadata from Brazil's open data portal, dados.gov.br.

The metadata has many goodies such as direct links to the dataset downloads, file formats, tags, full description, etc.

It works around the API's 9999-item pagination limit by sequentially scraping smaller categories based on license type (cc-by, cc-zero, etc.). This ensures a successful download of (almost) all available metadata - scrapes 11600 out of 14666 total datasets at the time of writing this readme.

How to Run

Prerequisites:
- Python 3.11+
- uv

Installation: Clone this repository, and use uv sync to create the venv and install the necessary packages

  git clone https://github.com/pedrolabonia/dadosabertos-scraper.git
  cd dadosabertos-scraper
  uv sync

Execution: Run the scraper using the scrape command. All files will be saved to a single output directory.
- Run with defaults(recommended):
```
uv run scrape
```
- Run with custom arguments:
```
uv run scrape --page_size 500 --concurrency 20 --output_dir ./my_data
```
- See all options:
```
uv run scrape --help
```

Command-Line Arguments

Recommended a 90s timeout since the API can take a while.

Argument	Default	Description
`--page_size`	`500`	Records to fetch per API request.
`--concurrency`	`10`	Max number of parallel download requests.
`--timeout`	`90`	Timeout in seconds for each HTTP request.
`--output_dir`	`scraped_data`	Directory to save the output `.json` files.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/dadosabertos_scraper		src/dadosabertos_scraper
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dados.gov.br Scraper

How to Run

Command-Line Arguments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DebugDemocracy/dadosabertos-scraper

Folders and files

Latest commit

History

Repository files navigation

Dados.gov.br Scraper

How to Run

Command-Line Arguments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages