Skip to content

Moses07/CONAB-Agricultural-Inputs_Data-Extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONAB Agricultural Inputs Data Extractor

A Python tool to extract agricultural inputs pricing data from CONAB (Companhia Nacional de Abastecimento), the Brazilian National Supply Company.

Description

This extractor downloads and processes agricultural inputs data from CONAB's consultation portal, including:

  • Agrochemicals (Agrotóxicos): Acaricides, Fungicides, Herbicides, Insecticides, Growth Regulators
  • Fertilizers (Fertilizantes): Inoculants, Organic, Chemical
  • Propagation Materials (Material Propagativo): Seeds

Data is extracted by state, year, and month, covering multiple Brazilian states from 2019 to the current year.

Features

  • Parallel Processing: Multi-threaded downloads for efficient data extraction
  • Caching: Local cache system to avoid re-downloading existing data
  • Rate Limiting: Intelligent rate limiting and retry mechanisms to handle API restrictions
  • Robust Error Handling: Automatic retries with exponential backoff for failed requests
  • Session Management: Dynamic session refresh to handle access restrictions
  • Comprehensive Logging: Detailed logging of extraction progress and statistics

Installation

Prerequisites

  • Python 3.7 or higher
  • pip package manager

Required Dependencies

Install the required Python packages:

pip install pandas numpy requests backoff openpyxl xlrd pycalamine

Or install from requirements file (if provided):

pip install -r requirements.txt

Usage

Basic Usage

Run the extractor from the command line:

python br_conab_inputs.py

This will:

  1. Extract data for all configured groups, subgroups, states, and years
  2. Save the output to BR_CONAB_INPUTS.csv in the current directory
  3. Cache intermediate results in temp/cache/ folder

Programmatic Usage

from br_conab_inputs import ConabInputsExtractor

# Initialize the extractor
extractor = ConabInputsExtractor()

# Extract data
data = extractor.extract()

# Save to CSV
extractor.save(data)

# Or work with the DataFrame directly
print(data.head())
print(data.info())

Output

Output File

The extractor generates a CSV file (BR_CONAB_INPUTS.csv) with the following columns:

  • Year: Year of the data
  • Month: Month number (1-12)
  • State: Brazilian state code (e.g., 'SP', 'MG', 'PR')
  • Group: Input group (AGROTOXICO, FERTILIZANTES, MATERIAL PROPAGATIVO)
  • SubGroup: Input subgroup
  • Product: Product name
  • Unit: Unit of measurement
  • Value: Price/value
  • Date: Datetime column combining year and month

Temporary Files

The extractor creates a temp/ folder structure:

  • temp/cache/: Contains cached CSV files for each request (group/subgroup/state/year combination)

You can safely delete the temp/ folder to clear the cache if needed.

Configuration

States

The extractor is configured to extract data for the following Brazilian states:

  • SP, MG, PR, RS, SC, MT, MS, GO, BA, MA

To modify the states, edit the self.states list in the ConabInputsExtractor.__init__() method.

Years

Default configuration:

  • Start Year: 2019
  • End Year: Current year (dynamically set)

To modify the year range, edit self.start_year and self.end_year in the constructor.

Groups and Subgroups

The extractor supports the following groups and subgroups:

Agrochemicals (AGROTOXICO)

  • Acaricida
  • Espalhante / Adjuvante
  • Fungicida
  • Herbicida
  • Inseticida
  • Estimulante/Regulador de Crescimento

Fertilizers (FERTILIZANTES)

  • Inoculante
  • Organico
  • Quimico

Propagation Materials (MATERIAL PROPAGATIVO)

  • Sementes

Performance

  • Uses parallel processing with CPU-optimized thread pooling
  • Implements intelligent caching to avoid redundant downloads
  • Includes rate limiting to respect server constraints
  • Provides progress logging and time estimates

Error Handling

The extractor includes comprehensive error handling:

  • HTTP Errors: Automatic retry with exponential backoff for 5xx errors
  • 403 Forbidden: Session refresh and extended backoff for access restrictions
  • Empty Data: Graceful handling of empty responses
  • Excel Parsing: Multiple engine fallbacks (openpyxl, xlrd, calamine)
  • Rate Limiting: Random jitter to avoid pattern detection

Logging

The extractor provides detailed logging including:

  • Extraction progress by batch
  • HTTP response statistics
  • Data completeness verification
  • Error messages with context
  • Performance metrics

Limitations

  • Extraction time depends on data availability and server response times
  • Some requests may fail due to server-side restrictions (handled gracefully)
  • Large extractions may take considerable time and bandwidth

Troubleshooting

Common Issues

403 Forbidden Errors

  • The script automatically refreshes sessions after consecutive 403 errors
  • If persistent, you may need to adjust rate limiting or wait between runs

Empty Data Files

  • Some combinations may not have data available
  • The extractor logs warnings for missing data and continues processing

Memory Issues

  • Data is processed in batches to manage memory
  • Cache files can be deleted if disk space is limited

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Disclaimer

This tool is for educational and research purposes. Please respect CONAB's terms of service and use responsibly. The tool includes rate limiting to minimize server load.

Author

Created for extracting and analyzing CONAB agricultural inputs data.

Acknowledgments

  • CONAB (Companhia Nacional de Abastecimento) for providing the data
  • Open source community for the libraries used in this project

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages