A Python tool to extract agricultural inputs pricing data from CONAB (Companhia Nacional de Abastecimento), the Brazilian National Supply Company.
This extractor downloads and processes agricultural inputs data from CONAB's consultation portal, including:
- Agrochemicals (Agrotóxicos): Acaricides, Fungicides, Herbicides, Insecticides, Growth Regulators
- Fertilizers (Fertilizantes): Inoculants, Organic, Chemical
- Propagation Materials (Material Propagativo): Seeds
Data is extracted by state, year, and month, covering multiple Brazilian states from 2019 to the current year.
- Parallel Processing: Multi-threaded downloads for efficient data extraction
- Caching: Local cache system to avoid re-downloading existing data
- Rate Limiting: Intelligent rate limiting and retry mechanisms to handle API restrictions
- Robust Error Handling: Automatic retries with exponential backoff for failed requests
- Session Management: Dynamic session refresh to handle access restrictions
- Comprehensive Logging: Detailed logging of extraction progress and statistics
- Python 3.7 or higher
- pip package manager
Install the required Python packages:
pip install pandas numpy requests backoff openpyxl xlrd pycalamineOr install from requirements file (if provided):
pip install -r requirements.txtRun the extractor from the command line:
python br_conab_inputs.pyThis will:
- Extract data for all configured groups, subgroups, states, and years
- Save the output to
BR_CONAB_INPUTS.csvin the current directory - Cache intermediate results in
temp/cache/folder
from br_conab_inputs import ConabInputsExtractor
# Initialize the extractor
extractor = ConabInputsExtractor()
# Extract data
data = extractor.extract()
# Save to CSV
extractor.save(data)
# Or work with the DataFrame directly
print(data.head())
print(data.info())The extractor generates a CSV file (BR_CONAB_INPUTS.csv) with the following columns:
Year: Year of the dataMonth: Month number (1-12)State: Brazilian state code (e.g., 'SP', 'MG', 'PR')Group: Input group (AGROTOXICO, FERTILIZANTES, MATERIAL PROPAGATIVO)SubGroup: Input subgroupProduct: Product nameUnit: Unit of measurementValue: Price/valueDate: Datetime column combining year and month
The extractor creates a temp/ folder structure:
temp/cache/: Contains cached CSV files for each request (group/subgroup/state/year combination)
You can safely delete the temp/ folder to clear the cache if needed.
The extractor is configured to extract data for the following Brazilian states:
- SP, MG, PR, RS, SC, MT, MS, GO, BA, MA
To modify the states, edit the self.states list in the ConabInputsExtractor.__init__() method.
Default configuration:
- Start Year: 2019
- End Year: Current year (dynamically set)
To modify the year range, edit self.start_year and self.end_year in the constructor.
The extractor supports the following groups and subgroups:
Agrochemicals (AGROTOXICO)
- Acaricida
- Espalhante / Adjuvante
- Fungicida
- Herbicida
- Inseticida
- Estimulante/Regulador de Crescimento
Fertilizers (FERTILIZANTES)
- Inoculante
- Organico
- Quimico
Propagation Materials (MATERIAL PROPAGATIVO)
- Sementes
- Uses parallel processing with CPU-optimized thread pooling
- Implements intelligent caching to avoid redundant downloads
- Includes rate limiting to respect server constraints
- Provides progress logging and time estimates
The extractor includes comprehensive error handling:
- HTTP Errors: Automatic retry with exponential backoff for 5xx errors
- 403 Forbidden: Session refresh and extended backoff for access restrictions
- Empty Data: Graceful handling of empty responses
- Excel Parsing: Multiple engine fallbacks (openpyxl, xlrd, calamine)
- Rate Limiting: Random jitter to avoid pattern detection
The extractor provides detailed logging including:
- Extraction progress by batch
- HTTP response statistics
- Data completeness verification
- Error messages with context
- Performance metrics
- Extraction time depends on data availability and server response times
- Some requests may fail due to server-side restrictions (handled gracefully)
- Large extractions may take considerable time and bandwidth
403 Forbidden Errors
- The script automatically refreshes sessions after consecutive 403 errors
- If persistent, you may need to adjust rate limiting or wait between runs
Empty Data Files
- Some combinations may not have data available
- The extractor logs warnings for missing data and continues processing
Memory Issues
- Data is processed in batches to manage memory
- Cache files can be deleted if disk space is limited
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
This tool is for educational and research purposes. Please respect CONAB's terms of service and use responsibly. The tool includes rate limiting to minimize server load.
Created for extracting and analyzing CONAB agricultural inputs data.
- CONAB (Companhia Nacional de Abastecimento) for providing the data
- Open source community for the libraries used in this project