A comprehensive tool for generating Brazilian Portuguese lexical units for FrameNet frames using large language models. This project provides both command-line scripts and a web API for generating contextually appropriate lexical units.
- Multiple LLM Backends: Support for Sabiรก-7B, Bode-7B, and Llama 3.1-8B-Instruct models
- Web API: Flask-based REST API for integration with web applications
- Docker Support: Containerized deployment with GPU acceleration
- Portuguese Focus: Specialized for Brazilian Portuguese lexical units
- FrameNet Integration: Follows FrameNet Brasil standards and conventions
- Python 3.8+
- CUDA-compatible GPU (recommended)
- 8GB+ RAM (16GB+ recommended)
- Docker and Docker Compose (for containerized deployment)
pip install llama-cpp-python[cublas] huggingface_hub flask numpy tqdm requestsgit clone https://github.com/FrameNetBrasil/lusuggestion.git
cd lusuggestion
pip install -r requirements.txtFor GPU Systems:
git clone https://github.com/FrameNetBrasil/lusuggestion.git
cd lusuggestion
docker-compose up --buildFor CPU-Only Systems:
git clone https://github.com/FrameNetBrasil/lusuggestion.git
cd lusuggestion
docker-compose -f docker-compose.cpu.yml up --build# With automatic model download (recommended)
python framenet_lu_generator_llama.py parsed_prompt.txt --download
# Using specific model path (if already downloaded)
python framenet_lu_generator_llama.py parsed_prompt.txt --model-path "./models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"# With custom parameters
python framenet_lu_generator_llama.py parsed_prompt.txt --download \
--temperature 0.1 \
--max-tokens 2048 \
--output custom_results.json
# Different model quantization
python framenet_lu_generator_llama.py parsed_prompt.txt --download \
--model-file Meta-Llama-3.1-8B-Instruct-Q8_0.gguf| Option | Description | Default |
|---|---|---|
--model-path PATH |
Path to local GGUF model file | Auto-detected |
--model-file FILENAME |
Specific GGUF model file | Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf |
--download |
Download model from HuggingFace Hub | False |
--output FILENAME |
Output JSON file path | Auto-generated |
--temperature FLOAT |
Sampling temperature (0.0-1.0) | 0.1 |
--max-tokens INT |
Maximum tokens to generate | 2048 |
--cache-dir PATH |
Model cache directory | ./models |
# Local development
python app.py --model-path /path/to/model.gguf --port 5000
# With Docker
docker-compose up
# With custom configuration
python app.py --model-path /path/to/model.gguf --port 8080 --host 0.0.0.0Simple web interface for testing the API.
Returns the health status of the API and model.
Response:
{
"status": "healthy",
"model_status": "loaded",
"model_path": "/path/to/model.gguf"
}Main endpoint for generating lexical units.
Request:
{
"frame": "Artefato",
"frame_definition": "Um artefato feito ou modificado por uma entidade inteligente para ser destinado a um determinado tipo de Uso.",
"target_count": 10,
"exclusion_list": ["anilha", "arame", "aspirador"],
"temperature": 0.1,
"max_tokens": 2048
}Response:
{
"frame": "Artefato",
"total": 10,
"items": [
{
"lemma": "ferramenta",
"pos": "NOUN",
"mwe": false,
"gloss_pt": "objeto usado para realizar trabalho especรญfico",
"example_pt": "A ferramenta foi criada para facilitar o trabalho.",
"confidence": 0.95
}
]
}# Generate lexical units
curl -X POST http://localhost:5000/generate \
-H "Content-Type: application/json" \
-d '{
"frame": "Artefato",
"frame_definition": "Um artefato feito ou modificado por uma entidade inteligente.",
"target_count": 5,
"exclusion_list": ["anilha", "arame"],
"temperature": 0.1
}'
# Health check
curl http://localhost:5000/healthimport requests
# Generate lexical units
response = requests.post('http://localhost:5000/generate', json={
"frame": "Artefato",
"frame_definition": "Um artefato feito ou modificado por uma entidade inteligente.",
"target_count": 10,
"exclusion_list": ["anilha", "arame", "aspirador"],
"temperature": 0.1
})
result = response.json()
print(f"Generated {result['total']} lexical units")# Build and run
docker-compose up --build
# Run in background
docker-compose up -d
# View logs
docker-compose logs -f# With Nginx reverse proxy
docker-compose --profile production up -d
# Custom configuration
MODEL_PATH=/custom/path/model.gguf PORT=8080 docker-compose up| Variable | Description | Default |
|---|---|---|
MODEL_PATH |
Path to model file in container | /srv/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf |
PORT |
API server port | 5000 |
HOST |
API server host | 0.0.0.0 |
| Model | Size | Quality | Speed | VRAM | Description |
|---|---|---|---|---|---|
| Llama 3.1-8B-Instruct | 4.9-8.5GB | Best | Fast-Slow | 6-10GB | Recommended, multilingual |
| Bode-7B | 3.8-7GB | Good | Fast | 5-8GB | Portuguese-specific |
| Sabiรก-7B | 3.8-7GB | Fair | Fast | 5-8GB | Portuguese base model |
| Quantization | File Size | Quality | Speed | Memory |
|---|---|---|---|---|
| Q4_K_M | ~5GB | Good | Fast | ~6GB |
| Q5_K_M | ~6GB | Better | Medium | ~7GB |
| Q8_0 | ~8GB | Best | Slower | ~10GB |
The API returns appropriate HTTP status codes and error messages:
200 OK- Successful generation400 Bad Request- Invalid input parameters404 Not Found- Endpoint not found500 Internal Server Error- Model or generation error
Example error response:
{
"error": "Temperature must be between 0.0 and 1.0"
}Models are automatically downloaded from HuggingFace Hub when using the --download flag. The default model repository is bartowski/Meta-Llama-3.1-8B-Instruct-GGUF.
- Temperature: Controls randomness (0.0 = deterministic, 1.0 = very random)
- Max Tokens: Maximum number of tokens to generate
- Target Count: Number of lexical units to generate
- Exclusion List: Existing lexical units to avoid
The application automatically uses GPU acceleration when available:
- CUDA-compatible GPU recommended
- Install
llama-cpp-python[cublas]for GPU support - Docker containers include GPU support via nvidia-docker
The application works perfectly on CPU-only systems with automatic fallback:
Recommended Configuration:
# Use CPU-optimized Docker compose
docker-compose -f docker-compose.cpu.yml up --build
# Or run locally
python app.py --download --model-file Meta-Llama-3.1-8B-Instruct-Q4_0.ggufCPU Performance Tips:
- Use Q4_K_M or Q4_0 quantization for best CPU performance
- Ensure 8GB+ RAM available (16GB+ recommended)
- Expect 2-5 minutes per generation (vs 30-60 seconds on GPU)
- Consider reducing
target_countto 5-10 for faster responses
For GPU Systems:
- Use appropriate quantization levels based on available VRAM
- Q4_K_M recommended for most use cases (good quality/speed balance)
- Q8_0 for maximum quality with sufficient VRAM
For CPU Systems:
- Q4_K_M or Q4_0 recommended for best speed/quality balance
- Ensure sufficient system RAM (1.5x model size + 2GB overhead)
- Monitor system memory usage during generation
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- FrameNet Brasil team for linguistic expertise
- Meta for Llama 3.1 model
- bartowski for GGUF model quantizations
- Hugging Face for model hosting and tools
For questions and support:
- GitHub Issues: lusuggestion/issues
- FrameNet Brasil: framenetbr@gmail.com
FrameNet Brasil | Advancing Portuguese computational linguistics through AI-powered lexical unit generation.