NOTE!! THE Project is not functional yet and only currently published for version control.
This project is a playground for experimenting with large language models (LLMs). It supports loading any model, performing quantization, converting between formats, running the model, and chatting with it. The project is built using Python and leverages ONNX, llama-cpp, PyTorch, and Streamlit.
- Load any LLM model in various formats (GGUF, ONNX, SafeTensors)
- Support for local models and API-based models (Ollama)
- Perform quantization to reduce model size and increase inference speed
- Convert models between different formats
- Run models on CPU or GPU depending on device availability
- Interactive chat interface with conversation history
- Support for multiple model backends (llama.cpp, ONNX, PyTorch)
- Llama 3.2 (1B and other variants)
- Phi-3.5 and Phi-4 models (including ONNX optimized versions)
- Any model compatible with llama.cpp, ONNX Runtime, or PyTorch
- Python 3.8 or later
- pip (Python package manager)
- For Apple Silicon Macs, llama.cpp with GGUF format is recommended
- Clone this repository:
git clone <repository-url>
- Navigate to the project directory:
cd AI-Playground - Install the required dependencies:
pip install -r requirements.txt
Run the Streamlit app:
python start_app.pyor
streamlit run app.pyThe app includes a model conversion page that allows you to convert between different model formats:
- ONNX format for compatibility with ONNX Runtime
- GGUF format for use with llama.cpp
- SafeTensors format for PyTorch
- models - Contains local model files and their metadata
/api- API configuration for remote models (like Ollama)- src - Source code for the application
/conversion- Model conversion utilities/llm- LLM implementation and interfaces/quantization- Quantization tools/ui- User interface components/utils- Helper functions and utilities
Feel free to fork this repository and submit pull requests. All contributions are welcome!