EdgeGPT with Local Models

EdgeGPT is a chat application that uses local LLM models for inference. This project combines a React frontend with a FastAPI Python backend to serve local AI models.

Features

Chat with local Mistral-7B and CodeLlama-7B models
Switch between text and code generation models seamlessly
Enhanced sequential thinking for step-by-step problem solving
Responsive UI built with React, TypeScript, and Tailwind CSS
Streaming responses for real-time interaction
Conversation history management and persistence

Codebase Structure

The project is structured as follows:

EdgeGPT/
├── src/                      # Frontend React code
│   ├── components/           # UI components
│   ├── utils/                # Utility functions
│   ├── hooks/                # React hooks
│   ├── lib/                  # Library code
│   ├── providers/            # React context providers
│   ├── pages/                # App pages
│   ├── App.tsx               # Main React application component
│   ├── ChatInterface.tsx     # Chat interface component (both in root and src)
│   ├── main.tsx              # Main entry point for React
│   ├── sequentialThinkingServer.ts # Sequential thinking server implementation
│   └── index.css             # Global CSS styles
├── public/                   # Static assets
├── models/                   # Local model files (not included in repo)
├── chat_history/             # Saved chat conversations
├── mcp/                      # Model Context Protocol related resources
│   └── thinking/             # Sequential thinking implementation resources
├── run.py                    # FastAPI backend server for model inference
├── server.js                 # Node.js server for additional features
├── modelService.ts           # TypeScript service for model interaction
├── start-servers.ps1         # PowerShell script to start all servers on Windows
├── start-servers.sh          # Shell script to start all servers on macOS/Linux
├── package.json              # Node.js dependencies
├── tailwind.config.ts        # Tailwind CSS configuration
├── vite.config.ts            # Vite build configuration
└── README.md                 # Project documentation

Clean-up Information

The codebase has been cleaned up to:

Remove duplicate files (ChatInterface.tsx, modelService.ts)
Remove Python cache files (pycache/)
Remove temporary installation files (get-pip.py)
Update .gitignore to exclude appropriate files
Organize the codebase structure for better maintainability

Requirements

Frontend

Node.js 16+
npm or yarn
TypeScript
Vite

Backend (Local Models)

Python 3.9+
FastAPI
uvicorn
llama-cpp-python

Installation

Frontend Setup

Install dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```

Local Models Setup

Navigate to the models directory:
```
cd models
```
Create a virtual environment (optional but recommended):
```
python -m venv venv
```
Activate the virtual environment:
- Windows: venv\Scripts\activate
- macOS/Linux: source venv/bin/activate

Install required packages:

pip install fastapi uvicorn pydantic psutil

Install llama-cpp-python:

Windows (with prebuilt wheel):

pip install llama_cpp_python-0.2.90-cp312-cp312-win_amd64.whl

macOS/Linux:
```
pip install llama-cpp-python
```

Download the required model files:
- Create a models directory in the project root if it doesn't exist
- Download the following models from Hugging Face or other trusted sources:
  - Mistral-7B-Instruct: mistral-7b-instruct-v0.1.Q4_K_M.gguf
  - CodeLlama-7B-Instruct: codellama-7b-instruct.Q4_K_M.gguf
- Place the downloaded model files in the models directory
Example download commands (replace with actual URLs):
```
# For Mistral-7B
wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf -O models/mistral-7b-instruct-v0.1.Q4_K_M.gguf

# For CodeLlama-7B
wget https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf -O models/codellama-7b-instruct.Q4_K_M.gguf
```

Verify model paths in run.py:

Open run.py in an editor

Ensure the model paths match your downloaded model filenames:

# Example paths in run.py
TEXT_MODEL_PATH = "models/mistral-7b-instruct-v0.1.Q4_K_M.gguf"
CODE_MODEL_PATH = "models/codellama-7b-instruct.Q4_K_M.gguf"

Running the Application

Starting All Servers at Once

You can start all servers at once using the provided scripts:

Windows:
```
./start-servers.ps1
```
macOS/Linux:
```
./start-servers.sh
```

Manual Startup Process

If you prefer to start each component individually:

Start the Local Models API Server:
```
python run.py
```
The server will run on http://localhost:8000
Start the Sequential Thinking Server:
```
npm run thinking-server
```
The thinking server will run on http://localhost:8001
Start the Frontend:
```
npm run dev
```
The frontend will be available at http://localhost:5173
Access the Application: Open your browser and navigate to http://localhost:5173

npm run dev cd models;python run.py python -m uvicorn run:app --host 0.0.0.0 --port 8000 --reload npm run thinking-server .\start-AgenTick-server.ps1

Configuration

By default, the application connects to the following endpoints:

Model API: http://localhost:8000
Thinking Server: http://localhost:8001

If you need to change these, you can modify the connection settings in:

src/lib/api.ts for API endpoints
src/sequentialThinkingServer.ts for thinking server configuration

Troubleshooting Common Issues

Model Loading Errors:
- Verify model files exist in the models/ directory
- Check that file paths in run.py match your actual model filenames
- Ensure you have enough RAM (at least 8GB recommended for 7B models)
Connection Errors:
- Verify all servers are running
- Check console outputs for error messages
- Ensure ports 8000, 8001, and 5173 are not in use by other applications
Performance Issues:
- Close unnecessary applications to free up memory
- Consider using smaller model quantizations if responses are too slow

Verifying Full Installation

Once you have all components running, perform these checks to verify everything is working correctly:

API Server Test:
```
curl http://localhost:8000/status
```
You should receive a JSON response with server status information.
Thinking Server Test:
```
curl http://localhost:8001/health
```
You should receive a response indicating the thinking server is healthy.
End-to-End Test:
- Open the web interface at http://localhost:5173
- Start a new conversation
- Type a simple query like "Hello, how are you?"
- Verify you receive a response from the model
- Try enabling sequential thinking for a more complex query

If all components are working correctly, you should be able to:

Switch between text and code models
Send messages and receive responses
Use sequential thinking for complex queries
Save and load conversations

Model Information

The application uses two quantized models:

Mistral-7B-Instruct (Text generation)
- Used for general text responses
- Optimized for instruction following
- File: mistral-7b-instruct-v0.1.Q4_K_M.gguf
CodeLlama-7B-Instruct (Code generation)
- Specialized for generating code samples
- Responds to programming-related queries
- File: codellama-7b-instruct.Q4_K_M.gguf

Switching Models

You can switch between the text and code models using the dropdown in the chat header. The application will automatically use the appropriate model based on your selection.

API Endpoints

The FastAPI server provides the following endpoints:

GET / - Check server status and current model
GET /status - Get server status information including memory usage and uptime
POST /generate - Generate text from the current model
POST /generate_stream - Stream text generation token by token
GET /conversations - Get all saved conversations
GET /conversation/{conversation_id} - Get a specific conversation
POST /conversation - Create a new conversation
DELETE /conversation/{conversation_id} - Delete a conversation
POST /switch-model - Switch between text and code models

Sequential Thinking Server

This project includes an optimized implementation of the Model Context Protocol (MCP) sequential thinking server, allowing for step-by-step reasoning without requiring internet access.

Features

High-performance implementation of sequential thinking capability
Works with existing local models
Provides a visual representation of the thinking process
Supports branching and revisions in the thinking process
Automatic timeout handling and performance optimization

Using Sequential Thinking

The sequential thinking feature allows the AI to break down complex problems into discrete steps, showing its reasoning process before providing a final answer. This is especially useful for:

Mathematical or logical problems
Algorithm design and analysis
Multi-step reasoning tasks
Complex decision-making scenarios

How to Activate Sequential Thinking

Start a new conversation
Click the sparkle (✨) icon next to the send button
Check the "Sequential thinking" checkbox
Type your question and send it

Best Practices for Sequential Thinking

For optimal results when using sequential thinking:

Ask clear, well-defined questions
For complex problems, provide all necessary information in one message
Be patient as the system works through the thinking steps
Limit to 3-5 thoughts for best performance
Use shorter prompts for faster processing

Troubleshooting

If sequential thinking is not working properly:

Ensure all servers are running (thinking server, model server, and web server)
Check the server console output for any errors

Try resetting the thinking server using:

curl -X POST http://localhost:8001/reset

If thoughts seem repetitive, you can force completion with:
```
curl -X POST http://localhost:8001/force-complete
```
Verify server status with:
```
curl http://localhost:8001/health
```

Performance Notes

The sequential thinking feature is optimized for performance but may still require more processing time than standard responses. Each thought typically takes 2-5 seconds to generate, so a complete sequence might take 10-30 seconds.

How It Works

The sequential thinking feature allows the AI to break down complex problems into steps. When enabled:

The AI first processes the prompt through a series of thought steps
Each thought builds on previous ones
The AI can revise or branch from previous thoughts
After completing the thinking process, it provides a final answer

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
AgenTick-env		AgenTick-env
__pycache__		__pycache__
agentrick-env		agentrick-env
chat_history		chat_history
data		data
gitgoodbackend		gitgoodbackend
models		models
public		public
src		src
.gitignore		.gitignore
AGENTRICK.md		AGENTRICK.md
ChatInterface.tsx		ChatInterface.tsx
README.md		README.md
agentrick_server.py		agentrick_server.py
bun.lockb		bun.lockb
client_secret_187905300049-lug13k1be5hqk1q08tk3judb7a94j3lp.apps.googleusercontent.com.json		client_secret_187905300049-lug13k1be5hqk1q08tk3judb7a94j3lp.apps.googleusercontent.com.json
components.json		components.json
eslint.config.js		eslint.config.js
index.html		index.html
modelService.ts		modelService.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
pyrightconfig.json		pyrightconfig.json
response.txt		response.txt
run.py		run.py
server.js		server.js
start-agentrick-server.ps1		start-agentrick-server.ps1
start-agentrick-server.sh		start-agentrick-server.sh
start-servers.ps1		start-servers.ps1
start-servers.sh		start-servers.sh
tailwind.config.ts		tailwind.config.ts
test-agentrick.py		test-agentrick.py
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vite.config.ts.timestamp-1747487279912-09147e410668e.mjs		vite.config.ts.timestamp-1747487279912-09147e410668e.mjs

urviiumesh/EdgeGPT

Folders and files

Latest commit

History

Repository files navigation

EdgeGPT with Local Models

Features

Codebase Structure

Clean-up Information

Requirements

Frontend

Backend (Local Models)

Installation

Frontend Setup

Local Models Setup

Running the Application

Starting All Servers at Once

Manual Startup Process

Configuration

Troubleshooting Common Issues

Verifying Full Installation

Model Information

Switching Models

API Endpoints

Sequential Thinking Server

Features

Using Sequential Thinking

How to Activate Sequential Thinking

Best Practices for Sequential Thinking

Troubleshooting

Performance Notes

How It Works

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages