ChatBot API with Gemini and Vector Search

Summary

About the Project
Endpoints
Tools and Libraries
Setup
File Structure

About the Project

This is a study project in Natural Language Processing (NLP) of Large Language Models (LLM) in which sentence similarity in a vector bank was used. Access to the functionalities is done through a RESTful API developed using Django and JWT Authentication.

The main LLM used in the project was Google Gemini (gemini-1.5-flash) which performs two main functions:

Intent Router: the model decides whether the initial message is a greeting or whether the sentence should be looked up in the vector bank;
Final Answer Generator: from the 3 most similar answers to the user's sentence, a final answer is generated by the model in order to answer the initial question.

The second LLM used in the project was sentence-transformers/all-MiniLM-L6-v2 from the Sentence Transformers library on HuggingFace. This model is under the Apache License 2.0. Here the model performs the function of generating the embeddings (vectors) of the sentences present in the database. The vector bank used was FAISS, which stores the sentence vectors and the reference (id) of the embeddings in the Postgre bank.

The database used was habedi/stack-exchange-dataset, made available by habedi on HuggingFace under the Creative Commons license, which has data for questions asked in a Stack Exchange (SE) question answering community. The data was downloaded and saved in a Postgre database.

Endpoints

Auth:
- login
- logout
- token refresh
User:
- list users
- create
- read
- update
- delete
Model:
- train model
- train monitor
Search:
- search information

Tools and Libraries

For this project, the following tools were needed:

Programming Language

Python - Powerful and easy-to-read programming language used for project development.

Libs and Frameworks

Web Development and APIs

django - Python framework that facilitates rapid development of web applications with a pragmatic design.
djangorestframework - Flexible toolkit for building web APIs with Django.
djangorestframework-simplejwt - authentication with JWT tokens.
psycopg2-binary - connection between Python and PostgreSQL databases.
Swagger - Framework for testing and documenting RESTful APIs.

Database

PostgreSQL - Open source object-relational database for data storage.

Asynchronous Task Management

redis - In-memory data storage system, used as a message broker by Celery.
celery - Library for executing distributed asynchronous tasks.

Environment Configuration and Management

python-decouple - Helps separate configurations from the code, making it easier to manage parameters without having to change the code.

NLP and Data Processing

pandas - data manipulation in tables.
beautifulsoup4 - information extraction in HTML.
torch - Deep learning library used for applications such as computer vision and natural language processing.
Sentence Transformers - module to access, use and train text and image embedding models.
langchain - building applications with LLM in modular architecture.
langchain-community - extra community connectors and integrations for LangChain.
langchain-huggingface - Integration of HuggingFace models into LangChain.
langchain-google-genai - Integration of Google models into LangChain.
faiss-cpu - Vector database for similarity searching in large vector databases.

Frontend

streamlit - create interactive web interfaces in a simple way.
streamlit-autorefresh - automatically refresh the Streamlit interface at set intervals.

Setup

The project setup uses Ubuntu as the operating system.

PostGreSQL

Install:

 sudo apt install postgresql postgresql-contrib

Check execution status:
```
 sudo systemctl status postgresql
```
If it is not running, use the following command:
```
 sudo systemctl start postgresql
```

Celery Broker - Redis

Install:

 sudo apt update

 sudo apt install redis-server

Check execution status:
```
 sudo systemctl status redis
```
If it is not running, use the following command:
```
 sudo systemctl start redis
```
To verify that Redis is running correctly, use the command:
```
 redis-cli ping
```
the return at the Terminal must be:
```
 PONG
```

Configure Environment Variables

Create a .env file in the root of the project and add the environment variables as shown in the .env.example file. The variables required for this project are:

DB_NAME: name of the database used in the project.
DB_USER: name of the user that will be used to connect to the database. It must have the necessary read and write permissions.
DB_PASSWORD: password of the database user to perform authentication.
DB_HOST: address of the database server. If it is local, then localhost must be used.
DB_PORT: port used by the database to perform the connection.
GOOGLE_API_KEY: API KEY generated in Google AI Studio.

Virtual Environment

Creating the Python virtual environment - venv from the requirements.txt file. Run the following commands in the project root:

    python3 -m venv venv

    source ./venv/bin/activate

    pip install -r requirements.txt

Perform Migrations

With PostgreSQL running and the correct connection settings in your .env file, you can run Django migrations to create the necessary tables in your database:

    python manage.py makemigrations

    python manage.py migrate

Create Superuser

The superuser is a special type of user that has full administrative permissions. To create one, simply issue the following command:

    python manage.py createsuperuser

Execute the Project

Open three Terminal tabs in the project's root directory.
In the first tab, run the Django server:
```
 source ./venv/bin/activate

 cd chat_bot_api/

 python manage.py runserver
```
Through the link provided, it is possible to access the documentation and testing of endpoints with Swagger.

In the second tab, run Celery:

 source ./venv/bin/activate

 cd chat_bot_api/

 celery -A chat_bot_api worker --loglevel=debug --concurrency=4 --max-tasks-per-child=500

asynchronous procedures will be shown here.

In the third tab, run Streamlit:
```
 source ./venv/bin/activate

 streamlit run ui/main.py
```
automatically a page in the Browser will open to test the project with the graphical interface.

File Structure

The Django Apps used here follow the basic pattern, with files like admin.py, apps.py, models.py, tests.py, urls.py and views.py. Files other than these in the Apps will be shown below.

assets/: folder containing files used in README.md;
chat_bot_api/: folder containing the entire implementation of the API in Django;
- app_auth: folder containing the Django app for login, logout and refresh token;
- app_model/: folder containing the app for sending zip files, sending data to the database, creating embeddings, testing questions or sentences and monitoring execution status;
  - services.py: file containing all processing of requests made in the endpoints of model;
  - tasks.py: file containing the asynchronous task to run on Celery, as well as the status update of each step.
- chat_bot_api/: folder containing the main configurations of the project's API;
- core/: folder containing the app that has the files necessary to run Swagger as well as implementations common to all other apps;
- user/: folder containing the app responsible for managing the creation, reading, listing, updating and deleting of users;
  - serializers.py: file containing the validations necessary for the functionalities present in the user endpoints.
- manage.py: file created by Django containing implementations to start and monitor the execution of the API;
data/: folder containing the project's input data sent via endpoint;
- raw/: folder that should contain the initial data of the project, received via API endpoint;
- processed/: folder containing the initial files processed into vectors.
  - faiss_index/: folder containing the vector tree saved in memory.
ui/: folder containing the Frontend implementation;
- pages/: folder containing the interface pages;
  - chatbot.py: implementation of the chat page;
  - login.py: implementation of the login screen and user creation.
- main.py: main page that calls the home page.
venv/: folder containing the Python virtual environment and the packages needed to run the project. Created in create venv;
.env: unmonitored file containing the settings for connecting to the database;
.env.example: example file for environment variables needed for the project;
.gitignore: file with instructions for which files Git should not monitor.
LICENSE.md: current software license for this project;
requirements.txt: file containing the libraries needed to create the virtual environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChatBot API with Gemini and Vector Search

Summary

About the Project

Endpoints

Tools and Libraries

Programming Language

Libs and Frameworks

Web Development and APIs

Database

Asynchronous Task Management

Environment Configuration and Management

NLP and Data Processing

Frontend

Setup

PostGreSQL

Celery Broker - Redis

Configure Environment Variables

Virtual Environment

Perform Migrations

Create Superuser

Execute the Project

File Structure

About

Uh oh!

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
chat_bot_api		chat_bot_api
scripts		scripts
ui/chat-bot-ui		ui/chat-bot-ui
.env.example		.env.example
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

License

hdgiacon/chat_bot

Folders and files

Latest commit

History

Repository files navigation

ChatBot API with Gemini and Vector Search

Summary

About the Project

Endpoints

Tools and Libraries

Programming Language

Libs and Frameworks

Web Development and APIs

Database

Asynchronous Task Management

Environment Configuration and Management

NLP and Data Processing

Frontend

Setup

PostGreSQL

Celery Broker - Redis

Configure Environment Variables

Virtual Environment

Perform Migrations

Create Superuser

Execute the Project

File Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Languages

Packages