GitHub - rylan-berry/chatbot: Using my newly made tokenizer and my LLM base, both of which I learned from tutorials, I'm synthesizing them together to not only make a better LLM, but I'll be changing it into a chatbot form.

This is my model, which is based on my original LLM that I created months ago, along with a more recently developed tokenizer. The model is broken across multiple files so preprocesses can be run separately from model training, as well as allowing the model to be run separately from training as well. The model should be able to run right out of the box as long as the generate.py, model.py, vocabulary_aid.py, model.pt, and merges.json are installed and working correctly.

The model requires external libraries to run and/or train. The only one necessary to run the model is PyTorch (www.pytorch.org). However, if you wish to train the model, NumPy is required. And if you want to adjust the pre-training processes, specifically the data collection, BeautifulSoup is needed. Along with these libraries, git Large File Storage(LFS) is utilized, so when using the repository, make sure you have git LFS installed.

If you are completely training the model from scratch, here's the order in which the files need to be run. First, run trainVocab.py; this sets up the vocabulary to be used for encoding and decoding. Then, run dataCollect.py, which gathers all the wanted sites that will be used in training into a single, pre-encoded file. Currently, all data is gathered from www.gutenberg.org. With these processes complete, run train.py; this will train the model on the data. Now, once trained, run generate.py to test the model.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
__pycache__		__pycache__
.gitattributes		.gitattributes
README.md		README.md
chat.py		chat.py
dataCollect.py		dataCollect.py
finetune.py		finetune.py
finetuneData.txt		finetuneData.txt
finetuned_model.pt		finetuned_model.pt
generate.py		generate.py
inputVocab.txt		inputVocab.txt
merges.json		merges.json
model.pt		model.pt
model.py		model.py
train.py		train.py
trainVocab.py		trainVocab.py
train_data.bin		train_data.bin
vocabulary_aid.py		vocabulary_aid.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

rylan-berry/chatbot

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages