Skip to content

Information retrieval search engine based on tf-idf. Corpus used is movie dataset from nltk in python

Notifications You must be signed in to change notification settings

darcy3000/tf-idf_vectorizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

*********************INFORMATION RETRIEVAL*****************
VECTOR SPACE MODEL

MODULES AND PACKAGES REQUIRED:
	NLTK
	KIVY
	PICKLE
	HEAPQ_MAX
	TIME
	STRING
	COLLECTIONS

Download the corpora from nltk. We are using movie_reviews as our corpus which is used for sentiment analysis by others. It contains 2000 documents. Run the python file as python <filename>.py

The classifier has been written in pickle files. So you can directly run the program without training the classifier by reading from the pickle files.
Uncomment the approriate lines to train the classifier again.

Keep the .kv files in the same folder for running the GUI based program.

A GUI based window will open. Follow the instructions. CLick on the links to view the entire document.  

About

Information retrieval search engine based on tf-idf. Corpus used is movie dataset from nltk in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages