SparkSearchEngine

Description

Build a small-scale spark-based search engine which searches in a list of documents to find those answering a user’s query.

Key Features

Inverted Index Construction

The project reads the content of multiple documents and generates an inverted index that records each unique word, the count of documents containing that word, and a sorted list of those documents.

Data Storage

The resulting inverted index is saved in a file (wholeInvertedIndex.txt) and subsequently stored in a MongoDB collection named "dictionary" for efficient retrieval.

Query Processing

Users can input queries to search for specific words or phrases. The system retrieves and displays the relevant document identifiers, showcasing the documents that match the search criteria.

Sorting and Organization

Both the words in the index and the associated document lists are sorted alphabetically and in ascending order, ensuring clarity and ease of use.

Conclusion

This project combines big data processing techniques with information retrieval principles, providing a practical application of Spark and MongoDB in building scalable search solutions.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
BigTemp		BigTemp
wholeInvertedIndex		wholeInvertedIndex
InvertedIndex.scala		InvertedIndex.scala
README.md		README.md
SaveToDatabase.scala		SaveToDatabase.scala
SearchTask.scala		SearchTask.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparkSearchEngine

Description

Key Features

Inverted Index Construction

Data Storage

Query Processing

Sorting and Organization

Conclusion

About

Uh oh!

Releases

Packages

Languages

Roaajadaa/SparkSearchEngine

Folders and files

Latest commit

History

Repository files navigation

SparkSearchEngine

Description

Key Features

Inverted Index Construction

Data Storage

Query Processing

Sorting and Organization

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages