Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions docs/NLP/projects/name_entity_recognition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@

# Name Entity Recognition (NER) Project

## AIM
To develop a system that identifies and classifies named entities (such as persons, organizations, locations, dates, etc.) in text using Named Entity Recognition (NER) with SpaCy.

## DATASET LINK
N/A (This project uses text input for NER analysis, not a specific dataset)
- It uses real time data as input .

## NOTEBOOK LINK
[Note book link ]
(https://colab.research.google.com/drive/1pBIEFA4a9LzyZKUFQMCypQ22M6bDbXM3?usp=sharing)

## LIBRARIES NEEDED
- SpaCy


## DESCRIPTION

!!! info "What is the requirement of the project?"
- Named Entity Recognition (NER) is essential to automatically extract and classify key entities from text, such as persons, organizations, locations, and more.
- This helps in analyzing and organizing data efficiently, enabling various NLP applications like document analysis and information retrieval.

??? info "Why is it necessary?"
- NER is used for understanding and structuring unstructured text, which is widely applied in industries such as healthcare, finance, and e-commerce.
- It allows users to extract actionable insights from large volumes of text data

??? info "How is it beneficial and used?"
- NER plays a key role in tasks such as document summarization, information retrieval.
- It automates the extraction of relevant entities, which reduces manual effort and improves efficiency.

??? info "How did you start approaching this project? (Initial thoughts and planning)"
- The project leverages SpaCy's pre-trained NER models, enabling easy text analysis without the need for training custom models.

### Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)
- SpaCy Documentation: [SpaCy NER](https://spacy.io/usage/linguistic-features#named-entities)
- NLP in Python by Steven Bird et al.

## EXPLANATION

### DETAILS OF THE DIFFERENT ENTITY TYPES

The system extracts the following entity types:

| Entity Type | Description |
|-------------|-------------|
| PERSON | Names of people (e.g., "Anuska") |
| ORG | Organizations (e.g., "Google", "Tesla") |
| LOC | Locations (e.g., "New York", "Mount Everest") |
| DATE | Dates (e.g., "January 1st, 2025") |
| GPE | Geopolitical entities (e.g., "India", "California") |

## WHAT I HAVE DONE

### Step 1: Data collection and preparation
- Gathered sample text for analysis (provided by users in the app).
- Explored the text structure and identified entity types.

### Step 2: NER model implementation
- Integrated SpaCy's pre-trained NER model (`en_core_web_sm`).
- Extracted named entities and visualized them with labels and color coding.

### Step 3: Testing and validation
- Validated results with multiple test cases to ensure entity accuracy.
- Allowed users to input custom text for NER analysis in real-time.

## PROJECT TRADE-OFFS AND SOLUTIONS

### Trade Off 1: Pre-trained model vs. custom model
- **Pre-trained models** provide quick results but may lack accuracy for domain-specific entities.
- **Custom models** can improve accuracy but require additional data and training time.

### Trade Off 2: Real-time analysis vs. batch processing
- **Real-time analysis** in a web app enhances user interaction but might slow down with large text inputs.
- **Batch processing** could be more efficient for larger datasets.

## SCREENSHOTS

### NER Example
``` mermaid
graph LR
A[Start] --> B[Text Input];
B --> C[NER Analysis];
C --> D{Entities Extracted};
D -->|Person| E[Anuska];
D -->|Location| F[New York];
D -->|Organization| G[Google];
D -->|Date| H[January 1st, 2025];