Multi-modal Context-based Sarcasm Detection Using AI

Project Overview

This project aims to explore the potential of AI in detecting sarcasm, which is a complex form of verbal irony. Sarcasm is often characterized by statements that convey the opposite of their literal meaning, commonly used for humor or passive aggression. The research introduces an innovative framework that combines ensemble methods with early and late fusion techniques to detect sarcasm using multi-modal data inputs including text, audio, and visual data.

Objectives

To develop a framework that can accurately detect sarcasm using multi-modal data.
To enhance the detection accuracy by employing ensemble techniques combined with advanced feature extraction models.
To investigate whether sarcasm is a speaker-dependent or speaker-independent phenomenon.

Key Contributions

VGGish Transfer Learning for Audio Data: Utilizes advanced audio feature extraction techniques, leveraging the VGGish model to generate audio embeddings that significantly outperform traditional methods.
State-of-the-Art Facial Expression Recognition: Employs a top-performing model trained on the Facial Expression Recognition (FER) dataset to extract subtle emotional cues from visual data.
Multimodal Fusion Ensemble Framework (MFEF): A novel architecture that integrates ensemble models with fusion techniques to improve sarcasm detection.

Framework

The proposed architecture integrates textual, audio, and visual inputs to detect sarcasm in conversational data. The framework includes:

Textual Model: Utilizes a case-sensitive BERT model to generate text embeddings, which capture contextual information and speaker nuances.
Audio Model: Uses the VGGish model to extract robust audio features, capturing the tonal complexities associated with sarcasm.
Visual Model: Extracts facial embeddings using a VGG-FER model, focusing on expressions that typically accompany sarcastic utterances.
Early Fusion Model: Concatenates embeddings from the textual, audio, and visual models to create a comprehensive feature set for sarcasm detection.
Late Fusion Model: Combines predictions from individual modalities using a logistic regression model.
Ensemble Model: Averages probabilities from the Early and Late Fusion models, enhancing the overall performance.

Experimental Results

The framework was tested under two experimental setups: speaker-dependent and speaker-independent sarcasm detection. The Ensemble model achieved the highest performance, demonstrating a 10% improvement over existing methods in sarcasm detection.

Model	Speaker-Dependent F1 Score	Speaker-Independent F1 Score
Textual	0.81	0.75
Audio	0.73	0.70
Visual	0.57	0.58
Early Fusion	0.87	0.85
Late Fusion	0.83	0.75
Ensemble	0.89	0.85

Installation

Clone the repository:

git clone https://github.com/Vijeth-Rai/multimodal-sarcasm-detection.git
cd multimodal-sarcasm-detection

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Data Preparation: Place your text, audio, and video data in the appropriate directories as specified in the configuration file.

Training the Models: Run the training scripts for each modality:

python train_text_model.py
python train_audio_model.py
python train_visual_model.py

Fusion and Ensemble: Combine the modality outputs and run the ensemble model:
```
python run_ensemble.py
```
Evaluation: Use the evaluation scripts to measure the performance:
```
python evaluate.py
```

Datasets

This research utilizes the MUStARD++ dataset, a multimodal dataset designed for sarcasm detection. The dataset includes text, audio, and visual data, providing a comprehensive foundation for training and evaluating the models.

Future Work

Expanding the dataset to reduce the margin of error and improve model robustness.
Exploring real-time sarcasm detection capabilities.
Investigating new multi-modal transfer learning models trained on larger classification datasets.

Citation

If you use this code or dataset, please cite:

@article{rai2024multimodal,
  title={When words are not enough: Multi-modal context-based sarcasm detection using AI},
  author={Vijeth Rai},
  journal={MSc Data Science and Artificial Intelligence},
  year={2024}
}

Contact

For questions or feedback, please contact Vijeth Rai at [vijethrai98@gmail.com].

This README provides a comprehensive overview of the project, installation instructions, and usage guidelines to help you get started with multi-modal sarcasm detection using AI.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
MSc		MSc
My paper.pdf		My paper.pdf
Presentation.pptx		Presentation.pptx
README.md		README.md
Reflective_Essay_220228042.pdf		Reflective_Essay_220228042.pdf
gitattributes		gitattributes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-modal Context-based Sarcasm Detection Using AI

Project Overview

Objectives

Key Contributions

Framework

Experimental Results

Installation

Usage

Datasets

Future Work

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

Vijeth-Rai/Multimodal-Sarcasm-Detection

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Context-based Sarcasm Detection Using AI

Project Overview

Objectives

Key Contributions

Framework

Experimental Results

Installation

Usage

Datasets

Future Work

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages