Natural Language Processing Course Project of Group 19 at IIT Kharagpur (Autumn'21)
Group Members:
- Abhikhya Tripathy (19EC10085)
- Debaditya Mukhopadhyay (19IE10036)
- Aditya Basu (19IE10002)
- Angana Mondal (19IE10039)
Joint-event-extraction is a significant emerging application of NLP techniques which involves extracting structural information (i.e., event triggers, arguments of the event) from unstructured real-world corpora. Existing methods for this task rely on complicated pipelines prone to error propagation.
An encoder-decoder based architecture for joint entity-relation extraction was proposed by Tapas Nayak et al., and we further develop the architecture to deploy it to predicting trigger, argument and relation tuple (including the classes of the trigger and argument). We also utilise pretrained BERT embeddings to preprocess our data.
The data is available at: https://drive.google.com/drive/u/1/folders/1fYP9PUQYRV0JWBa-N3CwuGkOCeBielT9
To obtain the Word2Vec embeddings and BERT embeddings, download 'w2v.txt' and 'BERT_embeddings.txt' from the aforementioned link.
- Python 3.5 +
- Pytorch 1.1.0
- CUDA 8.0
-
Python3:
python3 Joint_Event_Extraction.py gpu_id random_seed source_data_dir target_data_dir train/test w2v/bert -
IPython: Run individual cells of
NLP_Proj6_Grp_19.ipynb-
set the data folder and the output saving folder:
src_data_folder = pathtrg_data_folder = path + 'Model' -
for switching between training and testing phases, set the following parameters under
if __name__ == "__main__":,job_mode = 'train'orjob_mode = 'test' -
for switching between Word2Vec and BERT embeddings, set the following parameters under
if __name__ == "__main__":,embedding_type = 'w2v'orembedding_type = 'bert'
-
Command Line Arguments
Source_data_dir: Path to source data directoryTarget_data_dir: Path to target data directorytrain/test: Job mode (Chooseonly oneof the two modes at once)w2v/bert: Embedding type
Default Command Line Arguments for Google Colaboratory
- os.environ[‘CUDA_VISIBLE_DEVICES’] (=
gpu_id) = ‘0’ random_seed= 42
