by Matt Burke & Marios Tsekitsidis
Our solution to resolve the anaphora challenge utilizes state-of-the-art methodology from the Kenton Lee et al. (2018) paper and its implementation, e2e-coref, which are discussed in detail in section 1.2. Here are some of our implementation's most notable advantages:
The model we implemented was derived from the paper “Higher Order Coreference Resolution with Coarse-to-Fine Inference”, published in NAACL 2018 by Kenton Lee, Luheng He, and Luke Zettlemoyer. To our knowledge, the F1 score achieved on a coreference resolution task in this paper is the highest published F1 score to date. The authors also graciously open-sourced their code for this model, allowing us to replicate their results and modify it to suit our needs. We included the option to use the pretrained weights of the model trained by the authors on various datasets, as well as the option to train the model on the WikiCoref dataset or other datasets of the same format.
This model is the culmination of several recent incremental improvements in neural net models for coreference resolution. The superior effectiveness of neural net models for coreference resolution was first demonstrated by Clark and Manning in their 2016 papers “Deep reinforcement learning for mention-ranking coreference models” and “Improving coreference resolution by learning entity-level distributed representations”. These models were integrated into several NLP pipelines, including Standford’s CoreNLP and HuggingFace’s neuralcoref. These models are relatively computationally efficient, but we prioritized achieving the highest F1 scores on our task over computational efficiency.
The first major improvement in neural coreference models was published in 2017 in Lee et. al’s paper "End-to-end Neural Coreference Resolution". This showed an F1 score of 67.2, a 1.5 point improvement over Clark and Manning’s model. This paper also open-sources their code, allowing others to improve upon their results. This model was then improved by Peters et al in 2018 by using ELMo word embeddings as the input to the model, instead of the previously used GloVe embeddings. This end-to-end model using ELMo embeddings is the backbone of our model.
Our model applies the end-to-end neural model iteratively in order to detect second-order references. Clusters are detected by the end-to-end model on a first pass, and then these clusters are fed into the model once more, which allows clusters that refer to the same entity, but were too far apart to be grouped together on a first pass of the model, to be correctly combined into one cluster. This achieves an average F1 score of 73.0, as can be seen in Table 1 of the “Higher Order Coreference Resolution with Coarse-to-Fine Inference” paper.
One of the major challenges of implementing our model was adapting it for the WikiCoref dataset. Instead of changing our model parameters from the CoNLL format that Lee et al trained their model on, we instead changed the WikiCoref documents into the CoNLL format and trained on these documents. Our model takes in a list of words, makes predictions in CoNLL format, and then transforms the CoNLL predictions into the OntoNotes format of the WikiCoref dataset. The functions to transform from CoNLL output to OntoNotes format are provided in helpers.py.
This script contains method(s) that perform coreference prediction.
Creates a model with predetermined hyperparameters. If the pretrained boolean is true, then the weights of a model trained by the authors will be stored in logs and can be used to make predictions. Loading word embedding to set up the model can take some time.
Predicts the coref clusters in a document, given as a list of words, and writes these predictions to a file in a modified OntoNotes scheme matching the WikiCoref dataset's scheme.
Simply turns a list of words into a continuous block of text.
This script contains methods for training a coreference resolution model.
The trainer must be initialized with the AnaphoraModel you with to train.
Overloads train_model_conll.
Replaces the WikiCoref format documents with corresponding CoNLL format documents, and trains the AnaphoraModel on these documents.
Writes the average F1 score, precision, and recall of the trained model to the console.
This script contains auxiliary method(s) for our own testing.
Reads the xml document (filename) of an article’s words (e.g. Barack Obama_words.xml) and returns a python list of them. Used in testing our anaphora model predictor.
This script contains helper methods for key operations such as detecting Freebase and/or Wikidata topics and outputting predictions in the WikiCoref format.
Compares the counts for an attribute within a JSON vs. a corresponding xml file.
Counts how many elements an attribute (/field) in a JSON contains. Used to count how many spans and how many clusters our predictor detected.
Counts how many elements an attribute (/field) in an xml contains. Used to count how many spans and how many clusters our predictor detected.
Returns a dictionary whose keys are spans (represented as strings) and values are the clusters that spans belong to (also represented as strings).
Reads the predictor's output from a JSON file and outputs an xml with the predicted coreference information, as well as predicted Wikidata and Freebase topic information, in the WikiCoref format. (Pre-condition: the predictor must have run and its results must have been stored in a JSON file).
Returns a set with all values that show up for a tag in an xml.
Searches for string str in the Wikidata database. Once the corresponding entity is found in Wikidata, looks for its Freebase ID property within the entity's properties. Returns an array [wikidata_uri, freebase_uri] of the searched string's topic's Wikidata URI and Freebase URI.
Writes an .mmax file with the appropriate content to enable importing to MMAX2.
Training our model on a personal laptop can be very time consuming, so we tested running and training our model on publicly available TPUs in Google Colaboratry. A copy of a Colab notebook showing the exact steps of how to train the model on Colab is included in our submission. You need to either clone into our repository from the notebook (or upload our submission files to the notebook). The sequence of cammands to run is:
!git clone https://github.com/mattjburke/CorefChallenge.git
%cd CorefChallenge
!./setup_colab_reqs.sh
import anaphora_model, anaphora_model_trainer
Then the anaphora_model and anaphora_model_trainer classes can be used normally.
Unfortunately, we encountered a bug in this setup that we have not been able to resolve (specifically, coref_kernels.so does not get created when coref_kernels.cc is compiled). Because of this, we included a colab notebook in our submission which only runs the e2eCoref code, and does not use the competition specified interface.
We are thankful for the implementation of the Kenton Lee et al. (2018) paper: https://github.com/kentonl/e2e-coref.
We gratefully acknowledge the source of our CoNLL-formatted WikiCoref documents: https://github.com/victoriasovereigne/WikiCoref-CoNLL.