Anaphora Challenge Resolution

by Matt Burke & Marios Tsekitsidis

1. Introduction

1.1. Summary

Our solution to resolve the anaphora challenge utilizes state-of-the-art methodology from the Kenton Lee et al. (2018) paper and its implementation, e2e-coref, which are discussed in detail in section 1.2. Here are some of our implementation's most notable advantages:

1.2. Model Overview

The model we implemented was derived from the paper “Higher Order Coreference Resolution with Coarse-to-Fine Inference”, published in NAACL 2018 by Kenton Lee, Luheng He, and Luke Zettlemoyer. To our knowledge, the F1 score achieved on a coreference resolution task in this paper is the highest published F1 score to date. The authors also graciously open-sourced their code for this model, allowing us to replicate their results and modify it to suit our needs. We included the option to use the pretrained weights of the model trained by the authors on various datasets, as well as the option to train the model on the WikiCoref dataset or other datasets of the same format.

This model is the culmination of several recent incremental improvements in neural net models for coreference resolution. The superior effectiveness of neural net models for coreference resolution was first demonstrated by Clark and Manning in their 2016 papers “Deep reinforcement learning for mention-ranking coreference models” and “Improving coreference resolution by learning entity-level distributed representations”. These models were integrated into several NLP pipelines, including Standford’s CoreNLP and HuggingFace’s neuralcoref. These models are relatively computationally efficient, but we prioritized achieving the highest F1 scores on our task over computational efficiency.

The first major improvement in neural coreference models was published in 2017 in Lee et. al’s paper "End-to-end Neural Coreference Resolution". This showed an F1 score of 67.2, a 1.5 point improvement over Clark and Manning’s model. This paper also open-sources their code, allowing others to improve upon their results. This model was then improved by Peters et al in 2018 by using ELMo word embeddings as the input to the model, instead of the previously used GloVe embeddings. This end-to-end model using ELMo embeddings is the backbone of our model.

Our model applies the end-to-end neural model iteratively in order to detect second-order references. Clusters are detected by the end-to-end model on a first pass, and then these clusters are fed into the model once more, which allows clusters that refer to the same entity, but were too far apart to be grouped together on a first pass of the model, to be correctly combined into one cluster. This achieves an average F1 score of 73.0, as can be seen in Table 1 of the “Higher Order Coreference Resolution with Coarse-to-Fine Inference” paper.

One of the major challenges of implementing our model was adapting it for the WikiCoref dataset. Instead of changing our model parameters from the CoNLL format that Lee et al trained their model on, we instead changed the WikiCoref documents into the CoNLL format and trained on these documents. Our model takes in a list of words, makes predictions in CoNLL format, and then transforms the CoNLL predictions into the OntoNotes format of the WikiCoref dataset. The functions to transform from CoNLL output to OntoNotes format are provided in helpers.py.

2. Documentation

Anaphora Model

This script contains method(s) that perform coreference prediction.

__init__(self, pretrained)
Arguments:
pretrained (bool): Uses the pretrained model if set to True.
Description:

Creates a model with predetermined hyperparameters. If the pretrained boolean is true, then the weights of a model trained by the authors will be stored in logs and can be used to make predictions. Loading word embedding to set up the model can take some time.

predict_example(self, words: list, destination: str)
Arguments:
words (list): A list of words on which to perform coreference prediction.
destination (str): The path where to output the .xml document with the predicted annotations and the corresponding .mmax file to allow importing to the MMAX2 GUI.
Description:

Predicts the coref clusters in a document, given as a list of words, and writes these predictions to a file in a modified OntoNotes scheme matching the WikiCoref dataset's scheme.

wordlist_to_block(list)
Arguments:
list (list): A list of words to be concatenated.
Description:

Simply turns a list of words into a continuous block of text.

Anaphora Model Trainer

This script contains methods for training a coreference resolution model.

__init__(self, model: AnaphoraModel)
Arguments:
model (AnaphoraModel): A model of type AnaphoraModel.
Description:

The trainer must be initialized with the AnaphoraModel you with to train.

train_model(self, paths: list)
Arguments:
paths (list): A list of paths with the documents to be used for training.
Description:

Overloads train_model_conll.

train_model_conll(self, paths: list)
Arguments:
paths (list): A list of paths with the documents to be used for training.
Description:

Replaces the WikiCoref format documents with corresponding CoNLL format documents, and trains the AnaphoraModel on these documents.

evaluate_tained_model(self)
Description:

Writes the average F1 score, precision, and recall of the trained model to the console.

Extras

This script contains auxiliary method(s) for our own testing.

doc2words(filename)
Arguments:
filename (str): The name of the document from which to extract list of words.
Description:

Reads the xml document (filename) of an article’s words (e.g. Barack Obama_words.xml) and returns a python list of them. Used in testing our anaphora model predictor.

Helpers

This script contains helper methods for key operations such as detecting Freebase and/or Wikidata topics and outputting predictions in the WikiCoref format.

compare_json2xml(attribute, path2predictedJSON, path2trueXML)
Arguments:
attribute (str): the attribute whose number of elements to count. Could be 'clusters' or 'top_spans'.
path2predictedJSON (str): the path to the JSON file.
path2trueXML (str): the path to the xml file.
Description:

Compares the counts for an attribute within a JSON vs. a corresponding xml file.

count_in_json(attribute, path2json)
Arguments:
attribute (str): the attribute whose number of elements to count. Could be 'clusters' or 'top_spans'.
path2json (str): the path to the JSON file.
Description:

Counts how many elements an attribute (/field) in a JSON contains. Used to count how many spans and how many clusters our predictor detected.

count_in_xml(attribute, path2xml)
Arguments:
attribute (str): the attribute whose number of elements to count. Could be 'coref_class' or 'span'.
path2xml (str): the path to the xml file.
Description:

Counts how many elements an attribute (/field) in an xml contains. Used to count how many spans and how many clusters our predictor detected.

spans_w_coref(path2json)
Arguments:
path2json (str): Path to the JSON file containing the prediction.
Description:

Returns a dictionary whose keys are spans (represented as strings) and values are the clusters that spans belong to (also represented as strings).

export2xml(path2json, destination)
Arguments:
path2json (str): Path to the JSON file containing the prediction.
destination (str): Path to the directory where to output the coreference annotations as xml.
Description:

Reads the predictor's output from a JSON file and outputs an xml with the predicted coreference information, as well as predicted Wikidata and Freebase topic information, in the WikiCoref format. (Pre-condition: the predictor must have run and its results must have been stored in a JSON file).

xml_tag_values2set(tag, path2xml)
Arguments:
tag (str): The tag we are interested in exporting information about.
path2xml (str): Path to the xml from which to read.
Description:

Returns a set with all values that show up for a tag in an xml.

str2wikidata_freebase_uri(str)
Arguments:
str (str): The string to search for on Wikidata.
Description:

Searches for string str in the Wikidata database. Once the corresponding entity is found in Wikidata, looks for its Freebase ID property within the entity's properties. Returns an array [wikidata_uri, freebase_uri] of the searched string's topic's Wikidata URI and Freebase URI.

createMMAXfile(words_xml_filename)
Arguments:
words_xml_filename (str): The name of the article's words.xml file.
Description:

Writes an .mmax file with the appropriate content to enable importing to MMAX2.

3. Run our code

Training our model on a personal laptop can be very time consuming, so we tested running and training our model on publicly available TPUs in Google Colaboratry. A copy of a Colab notebook showing the exact steps of how to train the model on Colab is included in our submission. You need to either clone into our repository from the notebook (or upload our submission files to the notebook). The sequence of cammands to run is:

!git clone https://github.com/mattjburke/CorefChallenge.git
%cd CorefChallenge
!./setup_colab_reqs.sh
import anaphora_model, anaphora_model_trainer

Then the anaphora_model and anaphora_model_trainer classes can be used normally.

Unfortunately, we encountered a bug in this setup that we have not been able to resolve (specifically, coref_kernels.so does not get created when coref_kernels.cc is compiled). Because of this, we included a colab notebook in our submission which only runs the e2eCoref code, and does not use the competition specified interface.

Acknowledgements

We are thankful for the implementation of the Kenton Lee et al. (2018) paper: https://github.com/kentonl/e2e-coref.

We gratefully acknowledge the source of our CoNLL-formatted WikiCoref documents: https://github.com/victoriasovereigne/WikiCoref-CoNLL.