Matthew Burke

Machine Learning Scientist

Projects

The majority of the projects listed below can also be found on my github page. However, some projects that I collaborated on are hosted by a collaborator, and may not be maintained or have links to code repositories. Some of my earlier projects also do not have links to code repositories since the organization and readablity of the code is not representative of my current standard of quality. The github icon at the end of each project description links to the project's repository. Projects without a link simply have a bar in the same location.



Crop Yield Prediction

When harvesting crops, combines need to adjust their speed depending on the thickness (yield) of the crops to keep the inflow of crops constant. A predicted yield map of the field enables the combine speed to be adjusted autonomously. These maps were created by leveraging machine learning, satellite imagery, and machine data from millions of fields. A pending patent contains more details. My team of 5 was recognized with the 2023 Innovation Award for this project.

Tensorflow, TFIO, AWS EC2 and EBS, PySpark, MLFlow

2022


Crop Planting and Harvest Date Predictions

This machine learning model uses a series satellite images to predict when a crop was planted, and when it will be harvested. This was later developed into a product helping farmers prioritize which fields to harvest first, which is often an important logistical decision.

Tensorflow, TFIO, AWS EC2 and EBS, PySpark, MLFlow

2022


Tableau Dashboards

Created dozens of tableau dashboards providing valuable insights to stakeholders throughout John Deere, with scheduled queries refreshing the data periodically.

SQL, PySpark, Databricks, Tableau

2021


Responsive image

Stacked GANs for Learning Additional Features of Image Segmentation Maps

It has been shown that image segmentation models can be improved with an adversarial loss. Additionally, previous analysis of adversarial examples in image classification has shown that image datasets contain features that are not easily recognized by humans. This work investigates the effect of using a second adversarial loss to further improve image segmentation. The proposed model uses two generative adversarial networks stacked together, where the first generator takes an image as input and generates a segmentation map. The second generator then takes this predicted segmentation map as input and predicts the errors relative to the ground truth segmentation map. If these errors contained additional features that are not easily recognized by humans, they could possibly be learned by a discriminator. The proposed model did not consistently show significant improvement over a single generative adversarial model, casting doubt about the existence of such features.

Python, PyTorch

2020

Coreference Resolution

Winning submission for the Kingland Machine Learning Competition 2019 predicting coreference resolution. The 2-person team of Marios Tsekitsidis and I won first place out of 49 student teams. Coreference resolution is determining what proper noun a pronoun refers to (e.g. what entity ”it” or ”he” refers to). We adapted a state-of-the-art end-to-end model for use on the WikiCoref dataset. We also breifly explain the theory behind this model and the incremental improvements made to coreference models over the years that culminated in this model.

Python, TensorFlow

2019

Kallisto RNA Sequence Quantification

Two classmates and I reimplemented the main functions of the popular C++ RNA sequence quantification program kallisto in C. RNA sequence quantification is an important tool for determining the functions of cells. Given a list of RNA sequences to quantify (transcripts), and a larger set of RNA sequences from a cell (reads), the goal is to quantify the relative abundance of each transcript in the reads. We tested our implementation using simulated RNA sequence data in the FASTA file format.

C, FASTA files

2019

Multiple Sequence Alignment

A demonstration of some dynamic programming algorithms to find the optimal multiple sequence alignment of several DNA sequences.

Java

2019


Protein Structure Prediction

For this project I reimplemented a paper that used a ResNet architecture for predicting protein distograms, and compared the effectiveness of using an Attention Augmented CNN for the same purpose.

Python, PyTorch

2019


Examining Proteins on Protein Data Bank

A simple tutorial of some basic manipulations of pdb files.

Python, pdb files

2019


Graph Convolutional Networks

A demonstration of various approaches to create node embeddings and whole graph embeddings for downstream prediction tasks on graphical data, including predicting the properties of small molecules for drug discovery.

Python, PyTorch, PyTorch Geometric

2019


Scaling Up Convolutional Neural Networks

Primarily inspired by EfficientNet, we formulated an alternative method for optimally scaling up CNNs. We first measured the accuracy of a number of small networks with differing hyperparameters, and then used a performance scaling prediction algorithm to determine the optimal scaling of the hyperparameters. This work was done during the International Research Experience Program (IREP) at Technische Univerität Darmstadt in Darmstatd, Germany.

Python, TensorFlow

2019


Stock Prediction Improvement with News Articles

In this class project with two other students, we created a model to predict if the S&P 500 index price would increase or decrease each day. Our model took typical time series analysis inputs such as the simple moving average, the exponential moving average, and high-low price differences, as well as the predicted sentiment of daily news articles related to the stock market. The sentiment of the articles was predicted using a naive bayes classifier, and did slightly improve the model's performance.

Python, SciKit-Learn, Jupyter Notebook

2019


Checkers Agent

A checkers-playing agent that will play against a user. The agent uses alpha-beta search to determine the optimal move to make.

Java

2018

Naive Bayes News Article Classification

This is an implementation of the naive bayes agorithm from scratch. The naive bayes model is used to classify news articles from the 20 Newsgroups dataset into 20 groups.

Java

2018

Ensemble Classifier

This was one of my very first machine learning projects. This classifier uses an ensemble of 7 simple Scikit-Learn classifiers to achieve a higher accuracy prediction than any of the individual classifiers. The 7 classifiers are a random forrest model, an ada-boost classifier, a multilayer perceptron, a k-nearest-neighbors model, a logistic regression model, a naive bayes model, and a decision tree.

Python, SciKit-Learn

2018

Android Application

This is one of the first large projects that I completed with 3 other students over the course of a semester. Our android application allowed users to share their location when they were at participating bars or restaurants, and receive special promotional deals for inviting freinds to the same location or for going out to particular venues a certain number of times per week. Users' information was stored in an SQL database, and the app would access and update this information via a PHP script hosted on a server.

Java, xml, SQL, PHP

2017