Matriculation Prediction

This project, built with a team of graduate students for our NLP capstone, aimed to predict college matriculation from student application essays. I led the effort, designing and implementing a deep hierarchical transformer model using DistilBERT with chunked input to process long-form text.
The output probabilities were ensembled with a traditional Random Forest model and then aggregated using a meta-level neural network ensemble approach (MLP), producing a final classification model with precision, recall, and F1 scores exceeding 0.94. Additional confidence-based features from each model helped the ensemble reach an accuracy of 98.9% on the final holdout set.
I also implemented stratified cross-validation, class weighting, and extensive evaluation metrics.
Here is a link to our GitHub repository for the project.