Problem + Motivation
According to the American Society for the Prevention of Cruelty to Animals (ASPCA), roughly 920,000 shelter animals are euthanized each year.  I wanted to address this disheartening statistic with the skills gained from my machine learning and data analysis course. Using PetFinder.com’s API, I organized the data of adoptable dogs through multiple iterations of data cleaning and filtering. Through modeling techniques including logistic regression and decision trees, I sought to identify what attributes of a dog’s profile were statistically significant in a successful adoption. With this information, shelters would face fewer overcrowding issues, thus decreasing the euthanization rates of these animals. 

Data Collection + Cleaning
Petfinder.com’s API was used to collect information on adoptable dogs. The data sets provided columns describing the attributes of the dog (ex: breed, color. State where they were found, age, etc). Duplicate and faulty data values were also discovered such as values in the wrong column. All the remaining nan values were filled with “False” or “unknown” to ensure successful modeling results.
Data Prior to Cleaning
Data Prior to Cleaning
Data Post Cleaning
Data Post Cleaning
Modeling
Baseline Model Accuracy
Baseline Model Accuracy
Cross Validated Decision Tree Accuracy
Cross Validated Decision Tree Accuracy
Decision Tree Accuracy
Decision Tree Accuracy
Logistic Regression Accuracy
Logistic Regression Accuracy
Results
After performing the four modeling techniques, we concluded that cross-validated CART had the highest accuracy score of 0.78
Conclusion + Implications
Using the results from our models, we can conclude that the breed of dog, whether they are up to date on shots, house trained, good with children, and if they are fixed are all strong predictors of dog adoption success rates. We hope that our results will encourage shelters to maintain their upkeep with shots, housetraining, and these other factors that are critical in successful adoptions. 
Some room for improvement includes the addition of a sentimentality score using the description column of each dog’s adoption profile. A sentimentality score is a NLP-based feature that analyzes tone and language nuance. We hypothesize that many of the dogs are likely to have a positive sentimentality score, but we believe it would be productive to see if there is a range in which dogs are more likely to be adopted based on their score. 
Something important to note is that we may have had more definitive results if values were not missing for the dog profiles as well as the uneven distribution of dog breeds available in our data. Being more mindful of the diversity in the data is something to be more conscious of in the future. 
Back to Top