Predicting Access to Healthful Food Retailers with Machine Learning

Primary Author: Modhurima Amin

Faculty Sponsor: Jill McCluskey


Primary College/Unit: Agricultural, Human and Natural Resource Sciences

Category: Business, Communication, and Politial Sciences

Campus: Pullman




Many U.S. households lack access to healthful food and rely on inexpensive, processed food with low nutritional value. Surveying access to healthful food is costly and finding the factors that affect access remain convoluted owing to the multidimensional nature of socioeconomic variables. We utilize machine learning with census tract data to predict the modified Retail Food Environment Index (mRFEI), which refers to the percentage of healthful food retailers in a tract, and two binary measures: food deserts— where no healthful-food retailers exist, and food swamps, where healthful-food retailers are considerably outnumbered by less healthful retailers. Our model optimally identifies ten demographic variables that detect food deserts and swamps with 100% prediction accuracy in sample and 78% out of sample. We find that food deserts and swamps are intrinsically different and require separate policy attentions. Food deserts are mainly wide, lightly populated rural tracts with low ethnic diversity. Commercial supercenters might find it unprofitable to operate there owing to low population density. Contrarily, swamps are predominantly small, densely populated, urban tracts, with more non-Caucasian residents who lack vehicle access. Therefore, while community supported agriculture might work better for food deserts, limiting unhealthy retailers might be better for food swamps to solve the problem of healthy food access. Overall access to healthful food retailers is mainly explained by population density, presence of Caucasian population, and income. We also show that our model can be used to get a sensible prediction of access to healthful food retailers for any U.S. census tract.