Data Driven Weight Loss

Years ago, I worked as a ramp rat for a major airline. I would, in a given hour, throw a literal ton of bags on and off of an airplane. The physical nature of the job, combined with my 20-year-old metabolism, allowed me to eat everything and still lose/maintain a nice weight.

Since then, I've hung up my orange safety vest for a career sitting in front of computers. The pay is better, but I miss my travel benefits and superhuman nomming powers. The weight has slowly creeped on and it stubbornly remains. In the midst of life's stresses and overbooked schedules, it's hard to find time (and interest) in eating healthy food and working out.

A year or so ago, I hacked out Fourhealth with a good friend. It was our attempt to combine geolocation data from Foursquare with various health data. We were a finalist for the Twilio prize, but never really polished all of the bugs out of our app.

So, over the weekend, I revisited this concept with a more data science/quantified self perspective. For me specifically, what kind of places do I visit when I'm gaining or losing weight?

Data Preprocessing and Association Rule Mining

This problem lends itself well to association rule mining. The classic example of this data mining method involves looking for patterns of associations among items in supermarket transactions. An association rule is an implication, taking the form of an itemset (e.g., {milk, diapers}) that implies an additional itemset (e.g., {milk, diapers} => {beer}). This example can be interpreted as: customers who purchase both milk and diapers are also likely to purchase beer. This forms the basis for how some supermarkets run loss leader promotions and targeted coupons.

In my situation, I'm looking for itemsets of foursquare checkins which imply weight loss or weight gain. The first trick is obtaining data. I wrote an R function to iteratively pull my foursquare checkins, and then I manually downloaded my weight data from withings.

I calculated weight change among average weekly weight. If I gained or lost more than 0.3 pounds in a week, then I coded it as weight gain or loss, respectively. I dichotomized checkins at foursquare venues and venue categories by week, and used the resulting matrices to run the apriori algorithm. I attempted to preserve the temporality of the data by associating checkins with the subsequent week's weight change.


There were 51 weeks where sufficient data existed for analysis. Of those weeks, I lost weight in 17 of them, and gained weight in 22 of them.

The following association rules were discovered:

These are really awkward to interpret. If, in a given week, I went to the casino AND the gym, then it was likely that I gained weight. However, if in a given week, I checked in at home and a thai restaurant, then I was likely to lose weight.

So, let's try this again, only this time looking for association rules identifying specific venues. Note that the minimum confidence and support are a little lower, as it's harder to generate sufficient data to discover venue-specific rules (as opposed to category-specific rules):

So, this makes a bit more sense, and is much more actionable. It seems that a few culprits have been identified as being associated with my weight gain: The Chatterbox Pub, Spyhouse Coffee Shop, Chipotle, and The Bad Waitress. This sounds about right... these are restaurants/coffee shops that I frequent, and the food I consume there is less than healthy.

Additionally, what's interesting is that going to the gym (Life Time Fitness) doesn't help me lose weight when I continue going to some of these restaurants. Also, I guess this analysis would suggest that nothing really helps me lose weight (that's a little depressing). Rules associated with other venues (GingerHaven/my house, U of Mn) have pretty low confidence, so they're probably a bit more of a fluke than anything else.

My action items

Now I have some actionable findings that are associated with my weight gain. So, if I want to reverse these trends, it might stand to reason that I should avoid doing things that are associated with the trend.

So, for the next month or so, I'm going to try to avoid:

  • The Chatterbox
  • Spyhouse Coffee Shop
  • Chipotle
  • Bad Waitress

I would also avoid going to school, but I don't think my advisor or professors would like that. However, it is worth revisiting how I cope with academic-related stress, but that's a much harder topic. Of course, in general, conducting this analysis is orders of magnitude easier than actually implementing the recommended changes. Changing habits is hard, but hopefully I'll make it happen.

To be clear, these rules don't prove causality. The concept generally seems sound: the places that you frequent impact your health. However, it is possible to make healthy choices almost everywhere. So, your mileage may vary. But, these results do generally jive with my understanding of myself and my habits.

Try this out at home

Hopefully this sounds kinda cool to you (particularly if you've made it this far), and maybe you've considered following me on twitter by now ;)

If you want to keep playing with this, I've released the following related code:

  • RPI: R Programming Interface, the start of some R functions to obtain data from common APIs.
  • QS-Weight-Loss: My R code for this analysis.

If you try this out on your own data, let me know what you find! I'd love to further refine this method in the future.