I recorded every step I took for over three years

Last year I published an analysis of the steps I took over the past 2 years, this year I decided to update with the newest data from 2017.  I’ve seen my daily steps decrease by 17% to 8,024.  My daily goal was to reach 10k steps, but I only managed to achieve that 83 days (23% of the time).

Temperature

Last year I pointed out how the temperature has an effect on the number of steps I take.

123% more likely to take +15k steps when it’s warmer.

143% more likely to take less than 2,000 steps when it’s cold.

Location

In June I visited London and Europe to see the French Open.  I tried to pack as much into the trip as possible, which meant I spent most of my time walking.

Back in America, I noticed that living near a subway station in Brooklyn has drastically decreased the steps I take compared to my old apartment in Manhattan’s East Village which was 1/2 mile from the closest subway.  I now average 7,978 daily steps when I start my day in Brooklyn.

YoY, I’ve seen my daily steps decrease 17% which I attribute to the decrease in +10k step days.  In Brooklyn 24% of my days were 10k steps or more, in Manhattan, that amount was double, with half of my days hitting 10k.

Week Days

What’s my LTV?

Lifetime value is an important marketing concept which describes the amount of money a business can expect earn from a customer.  For this post, I’ve decided to calculate how much I’m worth to a business over the last 12 months using all my spending using Mint.com.

 

Business Analysis

I was really interested in the businesses where I spend the most money.  For this analysis, I have three categories essential expenses costs (rent, public transport, phone, groceries), and non-essential expenses (alcohol, coffee, fast food).  I’ll ignore the essential expenses because nobody really cares that I spend the same amount every month with the MTA or Verizon.   Below is a handpicked list of businesses that I spent lots of money at last year.

BLACK STAR COFFEE, $1,025: The business that surprised me the most was Black Star Coffee.  The average meal for me is a croissant, and a double cortado on weekdays, and a siracha egg sandwich and iced coffee on weekends.  Before looking at the data, I expected my monthly spending to be around $40.   Using Mint, I saw that over the last 12 months, I spent over $1,000 on 136 purchases (average cost of $7.54).  Clearly I was way off, I was blindly handing my credit card over without actually doing the math in my head, of how often I frequented this coffee place.

I like their croissants, and breakfast sandwiches a lot, I highly encourage people to try it out.  But I can’t justify spending at a conservative estimate, $400 a year on coffee from one place.  I could buy a coffee machine for $40, and pay an average of $0.40 for a freshly brewed cup of coffee, saving myself around $3.50 per cup.

TRADER JOES WINE SHOP, $409: Over the last year, I’ve really gotten into Trader Joe’s “two buck chuck”, officially known as Charles Shaw Wine, which is available for $3 for one bottle.  From the data, I make a bi-monthly trip to the Trader Joes wine shop, where I pick up two bottles  the first is usually an $8 Malbec, and the second is either a bottle of two buck chuck, or a second $8 Malbec, all of which leaves me with an average basket of $16.35 (incl. 10% NYC tax).

I expected my spending to be much higher with Trader Joes Wine, I was under the impression I visited more often and spent more in an average visit.  In terms of cost reduction, there’s not much I can except stop drinking.  In my opinion the wine is a great value, it doesn’t taste like it’s two dollars and it’s better than most beer.

Seamless, $714: I was really worried that Seamless would be over $1,000.  I estimated that my average basket was between $18 – $21 (actually $19.85), and that I ordered around 50 times (actually 36 times).  I’m fairly loyal, 2 restaurants accounted for 44% of all orders.  Half of all orders came in the final three months of 2016, while the other half came over the next nine months.  It’s good to know that I learned self control

One interesting to note, in 2016 I spent $979 on 52 orders, I’m not proud of it, but it was so convenient.

Category Analysis:


Fast Food $2,606: This is pretty embarrassing for two reasons. First these costs are mostly avoidable if I brought lunch, or stopped using Seamless.  Bringing lunch twice a week could save $1,000.  Second, I spent less on groceries than fast food.  Every time I buy lunch, I spend an average of $10.47, but cooking at home could cost me around $2 – $3 which is an 80% decrease in costs.

Sports Tickets $1,804: I’m a huge sports fan, over the past 12 months I went to the French Open, US (Tennis) Open, NHL Rangers games, and several soccer games.  Over the next 12 months I expect this to remain the same or slightly increase because I’d like to attend more NHL games, and an NFL game.  It’s not cheap, I spend around $150 dollars on each event (excl. soccer).

Television and Streaming $549: I use Sling TV, HBO Now, and Netflix Streaming for all of my TV and streaming.

Does this satisfy all my viewing needs?  Technically yes, the only thing I’m missing certain sports networks like YES, FS1 and MSG.  The channels that I do watch live are available in Sling and HBO Now: ESPN, Comedy Central, Adult Swim, HBO, and I can watch certain sports using a $15 (one time cost) digital TV antenna.

For comparison, using cable like Spectrum I could pay around $130/month, putting my total spend to $1,560 + $203 from Netflix.

 

Next Steps

Some areas were eyeopening.  I spend way too much on greasy Seamless food and lunch.  If I do spend money, it should be on nicer restaurants that I visit with friends.  Coffee was another unnecessary expense that I regret.  Coffee was usually ordered with a snack or sandwich which often doubled the cost.  For the next 12 months I should focus on brewing my own coffee, eating out less for dinner and bringing my lunch which could easily save me $1,000.  One area that I don’t expect to change, is sports tickets and alcohol.  I really enjoy seeing sports live and I don’t expect to change that anytime soon.  I would like to spend less than $2,000 over the next year, but realistically with dates, happy hours and NYC’s obsession with cocktail bars it will be hard.  I think it makes sense to track these expenses closer in Mint.  I’ve set up budgets which I’m trying to follow a little more closely, and I’ll try and do a quarterly post-mortem of expenses to see if I followed through on the cost reductions.

 

Source: Mint, Excel used for Viz

Analyzing 99 Million Taxi Trips Using Chicago Open Data

The City of Chicago released a dataset containing 100M trips over four years and it’s a huge win for the Open Data community. In this post, we examine the dataset which tells us everything about a passenger’s journey through Chicago, and see dive into the data to see how the industry is beginning to decline as competition from “Ride Shares”, begin to enter the market.

What’s does the average Chicagoan Tip their taxi driver?

Typically, 21% and that’s been stable since 2013.  Riders do seem to be more generous in December, with 2015 and 2016 having an average of 22% or higher.

Average Monthly Tip % from 2013-2016
What do people normally tip?

In statistics, a dataset has a “normal distribution” when the mean = median = mode,  in normal terms that means the average = middle number in a dataset = number appearing the most.  From the graph above, we see that it doesn’t have the smoothness of a bell curve, but instead, has sharp spikes around the values of 0%, 20%, 25% and 30%.

From my experience with New York Yellow Taxi Cabs, I assume that the payment system presents passengers with a predefined tip amount when paying.  Based on my analysis, 39% of all passengers use a predefined amount, with non-tippers making up 7% of all rides, and tippers (using the 20/25/30 amount) making up 32% of rides.

Note: Tip amounts in the dataset were only available for passengers who used a credit card.

 How are passengers paying for their ride?

There has been a steady increase in the amount of taxi rides that use a credit card.  From January 2013 to December 2016, the amount of trips using a credit card has increased from 30% to 47%.

Fewer people are using taxis.

2016 was the worst year for the Chicago Taxi industry with only 19.8M rides, the lowest in four years, and 26% lower than 2013.  Interestingly this didn’t have a huge effect on total fares, while trips were down 26%, fares are dropped 18%.  Similarly, while trips are down 10% from 2015, fares have only dipped 1%.

Putting on my economics hat to investigate this decrease, one possible reason for the decrease could be that riders are substituting taxis for ride sharing apps like Uber or Lyft which provide the same service at an equal or lower price.  Or perhaps it’s the January 2016 fare increase of 15%, that has driven consumers away.  In fairness, a 15% increase in Price and a 10% decrease in Quantity would suggest that the demand is slightly inelastic, but I digress. Other less likely reasons could be changes in public transportation, bikeshare programs, or more walking.

How fast does a taxi travel?

How long does a passenger spend in a taxi?

I found an interesting trend in average trip duration which seems to follow a seasonal trend. The winter months tend to have a shorter trip duration.  With a brutal winter, many passengers likely opt to take a taxi for shorter distances than they would in summer months.

Average time a taxi was in service

Average Monthly Fare

Similar to the average trip duration, the average fare follows a similar trend.  Winter months have a lower overall fare than the summer months.  January seems to be consistently 10% lower than May for each year in the dataset.

 

Conclusion

The Chicago Taxi business is in decline and has seen +10% decreases for two consecutive years.  The introduction of competition from ride share apps like Uber and Lyft has surely eaten into their business and will continue to increase market share as their businesses expand.  From a data perspective we found interesting stats on tip percentage, speed, and ride duration while also witnessing the affects of weather on how the city commutes.  This was an interesting dataset, and I want to look closer into the effects of how neighborhoods pickup/ dropoffs, but first I have to learn about Chicago neighborhoods (or do they have buroughs like NYC?).  My next steps will be to compare NYC to Chicago to see how each city’s taxi compares to the others.

Sources:

Big thanks to the City of Chicago (and more specifically Freedom of Information Laws) which released taxi ridership data into the public domain, and another bigger thanks to Google for adding this into the Big Query dataset and allowing users access to query the information and use it for free.  Chicago, NYC and several other datasets can be found in their database.

Data Sources: Taxi Data via Google Big Query

Code and queries: I need to set up a Github link with code used to generate these queries.

Visualizations Sources: Graphs created using Microsoft Excel.

There are 10,665 people in America named Shaq

During Super Bowl XLI, you may have seen a commercial about “Super Bowl Babies“,  which are babies conceived immediately following a city’s Super Bowl win.  While I believe the link between conception and championship is difficult to prove, it doesn’t mean that sports can impact a parent’s decision making.  This commercial did remind me of a Clemson football player named Shaq Lawson, which makes him the second person named Shaq I had ever heard of.

Using the Social Security Administrations data, and Google Big Query, I decided to look at the names of Hall of Fame players with unique names, and look at the number of babies born during that time with their name.  I chose five athletes, Shaquille O’Neal, Michael Jordan, Tiger Woods, Kobe Bryant, and Lebron James.

 

When did each name peak?

 

Shaq (’92, ’93, ’94)

Before 1973, Shaq (and Shaquille) did show up in the SSA’s published data, but twenty years later peaked at 2,422.

The most popular years for the name Shaq began after Shaquille O’Neal was drafted out of LSU with the first draft pick in the 1992 NBA Draft.  During his first three years with the Orlando Magic he won Rookie of the Year, appeared on the cover of Sports Illustrated, and finished fourth in MVP voting.  ESPN listed Shaq as the fourth best center of NBA history.

Kobe (’00, ’01, ’02, ’03)

The first name Kobe went from 307 in 1996, to 1,093 in 1998 which coincided with Bryant’s second season with the Lakers.  From ’01 – ’03 Bryant finished in the top ten in MVP voting each year, and won three NBA championships during that time.

 

Jordan (’90, ’91, ’97, ’98)

Jordan was next, there were 660 people named Jordan in 1977, but 22,080 in 1990, and was by far the most popular of the five names analyzed.

The top years for the name Jordan coincided with Michael Jordan’s peak with the Chicago Bulls.  During his tenure, Jordan won six NBA championships and Finals MVPs, and 5 All-Star awards.  Jordan is considered to be the best player in NBA history.

 

Tiger (’97, ’98, ’10)

Tiger won’t go down as the most popular name, but it is a unique first name that first appeared in 1997 after Tiger Woods won the 1997 Masters Tournament.  Interestingly enough, the most popular year for the name Tiger was 2010 when Tiger was the center of an infidelity scandal.

Lebron (’07, ’10)

I was surprised to see the Lebron hasn’t been a popular name despite James’s championship wins with the Heat, major endorsement deals with Nike.  Only two years have had more than 50 people named Lebron born: 2007 and 2010.

I recorded every step I took for over two years

I used my iPhone to record every step I took from November 2014 to December 2016.  I found some interesting trends after looking at average daily temperature, day of week, and location.

Some like it hot

After downloading the data from my phone, I compared my steps to the daily average temperature (source: NOAA) to see how temperature affected my activity.  On ‘hot’ days (temperature higher than 83 degrees), I averaged more than 11k steps.  This is because I like to take advantage of the warm weather and run 3 – 6 miles along the East River park on weekends, and squeeze in a couple of 2 – 4 mile runs on weekdays.  What’s interesting is that there seems to be a baseline step amount of 8,700 steps, meaning, that no matter the temperature, I always needed to walk 8.8k steps.  I cover this later, but this was likely because of my apartment’s distance from the subway.

The graph below is is probably my favorite representation of temperature and frequency.  I found this graph type when I was looking at a DC Bike Share on Kaggle.com.  On the x-axis is the temperature, the y-axis is my steps, the colors represent the type of weather (cold vs hot) and each marker repersents a day over the last 2+ years.  Like the previous image, you can see a slight increase in daily steps as the temperature rises.

NYC Real Estate

From 2014-2016 I lived in three different places, Connecticut, Manhattan, and Brooklyn.  My most active days were when I lived in the East Village, where the closest subway was almost a mile from my apartment.  Just getting from my apartment to work, meant that I walked almost two miles roundtrip. Things changed when I moved into a new apartment in Brooklyn.  This apartment was much closer to a subway, and my steps reflect it.  Keeping all things equal (steps taken on the way to get lunch, running/gym) we can assume that I take 2,000 fewer steps a day.

 

I took a look at the average amount of steps I took each month.  During the spring/summer months we notice that I took over 10,000 steps!

What hours did I walk the most in 2016?  To no one’s surprise, I walked the most during my weekday commute 9am, 6-8pm.  As mentioned earlier, I walked almost 1 mile to the nearest subway which would account for the 800+ steps I took during those times.

 

How do you track every step you take?

First you’ll need an iPhone with the Health app installed, which allows your phone to track every step using movement, similar to a pedometer.

Next, you’ll need to download this app: QS Access which will allow you to export a CSV of your daily steps onto Google Drive.

How to run 300 miles

In January 2016, I set a goal for myself that I would run 300 miles in 2016.  This would be a 50 mile increase from 2015 – a year where I ran more than any year previous –  and an 80 mile increase from 2014.  The rules were simple, only running counted toward 300, of the miles run, they had to be recorded in the app, MapMyRun, warmups and cool-downs don’t count towards the mileage, and treadmill running doesn’t count unless it’s over 1.5 miles.

Setting Quarterly Goals

I knew from the start that it’s unlikely that I’ll run the same every week – temperature, daylight, and likelihood of injury all factor into my running decisions.  With this in mind, I set the following quarterly mile goals:  50, 75, 125, and 50 miles.  My rationale was simple, I have less of an opportunity to run in Q1 & Q4 because of weather and an early sunset.  Injury is a major concern too, if I go out to fast in Q1, I could injure myself and be out for a month.

My July – September goals were justified because of the same reasons as my Q1 goal.  From May to August, the sun sets after 7:30p, which allows me enough time to leave work and run 3+ miles.  This takes the burden off of weekends, which means that I’m not dependent on 2 out of 7 days for my ten mile weekly goal.  Also, longer days mean I don’t need use the gym treadmills where I run 2.5 miles before getting too bored to finish.

Reality vs Goals

Part of this post is to be honest with myself.  Even though I modeled this down to the week, I still fell well short of my goal.  To avoid failing again, we need to learn from my mistakes.

We can see that my actual running in Q1 fell short of what I thought was possible.  This is the result of several factors: weather and work.  Jan and Feb were pretty brutal, there was a “blizzard” the first week of February which prevented my from running most of Feb.

Q3 was the biggest reason I missed my goal.  In January I set a personal goal of 125 miles over a 13 week span.  In theory this was easy to achieve, I would need to run 10 miles a week, or run 3.33 miles three times a week (probably Wednesday, Saturday and another date).  In reality I wasn’t able to meet this goal, for reasons I’m still trying to understand.  Part of it may be related to my personal life, I went on more dates on weekdays and weekends.  Another reason may be the 3 mile races I competed in every Wednesday.  These races were tough, and I felt like I had to rest for several days to recover from the races.

Next Steps

Since I didn’t run 300 miles last year I’ve decided to fulfill this goal in 2017.  I’ll need to learn from the mistakes of 2016, and be more proactive in Q3 when the goal is higher.  I’ll also need to determine if I’ll accomplish this with a lot of short runs (1.5 – 3 miles), or longer runs 4 miles.  I’ve entertained the idea of training for a half-marathon to accomplish this with less runs, but I prefer to keep my training (and racees) under 6 miles per run.

How often do I use Netflix DVD?

I recently cancelled my Netflix DVD subscription, which I wasn’t using as much as when I signed up in 2010.  Among my friends and coworkers, I’m one of the last people to use the service.  Before I canceled the subscription, I three years worth of Netflix send/return emails from my Gmail and cleaned the data in Excel.

So was Netflix DVD a good value?  In 2011-2012, it definitely was.  There were be nights where I’d watch a DVD the same day I received it in the mail, and mail it back the next morning.   But, as time went on, and I worked longer hours, I had less time to commit to movies.  All of a sudden, I didn’t want to commit 2-3 hours of my evening to a movie which demands my full attention.  My indifference to the DVD service started in August 2016.  In the graph below, we can see I stopped returning DVD’s almost entirely.

So what DVD’s spent the most time at home?

173 days: The Hateful Eight (June 10 – November 29 2017)

114 days: Her (July 30, 2014- November 20, 2014)

97 days: The Usual Suspects (October 10, 2013 – Jan 14, 2014)

93 days: Inside Llewyn Davis (November 20, 2014 – Feb. 20, 2015)

91 days: Paul (Sep 8, 2015 – Dec 7, 2015)

91 days: Foxcatcher (Sep 8, 2015 – Dec 7, 2015)

80 days: The Godfather (Dec 23, 2015 – Mar 11, 2016)

71 days: The Imitation Game (Aug 4, 2016 – Oct 13, 2016)

70 days: Nightcrawler (April 15, 2015 – June 23, 2015)

 

 

NYC Crime Data Analysis of 2015

One of my goals over the next year is to analyze more of the data freely available from New York City public agencies like the NYPD, 311, TLC, and MTA.

Today, I decided to analyze the NYPD’s dataset, the Seven Major Felonies for 2015 (or technically the first nine months of 2015).  I’m a big fan of the COMPSTAT releases available on the NYPD’s website, which break out crime into several categories, timeframes, locations, and shooting incidents and a comparison at the bottom of each PDF, comparing the current year against previous years (2016, 2001, 1998, 1993, 1990).  Today’s dataset is going to focus on the NYPD’s ‘seven major felonies: rape, murder, burglary, robbery, grand larceny, felony assault, and grand larceny of a motor vehicle.

 

The data can be found here.

Interesting Findings

Felony assaults are twice as likely to occur between midnight and 4am on Saturday and Sunday.

Staten Island has zero murders on Wednesday or Friday.

Bronx murders peak at 3pm, and 8pm, and 1am

Summer between midnight and 4am is when most robberies occur

8% of grand larcenies occur at noon

 

Murder (257)

This is category I find interesting because of the drop over the last 25 years.  In 1990, NYC once average a over 6 murders per day, and ended 1990 with 2,245, this number has plummeted to (only) 352 in 2015.  It’s difficult to say what caused this drop from a dataset perspective, but we are able to see trends in the 2015 data by layering on time, temperature and location.

One of my favorite quotes regarding the correlation between the murder rate and temperature came from former Queens resident 50 Cent.  On his album ‘Get Rich or Die Tryin’, he postulated, “They say summer time is the killing season, it’s hot out in this bitch, that’s a good enough reason“.  Fifty does seem to have a point, looking at the average number of murders that occur at a certain temperature, 75-80 degrees (Fahrenheit) is when murders peak at 45.

It’s difficult to say what causes the spike using only data, but we can make some guesses.  One guess is that people may be more likely with to go outside when there’s more daylight and warmer weather.  This could lead to more confrontations with friends/enemies/strangers.  Another guess would be the summer vacation schedule causes more teenagers to get into trouble, an ”.

BURGLARY (10,945)

The distribution of burglary seems to be dependent on time of day.  From 2pm-4am, each hour of the day will account for 5% of reported burglaries, but between 5am and 1pm, that number falls between 2-3%.  I don’t have information on the location of the burglary ie: Residential vs commercial, which would be useful to know who is affected, and if the location affects the reporting.  My guess is that it’s hard to determine whether these burglaries occurred at the time in the dataset, or if it’s when the NYPD received them.

 

RAPE (1,080)

This felony seems to be concentrated within several days and hours with several noticable spikes.  13% of rapes occur at midnight, in terms of weekday, Saturday and Sunday make up 18 and 17% respectively.  There are small spikes in the data at 4am, 8am, and noon.

Grand Larceny of a Motor Vehicle (5,515)

I’ve only briefly looked at this felony, but it appears that half occur between 6pm and midnight.  Breaking it out by month, 27% of GL’s were commited in August and September, while only 17% occured in January and February.