Jim Hogan - Blog

I recorded every step I took for over three years

Last year I published an analysis of the steps I took over the past 2 years, this year I decided to update with the newest data from 2017. I’ve seen my daily steps decrease by 17% to 8,024. My daily goal was to reach 10k steps, but I only managed to achieve that 83 days (23% of the time).

Temperature

Last year I pointed out how the temperature has an effect on the number of steps I take.

123% more likely to take +15k steps when it’s warmer.

143% more likely to take less than 2,000 steps when it’s cold.

Location

In June I visited London and Europe to see the French Open. I tried to pack as much into the trip as possible, which meant I spent most of my time walking.

Back in America, I noticed that living near a subway station in Brooklyn has drastically decreased the steps I take compared to my old apartment in Manhattan’s East Village which was 1/2 mile from the closest subway. I now average 7,978 daily steps when I start my day in Brooklyn.

YoY, I’ve seen my daily steps decrease 17% which I attribute to the decrease in +10k step days. In Brooklyn 24% of my days were 10k steps or more, in Manhattan, that amount was double, with half of my days hitting 10k.

Week Days

Build maps with the US Census data and R

Before you start

Install the main packages: tidycensus, tidyverse, and leaflet. The example below will install the packages if you don’t have them. Get a Census Key here.

library_list <- c("leaflet","stringr","sf","tidyverse","tidycensus","purrr","knitr","scales")
for(library in library_list){
  
  if(!require(library, character.only = TRUE)){
    install.packages(library, dependencies = TRUE)
    require(library, character.only = TRUE )
    
  }

census_api_key(key = "KEY_GOES_HERE", install = TRUE,overwrite = TRUE)
readRenviron("~/.Renviron") 
options(tigris_use_cache = TRUE)

Median Income in Brooklyn

Census data consists of the decennial dataset, which you know as the survey Americans fill out every ten years, and the American Community Survey (ACS) which is completed every year. Every section of the United States has a 12 digit code maintained by the US Census.

The taxonomy format: State (2) – County (3)- Tract (6) – Block Group (1). The code 36047016500 would be interpreted as New York (36), Kings County aka Brooklyn (047 ), Tract (016500).

The ACS tracks over twenty-five thousand statistics for things like: income, educational attainment, travel time, age, household size. For this example, use the median income code of B19013_001. The data is available in granularities like State, Country, Tract, which you can see in the get_acs function as “geography”.

To see every variable tracked by the ACS, run this command.

v17 <- load_variables(2019, "acs5", cache = TRUE)
View(v17)

ny_median<-  get_acs(
    geography = "tract"
    , variables = "B19013_001"
    ,state = "NY"
    ,geometry = TRUE
    ,year=2019)

####Important Areas in New York
# 047 Brooklyn
# 081 Queens
# 061 Manhattan
# 005 Bronx

ny_median<-ny_median %>% filter(
   str_detect( GEOID,'36047')
  )

Interactive Map with Leaflet

ny_df<-st_as_sf(ny_median)

pal <- colorNumeric(palette = "RdYlBu",domain = ny_df$estimate)

m<-ny_df %>%
  st_transform(crs = "+init=epsg:4326") %>%
  leaflet(width = "100%") %>%
  addProviderTiles(provider = "CartoDB.Positron") %>%
  addPolygons(popup = ~ str_extract(estimate, "^([^,]*)"),
              stroke = FALSE,
              smoothFactor = 0,
              fillOpacity = 0.9
              ,color = ~ pal(estimate)
              ) %>%
  addLegend("bottomright",
            pal = pal,
            values = ~ estimate,
            title = "Median Income",
            labFormat = labelFormat(prefix = " "),
            opacity = 1
            )
m

From Time Magazine to Gizmodo, here are the publishers that were put up for sale in 2018

2018 was a transitional year for publishers who saw reorganizations, sales, and acquisitions. I took a look at over 75 publishers to see how they performed.

Moving from left to right, Meredith is working on the divestiture of it’s Time Inc. assets. Time was the most notable, going to Marc Benioff for $190 million in September, quickly followed by Fortune in November for $150 million.

Gothamist had a turbulent 2018. In November 2017, it was shut down after the editorial team voted to unionize, but it was quickly revived by public radio station WNYC in February 2018.

Picking the best python graphs for beginners – Plotly, Seaborn, Matplotlib, Chartify

Are you new to Python and trying to make a beautiful graph? I’ve reviewed four of the most popular and picked the best option for beginners. For the cells below, I used Jupyer Notebook with these modules that can be installed via pip (pandas, numpy, plotly, cufflinks, seaborn, chartify).

In a normal day, I’ll open my Jupyter Notebook, import a CSV that I created using SQL/Hive.

remember, this doesn't go in jupyter notebook, it goes in your terminal (the thing with a black screen, sort of looks like that thing from The Matrix)

pip install plotly
pip install cufflinks
pip install chartify
pip install seaborn

import pandas as pd
import numpy as np

%matplotlib inline

import pandas as pd


%cd -q Downloads 
#%cd this changes my directory to the Downloads folder

df1=pd.read_csv('blog_example.csv')
#this uses pandas (pd) to read the csv in the Downloads folder
#this example data mimics Google Ad Manager data, but for this exercise, it's full of random numbers

df2=df1.pivot_table(values='imps',index='day',columns='subset',aggfunc='sum')
#I now have two dataframes: df1, df2. This will be used later, depending on the graph

df2.head()
#.head() will show the first five rows of df2

Download example data here.

Plot.ly

Link

Learning Curve: Low, my pick for best graphing module for beginners.

What I like: Interactive, easiest library to use for beginners, pretty themes out of the box, other features (export, save as png), easy to understand documentation for new users.

What I don’t like: version 2.x is slow. If you don’t use cufflinks, this becomes one of the most difficult graphing libraries. Requires additional code to run in offline mode.

import plotly
import cufflinks as cf

cf.go_offline() 
#cf.go_offline() allows you to use plotly in jupyter

df2.iplot()

Chartify

Link

What I like: Easy to write, built by Spotify Data Science team.

What I don’t like: Requires an additional exe to run (from Google).

import chartify
df=df1.groupby(['day','subset'],as_index=False).sum() 
#chartify can handle a flat table, no need to pivot it

%cd -q
#%cd was needed to change the active directory to 'python', earlier in this lesson I moved it to the Downloads folder. 

ch = chartify.Chart(blank_labels=True, x_axis_type='datetime')

ch.plot.line(
    data_frame=df,
    x_column='day'
    ,y_column='imps'
    ,color_column='subset'
)
ch.show()

Seaborn

Link

pip install seaborn
sns.set()
#sns.set is optional, but I like the formatting
sns.lineplot(x='day',y='imps',hue='subset' ,data=df1,ci=None);

What I like: Pretty visualizations out of the box, great at heatmaps.

What I don’t like: I’ve personally had trouble writing

and remembering the formatting of the plotting functions.

Matplotlib

Link

pip install matplotlib

df1.plot()

Learning Curve:

What I like: Customizable, lots of documentation on StackOverflow

What I don’t like: Difficult to remember all the features. Learning curve is prohibitive to new users.

I before E except after C is a lie

It’s embarrassing, but I’ve had a lot of trouble spelling the word, “receipt”. I keep spelling it reciept, which could be avoided if I remembered the simple mnemonic rhyme, “I before E, except after C”. In this analysis, we see that it’s almost never true.

First, I downloaded a copy of every English word from this github repo, and then running it in Excel, isolated words with the letters “IE” or “EI”. I found 504 words.

75% of words did not follow the maxim of I before E.

Did the first letter in a word impact the I & E order? It doesn’t seem so. Only words beginning with B, J, S and V all had an I before an E.

Length of the word didn’t have a significant impact on the order of the I and the E. I & E combinations occurred more in words that had a length between 10 and 12 letters.

Williamsburg in seven years

I moved to Williamsburg from the East Village in 2016 because I wanted to pay less rent, shorten my commute and be around a lot of bars and restaurants. After a year of living Williamsburg, I’ve heard more than my fair share of hipsters gentrifying jokes. What is interesting about the area is the sense of new-ness in the area. Looking around, some areas are nothing but apartment building made out of the same metal and glass facade.

Building a time machine with Google

I didn’t live in NYC in 2007, but I am lucky enough to have the next best thing, Google. By using Google Maps’ time machine function. When in street view, move your cursor to the top left of the screen until you hover over the grey box (in the picture below it says 250 Bedford Ave). In the bottom part of my box, I can see a clock that labeled, “Street View – August 2007”. This will open a timeline of every time that Google Street Car has passed by your location.

Note: My goal was to use jQuery to make a before/after effect of the image, but jQuery and WordPress don’t play well together (it caused my entire site to stop loading), so I published this In a format similar to Business Insider (one big list with next to no insight).

Bedford Avenue

Bedford Avenue is now the heart of Williamsburg, but before that it was full of decaying building. In the ten years since that photo was taken, an Apple Store, Equinox, Whole Foods and Duane Reade were built in this exact location.

McCarren Park Area

This area would be unrecognizable if it weren’t for the houses on the right hand side of the screen. In eight years, three huge apartment buildings were built in the empty lots and warehouses of East Williamsburg.

West Williamsburg

No major architectural changes here, but we can see the cities move to make NYC streets more pedestrian friendly.

Central Williamsburg

This was an amazing picutre. Before most of the major development we could see the Manhattan skyline between the old buildings covered in graffiti. In the following eight years, there were new building on every block going all the way to the East River.

Conclusion

I wanted this post to go out to show people how much this area changed in a short timespan. What I would love to look at next is the affect on real estate prices, rent and GDP of the area.