Saturday, January 3, 2015

Melting Columns to Rows in R

Pivoting data is a common issue when dealing with Excel or CSV input files.  Excel users generally like "dirty" yet presentable data.  Data dumps commonly place multiple tables worth of data into a single massive column table.  This style of multi-column data lends it self to problems when trying to analyze results using tools such as pivot tables.

For example, 12 months of column data should be converted to 2 columns (month name, value) for efficiently pivoting / filtering and sorting data from a Wide to Long format.

Excel can use some macro functionality to unpivot results, or an add-in like PrepYourData or Microsoft PowerQuery.
 
R provides this kind of functionality in 3 lines of code. In this case for subjects and scores across column to rows.

http://stackoverflow.com/questions/18446668/using-r-to-reformat-data-from-cross-tab-to-one-datum-per-line-format

http://seananderson.ca/2013/10/19/reshape.html

Tidy Data

Monday, November 10, 2014

Lifecycle of Data Science Project

The Lifecycle Summarized

1. Identify & Define the problem
2. Define and document data sources
3. Statistical data profiling
4. Implementation
5. Sharing and collaboration
6. Maintenance & Support

http://www.datasciencecentral.com/profiles/blogs/life-cycle-of-data-science-projects

Plus M&Ms, Jackknife (Swiss Army Style?) logistic and linear regression.
http://www.datasciencecentral.com/profiles/blogs/jackknife-logistic-and-linear-regression

Plus jackknifing your results in @ 4 lines of R code.
http://ryouready.wordpress.com/2008/12/19/r-jackknife-the-coefficients-of-a-linear-regression-model/

Random Forests in Tableau with R
http://boraberan.wordpress.com/2014/02/07/decision-trees-in-tableau-using-r/

Finally, using a jackknife to cut down some Hidden Decision Trees.
http://www.datasciencecentral.com/profiles/blogs/hidden-decision-trees-revisited

Monday, May 26, 2014

FRED Add-In for Excel and some Torontoist Centric Economic Data

The Federal Reserve Bank of St. Louis Economic Data (FRED) Add-In is free software that will significantly reduce the amount of time spent collecting and organizing macroeconomic data. The FRED add-in provides free access to over 210,000 data series from various sources (e.g., BEA, BLS, Census, and OECD) directly through Microsoft Excel.

Get it here
http://research.stlouisfed.org/fred-addin/

Are you looking for GDP, CPI, or microeconomic data from the US FED?  Stats Canada?

Some interesting visualizations and interpretations of this type of data.

How much you make vs. how much it really feels like per US city with the supporting data released April 2014.

Canadian Cities Where An Average Income Will No Longer Buy You a House 

Numbeo, Cost of Living In Canada

It will cost you about 6% more to eat at McDonalds in Barrie vs. Toronto.
It costs 90% more to buy a bag of potatoes in Oshawa than Toronto.

There must be a potato famine in Oshawa... or people there really like french fries.

Friday, August 10, 2012

Speedometer Design: Why It Works | DATA + DESIGN by Paul Van Slembrouck

Gauge controls may have some comfortable familiarity with certain business users (think oil & gas or automotive/transportation). 

However, do they convey business information in a quick, actionable way?
Speedometer Design: Why It Works | DATA + DESIGN by Paul Van Slembrouck

Saturday, February 4, 2012

2011年の日本の地震 分布図 Japan earthquakes 2011 Visualization map (2012-01-01) - YouTube

It isn’t so much the visualization as it is the sound that will blow you away. From the constant frame of reference in the top left clock / calendar, to the scrolling detail data, this has all the elements of a perfect visualization.  It would be really cool to see twitter feeds and news headings on this too.