Data Scientist Enablement

Get Started. It's Free
or sign up with your email address
Data Scientist Enablement by Mind Map: Data Scientist Enablement

1. Week 1

1.1. 20.1. - 25.1.2014

1.2. Reading / Learning

1.2.1. An Introduction to Data Science by Jeffrey Stanton

1.2.1.1. Read chapters 1 - 3

1.2.1.2. http://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf

1.2.2. Big Data [sorry] & Data Science: What Does a Data Scientist Do?

1.2.2.1. http://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what-does-a-data-scientist-do

1.3. Activities

1.3.1. Installing R and R-Studio

1.3.1.1. http://www.rstudio.com/

1.3.2. Fun with Math

1.3.2.1. Usign Data Graphs widget

1.3.2.1.1. http://www.mathsisfun.com/data/data-graph.php

1.3.2.1.2. Create a bar chart quickly with 10 random values

1.3.2.1.3. Change graph to Pie Chart

1.3.2.1.4. Display percentages only, not the original values.

1.3.2.2. practice

1.3.3. Playing with ML Datasets

1.3.3.1. Visit UCI Machine Learning Repository

1.3.3.1.1. http://archive.ics.uci.edu/ml/datasets.html

1.3.3.1.2. Familiarize yourself with various datasets at this site

1.3.3.1.3. For week 1 our focus is on just “Housing” dataset.

1.3.4. Research on Data Visualization tools

1.3.4.1. Create a presentation

1.3.4.1.1. Data Visualization Tools - A Comparative Study

1.3.4.1.2. Incorporate your unique ideas

1.3.4.1.3. Right evaluation methodology

1.3.4.1.4. Justify your choices

1.3.4.1.5. Build this presentation for 4 weeks

1.3.4.1.6. Present it during 5th week

1.4. Assignments

1.4.1. Download Housing dataset

1.4.1.1. UCI Machine Learning Repository

1.4.1.2. http://archive.ics.uci.edu/ml/datasets.html

1.4.2. Import this dataset into your R environment

1.4.3. Display this dataset

1.5. Social Engagement on SONO

1.5.1. Discussion 1

1.5.1.1. Welcome to DSE program

1.5.1.2. required

1.5.2. Discussion 2

1.5.2.1. Programming languages

1.5.2.1.1. you are familiar with?

1.5.2.1.2. do you use on day to day basis?

1.5.2.2. R Language

1.5.2.2.1. any experience?

1.5.2.3. Analytics tools

1.5.2.3.1. if any

1.5.2.3.2. you have used before?

1.5.2.4. required

1.5.3. Discussion 3

1.5.3.1. Optional Q&A

1.5.3.2. optional

1.6. Submission

1.6.1. Submit the screenshots of your R workspace

1.6.1.1. [email protected]

1.6.1.1.1. Image into body

1.6.1.1.2. PDF format attached

1.6.1.1.3. No links

1.6.1.2. showing the Housing dataset

1.6.2. Deadline

1.6.2.1. Saturday Jan 25

1.6.2.2. 11:59 PM

1.6.2.3. your local time

1.7. Roadmap

1.7.1. http://bit.ly/1hC5wAV

2. Week 2

2.1. 26.1.2014 - 1.2.2014

2.2. Reading / Learning

2.2.1. An Introduction to Data Science by Jeffrey Stanton

2.2.1.1. Read chapters 4 - 7

2.2.1.2. http://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf

2.2.2. R for Machine Learning by Allison Chung

2.2.2.1. Sections 1 to 3.4, pages 1-5

2.2.2.2. http://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec02.pdf

2.2.3. Introduction to Probability and Statistics Using R

2.2.3.1. Chapters 1-3

2.2.3.2. http://cran.r-project.org/web/packages/IPSUR/vignettes/IPSUR.pdf

2.2.3.3. optional

2.2.4. Think Stats: Probability and Statistics for Programmers

2.2.4.1. Chapters 2-7

2.2.4.2. http://www.greenteapress.com/thinkstats/thinkstats.pdf

2.2.4.3. optional

2.2.5. Statistics Playlist

2.2.5.1. Khan Academy

2.2.5.2. http://www.youtube.com/playlist?list=PL86E177E40B006419

2.2.5.3. If you are unfamiliar with basic Statistical concepts

2.2.5.4. If you need a quick refresher on this topic

2.2.5.5. optional

2.3. Activities

2.3.1. Play with spreadsheets

2.3.1.1. Given the following dataset

2.3.1.1.1. find

2.3.1.1.2. { 3, 15, 17, 18, 20, 20, 12, 20, 20, 16, 17, 12, 4, 7, 15, 20, 12, 6, 1, 20 }

2.3.1.2. practice

2.3.2. Research on Data Visualization tools

2.3.2.1. continue

2.3.3. Fun with Math

2.3.3.1. Relative Frequency Distribution

2.3.3.1.1. http://www.mathsisfun.com/data/relative-frequency.html

2.3.3.1.2. learn

2.3.3.2. practice

2.3.4. Connect with local groups

2.3.4.1. R

2.3.4.2. Data Scientist

2.3.4.3. Big Data

2.3.4.4. chack

2.3.4.4.1. projects

2.3.4.4.2. talks

2.3.4.4.3. seminars

2.3.4.5. discuss

2.3.4.5.1. enagegement

2.3.4.5.2. help

2.3.4.6. optional

2.3.5. Download PSPP for Windows

2.3.5.1. pspp4windows

2.3.5.2. http://sourceforge.net/projects/pspp4windows/

2.3.5.3. play with it

2.3.5.4. optional

2.3.6. Housing dataset

2.3.6.1. Import this dataset into your R environment

2.3.6.2. Describe this dataset statistically

2.3.6.3. pastecs package

2.3.6.3.1. stat.desc() function

2.3.6.4. optional

2.3.7. Big Data in Motion

2.3.7.1. http://www.elabs3.com/ct.html?ufl=a&rtr=on&s=j0w,1oekc,62ly,76ou,hoyr,f1qz,g4iu

2.3.7.2. online

2.3.7.3. free

2.3.7.4. webinar

2.3.7.5. Jan 30, 2014

2.3.7.6. 1 PM EST

2.3.7.7. optional

2.4. Assignments

2.4.1. Download Haberman dataset

2.4.1.1. UCI Machine Learning Repository

2.4.1.2. http://archive.ics.uci.edu/ml/datasets.html

2.4.2. Import this dataset into your R environment

2.4.3. Visually describe this dataset

2.4.3.1. Generate three graphic representations

2.4.3.1.1. Histogram

2.4.3.1.2. Scatter Plot

2.4.3.1.3. Box Plot

2.5. Social Engagement on SONO

2.5.1. Discussion 1

2.5.1.1. Big fuss about Big Data

2.5.1.2. required

2.5.2. Discussion 2

2.5.2.1. Statistical sampling etc.

2.5.2.2. required

2.5.3. Discussion 3

2.5.3.1. Optional Q&A

2.5.3.2. optional

2.6. Submission

2.6.1. Submit the screenshots of your R workspace

2.6.1.1. [email protected]

2.6.1.1.1. PDF format attached

2.6.1.2. showing the Haberman dataset

2.6.2. Deadline

2.6.2.1. Saturday Feb 1

2.6.2.2. 11:59 PM

2.6.2.3. your local time

2.7. Roadmap

2.7.1. http://bit.ly/1dVHJwO

3. Week 3

3.1. 2.2.2014 - 8.2.2014

3.2. Reading / Learning

3.2.1. R for Machine Learning by Allison Chung

3.2.1.1. Sections 4.1 - 4.5, page 7

3.2.1.2. http://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec02.pdf

3.2.2. Machine Learning: The Basics by Ron Bekkerman

3.2.2.1. http://www.youtube.com/watch?v=wjTJVhmu1JM

3.2.2.2. optional

3.2.3. Introduction to R for Data Mining by Joseph Rickert

3.2.3.1. http://www.youtube.com/watch?v=6jT6Rit_5EQ

3.2.3.2. optional

3.2.4. Machine Learning With R

3.2.4.1. http://www.slideshare.net/ChiuYW/machine-learning-using-r#

3.2.5. Machine Learning Guidance For Beginners

3.2.5.1. http://mlthirst.wordpress.com/2013/01/08/machine-learning-guidance-for-beginners/

3.2.6. Hilary Mason - Machine Learning for Hackers

3.2.6.1. http://vimeo.com/43547079

3.3. Activities

3.3.1. Research on Data Visualization tools

3.3.1.1. continue

3.3.2. Visit Data Science Central

3.3.2.1. http://www.datasciencecentral.com/

3.3.2.2. Examine “Visualization of the day”

3.3.2.3. practice

3.3.3. Explore Amazon

3.3.3.1. Using publicly available resources

3.3.3.2. Investigate what algorithmic techniques Amazon employs

3.3.3.3. To recommend related products when you search for one

3.3.3.4. Do not employ private intellectual capital (iCap)

3.3.3.5. practice

3.3.4. Survey ML in your industry

3.3.4.1. List top 10 of these with the use cases

3.3.4.2. Briefly discuss about the outcomes

3.3.4.3. Do not access or disclose any iCap

3.3.4.4. practice

3.3.5. Explore State of World Children 2014 in Numbers

3.3.5.1. http://www.unicef.org/sowc2014/numbers/

3.3.5.2. Where do the poorest children live?

3.3.5.3. What is being done to improve their lives?

3.3.5.4. What are systemic problems that still need to be solved?

3.3.5.5. optional

3.3.6. Apply for Schmid Fellowship

3.3.6.1. Check out

3.3.6.1.1. http://www.dssg.uchicago.edu/

3.3.6.2. If interested apply

3.3.6.3. optional

3.3.7. Research on Prof. Sen’s methodologies

3.3.7.1. http://scholar.harvard.edu/sen

3.3.7.2. Examine what data he employs

3.3.7.3. optional

3.3.8. Mushroom dataset

3.3.8.1. Perform Statistical Analysis

3.3.8.2. Reveal hidden patterns and correlation between its attributes

3.3.8.3. optional

3.4. Assignments

3.4.1. Download Mushroom dataset

3.4.1.1. UCI Machine Learning Repository

3.4.1.2. http://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/datasets/

3.4.2. Import this dataset into your R environment

3.4.3. Apply apriori() algorithm

3.4.3.1. arules package

3.5. Social Engagement on SONO

3.5.1. Discussion 1

3.5.1.1. Big Data In 2014: 6 Bold Predictions

3.5.1.2. article

3.5.1.2.1. http://www.informationweek.com/big-data/big-data-analytics/big-data-in-2014-6-bold-predictions/d/d-id/1113091

3.5.1.3. required

3.5.2. Discussion 2

3.5.2.1. Netflix 1 M Prize - Belcor Solution

3.5.2.2. forum

3.5.2.2.1. http://www.netflixprize.com/community/viewtopic.php?id=1537

3.5.2.3. required

3.5.3. Discussion 3

3.5.3.1. Optional Q&A

3.5.3.2. optional

3.6. Submission

3.6.1. Submit the screenshots of your R workspace

3.6.1.1. [email protected]

3.6.1.1.1. PDF format attached

3.6.1.2. showing the Mushrooms dataset

3.6.2. Deadline

3.6.2.1. Saturday Feb 8

3.6.2.2. 11:59 PM

3.6.2.3. your local time

3.7. Roadmap

3.7.1. http://bit.ly/1dILgbT

4. Week 4

4.1. 9.2.2014 - 15.2.2014

4.2. Reading/Learning

4.2.1. R for Machine Learning by Allison Chang

4.2.1.1. http://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec02.pdf

4.2.2. Introduction to Machine Learning by Lars Marius Garshol

4.2.2.1. http://www.slideshare.net/larsga/introduction-to-big-datamachine-learning

4.2.3. Machine Learning: The Basics by Ron Bekkerman

4.2.3.1. http://www.youtube.com/watch?v=wjTJVhmu1JM

4.2.3.2. optional

4.2.4. Introduction to R for Data Mining by Joseph Rickert

4.2.4.1. http://www.youtube.com/watch?v=6jT6Rit_5EQ

4.2.4.2. optional

4.2.5. Top 10 Algorithms in Data Mining by Wu et. al.

4.2.5.1. http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf

4.2.5.2. optional

4.3. Activities

4.3.1. Research on Data Visualization tools

4.3.1.1. continue

4.3.2. Visualize 2010 Winter Olympic Medals

4.3.2.1. Visualize this data using a spreadsheet showing geographic distribution pattern of these medals.

4.3.2.2. You can use Google Spreadsheet

4.3.2.3. Repeat this exercise for 2014 Winter Olympics

4.3.2.4. practice

4.3.3. Write a user-defined function in R

4.3.3.1. input

4.3.3.1.1. N

4.3.3.1.2. integer

4.3.3.2. return

4.3.3.2.1. sum

4.3.3.3. verify

4.3.3.3.1. sum N odd numbers

4.3.3.3.2. is given by formula

4.3.3.4. practice

4.3.4. Write an R function that implements Sieve of Eratosthenes

4.3.4.1. Sieve of Eratosthenes

4.3.4.1.1. algorithm

4.3.4.1.2. http://primes.utm.edu/glossary/page.php?sort=SieveOfEratosthenes

4.3.4.2. practice

4.3.5. Build a personal Career Advancement Roadmap

4.3.5.1. 5-10 year horizon

4.3.5.2. inventory

4.3.5.2.1. strengths

4.3.5.2.2. capabilities

4.3.5.3. reflects

4.3.5.3.1. career ambitions

4.3.5.4. use

4.3.5.4.1. DSE Roadmap to enhance your capabilities

4.3.5.4.2. open knowledge repositories

4.3.5.5. practice

4.4. Assignments

4.4.1. write

4.4.1.1. Survey Paper

4.4.1.1.1. 2-5 pages

4.4.1.2. Topic

4.4.1.2.1. How Big Data is being used in your industry

4.4.2. inspiration

4.4.2.1. The 'big data' revolution in healthcare - McKinsey & Company

4.4.2.2. http://www.mckinsey.com/insights/health_systems/~/media/7764A72F70184C8EA88D805092D72D58.ashx

4.5. Social Engagement on SONO

4.5.1. Discussion 1

4.5.1.1. Top 8 Big Data Posts from December 2013

4.5.1.2. article

4.5.1.2.1. http://ensighten.com/blog/top-8-big-data-posts-december-2013

4.5.1.2.2. Pick a post that interest you most.

4.5.1.2.3. Comment what you like most about it and how these insights can applied.

4.5.1.3. required

4.5.2. Discussion 2

4.5.2.1. Evolving Darwin - Genetic Algorithm

4.5.2.2. video

4.5.2.2.1. http://www.youtube.com/watch?v=dO05XcXLxGs

4.5.2.2.2. Does it sound like a valid machine learning approach?

4.5.2.2.3. What are its strengths and weaknesses, if any?

4.5.2.2.4. How would you improve it?

4.5.2.3. required

4.5.3. Discussion 3

4.5.3.1. Optional Q&A

4.5.3.2. optional

4.6. Submission

4.6.1. Submit single document

4.6.1.1. [email protected]

4.6.1.1.1. PDF format attached

4.6.1.2. showing your Big Data Survey

4.6.2. Deadline

4.6.2.1. Saturday Feb 15

4.6.2.2. 11:59 PM

4.6.2.3. your local time

4.7. Roadmap

4.7.1. http://bit.ly/1g8tMKM

5. Week 5

5.1. 16.2.2014 - 22.2.2014

5.2. Reading/Learning

5.2.1. Why data visualization matters

5.2.1.1. http://strata.oreilly.com/2012/02/why-data-visualization-matters.html

5.2.2. Quick Introduction to Time Series Graphs

5.2.2.1. http://www.uri.edu/artsci/newecn/Classes/Art/INT1/Eco/D_A/timeseries.html

5.2.3. Four Pillars of Effective Visualization by Noah Iliinsky

5.2.3.1. http://cdn.oreillystatic.com/en/assets/1/event/91/Four%20Pillars%20of%20Effective%20Visualizations%20Presentation.pdf

5.2.4. Designing Data Visualizations

5.2.4.1. http://www.youtube.com/watch?v=lTAeMU2XI4U

5.2.4.2. optional

5.2.5. Introduction to Data Visualization with R and ggplot2

5.2.5.1. http://www.youtube.com/watch?v=efmuwtFNlME

5.2.5.2. optional

5.3. Activities

5.3.1. Examine the Axiis visualization

5.3.1.1. showing Browser Market Share

5.3.1.2. http://www.axiis.org/examples/BrowserMarketShare.html

5.3.1.3. Figure out how Chrome did in July 2009

5.3.1.4. Did it fare better than Safari and Opera combined?

5.3.1.5. practice

5.3.2. Visit Data Science Central

5.3.2.1. http://www.datasciencecentral.com/

5.3.2.2. Examine “Visualization of the day”

5.3.2.3. Could you have presented this in a better way?

5.3.2.4. Explore the alternative ways of representing this.

5.3.2.5. practice

5.3.3. Build Time Series Graph for the dataset

5.3.3.1. Use your favorite Spreadsheet tool

5.3.3.2. data

5.3.3.2.1. P = (23,58,74,45,87,99,30,64,79,82,11,49)

5.3.3.2.2. Year = (1900,1901,1902,1903,1904,1905,1906,1907,1908,1909,1910,1911)

5.3.3.2.3. #P is the number of patents registered by a fictional corporation Avion Zenith Unlimited.

5.3.3.3. practice

5.3.4. Upload the above dataset into the cloud-based Visualization tool, IBM Many Eyes

5.3.4.1. http://www-958.ibm.com/

5.3.4.2. practice

5.3.4.3. optional

5.3.5. Download Data Visuaization tool Tableau Public and play with the above Avion Zenith dataset

5.3.5.1. http://www.tableausoftware.com/public/download

5.3.5.2. practice

5.3.5.3. optional

5.3.6. Download and practice R visualization examples illustrated by Winston Chang in his webcast

5.3.6.1. http://www.youtube.com/watch?v=efmuwtFNlME

5.3.6.2. practice

5.3.6.3. optional

5.3.7. Review the current/recent issue of R-Journal

5.3.7.1. http://journal.r-project.org/

5.3.7.2. Pick an article that interests you

5.3.7.3. Write a short synopsis in a readble format

5.3.7.4. Share it with rest of the group

5.3.7.5. research

5.3.7.6. optional

5.4. Assignments

5.4.1. Complete Comparative Study of Data Visualization Tools

5.4.1.1. submit

5.4.1.1.1. individual assignment

5.4.1.1.2. or collaborate with others

5.4.1.2. format

5.4.1.2.1. presentation

5.4.1.2.2. or surway paper

5.4.1.3. Summarize your recommendation with relevant numbers and ratings

5.5. Social Engagement on SONO

5.5.1. Discussion 1

5.5.1.1. quote

5.5.1.1.1. "Data is not Information. Information is not Knowledge. Knowledge is not Understanding. Understanding is not Wisdom”

5.5.1.1.2. Tim Berners Lee

5.5.1.1.3. What are your thoughts on this key insight?

5.5.2. Discussion 2

5.5.2.1. Optional Q&A

5.5.2.2. optional

5.6. Submission

5.6.1. Submit single document

5.6.1.1. [email protected]

5.6.1.1.1. PDF format attached

5.6.1.2. showing your Visualization Tools survey paper or presentation

5.6.2. Deadline

5.6.2.1. Saturday Feb 22

5.6.2.2. 11:59 PM

5.6.2.3. your local time

5.6.2.4. no penalty for late submissions

5.7. Roadmap

5.7.1. http://bit.ly/1gnqHXN