Data science map

Get Started. It's Free
or sign up with your email address
Data science map by Mind Map: Data science map

1. data ingest/output

1.1. Excel

1.1.1. Readxl: get data out of Excel and into R http://readxl.tidyverse.org

1.1.2. Excelgesis: look inside an excel file jennybc/excelgesis

1.2. SQL

1.2.1. Using PostgreSQL in R: A quick how-to

1.2.2. https://openml.github.io/articles/slides/useR2017_tutorial/slides_tutorial.html?utm_content=buffer2efd1&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer#1

1.2.3. modeldb

1.3. Webscraping

1.3.1. Pull data from wikipedia tables: Scraping wikipedia tables

1.3.2. Extracting data from the web: http://bit.ly/2pvy8Hz

1.3.3. Scraping JS Scraping Javascript websites in R · Brooke Watson

1.3.4. Data pasta

1.3.4.1. Twitter

1.3.5. Twitter

1.3.5.1. https://rud.is/books/21-recipes/?utm_content=buffer7a82b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

1.4. Google sheets

1.4.1. Connect to Google Sheets in Power BI using R

1.5. pdfs

1.5.1. One View of the Impact of the New Immigration Ban (+ freeing PDF data with tabulizer)

1.6. APIs

1.6.1. Riingo An R interface to the Tiingo stock price API

2. package development + testing

2.1. R packages Welcome · R packages

2.2. Code testing

2.2.1. Assert an item is unchanged on execution: Assert condition over a code block

2.2.2. checkr: https://www.theoj.org/joss-papers/joss.00624/10.21105.joss.00624.pdf

2.3. Licenses

2.3.1. Open Source Licenses Explained

3. data wrangling and data structures

3.1. basic data structures in R R for Excel users - Rex Analytics

3.2. R For data Science: R for Data Science

3.3. Advanced R: Welcome · Advanced R.

3.4. dplyr: manipulate data tidyverse/dplyr

3.4.1. Data Wrangling Part 4: Summarizing and slicing your data

3.5. tidyr: clean up data tidyverse/tidyr

3.6. explore some data structures: go build a blockchain! Simple Blockchain Example in R

3.7. data manipulation in R: Amazon.com: Data Manipulation in R (R Fundamentals Book 2) eBook: Stephanie Locke: Kindle Store

3.8. data wrangling cheat sheet http://bit.ly/1LaYWBd

3.9. Factors

3.9.1. Manipulation de facteurs avec forcats – R-atique en Francais

3.10. regex

3.10.1. https://bit.ly/2GeeWV2

3.10.2. R Regex TesterR: Regular Expressions as used in R

3.10.3. gadenbuie/regexplain

3.11. purrr

3.11.1. https://t.co/1Jw1ZWeDRb

4. Getting started with R or data science

4.1. Where to start

4.1.1. becoming a data scientist Becoming A Data Scientist | Documenting my path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist"

4.1.2. Don't forget

4.1.2.1. Three things every new data scientist should know - Rex Analytics

4.1.2.2. Yes, you can: learn data science - Rex Analytics

4.2. troubleshooting installation R Package Install Troubleshooting

4.3. The thing is, learning R and learning an IDE are two different things: Learning to program is getting harder

4.4. R Syntax cheat sheet: http://www.science.smith.edu/~amcnamara/Syntax-cheatsheet.pdf

4.5. two minute R videos series: http://bit.ly/2pwzyRu

4.6. Where do things live in R? Where do things live in R? R for Excel Users - Rex Analytics

4.7. errors

4.7.1. decoding error messages Decoding error messages in R - Rex Analytics

4.7.2. object not found Object not found: R - Rex Analytics

4.8. basic skills resources: Some notes

4.8.1. Probability

4.8.1.1. basic probability A Primer on Basic Probability - Rex Analytics

4.8.1.2. What is a probability distribution? What is a distribution?

4.8.1.3. Probability cheat sheet http://bit.ly/21kATFL

4.8.1.4. Conditional probability http://bit.ly/2pvGL48

4.8.2. Statistics

4.8.2.1. Explore correlation and regression Exploring Correlation and the Simple Linear Regression Model - Rex Analytics

4.8.2.2. describe simple statistics Describing simple statistics - Rex Analytics

4.8.2.3. statistical significance, explained http://bit.ly/2G8Sksz

4.8.2.4. Asymptotics - the engine behind a lot of statistical results.

4.8.2.4.1. Law of large numbers vs Central Limit Theorem. In gifs Law of Large Numbers vs the Central Limit Theorem: in GIFs - Rex Analytics

4.8.2.4.2. The law of large numbers The Law of Large Numbers: It's Not the Central Limit Theorem - Rex Analytics

4.8.2.4.3. The central limit theorem http://bit.ly/2HWmGvg

4.8.2.5. Bayesian statistics: they're good people, I promise

4.8.2.5.1. Bayesian Statistics Explained in Simple English For Beginners

4.8.2.5.2. What is bayesian updating actually doing? http://bit.ly/2DLJImb

4.8.3. Experimental/sample design

4.8.3.1. Correlation vs causation Correlation vs Causation - Rex Analytics

4.8.4. explain histograms Thomas Lin Pedersen on Twitter

4.8.4.1. code An example of animating the build up of a histogram with dropping balls using tweenr, gganimate and ggplot2

4.9. Linear algebra

4.9.1. WTF are eigen values? Eigenvectors and Eigenvalues explained visually

4.9.2. Shiny app for calculating them and everything http://bit.ly/2IKlCMf

5. data hazmat (ok cleaning)

5.1. Janitor: http://sfirke.github.io/janitor/

5.2. vtreat package for a relatively automated approach R Tip: Use the vtreat Package For Data Preparation

6. troubleshooting

6.1. make a reprex! http://bit.ly/2ptJU4u

7. functional and object-oriented programming

7.1. functions http://bit.ly/2FWWJza

7.2. closures Closures in R - Rex Analytics

7.3. tidy evaluation

7.3.1. What is tidy evaluation anyway? https://www.youtube.com/watch?v=nERXS3ssntw&feature=youtu.be

7.3.2. tidy evaluation used to make dplyr-type verbs https://bit.ly/2G6tVAC

7.3.3. Tidyeval meets PDF table hell

7.4. other languages

7.4.1. C++

7.4.1.1. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog

7.5. need for speed

7.5.1. Making R Code Faster : A Case Study

8. graph/network methods

8.1. intro to ggraph http://bit.ly/2IIZDFQ

8.2. social network analysis, how-to guide Social Network Analysis - RDataMining.com: R and Data Mining

9. Utilities

9.1. automation in R

9.1.1. Automating summary of surveys with R Markdown: Automating Summary of Surveys with RMarkdown

9.1.2. Automate processes Automate R processes

9.1.3. Scheduling R scripts and processes on Windows and Unix/Linux

9.2. vectorising

9.2.1. Jenny Bryan's tutorials: Vectors and lists

9.2.2. Functional Programming with Purrr: Thomas Mock http://bit.ly/2FTn80W

9.2.3. Mara Averick's fantastic collection: purrr-ty posts

9.3. lists

9.3.1. zeallot package - an assignment operator for unpacking lists and vectors nteetor/presentations

9.4. learning unix makes life easier: The Unix Workbench

9.4.1. what is sudo? - linux Migrating to Linux: Using Sudo

9.5. Cloud services

9.5.1. AWS

9.5.1.1. set up RStudio on AWS RStudio in the Cloud I: Amazon Web Services

9.5.2. google drive

9.5.2.1. tidyverse/googledrive

9.5.2.2. An Interface to Google Drive • googledrive

9.6. git/github

9.6.1. Github: it's worth it Happy Git and GitHub for the useR

9.6.2. Allen Downey's book on git: amgit

9.6.3. edwindj/daff

9.7. https://github.com/goldingn/default?utm_content=buffer0d8b6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

9.8. containers

9.8.1. docker

9.8.1.1. http://ropenscilabs.github.io/r-docker-tutorial/?utm_content=buffer66d72&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

10. Work flow

10.1. Hadley Wickham's Data science workflow: R for Data Science

10.2. mapping analytics objects Mapping analytics objects - Rex Analytics

10.3. Consultant's workflow A consultant's workflow - Rex Analytics

10.4. Hadley Wickham's Alfred Workflow Project workflow

10.5. managing code

10.5.1. good practices to follow: Engineering Data Science at Automattic

10.5.2. cookie cutter data science template: Cookiecutter Data Science

10.5.3. Drake: ropensci/drake

10.6. Good stat management

10.6.1. ten simple rules for effective statistical practice Ten Simple Rules for Effective Statistical Practice

10.6.2. sharing data jtleek/datasharing

10.6.3. It's not all about the values: https://www.nature.com/news/statistics-p-values-are-just-the-tip-of-the-iceberg-1.17412

10.7. Agile data science

11. accessibility

11.1. vision accessibility

11.1.1. On less-than-perfect-vision: Posters & Talks: Can you read me now?

11.1.2. Colourblindness Data Visualisation: Hex Codes, Pantone Colours and Accessibility - Rex Analytics

11.2. Thinking about non native English speakers: Philip Guo - Selected Publications

11.3. inclusive design: Inclusive - Microsoft Design

11.4. Accessibility in your software | iOS & VoiceOver

12. Modelling

12.1. Standard classical

12.1.1. Lm/Regression

12.1.1.1. An Introduction to Statistical and Data Sciences via R

12.1.1.2. Regression in R Regression Analysis using R

12.1.1.3. Regression essentials: http://bit.ly/2u9PezE

12.1.2. marginal effects Easy peasy STATA-like marginal effects with R

12.1.3. Machine learning

12.1.3.1. Caret

12.1.3.1.1. Walkthrough by Zev Ross Predictive modeling and machine learning in R with the caret package

12.1.3.2. averaging/bagging

12.1.3.2.1. Averaging for Prediction in Econometrics and ML

12.1.3.3. interpretable machine learning

12.1.3.3.1. LIME framework http://bit.ly/2ubmnL9

12.1.3.4. Cross validation

12.1.3.4.1. An applied experiment using brain decoders - interesting way of looking at CV critically in practice. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

12.1.3.5. Introductory material

12.1.3.5.1. Lectures - 15 hours worth! In-depth introduction to machine learning in 15 hours of expert videos

12.1.3.6. Tree based methods

12.1.3.6.1. Examples:

12.1.3.7. ML cheat sheet Machine Learning Modelling in R : : Cheat Sheet

12.2. High dimensional

12.2.1. High-dimensional statistical and econometric methods for estimation and inference. http://bit.ly/2DJ35wl

12.2.2. Statistical model selection with “Big Data”: Doornik & Hendry's New Paper - Rex Analytics

12.2.3. Variable selection, big data http://bit.ly/2pvyjmd

12.2.4. curse of dimensionality: Typical Sets and the Curse of Dimensionality

12.3. ordinal

12.3.1. likert scale

12.3.1.1. Assessment of health surveys: fitting a multidimensional graded response model

12.4. forecasting

12.4.1. Principles and practice: https://otexts.org/fpp2/

12.4.2. Hierarchical

12.4.3. high dimensional forecasting

12.4.3.1. feature spaces Analysing Large Collections of Time Series

12.4.3.2. Anomaly Detection in Streaming Nonstationary Temporal Data (PDF Download Available)

12.4.4. time series R structures

12.4.4.1. tsibble and tibbletime tsibble? or tibbletime? · Earo Wang

12.4.5. Exploring the sources of uncertainty: Why does bagging for time series forecasting work?

12.5. Neural Nets/AI

12.5.1. F.X. Diebold's write up: Neural Nets, ML and AI

12.5.2. Capsule networks

12.5.2.1. overview: Understanding Hinton’s Capsule Networks. Part I: Intuition.

12.5.3. Deep learning

12.5.3.1. Deep quantile regression: https://towardsdatascience.com/deep-quantile-regression-c85481548b5a

12.6. Model selection

12.6.1. information criteria

12.6.1.1. Hannan Quinn Hannan Quinn Information Criteria - Rex Analytics

12.6.2. Machine learning or traditional econometrics? Machine Learning vs Econometric Modelling: Which One? - Rex Analytics

12.6.3. feature selection algorithms

12.6.3.1. xgboost

12.6.3.1.1. boost-a-roota (python) chasedehan/BoostARoota

12.7. Algorithms

12.7.1. What's the difference between model, estimator algorithm? Models, Estimators and Algorithms - Rex Analytics

12.7.2. Gradient Descent vs Stochastic Gradient Descent: Some Observations of Behaviour - Rex Analytics

12.8. model interpretation

12.8.1. Interpreting Models: Coefficients, Marginal Effects or Elasticities? - Rex Analytics

12.8.2. Dalex: blackbox model interpretation DALEX: which variables are really important? Ask your black box model! | SmarterPoland.pl

12.9. Continuous, Censored and Truncated Data: what are the differences and do you need to care? - Rex Analytics

12.10. Classification

12.10.1. Unbalanced classes

12.10.1.1. SMOTE

12.10.1.1.1. Sales analytics: Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders

12.11. survival

12.11.1. https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822

13. testing - statistical

13.1. hypothesis testing theory

13.1.1. not significant does not mean not important https://economics.mit.edu/files/14851

13.1.2. Does it matter in practice? Normal vs t distribution - Rex Analytics

13.1.3. explain a confound to me like I don't have a PhD in stats The Great Minds Journal Club discusses Westfall & Yarkoni (2016)

13.1.4. p- values http://bit.ly/2IHJujN

13.1.4.1. sometimes you just have to laugh Introducing the p-hacker app: Train your expert p-hacking skills

13.2. A/B testing

13.2.1. it's hard! Optimizely’s decision to ditch its free plan suggests A/B website testing is dead

13.2.2. power and effect sizes Dan Quintana on Twitter

13.2.3. Size matters: Size Matters - Rex Analytics

13.2.4. Optimal error rate

13.2.4.1. Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests

13.2.4.2. Optimal Error Rate Calculator

13.2.5. LR

13.2.5.1. https://osf.io/preprints/bitss/g3j2k/

13.3. causal inference

13.3.1. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies | American Journal of Epidemiology | Oxford Academic

13.4. ANOVA

13.4.1. singmann/afex

14. analytics outputs

14.1. visualisation

14.1.1. Data Visualization

14.1.2. visual inference

14.1.2.1. Di Cook's 2017 Ihaka lecure

14.1.2.1.1. Di Cook's slides: Myth busting and apophenia in data visualisation: is what you see really there?

14.1.2.1.2. Sketch notes by Jacquie Tran: Jacquie Tran on Twitter

14.1.2.2. Charting Temporal Trends in Alteryx using sugrrants R package – Saqib Ali

14.1.3. visual vocabulary: ideas, ideas, ideas ft-interactive/chart-doctor

14.1.4. mechanics of great visualisation

14.1.4.1. custom corporate palettes in ggplot2 Creating corporate colour palettes for ggplot2 • blogR

14.1.4.2. fuzzy/jagged charts? Set the dpi: Thomas Mock on Twitter

14.1.4.3. neo4J Graphs http://bit.ly/2HVoCo9

14.1.4.4. ggplot2 book ggplot2

14.1.4.4.1. ggplot2 cheat sheet https://www.rstudio.com/wp-content/uploads/2015/08/ggplot2-cheatsheet.pdf

14.1.4.5. multiple plots on a page http://bit.ly/2u9MAKe

14.1.4.6. 5 tips for decoding visualisationhttp://bit.ly/2FZLbY2

14.1.4.7. tree mapping TreeMap with data.tree

14.1.4.8. data viz checklist http://bit.ly/2pvaYAc

14.1.4.9. raincloud plots Introducing Raincloud Plots!

14.1.4.10. https://github.com/yixuan/showtext?utm_content=bufferb8b22&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

14.1.4.11. Take a Sad Plot & Make It Better - Alison Presmanes Hill

14.1.5. visual literacy http://bit.ly/2HS6mMo

14.1.6. process map visualisations http://bit.ly/2G4vsqw

14.1.7. revisualisation Hint Events – Medium_fm/design-and-redesign-4ab77206cf9

14.1.8. animation

14.1.8.1. tweenr Pipe Tweenr

14.1.9. Fundamentals of Data Visualization

14.2. dashboards

14.2.1. starting out with R and Shiny Starting Out with R and Shiny

14.2.2. Shiny cheatsheet https://shiny.rstudio.com/images/shiny-cheatsheet.pdf

14.2.3. https://antoineguillot.wordpress.com/2017/02/21/three-r-shiny-tricks-to-make-your-shiny-app-shines-23-semi-collapsible-sidebar/?utm_content=buffer88135&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

14.3. writing/presentations

14.3.1. writing business reports, basic guide Writing business reports - Rex Analytics

14.3.2. R markdown

14.3.2.1. Writing equations? Reyn Yoshioka on Twitter

14.3.2.2. R markdown cheatsheet https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

14.3.2.3. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog

14.3.2.4. Templating

14.3.2.4.1. http://rapport-package.info/?utm_content=bufferb649b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

14.3.3. Get your data into a word document from R https://www.youtube.com/watch?v=XzGabWGo6NE&feature=youtu.be

14.3.4. Great presentations

14.3.4.1. Making Slides

14.3.5. Blogs

14.3.5.1. Blogdown tute: Building your blog using blogdown · Data and the Anthropologist

14.4. web

14.4.1. Getting Hugo running on Netlify

15. unstructured data

15.1. text mining/NLP

15.1.1. very basic text mining/introduction, word clouds Text Mining: Word Clouds - Rex Analytics

15.1.1.1. Extending the word cloud Words in Politics: Some extensions of the word cloud « Fells Stats

15.1.2. Text Mining with R

15.1.3. automated text analysis using network methods cbail/textnets

15.1.4. create a text document matrix Text Mining - RDataMining.com: R and Data Mining

15.1.5. term frequency, tf-idf matrix Term Frequency and tf-idf Using Tidy Data Principles

15.1.6. dimensionality reduction and clustering, text data http://bit.ly/2Gbgl22

15.1.7. tokenizsers: New release: tokenizers v0.2.0

15.1.8. advanced:

15.1.8.1. latent variable model in RNNs https://yobibyte.github.io/files/paper_notes/A_Recurrent_Latent_Variable_Model_for_Sequential_Data__Chung_et_al___2016.pdf

15.1.9. hrbrmstr/misinfo

15.2. deep learning

15.2.1. a critical appraisal https://arxiv.org/pdf/1801.00631.pdf

16. data analysis

16.1. How to:

16.1.1. First contact to final decision example Ratesetter: data analysis from first contact to final interrogation - Rex Analytics

16.1.2. Starting out, first questions to ask Data Analysis: Questions to Ask the First Time - Rex Analytics

16.1.3. More questions to ask Data Analysis: More Questions - Rex Analytics

16.1.4. Enough with the questions! Data Analysis: Enough with the Questions Already - Rex Analytics

16.1.5. Reading a data analysis: Chris Riederererer on Twitter

16.1.6. Data exploration cheat sheet, R: http://bit.ly/2HWYCZd

16.1.7. Useful packages: https://bit.ly/2DSrkby

16.2. missing data

16.2.1. Naniar

16.2.1.1. Package home Data Structures, Summaries, and Visualisations for Missing Data • naniar

16.2.1.2. Package on Github njtierney/naniar

16.2.1.3. Gallery

16.2.1.4. Witchunt example Witch hunting in Europe: a discovery of missingness - Rex Analytics

16.3. Data Science Live Book

17. GIS