Data science map

Kom i gang. Det er Gratis
eller tilmeld med din email adresse
Data science map af Mind Map: Data science map

1. data ingest/output

1.1. Excel

1.1.1. Readxl: get data out of Excel and into R http://readxl.tidyverse.org

1.1.2. Excelgesis: look inside an excel file jennybc/excelgesis

1.2. SQL

1.2.1. Using PostgreSQL in R: A quick how-to

1.2.2. https://openml.github.io/articles/slides/useR2017_tutorial/slides_tutorial.html?utm_content=buffer2efd1&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer#1

1.2.3. modeldb

1.3. Webscraping

1.3.1. Pull data from wikipedia tables: Scraping wikipedia tables

1.3.2. Extracting data from the web: http://bit.ly/2pvy8Hz

1.3.3. Scraping JS Scraping Javascript websites in R · Brooke Watson

1.3.4. Data pasta

1.3.4.1. Twitter

1.3.5. Twitter

1.3.5.1. https://rud.is/books/21-recipes/?utm_content=buffer7a82b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

1.4. Google sheets

1.4.1. Connect to Google Sheets in Power BI using R

1.5. pdfs

1.5.1. One View of the Impact of the New Immigration Ban (+ freeing PDF data with tabulizer)

1.6. APIs

1.6.1. Riingo An R interface to the Tiingo stock price API

2. Modelling

2.1. Standard classical

2.1.1. Lm/Regression

2.1.1.1. An Introduction to Statistical and Data Sciences via R

2.1.1.2. Regression in R Regression Analysis using R

2.1.1.3. Regression essentials: http://bit.ly/2u9PezE

2.1.2. marginal effects Easy peasy STATA-like marginal effects with R

2.1.3. Machine learning

2.1.3.1. Caret

2.1.3.1.1. Walkthrough by Zev Ross Predictive modeling and machine learning in R with the caret package

2.1.3.2. averaging/bagging

2.1.3.2.1. Averaging for Prediction in Econometrics and ML

2.1.3.3. interpretable machine learning

2.1.3.3.1. LIME framework http://bit.ly/2ubmnL9

2.1.3.4. Cross validation

2.1.3.4.1. An applied experiment using brain decoders - interesting way of looking at CV critically in practice. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

2.1.3.5. Introductory material

2.1.3.5.1. Lectures - 15 hours worth! In-depth introduction to machine learning in 15 hours of expert videos

2.1.3.6. Tree based methods

2.1.3.6.1. Examples:

2.1.3.7. ML cheat sheet Machine Learning Modelling in R : : Cheat Sheet

2.2. High dimensional

2.2.1. High-dimensional statistical and econometric methods for estimation and inference. http://bit.ly/2DJ35wl

2.2.2. Statistical model selection with “Big Data”: Doornik & Hendry's New Paper - Rex Analytics

2.2.3. Variable selection, big data http://bit.ly/2pvyjmd

2.2.4. curse of dimensionality: Typical Sets and the Curse of Dimensionality

2.3. ordinal

2.3.1. likert scale

2.3.1.1. Assessment of health surveys: fitting a multidimensional graded response model

2.4. forecasting

2.4.1. Principles and practice: https://otexts.org/fpp2/

2.4.2. Hierarchical

2.4.3. high dimensional forecasting

2.4.3.1. feature spaces Analysing Large Collections of Time Series

2.4.3.2. Anomaly Detection in Streaming Nonstationary Temporal Data (PDF Download Available)

2.4.4. time series R structures

2.4.4.1. tsibble and tibbletime tsibble? or tibbletime? · Earo Wang

2.4.5. Exploring the sources of uncertainty: Why does bagging for time series forecasting work?

2.5. Neural Nets/AI

2.5.1. F.X. Diebold's write up: Neural Nets, ML and AI

2.5.2. Capsule networks

2.5.2.1. overview: Understanding Hinton’s Capsule Networks. Part I: Intuition.

2.5.3. Deep learning

2.5.3.1. Deep quantile regression: https://towardsdatascience.com/deep-quantile-regression-c85481548b5a

2.6. Model selection

2.6.1. information criteria

2.6.1.1. Hannan Quinn Hannan Quinn Information Criteria - Rex Analytics

2.6.2. Machine learning or traditional econometrics? Machine Learning vs Econometric Modelling: Which One? - Rex Analytics

2.6.3. feature selection algorithms

2.6.3.1. xgboost

2.6.3.1.1. boost-a-roota (python) chasedehan/BoostARoota

2.7. Algorithms

2.7.1. What's the difference between model, estimator algorithm? Models, Estimators and Algorithms - Rex Analytics

2.7.2. Gradient Descent vs Stochastic Gradient Descent: Some Observations of Behaviour - Rex Analytics

2.8. model interpretation

2.8.1. Interpreting Models: Coefficients, Marginal Effects or Elasticities? - Rex Analytics

2.8.2. Dalex: blackbox model interpretation DALEX: which variables are really important? Ask your black box model! | SmarterPoland.pl

2.9. Continuous, Censored and Truncated Data: what are the differences and do you need to care? - Rex Analytics

2.10. Classification

2.10.1. Unbalanced classes

2.10.1.1. SMOTE

2.10.1.1.1. Sales analytics: Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders

2.11. survival

2.11.1. https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822

3. testing - statistical

3.1. hypothesis testing theory

3.1.1. not significant does not mean not important https://economics.mit.edu/files/14851

3.1.2. Does it matter in practice? Normal vs t distribution - Rex Analytics

3.1.3. explain a confound to me like I don't have a PhD in stats The Great Minds Journal Club discusses Westfall & Yarkoni (2016)

3.1.4. p- values http://bit.ly/2IHJujN

3.1.4.1. sometimes you just have to laugh Introducing the p-hacker app: Train your expert p-hacking skills

3.2. A/B testing

3.2.1. it's hard! Optimizely’s decision to ditch its free plan suggests A/B website testing is dead

3.2.2. power and effect sizes Dan Quintana on Twitter

3.2.3. Size matters: Size Matters - Rex Analytics

3.2.4. Optimal error rate

3.2.4.1. Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests

3.2.4.2. Optimal Error Rate Calculator

3.2.5. LR

3.2.5.1. https://osf.io/preprints/bitss/g3j2k/

3.3. causal inference

3.3.1. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies | American Journal of Epidemiology | Oxford Academic

3.4. ANOVA

3.4.1. singmann/afex

4. package development + testing

4.1. R packages Welcome · R packages

4.2. Code testing

4.2.1. Assert an item is unchanged on execution: Assert condition over a code block

4.2.2. checkr: https://www.theoj.org/joss-papers/joss.00624/10.21105.joss.00624.pdf

4.3. Licenses

4.3.1. Open Source Licenses Explained

5. data wrangling and data structures

5.1. basic data structures in R R for Excel users - Rex Analytics

5.2. R For data Science: R for Data Science

5.3. Advanced R: Welcome · Advanced R.

5.4. dplyr: manipulate data tidyverse/dplyr

5.4.1. Data Wrangling Part 4: Summarizing and slicing your data

5.5. tidyr: clean up data tidyverse/tidyr

5.6. explore some data structures: go build a blockchain! Simple Blockchain Example in R

5.7. data manipulation in R: Amazon.com: Data Manipulation in R (R Fundamentals Book 2) eBook: Stephanie Locke: Kindle Store

5.8. data wrangling cheat sheet http://bit.ly/1LaYWBd

5.9. Factors

5.9.1. Manipulation de facteurs avec forcats – R-atique en Francais

5.10. regex

5.10.1. https://bit.ly/2GeeWV2

5.10.2. R Regex TesterR: Regular Expressions as used in R

5.10.3. gadenbuie/regexplain

5.11. purrr

5.11.1. https://t.co/1Jw1ZWeDRb

6. analytics outputs

6.1. visualisation

6.1.1. Data Visualization

6.1.2. visual inference

6.1.2.1. Di Cook's 2017 Ihaka lecure

6.1.2.1.1. Di Cook's slides: Myth busting and apophenia in data visualisation: is what you see really there?

6.1.2.1.2. Sketch notes by Jacquie Tran: Jacquie Tran on Twitter

6.1.2.2. Charting Temporal Trends in Alteryx using sugrrants R package – Saqib Ali

6.1.3. visual vocabulary: ideas, ideas, ideas ft-interactive/chart-doctor

6.1.4. mechanics of great visualisation

6.1.4.1. custom corporate palettes in ggplot2 Creating corporate colour palettes for ggplot2 • blogR

6.1.4.2. fuzzy/jagged charts? Set the dpi: Thomas Mock on Twitter

6.1.4.3. neo4J Graphs http://bit.ly/2HVoCo9

6.1.4.4. ggplot2 book ggplot2

6.1.4.4.1. ggplot2 cheat sheet https://www.rstudio.com/wp-content/uploads/2015/08/ggplot2-cheatsheet.pdf

6.1.4.5. multiple plots on a page http://bit.ly/2u9MAKe

6.1.4.6. 5 tips for decoding visualisationhttp://bit.ly/2FZLbY2

6.1.4.7. tree mapping TreeMap with data.tree

6.1.4.8. data viz checklist http://bit.ly/2pvaYAc

6.1.4.9. raincloud plots Introducing Raincloud Plots!

6.1.4.10. https://github.com/yixuan/showtext?utm_content=bufferb8b22&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

6.1.4.11. Take a Sad Plot & Make It Better - Alison Presmanes Hill

6.1.5. visual literacy http://bit.ly/2HS6mMo

6.1.6. process map visualisations http://bit.ly/2G4vsqw

6.1.7. revisualisation Hint Events – Medium_fm/design-and-redesign-4ab77206cf9

6.1.8. animation

6.1.8.1. tweenr Pipe Tweenr

6.1.9. Fundamentals of Data Visualization

6.2. dashboards

6.2.1. starting out with R and Shiny Starting Out with R and Shiny

6.2.2. Shiny cheatsheet https://shiny.rstudio.com/images/shiny-cheatsheet.pdf

6.2.3. https://antoineguillot.wordpress.com/2017/02/21/three-r-shiny-tricks-to-make-your-shiny-app-shines-23-semi-collapsible-sidebar/?utm_content=buffer88135&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

6.3. writing/presentations

6.3.1. writing business reports, basic guide Writing business reports - Rex Analytics

6.3.2. R markdown

6.3.2.1. Writing equations? Reyn Yoshioka on Twitter

6.3.2.2. R markdown cheatsheet https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

6.3.2.3. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog

6.3.2.4. Templating

6.3.2.4.1. http://rapport-package.info/?utm_content=bufferb649b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

6.3.3. Get your data into a word document from R https://www.youtube.com/watch?v=XzGabWGo6NE&feature=youtu.be

6.3.4. Great presentations

6.3.4.1. Making Slides

6.3.5. Blogs

6.3.5.1. Blogdown tute: Building your blog using blogdown · Data and the Anthropologist

6.4. web

6.4.1. Getting Hugo running on Netlify

7. Getting started with R or data science

7.1. Where to start

7.1.1. becoming a data scientist Becoming A Data Scientist | Documenting my path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist"

7.1.2. Don't forget

7.1.2.1. Three things every new data scientist should know - Rex Analytics

7.1.2.2. Yes, you can: learn data science - Rex Analytics

7.2. troubleshooting installation R Package Install Troubleshooting

7.3. The thing is, learning R and learning an IDE are two different things: Learning to program is getting harder

7.4. R Syntax cheat sheet: http://www.science.smith.edu/~amcnamara/Syntax-cheatsheet.pdf

7.5. two minute R videos series: http://bit.ly/2pwzyRu

7.6. Where do things live in R? Where do things live in R? R for Excel Users - Rex Analytics

7.7. errors

7.7.1. decoding error messages Decoding error messages in R - Rex Analytics

7.7.2. object not found Object not found: R - Rex Analytics

7.8. basic skills resources: Some notes

7.8.1. Probability

7.8.1.1. basic probability A Primer on Basic Probability - Rex Analytics

7.8.1.2. What is a probability distribution? What is a distribution?

7.8.1.3. Probability cheat sheet http://bit.ly/21kATFL

7.8.1.4. Conditional probability http://bit.ly/2pvGL48

7.8.2. Statistics

7.8.2.1. Explore correlation and regression Exploring Correlation and the Simple Linear Regression Model - Rex Analytics

7.8.2.2. describe simple statistics Describing simple statistics - Rex Analytics

7.8.2.3. statistical significance, explained http://bit.ly/2G8Sksz

7.8.2.4. Asymptotics - the engine behind a lot of statistical results.

7.8.2.4.1. Law of large numbers vs Central Limit Theorem. In gifs Law of Large Numbers vs the Central Limit Theorem: in GIFs - Rex Analytics

7.8.2.4.2. The law of large numbers The Law of Large Numbers: It's Not the Central Limit Theorem - Rex Analytics

7.8.2.4.3. The central limit theorem http://bit.ly/2HWmGvg

7.8.2.5. Bayesian statistics: they're good people, I promise

7.8.2.5.1. Bayesian Statistics Explained in Simple English For Beginners

7.8.2.5.2. What is bayesian updating actually doing? http://bit.ly/2DLJImb

7.8.3. Experimental/sample design

7.8.3.1. Correlation vs causation Correlation vs Causation - Rex Analytics

7.8.4. explain histograms Thomas Lin Pedersen on Twitter

7.8.4.1. code An example of animating the build up of a histogram with dropping balls using tweenr, gganimate and ggplot2

7.9. Linear algebra

7.9.1. WTF are eigen values? Eigenvectors and Eigenvalues explained visually

7.9.2. Shiny app for calculating them and everything http://bit.ly/2IKlCMf

8. data hazmat (ok cleaning)

8.1. Janitor: http://sfirke.github.io/janitor/

8.2. vtreat package for a relatively automated approach R Tip: Use the vtreat Package For Data Preparation

9. troubleshooting

9.1. make a reprex! http://bit.ly/2ptJU4u

10. functional and object-oriented programming

10.1. functions http://bit.ly/2FWWJza

10.2. closures Closures in R - Rex Analytics

10.3. tidy evaluation

10.3.1. What is tidy evaluation anyway? https://www.youtube.com/watch?v=nERXS3ssntw&feature=youtu.be

10.3.2. tidy evaluation used to make dplyr-type verbs https://bit.ly/2G6tVAC

10.3.3. Tidyeval meets PDF table hell

10.4. other languages

10.4.1. C++

10.4.1.1. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog

10.5. need for speed

10.5.1. Making R Code Faster : A Case Study

11. unstructured data

11.1. text mining/NLP

11.1.1. very basic text mining/introduction, word clouds Text Mining: Word Clouds - Rex Analytics

11.1.1.1. Extending the word cloud Words in Politics: Some extensions of the word cloud « Fells Stats

11.1.2. Text Mining with R

11.1.3. automated text analysis using network methods cbail/textnets

11.1.4. create a text document matrix Text Mining - RDataMining.com: R and Data Mining

11.1.5. term frequency, tf-idf matrix Term Frequency and tf-idf Using Tidy Data Principles

11.1.6. dimensionality reduction and clustering, text data http://bit.ly/2Gbgl22

11.1.7. tokenizsers: New release: tokenizers v0.2.0

11.1.8. advanced:

11.1.8.1. latent variable model in RNNs https://yobibyte.github.io/files/paper_notes/A_Recurrent_Latent_Variable_Model_for_Sequential_Data__Chung_et_al___2016.pdf

11.1.9. hrbrmstr/misinfo

11.2. deep learning

11.2.1. a critical appraisal https://arxiv.org/pdf/1801.00631.pdf

12. data analysis

12.1. How to:

12.1.1. First contact to final decision example Ratesetter: data analysis from first contact to final interrogation - Rex Analytics

12.1.2. Starting out, first questions to ask Data Analysis: Questions to Ask the First Time - Rex Analytics

12.1.3. More questions to ask Data Analysis: More Questions - Rex Analytics

12.1.4. Enough with the questions! Data Analysis: Enough with the Questions Already - Rex Analytics

12.1.5. Reading a data analysis: Chris Riederererer on Twitter

12.1.6. Data exploration cheat sheet, R: http://bit.ly/2HWYCZd

12.1.7. Useful packages: https://bit.ly/2DSrkby

12.2. missing data

12.2.1. Naniar

12.2.1.1. Package home Data Structures, Summaries, and Visualisations for Missing Data • naniar

12.2.1.2. Package on Github njtierney/naniar

12.2.1.3. Gallery

12.2.1.4. Witchunt example Witch hunting in Europe: a discovery of missingness - Rex Analytics

12.3. Data Science Live Book

13. graph/network methods

13.1. intro to ggraph http://bit.ly/2IIZDFQ

13.2. social network analysis, how-to guide Social Network Analysis - RDataMining.com: R and Data Mining

14. Utilities

14.1. automation in R

14.1.1. Automating summary of surveys with R Markdown: Automating Summary of Surveys with RMarkdown

14.1.2. Automate processes Automate R processes

14.1.3. Scheduling R scripts and processes on Windows and Unix/Linux

14.2. vectorising

14.2.1. Jenny Bryan's tutorials: Vectors and lists

14.2.2. Functional Programming with Purrr: Thomas Mock http://bit.ly/2FTn80W

14.2.3. Mara Averick's fantastic collection: purrr-ty posts

14.3. lists

14.3.1. zeallot package - an assignment operator for unpacking lists and vectors nteetor/presentations

14.4. learning unix makes life easier: The Unix Workbench

14.4.1. what is sudo? - linux Migrating to Linux: Using Sudo

14.5. Cloud services

14.5.1. AWS

14.5.1.1. set up RStudio on AWS RStudio in the Cloud I: Amazon Web Services

14.5.2. google drive

14.5.2.1. tidyverse/googledrive

14.5.2.2. An Interface to Google Drive • googledrive

14.6. git/github

14.6.1. Github: it's worth it Happy Git and GitHub for the useR

14.6.2. Allen Downey's book on git: amgit

14.6.3. edwindj/daff

14.7. https://github.com/goldingn/default?utm_content=buffer0d8b6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

14.8. containers

14.8.1. docker

14.8.1.1. http://ropenscilabs.github.io/r-docker-tutorial/?utm_content=buffer66d72&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

15. Work flow

15.1. Hadley Wickham's Data science workflow: R for Data Science

15.2. mapping analytics objects Mapping analytics objects - Rex Analytics

15.3. Consultant's workflow A consultant's workflow - Rex Analytics

15.4. Hadley Wickham's Alfred Workflow Project workflow

15.5. managing code

15.5.1. good practices to follow: Engineering Data Science at Automattic

15.5.2. cookie cutter data science template: Cookiecutter Data Science

15.5.3. Drake: ropensci/drake

15.6. Good stat management

15.6.1. ten simple rules for effective statistical practice Ten Simple Rules for Effective Statistical Practice

15.6.2. sharing data jtleek/datasharing

15.6.3. It's not all about the values: https://www.nature.com/news/statistics-p-values-are-just-the-tip-of-the-iceberg-1.17412

15.7. Agile data science

16. GIS

17. accessibility

17.1. vision accessibility

17.1.1. On less-than-perfect-vision: Posters & Talks: Can you read me now?

17.1.2. Colourblindness Data Visualisation: Hex Codes, Pantone Colours and Accessibility - Rex Analytics

17.2. Thinking about non native English speakers: Philip Guo - Selected Publications

17.3. inclusive design: Inclusive - Microsoft Design

17.4. Accessibility in your software | iOS & VoiceOver