
1. data ingest/output
1.1. Excel
1.1.1. Readxl: get data out of Excel and into R http://readxl.tidyverse.org
1.1.2. Excelgesis: look inside an excel file jennybc/excelgesis
1.2. SQL
1.2.1. Using PostgreSQL in R: A quick how-to
1.2.2. https://openml.github.io/articles/slides/useR2017_tutorial/slides_tutorial.html?utm_content=buffer2efd1&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer#1
1.2.3. modeldb
1.3. Webscraping
1.3.1. Pull data from wikipedia tables: Scraping wikipedia tables
1.3.2. Extracting data from the web: http://bit.ly/2pvy8Hz
1.3.3. Scraping JS Scraping Javascript websites in R · Brooke Watson
1.3.4. Data pasta
1.3.4.1. Twitter
1.3.5. Twitter
1.3.5.1. https://rud.is/books/21-recipes/?utm_content=buffer7a82b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
1.4. Google sheets
1.4.1. Connect to Google Sheets in Power BI using R
1.5. pdfs
1.5.1. One View of the Impact of the New Immigration Ban (+ freeing PDF data with tabulizer)
1.6. APIs
1.6.1. Riingo An R interface to the Tiingo stock price API
2. Modelling
2.1. Standard classical
2.1.1. Lm/Regression
2.1.1.1. An Introduction to Statistical and Data Sciences via R
2.1.1.2. Regression in R Regression Analysis using R
2.1.1.3. Regression essentials: http://bit.ly/2u9PezE
2.1.2. marginal effects Easy peasy STATA-like marginal effects with R
2.1.3. Machine learning
2.1.3.1. Caret
2.1.3.1.1. Walkthrough by Zev Ross Predictive modeling and machine learning in R with the caret package
2.1.3.2. averaging/bagging
2.1.3.2.1. Averaging for Prediction in Econometrics and ML
2.1.3.3. interpretable machine learning
2.1.3.3.1. LIME framework http://bit.ly/2ubmnL9
2.1.3.4. Cross validation
2.1.3.4.1. An applied experiment using brain decoders - interesting way of looking at CV critically in practice. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines
2.1.3.5. Introductory material
2.1.3.5.1. Lectures - 15 hours worth! In-depth introduction to machine learning in 15 hours of expert videos
2.1.3.6. Tree based methods
2.1.3.6.1. Examples:
2.1.3.7. ML cheat sheet Machine Learning Modelling in R : : Cheat Sheet
2.2. High dimensional
2.2.1. High-dimensional statistical and econometric methods for estimation and inference. http://bit.ly/2DJ35wl
2.2.2. Statistical model selection with “Big Data”: Doornik & Hendry's New Paper - Rex Analytics
2.2.3. Variable selection, big data http://bit.ly/2pvyjmd
2.2.4. curse of dimensionality: Typical Sets and the Curse of Dimensionality
2.3. ordinal
2.3.1. likert scale
2.3.1.1. Assessment of health surveys: fitting a multidimensional graded response model
2.4. forecasting
2.4.1. Principles and practice: https://otexts.org/fpp2/
2.4.2. Hierarchical
2.4.3. high dimensional forecasting
2.4.3.1. feature spaces Analysing Large Collections of Time Series
2.4.3.2. Anomaly Detection in Streaming Nonstationary Temporal Data (PDF Download Available)
2.4.4. time series R structures
2.4.4.1. tsibble and tibbletime tsibble? or tibbletime? · Earo Wang
2.4.5. Exploring the sources of uncertainty: Why does bagging for time series forecasting work?
2.5. Neural Nets/AI
2.5.1. F.X. Diebold's write up: Neural Nets, ML and AI
2.5.2. Capsule networks
2.5.2.1. overview: Understanding Hinton’s Capsule Networks. Part I: Intuition.
2.5.3. Deep learning
2.5.3.1. Deep quantile regression: https://towardsdatascience.com/deep-quantile-regression-c85481548b5a
2.6. Model selection
2.6.1. information criteria
2.6.1.1. Hannan Quinn Hannan Quinn Information Criteria - Rex Analytics
2.6.2. Machine learning or traditional econometrics? Machine Learning vs Econometric Modelling: Which One? - Rex Analytics
2.6.3. feature selection algorithms
2.6.3.1. xgboost
2.6.3.1.1. boost-a-roota (python) chasedehan/BoostARoota
2.7. Algorithms
2.7.1. What's the difference between model, estimator algorithm? Models, Estimators and Algorithms - Rex Analytics
2.7.2. Gradient Descent vs Stochastic Gradient Descent: Some Observations of Behaviour - Rex Analytics
2.8. model interpretation
2.8.1. Interpreting Models: Coefficients, Marginal Effects or Elasticities? - Rex Analytics
2.8.2. Dalex: blackbox model interpretation DALEX: which variables are really important? Ask your black box model! | SmarterPoland.pl
2.9. Continuous, Censored and Truncated Data: what are the differences and do you need to care? - Rex Analytics
2.10. Classification
2.10.1. Unbalanced classes
2.10.1.1. SMOTE
2.10.1.1.1. Sales analytics: Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders
2.11. survival
2.11.1. https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822
3. testing - statistical
3.1. hypothesis testing theory
3.1.1. not significant does not mean not important https://economics.mit.edu/files/14851
3.1.2. Does it matter in practice? Normal vs t distribution - Rex Analytics
3.1.3. explain a confound to me like I don't have a PhD in stats The Great Minds Journal Club discusses Westfall & Yarkoni (2016)
3.1.4. p- values http://bit.ly/2IHJujN
3.1.4.1. sometimes you just have to laugh Introducing the p-hacker app: Train your expert p-hacking skills
3.2. A/B testing
3.2.1. it's hard! Optimizely’s decision to ditch its free plan suggests A/B website testing is dead
3.2.2. power and effect sizes Dan Quintana on Twitter
3.2.3. Size matters: Size Matters - Rex Analytics
3.2.4. Optimal error rate
3.2.4.1. Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests
3.2.4.2. Optimal Error Rate Calculator
3.2.5. LR
3.2.5.1. https://osf.io/preprints/bitss/g3j2k/
3.3. causal inference
3.3.1. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies | American Journal of Epidemiology | Oxford Academic
3.4. ANOVA
3.4.1. singmann/afex
4. package development + testing
4.1. R packages Welcome · R packages
4.2. Code testing
4.2.1. Assert an item is unchanged on execution: Assert condition over a code block
4.2.2. checkr: https://www.theoj.org/joss-papers/joss.00624/10.21105.joss.00624.pdf
4.3. Licenses
4.3.1. Open Source Licenses Explained
5. data wrangling and data structures
5.1. basic data structures in R R for Excel users - Rex Analytics
5.2. R For data Science: R for Data Science
5.3. Advanced R: Welcome · Advanced R.
5.4. dplyr: manipulate data tidyverse/dplyr
5.4.1. Data Wrangling Part 4: Summarizing and slicing your data
5.5. tidyr: clean up data tidyverse/tidyr
5.6. explore some data structures: go build a blockchain! Simple Blockchain Example in R
5.7. data manipulation in R: Amazon.com: Data Manipulation in R (R Fundamentals Book 2) eBook: Stephanie Locke: Kindle Store
5.8. data wrangling cheat sheet http://bit.ly/1LaYWBd
5.9. Factors
5.9.1. Manipulation de facteurs avec forcats – R-atique en Francais
5.10. regex
5.10.1. https://bit.ly/2GeeWV2
5.10.2. R Regex TesterR: Regular Expressions as used in R
5.10.3. gadenbuie/regexplain
5.11. purrr
5.11.1. https://t.co/1Jw1ZWeDRb
6. analytics outputs
6.1. visualisation
6.1.1. Data Visualization
6.1.2. visual inference
6.1.2.1. Di Cook's 2017 Ihaka lecure
6.1.2.1.1. Di Cook's slides: Myth busting and apophenia in data visualisation: is what you see really there?
6.1.2.1.2. Sketch notes by Jacquie Tran: Jacquie Tran on Twitter
6.1.2.2. Charting Temporal Trends in Alteryx using sugrrants R package – Saqib Ali
6.1.3. visual vocabulary: ideas, ideas, ideas ft-interactive/chart-doctor
6.1.4. mechanics of great visualisation
6.1.4.1. custom corporate palettes in ggplot2 Creating corporate colour palettes for ggplot2 • blogR
6.1.4.2. fuzzy/jagged charts? Set the dpi: Thomas Mock on Twitter
6.1.4.3. neo4J Graphs http://bit.ly/2HVoCo9
6.1.4.4. ggplot2 book ggplot2
6.1.4.4.1. ggplot2 cheat sheet https://www.rstudio.com/wp-content/uploads/2015/08/ggplot2-cheatsheet.pdf
6.1.4.5. multiple plots on a page http://bit.ly/2u9MAKe
6.1.4.6. 5 tips for decoding visualisationhttp://bit.ly/2FZLbY2
6.1.4.7. tree mapping TreeMap with data.tree
6.1.4.8. data viz checklist http://bit.ly/2pvaYAc
6.1.4.9. raincloud plots Introducing Raincloud Plots!
6.1.4.10. https://github.com/yixuan/showtext?utm_content=bufferb8b22&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
6.1.4.11. Take a Sad Plot & Make It Better - Alison Presmanes Hill
6.1.5. visual literacy http://bit.ly/2HS6mMo
6.1.6. process map visualisations http://bit.ly/2G4vsqw
6.1.7. revisualisation Hint Events – Medium_fm/design-and-redesign-4ab77206cf9
6.1.8. animation
6.1.8.1. tweenr Pipe Tweenr
6.1.9. Fundamentals of Data Visualization
6.2. dashboards
6.2.1. starting out with R and Shiny Starting Out with R and Shiny
6.2.2. Shiny cheatsheet https://shiny.rstudio.com/images/shiny-cheatsheet.pdf
6.2.3. https://antoineguillot.wordpress.com/2017/02/21/three-r-shiny-tricks-to-make-your-shiny-app-shines-23-semi-collapsible-sidebar/?utm_content=buffer88135&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
6.3. writing/presentations
6.3.1. writing business reports, basic guide Writing business reports - Rex Analytics
6.3.2. R markdown
6.3.2.1. Writing equations? Reyn Yoshioka on Twitter
6.3.2.2. R markdown cheatsheet https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
6.3.2.3. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog
6.3.2.4. Templating
6.3.2.4.1. http://rapport-package.info/?utm_content=bufferb649b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
6.3.3. Get your data into a word document from R https://www.youtube.com/watch?v=XzGabWGo6NE&feature=youtu.be
6.3.4. Great presentations
6.3.4.1. Making Slides
6.3.5. Blogs
6.3.5.1. Blogdown tute: Building your blog using blogdown · Data and the Anthropologist
6.4. web
6.4.1. Getting Hugo running on Netlify
7. Getting started with R or data science
7.1. Where to start
7.1.1. becoming a data scientist Becoming A Data Scientist | Documenting my path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist"
7.1.2. Don't forget
7.1.2.1. Three things every new data scientist should know - Rex Analytics
7.1.2.2. Yes, you can: learn data science - Rex Analytics
7.2. troubleshooting installation R Package Install Troubleshooting
7.3. The thing is, learning R and learning an IDE are two different things: Learning to program is getting harder
7.4. R Syntax cheat sheet: http://www.science.smith.edu/~amcnamara/Syntax-cheatsheet.pdf
7.5. two minute R videos series: http://bit.ly/2pwzyRu
7.6. Where do things live in R? Where do things live in R? R for Excel Users - Rex Analytics
7.7. errors
7.7.1. decoding error messages Decoding error messages in R - Rex Analytics
7.7.2. object not found Object not found: R - Rex Analytics
7.8. basic skills resources: Some notes
7.8.1. Probability
7.8.1.1. basic probability A Primer on Basic Probability - Rex Analytics
7.8.1.2. What is a probability distribution? What is a distribution?
7.8.1.3. Probability cheat sheet http://bit.ly/21kATFL
7.8.1.4. Conditional probability http://bit.ly/2pvGL48
7.8.2. Statistics
7.8.2.1. Explore correlation and regression Exploring Correlation and the Simple Linear Regression Model - Rex Analytics
7.8.2.2. describe simple statistics Describing simple statistics - Rex Analytics
7.8.2.3. statistical significance, explained http://bit.ly/2G8Sksz
7.8.2.4. Asymptotics - the engine behind a lot of statistical results.
7.8.2.4.1. Law of large numbers vs Central Limit Theorem. In gifs Law of Large Numbers vs the Central Limit Theorem: in GIFs - Rex Analytics
7.8.2.4.2. The law of large numbers The Law of Large Numbers: It's Not the Central Limit Theorem - Rex Analytics
7.8.2.4.3. The central limit theorem http://bit.ly/2HWmGvg
7.8.2.5. Bayesian statistics: they're good people, I promise
7.8.2.5.1. Bayesian Statistics Explained in Simple English For Beginners
7.8.2.5.2. What is bayesian updating actually doing? http://bit.ly/2DLJImb
7.8.3. Experimental/sample design
7.8.3.1. Correlation vs causation Correlation vs Causation - Rex Analytics
7.8.4. explain histograms Thomas Lin Pedersen on Twitter
7.8.4.1. code An example of animating the build up of a histogram with dropping balls using tweenr, gganimate and ggplot2
7.9. Linear algebra
7.9.1. WTF are eigen values? Eigenvectors and Eigenvalues explained visually
7.9.2. Shiny app for calculating them and everything http://bit.ly/2IKlCMf
8. data hazmat (ok cleaning)
8.1. Janitor: http://sfirke.github.io/janitor/
8.2. vtreat package for a relatively automated approach R Tip: Use the vtreat Package For Data Preparation
9. troubleshooting
9.1. make a reprex! http://bit.ly/2ptJU4u
10. functional and object-oriented programming
10.1. functions http://bit.ly/2FWWJza
10.2. closures Closures in R - Rex Analytics
10.3. tidy evaluation
10.3.1. What is tidy evaluation anyway? https://www.youtube.com/watch?v=nERXS3ssntw&feature=youtu.be
10.3.2. tidy evaluation used to make dplyr-type verbs https://bit.ly/2G6tVAC
10.3.3. Tidyeval meets PDF table hell
10.4. other languages
10.4.1. C++
10.4.1.1. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog
10.5. need for speed
10.5.1. Making R Code Faster : A Case Study
11. unstructured data
11.1. text mining/NLP
11.1.1. very basic text mining/introduction, word clouds Text Mining: Word Clouds - Rex Analytics
11.1.1.1. Extending the word cloud Words in Politics: Some extensions of the word cloud « Fells Stats
11.1.2. Text Mining with R
11.1.3. automated text analysis using network methods cbail/textnets
11.1.4. create a text document matrix Text Mining - RDataMining.com: R and Data Mining
11.1.5. term frequency, tf-idf matrix Term Frequency and tf-idf Using Tidy Data Principles
11.1.6. dimensionality reduction and clustering, text data http://bit.ly/2Gbgl22
11.1.7. tokenizsers: New release: tokenizers v0.2.0
11.1.8. advanced:
11.1.8.1. latent variable model in RNNs https://yobibyte.github.io/files/paper_notes/A_Recurrent_Latent_Variable_Model_for_Sequential_Data__Chung_et_al___2016.pdf
11.1.9. hrbrmstr/misinfo
11.2. deep learning
11.2.1. a critical appraisal https://arxiv.org/pdf/1801.00631.pdf
12. data analysis
12.1. How to:
12.1.1. First contact to final decision example Ratesetter: data analysis from first contact to final interrogation - Rex Analytics
12.1.2. Starting out, first questions to ask Data Analysis: Questions to Ask the First Time - Rex Analytics
12.1.3. More questions to ask Data Analysis: More Questions - Rex Analytics
12.1.4. Enough with the questions! Data Analysis: Enough with the Questions Already - Rex Analytics
12.1.5. Reading a data analysis: Chris Riederererer on Twitter
12.1.6. Data exploration cheat sheet, R: http://bit.ly/2HWYCZd
12.1.7. Useful packages: https://bit.ly/2DSrkby
12.2. missing data
12.2.1. Naniar
12.2.1.1. Package home Data Structures, Summaries, and Visualisations for Missing Data • naniar
12.2.1.2. Package on Github njtierney/naniar
12.2.1.3. Gallery
12.2.1.4. Witchunt example Witch hunting in Europe: a discovery of missingness - Rex Analytics
12.3. Data Science Live Book
13. graph/network methods
13.1. intro to ggraph http://bit.ly/2IIZDFQ
13.2. social network analysis, how-to guide Social Network Analysis - RDataMining.com: R and Data Mining
14. Utilities
14.1. automation in R
14.1.1. Automating summary of surveys with R Markdown: Automating Summary of Surveys with RMarkdown
14.1.2. Automate processes Automate R processes
14.1.3. Scheduling R scripts and processes on Windows and Unix/Linux
14.2. vectorising
14.2.1. Jenny Bryan's tutorials: Vectors and lists
14.2.2. Functional Programming with Purrr: Thomas Mock http://bit.ly/2FTn80W
14.2.3. Mara Averick's fantastic collection: purrr-ty posts
14.3. lists
14.3.1. zeallot package - an assignment operator for unpacking lists and vectors nteetor/presentations
14.4. learning unix makes life easier: The Unix Workbench
14.4.1. what is sudo? - linux Migrating to Linux: Using Sudo
14.5. Cloud services
14.5.1. AWS
14.5.1.1. set up RStudio on AWS RStudio in the Cloud I: Amazon Web Services
14.5.2. google drive
14.5.2.1. tidyverse/googledrive
14.5.2.2. An Interface to Google Drive • googledrive
14.6. git/github
14.6.1. Github: it's worth it Happy Git and GitHub for the useR
14.6.2. Allen Downey's book on git: amgit
14.6.3. edwindj/daff
14.7. https://github.com/goldingn/default?utm_content=buffer0d8b6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
14.8. containers
14.8.1. docker
14.8.1.1. http://ropenscilabs.github.io/r-docker-tutorial/?utm_content=buffer66d72&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
15. Work flow
15.1. Hadley Wickham's Data science workflow: R for Data Science
15.2. mapping analytics objects Mapping analytics objects - Rex Analytics
15.3. Consultant's workflow A consultant's workflow - Rex Analytics
15.4. Hadley Wickham's Alfred Workflow Project workflow
15.5. managing code
15.5.1. good practices to follow: Engineering Data Science at Automattic
15.5.2. cookie cutter data science template: Cookiecutter Data Science
15.5.3. Drake: ropensci/drake
15.6. Good stat management
15.6.1. ten simple rules for effective statistical practice Ten Simple Rules for Effective Statistical Practice
15.6.2. sharing data jtleek/datasharing
15.6.3. It's not all about the values: https://www.nature.com/news/statistics-p-values-are-just-the-tip-of-the-iceberg-1.17412
15.7. Agile data science
16. GIS
17. accessibility
17.1. vision accessibility
17.1.1. On less-than-perfect-vision: Posters & Talks: Can you read me now?
17.1.2. Colourblindness Data Visualisation: Hex Codes, Pantone Colours and Accessibility - Rex Analytics