
1. data ingest/output
1.1. Excel
1.1.1. Readxl: get data out of Excel and into R http://readxl.tidyverse.org
1.1.2. Excelgesis: look inside an excel file jennybc/excelgesis
1.2. SQL
1.2.1. Using PostgreSQL in R: A quick how-to
1.2.2. https://openml.github.io/articles/slides/useR2017_tutorial/slides_tutorial.html?utm_content=buffer2efd1&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer#1
1.2.3. modeldb
1.3. Webscraping
1.3.1. Pull data from wikipedia tables: Scraping wikipedia tables
1.3.2. Extracting data from the web: http://bit.ly/2pvy8Hz
1.3.3. Scraping JS Scraping Javascript websites in R · Brooke Watson
1.3.4. Data pasta
1.3.4.1. Twitter
1.3.5. Twitter
1.3.5.1. https://rud.is/books/21-recipes/?utm_content=buffer7a82b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
1.4. Google sheets
1.4.1. Connect to Google Sheets in Power BI using R
1.5. pdfs
1.5.1. One View of the Impact of the New Immigration Ban (+ freeing PDF data with tabulizer)
1.6. APIs
1.6.1. Riingo An R interface to the Tiingo stock price API
2. package development + testing
2.1. R packages Welcome · R packages
2.2. Code testing
2.2.1. Assert an item is unchanged on execution: Assert condition over a code block
2.2.2. checkr: https://www.theoj.org/joss-papers/joss.00624/10.21105.joss.00624.pdf
2.3. Licenses
2.3.1. Open Source Licenses Explained
3. data wrangling and data structures
3.1. basic data structures in R R for Excel users - Rex Analytics
3.2. R For data Science: R for Data Science
3.3. Advanced R: Welcome · Advanced R.
3.4. dplyr: manipulate data tidyverse/dplyr
3.4.1. Data Wrangling Part 4: Summarizing and slicing your data
3.5. tidyr: clean up data tidyverse/tidyr
3.6. explore some data structures: go build a blockchain! Simple Blockchain Example in R
3.7. data manipulation in R: Amazon.com: Data Manipulation in R (R Fundamentals Book 2) eBook: Stephanie Locke: Kindle Store
3.8. data wrangling cheat sheet http://bit.ly/1LaYWBd
3.9. Factors
3.9.1. Manipulation de facteurs avec forcats – R-atique en Francais
3.10. regex
3.10.1. https://bit.ly/2GeeWV2
3.10.2. R Regex TesterR: Regular Expressions as used in R
3.10.3. gadenbuie/regexplain
3.11. purrr
3.11.1. https://t.co/1Jw1ZWeDRb
4. Getting started with R or data science
4.1. Where to start
4.1.1. becoming a data scientist Becoming A Data Scientist | Documenting my path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist"
4.1.2. Don't forget
4.1.2.1. Three things every new data scientist should know - Rex Analytics
4.1.2.2. Yes, you can: learn data science - Rex Analytics
4.2. troubleshooting installation R Package Install Troubleshooting
4.3. The thing is, learning R and learning an IDE are two different things: Learning to program is getting harder
4.4. R Syntax cheat sheet: http://www.science.smith.edu/~amcnamara/Syntax-cheatsheet.pdf
4.5. two minute R videos series: http://bit.ly/2pwzyRu
4.6. Where do things live in R? Where do things live in R? R for Excel Users - Rex Analytics
4.7. errors
4.7.1. decoding error messages Decoding error messages in R - Rex Analytics
4.7.2. object not found Object not found: R - Rex Analytics
4.8. basic skills resources: Some notes
4.8.1. Probability
4.8.1.1. basic probability A Primer on Basic Probability - Rex Analytics
4.8.1.2. What is a probability distribution? What is a distribution?
4.8.1.3. Probability cheat sheet http://bit.ly/21kATFL
4.8.1.4. Conditional probability http://bit.ly/2pvGL48
4.8.2. Statistics
4.8.2.1. Explore correlation and regression Exploring Correlation and the Simple Linear Regression Model - Rex Analytics
4.8.2.2. describe simple statistics Describing simple statistics - Rex Analytics
4.8.2.3. statistical significance, explained http://bit.ly/2G8Sksz
4.8.2.4. Asymptotics - the engine behind a lot of statistical results.
4.8.2.4.1. Law of large numbers vs Central Limit Theorem. In gifs Law of Large Numbers vs the Central Limit Theorem: in GIFs - Rex Analytics
4.8.2.4.2. The law of large numbers The Law of Large Numbers: It's Not the Central Limit Theorem - Rex Analytics
4.8.2.4.3. The central limit theorem http://bit.ly/2HWmGvg
4.8.2.5. Bayesian statistics: they're good people, I promise
4.8.2.5.1. Bayesian Statistics Explained in Simple English For Beginners
4.8.2.5.2. What is bayesian updating actually doing? http://bit.ly/2DLJImb
4.8.3. Experimental/sample design
4.8.3.1. Correlation vs causation Correlation vs Causation - Rex Analytics
4.8.4. explain histograms Thomas Lin Pedersen on Twitter
4.8.4.1. code An example of animating the build up of a histogram with dropping balls using tweenr, gganimate and ggplot2
4.9. Linear algebra
4.9.1. WTF are eigen values? Eigenvectors and Eigenvalues explained visually
4.9.2. Shiny app for calculating them and everything http://bit.ly/2IKlCMf
5. data hazmat (ok cleaning)
5.1. Janitor: http://sfirke.github.io/janitor/
5.2. vtreat package for a relatively automated approach R Tip: Use the vtreat Package For Data Preparation
6. troubleshooting
6.1. make a reprex! http://bit.ly/2ptJU4u
7. functional and object-oriented programming
7.1. functions http://bit.ly/2FWWJza
7.2. closures Closures in R - Rex Analytics
7.3. tidy evaluation
7.3.1. What is tidy evaluation anyway? https://www.youtube.com/watch?v=nERXS3ssntw&feature=youtu.be
7.3.2. tidy evaluation used to make dplyr-type verbs https://bit.ly/2G6tVAC
7.3.3. Tidyeval meets PDF table hell
7.4. other languages
7.4.1. C++
7.4.1.1. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog
7.5. need for speed
7.5.1. Making R Code Faster : A Case Study
8. graph/network methods
8.1. intro to ggraph http://bit.ly/2IIZDFQ
8.2. social network analysis, how-to guide Social Network Analysis - RDataMining.com: R and Data Mining
9. Utilities
9.1. automation in R
9.1.1. Automating summary of surveys with R Markdown: Automating Summary of Surveys with RMarkdown
9.1.2. Automate processes Automate R processes
9.1.3. Scheduling R scripts and processes on Windows and Unix/Linux
9.2. vectorising
9.2.1. Jenny Bryan's tutorials: Vectors and lists
9.2.2. Functional Programming with Purrr: Thomas Mock http://bit.ly/2FTn80W
9.2.3. Mara Averick's fantastic collection: purrr-ty posts
9.3. lists
9.3.1. zeallot package - an assignment operator for unpacking lists and vectors nteetor/presentations
9.4. learning unix makes life easier: The Unix Workbench
9.4.1. what is sudo? - linux Migrating to Linux: Using Sudo
9.5. Cloud services
9.5.1. AWS
9.5.1.1. set up RStudio on AWS RStudio in the Cloud I: Amazon Web Services
9.5.2. google drive
9.5.2.1. tidyverse/googledrive
9.5.2.2. An Interface to Google Drive • googledrive
9.6. git/github
9.6.1. Github: it's worth it Happy Git and GitHub for the useR
9.6.2. Allen Downey's book on git: amgit
9.6.3. edwindj/daff
9.7. https://github.com/goldingn/default?utm_content=buffer0d8b6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
9.8. containers
9.8.1. docker
9.8.1.1. http://ropenscilabs.github.io/r-docker-tutorial/?utm_content=buffer66d72&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
10. Work flow
10.1. Hadley Wickham's Data science workflow: R for Data Science
10.2. mapping analytics objects Mapping analytics objects - Rex Analytics
10.3. Consultant's workflow A consultant's workflow - Rex Analytics
10.4. Hadley Wickham's Alfred Workflow Project workflow
10.5. managing code
10.5.1. good practices to follow: Engineering Data Science at Automattic
10.5.2. cookie cutter data science template: Cookiecutter Data Science
10.5.3. Drake: ropensci/drake
10.6. Good stat management
10.6.1. ten simple rules for effective statistical practice Ten Simple Rules for Effective Statistical Practice
10.6.2. sharing data jtleek/datasharing
10.6.3. It's not all about the values: https://www.nature.com/news/statistics-p-values-are-just-the-tip-of-the-iceberg-1.17412
10.7. Agile data science
11. accessibility
11.1. vision accessibility
11.1.1. On less-than-perfect-vision: Posters & Talks: Can you read me now?
11.1.2. Colourblindness Data Visualisation: Hex Codes, Pantone Colours and Accessibility - Rex Analytics
11.2. Thinking about non native English speakers: Philip Guo - Selected Publications
11.3. inclusive design: Inclusive - Microsoft Design
11.4. Accessibility in your software | iOS & VoiceOver
12. Modelling
12.1. Standard classical
12.1.1. Lm/Regression
12.1.1.1. An Introduction to Statistical and Data Sciences via R
12.1.1.2. Regression in R Regression Analysis using R
12.1.1.3. Regression essentials: http://bit.ly/2u9PezE
12.1.2. marginal effects Easy peasy STATA-like marginal effects with R
12.1.3. Machine learning
12.1.3.1. Caret
12.1.3.1.1. Walkthrough by Zev Ross Predictive modeling and machine learning in R with the caret package
12.1.3.2. averaging/bagging
12.1.3.2.1. Averaging for Prediction in Econometrics and ML
12.1.3.3. interpretable machine learning
12.1.3.3.1. LIME framework http://bit.ly/2ubmnL9
12.1.3.4. Cross validation
12.1.3.4.1. An applied experiment using brain decoders - interesting way of looking at CV critically in practice. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines
12.1.3.5. Introductory material
12.1.3.5.1. Lectures - 15 hours worth! In-depth introduction to machine learning in 15 hours of expert videos
12.1.3.6. Tree based methods
12.1.3.6.1. Examples:
12.1.3.7. ML cheat sheet Machine Learning Modelling in R : : Cheat Sheet
12.2. High dimensional
12.2.1. High-dimensional statistical and econometric methods for estimation and inference. http://bit.ly/2DJ35wl
12.2.2. Statistical model selection with “Big Data”: Doornik & Hendry's New Paper - Rex Analytics
12.2.3. Variable selection, big data http://bit.ly/2pvyjmd
12.2.4. curse of dimensionality: Typical Sets and the Curse of Dimensionality
12.3. ordinal
12.3.1. likert scale
12.3.1.1. Assessment of health surveys: fitting a multidimensional graded response model
12.4. forecasting
12.4.1. Principles and practice: https://otexts.org/fpp2/
12.4.2. Hierarchical
12.4.3. high dimensional forecasting
12.4.3.1. feature spaces Analysing Large Collections of Time Series
12.4.3.2. Anomaly Detection in Streaming Nonstationary Temporal Data (PDF Download Available)
12.4.4. time series R structures
12.4.4.1. tsibble and tibbletime tsibble? or tibbletime? · Earo Wang
12.4.5. Exploring the sources of uncertainty: Why does bagging for time series forecasting work?
12.5. Neural Nets/AI
12.5.1. F.X. Diebold's write up: Neural Nets, ML and AI
12.5.2. Capsule networks
12.5.2.1. overview: Understanding Hinton’s Capsule Networks. Part I: Intuition.
12.5.3. Deep learning
12.5.3.1. Deep quantile regression: https://towardsdatascience.com/deep-quantile-regression-c85481548b5a
12.6. Model selection
12.6.1. information criteria
12.6.1.1. Hannan Quinn Hannan Quinn Information Criteria - Rex Analytics
12.6.2. Machine learning or traditional econometrics? Machine Learning vs Econometric Modelling: Which One? - Rex Analytics
12.6.3. feature selection algorithms
12.6.3.1. xgboost
12.6.3.1.1. boost-a-roota (python) chasedehan/BoostARoota
12.7. Algorithms
12.7.1. What's the difference between model, estimator algorithm? Models, Estimators and Algorithms - Rex Analytics
12.7.2. Gradient Descent vs Stochastic Gradient Descent: Some Observations of Behaviour - Rex Analytics
12.8. model interpretation
12.8.1. Interpreting Models: Coefficients, Marginal Effects or Elasticities? - Rex Analytics
12.8.2. Dalex: blackbox model interpretation DALEX: which variables are really important? Ask your black box model! | SmarterPoland.pl
12.9. Continuous, Censored and Truncated Data: what are the differences and do you need to care? - Rex Analytics
12.10. Classification
12.10.1. Unbalanced classes
12.10.1.1. SMOTE
12.10.1.1.1. Sales analytics: Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders
12.11. survival
12.11.1. https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822
13. testing - statistical
13.1. hypothesis testing theory
13.1.1. not significant does not mean not important https://economics.mit.edu/files/14851
13.1.2. Does it matter in practice? Normal vs t distribution - Rex Analytics
13.1.3. explain a confound to me like I don't have a PhD in stats The Great Minds Journal Club discusses Westfall & Yarkoni (2016)
13.1.4. p- values http://bit.ly/2IHJujN
13.1.4.1. sometimes you just have to laugh Introducing the p-hacker app: Train your expert p-hacking skills
13.2. A/B testing
13.2.1. it's hard! Optimizely’s decision to ditch its free plan suggests A/B website testing is dead
13.2.2. power and effect sizes Dan Quintana on Twitter
13.2.3. Size matters: Size Matters - Rex Analytics
13.2.4. Optimal error rate
13.2.4.1. Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests
13.2.4.2. Optimal Error Rate Calculator
13.2.5. LR
13.2.5.1. https://osf.io/preprints/bitss/g3j2k/
13.3. causal inference
13.3.1. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies | American Journal of Epidemiology | Oxford Academic
13.4. ANOVA
13.4.1. singmann/afex
14. analytics outputs
14.1. visualisation
14.1.1. Data Visualization
14.1.2. visual inference
14.1.2.1. Di Cook's 2017 Ihaka lecure
14.1.2.1.1. Di Cook's slides: Myth busting and apophenia in data visualisation: is what you see really there?
14.1.2.1.2. Sketch notes by Jacquie Tran: Jacquie Tran on Twitter
14.1.2.2. Charting Temporal Trends in Alteryx using sugrrants R package – Saqib Ali
14.1.3. visual vocabulary: ideas, ideas, ideas ft-interactive/chart-doctor
14.1.4. mechanics of great visualisation
14.1.4.1. custom corporate palettes in ggplot2 Creating corporate colour palettes for ggplot2 • blogR
14.1.4.2. fuzzy/jagged charts? Set the dpi: Thomas Mock on Twitter
14.1.4.3. neo4J Graphs http://bit.ly/2HVoCo9
14.1.4.4. ggplot2 book ggplot2
14.1.4.4.1. ggplot2 cheat sheet https://www.rstudio.com/wp-content/uploads/2015/08/ggplot2-cheatsheet.pdf
14.1.4.5. multiple plots on a page http://bit.ly/2u9MAKe
14.1.4.6. 5 tips for decoding visualisationhttp://bit.ly/2FZLbY2
14.1.4.7. tree mapping TreeMap with data.tree
14.1.4.8. data viz checklist http://bit.ly/2pvaYAc
14.1.4.9. raincloud plots Introducing Raincloud Plots!
14.1.4.10. https://github.com/yixuan/showtext?utm_content=bufferb8b22&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
14.1.4.11. Take a Sad Plot & Make It Better - Alison Presmanes Hill
14.1.5. visual literacy http://bit.ly/2HS6mMo
14.1.6. process map visualisations http://bit.ly/2G4vsqw
14.1.7. revisualisation Hint Events – Medium_fm/design-and-redesign-4ab77206cf9
14.1.8. animation
14.1.8.1. tweenr Pipe Tweenr
14.1.9. Fundamentals of Data Visualization
14.2. dashboards
14.2.1. starting out with R and Shiny Starting Out with R and Shiny
14.2.2. Shiny cheatsheet https://shiny.rstudio.com/images/shiny-cheatsheet.pdf
14.2.3. https://antoineguillot.wordpress.com/2017/02/21/three-r-shiny-tricks-to-make-your-shiny-app-shines-23-semi-collapsible-sidebar/?utm_content=buffer88135&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
14.3. writing/presentations
14.3.1. writing business reports, basic guide Writing business reports - Rex Analytics
14.3.2. R markdown
14.3.2.1. Writing equations? Reyn Yoshioka on Twitter
14.3.2.2. R markdown cheatsheet https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
14.3.2.3. A new RStudio addin to facilitate inserting tables in Rmarkdown documents | Lorenzo Busetto Website & Blog
14.3.2.4. Templating
14.3.2.4.1. http://rapport-package.info/?utm_content=bufferb649b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
14.3.3. Get your data into a word document from R https://www.youtube.com/watch?v=XzGabWGo6NE&feature=youtu.be
14.3.4. Great presentations
14.3.4.1. Making Slides
14.3.5. Blogs
14.3.5.1. Blogdown tute: Building your blog using blogdown · Data and the Anthropologist
14.4. web
14.4.1. Getting Hugo running on Netlify
15. unstructured data
15.1. text mining/NLP
15.1.1. very basic text mining/introduction, word clouds Text Mining: Word Clouds - Rex Analytics
15.1.1.1. Extending the word cloud Words in Politics: Some extensions of the word cloud « Fells Stats
15.1.2. Text Mining with R
15.1.3. automated text analysis using network methods cbail/textnets
15.1.4. create a text document matrix Text Mining - RDataMining.com: R and Data Mining
15.1.5. term frequency, tf-idf matrix Term Frequency and tf-idf Using Tidy Data Principles
15.1.6. dimensionality reduction and clustering, text data http://bit.ly/2Gbgl22
15.1.7. tokenizsers: New release: tokenizers v0.2.0
15.1.8. advanced:
15.1.8.1. latent variable model in RNNs https://yobibyte.github.io/files/paper_notes/A_Recurrent_Latent_Variable_Model_for_Sequential_Data__Chung_et_al___2016.pdf
15.1.9. hrbrmstr/misinfo
15.2. deep learning
15.2.1. a critical appraisal https://arxiv.org/pdf/1801.00631.pdf
16. data analysis
16.1. How to:
16.1.1. First contact to final decision example Ratesetter: data analysis from first contact to final interrogation - Rex Analytics
16.1.2. Starting out, first questions to ask Data Analysis: Questions to Ask the First Time - Rex Analytics
16.1.3. More questions to ask Data Analysis: More Questions - Rex Analytics
16.1.4. Enough with the questions! Data Analysis: Enough with the Questions Already - Rex Analytics
16.1.5. Reading a data analysis: Chris Riederererer on Twitter
16.1.6. Data exploration cheat sheet, R: http://bit.ly/2HWYCZd
16.1.7. Useful packages: https://bit.ly/2DSrkby
16.2. missing data
16.2.1. Naniar
16.2.1.1. Package home Data Structures, Summaries, and Visualisations for Missing Data • naniar
16.2.1.2. Package on Github njtierney/naniar
16.2.1.3. Gallery
16.2.1.4. Witchunt example Witch hunting in Europe: a discovery of missingness - Rex Analytics