Get Started. It's Free
or sign up with your email address
Week 1 by Mind Map: Week 1

1. Concepts and ideas

1.1. Replicate your results by different people

1.1.1. Cannot be replicated due to:

1.1.1.1. No time

1.1.1.2. No money

1.1.1.3. Unique

1.1.2. Make code available to everyone

1.2. Validation of data analysis

1.3. Research pipeline

1.3.1. Article

1.3.1.1. Author goes left to right

1.3.1.2. Reader goes right to left

1.4. What needed

1.4.1. Data should be available

1.4.2. Avilable code

1.4.3. Documentation of code and data

1.4.4. Standard ways of distribution

1.5. Players

1.5.1. Author

1.5.2. Readers

1.6. Literate Programming

1.6.1. Article

1.6.1.1. Text

1.6.1.2. Code

1.6.2. Presentation code

1.6.3. General concept

1.6.3.1. Documentation language

1.6.3.2. Programming language

1.6.4. Types

1.6.4.1. Sweave

1.6.4.1.1. uses Latex

1.6.4.1.2. Lacks features: caching, multiple plots

1.6.4.1.3. Not well udapted

1.6.4.2. knitr

1.6.4.2.1. uses R

2. Scripting your analysis

2.1. Script everything

3. Structure of data analysis

3.1. Steps

3.1.1. Define a question

3.1.1.1. Narrow as much as possible

3.1.1.2. This helps remove the noise of other data

3.1.2. Define ideal data set

3.1.2.1. May depend on your goal

3.1.2.1.1. Descriptive

3.1.2.1.2. Exploratory

3.1.2.1.3. Inferential

3.1.2.1.4. Predictive

3.1.2.1.5. Causal

3.1.2.1.6. Mechanistic

3.1.3. What data you can access

3.1.3.1. Free on the web

3.1.3.2. Buy data

3.1.3.3. Might need to generate it

3.1.4. Obtain data

3.1.4.1. Try to get raw data

3.1.4.2. If got from web: record url and time accessed

3.1.5. Clean data

3.1.5.1. if it preprocessed already, understand how

3.1.5.2. understand souce of data

3.1.5.3. determine if data is good enough

3.1.5.3.1. quit

3.1.5.3.2. change data

3.1.6. Exploratory data analysis

3.1.7. Statistical prediction/modeling

3.1.7.1. Get the value of uncertancy

3.1.8. Interpret results

3.1.8.1. Use apropriate language

3.1.8.2. Give explanation

3.1.8.3. Interpret the results

3.1.9. Challenge results

3.1.9.1. All steps

3.1.9.2. Measures of uncertanty

3.1.9.3. Think of potential alternatives

3.1.10. Synthesize/write up results

3.1.10.1. Lead with questions

3.1.10.2. Don't include analysis if can

3.1.10.3. pretty figures

3.1.11. Create reproducible code

4. Organizing analysis

4.1. Data

4.1.1. Raw data

4.1.1.1. If downloaded from web - add date in README

4.1.2. Processed data

4.1.2.1. Should be named so it is easy to understand which script generated the data

4.2. Figures

4.2.1. Exploratory figures

4.2.2. Final figures

4.3. R code

4.3.1. Raw / unused scripts

4.3.2. Finl scripts

4.3.3. R markdown files

4.4. Text

4.4.1. README

4.4.1.1. Should contain step-by-step instructions for analysis

4.4.2. Article

4.4.2.1. Title

4.4.2.2. Intro

4.4.2.3. Used methods

4.4.2.4. results

4.4.2.5. Conclusions