Genome sequencing

Get Started. It's Free
or sign up with your email address
Rocket clouds
Genome sequencing by Mind Map: Genome sequencing

1. applications

1.1. somatic sequencing

1.1.1. mutated vs normal genome sequencing

1.1.2. idea

1.1.2.1. take sample

1.1.2.2. sequence

1.1.2.2.1. have two genome

1.1.2.3. look for tumor-specific variation

1.1.2.3.1. identify SNV

1.1.3. "first" publication

1.1.3.1. 2008

1.1.3.2. cancer cells

1.1.3.3. venn diagram with Venter/Watson

1.1.3.4. SNV

1.1.3.4.1. all SNVs

1.1.3.4.2. - SNV in normal tissue

1.1.3.4.3. - novel (db-snp)

1.1.3.4.4. in a gene

1.1.3.4.5. Syn/non-syn

1.2. "counting methods"

1.2.1. chip-seq

1.2.1.1. parts of genome a protein binds to

1.2.1.2. counting experiment

1.2.1.2.1. peaks

1.2.2. rna-seq

1.2.2.1. quantification of gene expression

1.2.2.2. by

1.2.2.2.1. sequencing rna

1.2.2.3. of a particular cell

1.3. exome sequencing

1.3.1. why?

1.3.1.1. 1/6 of costs

1.3.1.2. 1/15 of data

1.3.1.3. of whole genome sequencing

1.4. de novo assembly

1.4.1. Genomes sequences without reference

1.4.2. Close previous knowledge gaps

1.5. re-sequencing

1.5.1. goal

1.5.1.1. study

1.5.1.1.1. genetic variation

1.5.2. how

1.5.2.1. analysing many genomes from same/similar species

2. history

2.1. progress as

2.1.1. automating/refining dideoxi sequencing

2.1.1.1. popular machine

2.1.1.1.1. AB3770x

3. techniques

3.1. generation

3.1.1. 1.

3.1.1.1. Sanger/Dideoxy

3.1.1.1.1. examples

3.1.1.1.2. length

3.1.1.1.3. idea

3.1.2. 2.

3.1.2.1. Nextgen

3.1.2.1.1. idea

3.1.2.2. true single molecule sequencing

3.1.2.2.1. no amplification

3.1.2.3. single molecule real time sequencing

3.1.2.4. Ion torrent

3.1.2.4.1. computer-based

3.1.2.4.2. idea

3.1.3. 3.

3.2. in detail

3.2.1. Chain termination methods

3.2.1.1. Dye terminator method

3.2.2. pyro-sequencing

3.2.2.1. incorporation of bases

3.2.2.2. light emission

3.2.3. Library/template preparation methods

3.2.3.1. taking DNA

3.2.3.2. Shearing into smaller chunks (hundreds of bases)

3.2.3.3. Add adapters

3.2.3.4. Select for adapters

3.2.3.5. Attach to surface

3.2.3.6. Creating large insert paired-end libraries

3.2.3.6.1. goal

3.2.3.6.2. how

3.2.3.7. indexing libraries

3.2.4. High-throughput sequencing

3.2.4.1. Companies

3.2.4.1.1. Roche 454

3.2.4.1.2. Illumina (Solexa)

3.2.4.1.3. Life Technologies (Applied Biosystems)

3.2.4.1.4. Polonator

3.2.4.1.5. Helicos

3.2.4.1.6. Pacific BioScience

3.2.4.1.7. Intelligent Bio Systems

3.2.4.2. techniques

3.2.4.2.1. Polony sequencing

3.2.4.2.2. Roche 454 pyrosequencing

3.2.4.2.3. Illumina (Solexa) sequencing

3.2.4.2.4. SOLiD sequencing

3.2.4.2.5. Ion semiconductor sequencing

3.2.4.2.6. DNA nanoball sequencing

3.2.4.2.7. Lynx Therapeutics' massively parallel signature sequencing (MPSS)

3.2.4.3. rely on

3.2.4.3.1. interplay

3.2.4.4. comparison

4. resources

4.1. videos

4.1.1. next-gen sequencing overview

4.1.1.1. http://www.youtube.com/watch?v=g0vGrNjpyA8

4.1.1.2. http://www.youtube.com/watch?v=FGp8bOi2RVA&feature=related

4.1.2. How a genome is sequenced

4.2. initiatives

4.2.1. 1000 human genomes

4.2.2. human hapmap

4.2.3. READDNA

5. definition

5.1. whole-genome sequencing

5.1.1. 30x coverage

5.1.2. mapping to reference genome

5.1.3. no assembled (like first human)

5.2. pair-end

5.2.1. aka

5.2.1.1. mate pairs

5.3. Multi-reads

5.3.1. 20-30% with human genome

5.4. Copy number variation

5.4.1. long stretches

5.5. single nucleotide polymorphism

5.5.1. most common type

5.6. missing terms

5.6.1. Amplified DNA fragments

6. Challenges

6.1. repetitive DNA

6.1.1. occurrence

6.1.1.1. all kingdoms

6.1.1.1.1. plants

6.1.1.1.2. half of human genome

6.1.2. function

6.1.2.1. some non-functional

6.1.2.2. played part in human evolution

6.1.3. types

6.1.3.1. interspersed repeats

6.1.3.1.1. longer interspersed repeats

6.1.3.1.2. best example

6.1.3.2. tandem repeats

6.1.3.2.1. short tandem repeats

6.1.3.3. nested repeats

6.1.4. computational problems

6.1.4.1. create ambiguities

6.1.4.1.1. can lead to false inference of

6.1.4.1.2. solution

6.1.4.2. definition

6.1.4.2.1. > 100 bp

6.1.4.2.2. > 2-3 times in genome

6.1.4.2.3. > 97% identity

7. Computational analysis

7.1. existing tools

7.1.1. SV/CNV detection

7.1.2. SNP detection

7.1.2.1. GATK

7.1.2.2. MAQ

7.1.2.3. SamTools

7.1.2.4. SOAPsnp

7.1.2.5. VarScan

7.1.3. Short-read alignment

7.1.3.1. tools

7.1.3.1.1. Bowtie

7.1.3.1.2. TopHat

7.1.3.1.3. Cufflinks

7.1.4. De novo assembly

7.1.4.1. Memory-efficient

7.1.4.1.1. Data structures

7.1.4.1.2. Algorithms

7.1.4.2. Assembler

7.1.4.2.1. String Graph Assembler

7.2. workflow

7.2.1. mapping reads

7.2.1.1. reference genome

7.2.2. call SNPs

8. Genome annotation

8.1. identifying elements/structure in genome

8.1.1. Gene prediction

8.2. Attaching biological information to these

8.3. Structural annotation

8.3.1. ORFs

8.3.2. gene structure

8.3.3. coding regions

8.3.4. location of regulatory motifs

8.4. Functional annotation

8.4.1. biochemical function

8.4.2. biological function

8.4.3. involved regulation and interactions