Genome sequencing

Get Started. It's Free
or sign up with your email address
Genome sequencing by Mind Map: Genome sequencing

1. applications

1.1. somatic sequencing

1.1.1. mutated vs normal genome sequencing

1.1.2. idea take sample sequence have two genome look for tumor-specific variation identify SNV

1.1.3. "first" publication 2008 cancer cells venn diagram with Venter/Watson SNV all SNVs - SNV in normal tissue - novel (db-snp) in a gene Syn/non-syn

1.2. "counting methods"

1.2.1. chip-seq parts of genome a protein binds to counting experiment peaks

1.2.2. rna-seq quantification of gene expression by sequencing rna of a particular cell

1.3. exome sequencing

1.3.1. why? 1/6 of costs 1/15 of data of whole genome sequencing

1.4. de novo assembly

1.4.1. Genomes sequences without reference

1.4.2. Close previous knowledge gaps

1.5. re-sequencing

1.5.1. goal study genetic variation

1.5.2. how analysing many genomes from same/similar species

2. history

2.1. progress as

2.1.1. automating/refining dideoxi sequencing popular machine AB3770x

3. techniques

3.1. generation

3.1.1. 1. Sanger/Dideoxy examples length idea

3.1.2. 2. Nextgen idea true single molecule sequencing no amplification single molecule real time sequencing Ion torrent computer-based idea

3.1.3. 3.

3.2. in detail

3.2.1. Chain termination methods Dye terminator method

3.2.2. pyro-sequencing incorporation of bases light emission

3.2.3. Library/template preparation methods taking DNA Shearing into smaller chunks (hundreds of bases) Add adapters Select for adapters Attach to surface Creating large insert paired-end libraries goal how indexing libraries

3.2.4. High-throughput sequencing Companies Roche 454 Illumina (Solexa) Life Technologies (Applied Biosystems) Polonator Helicos Pacific BioScience Intelligent Bio Systems techniques Polony sequencing Roche 454 pyrosequencing Illumina (Solexa) sequencing SOLiD sequencing Ion semiconductor sequencing DNA nanoball sequencing Lynx Therapeutics' massively parallel signature sequencing (MPSS) rely on interplay comparison

4. resources

4.1. videos

4.1.1. next-gen sequencing overview

4.1.2. How a genome is sequenced

4.2. initiatives

4.2.1. 1000 human genomes

4.2.2. human hapmap

4.2.3. READDNA

5. definition

5.1. whole-genome sequencing

5.1.1. 30x coverage

5.1.2. mapping to reference genome

5.1.3. no assembled (like first human)

5.2. pair-end

5.2.1. aka mate pairs

5.3. Multi-reads

5.3.1. 20-30% with human genome

5.4. Copy number variation

5.4.1. long stretches

5.5. single nucleotide polymorphism

5.5.1. most common type

5.6. missing terms

5.6.1. Amplified DNA fragments

6. Challenges

6.1. repetitive DNA

6.1.1. occurrence all kingdoms plants half of human genome

6.1.2. function some non-functional played part in human evolution

6.1.3. types interspersed repeats longer interspersed repeats best example tandem repeats short tandem repeats nested repeats

6.1.4. computational problems create ambiguities can lead to false inference of solution definition > 100 bp > 2-3 times in genome > 97% identity

7. Computational analysis

7.1. existing tools

7.1.1. SV/CNV detection

7.1.2. SNP detection GATK MAQ SamTools SOAPsnp VarScan

7.1.3. Short-read alignment tools Bowtie TopHat Cufflinks

7.1.4. De novo assembly Memory-efficient Data structures Algorithms Assembler String Graph Assembler

7.2. workflow

7.2.1. mapping reads reference genome

7.2.2. call SNPs

8. Genome annotation

8.1. identifying elements/structure in genome

8.1.1. Gene prediction

8.2. Attaching biological information to these

8.3. Structural annotation

8.3.1. ORFs

8.3.2. gene structure

8.3.3. coding regions

8.3.4. location of regulatory motifs

8.4. Functional annotation

8.4.1. biochemical function

8.4.2. biological function

8.4.3. involved regulation and interactions