Assembly pipeline (Overview)

Get Started. It's Free
or sign up with your email address
Rocket clouds
Assembly pipeline (Overview) by Mind Map: Assembly pipeline (Overview)

1. Assembly of raw reads using a genome assembler.

1.1. Software used: MIRA is currently the only assembler which will perform hybrid assemblies using different sequencing technologies, e.g. a mix of 454 and Illumina

1.1.1. De novo assemblies

1.1.1.1. Reads are assembled using algorithms based upon sequence quality, paired end distances and average depth of coverage: the latter prevents misassembly of heavily repeated areas

1.1.2. Scaffolded assemblies

1.1.2.1. Used for all genomes with a close relative. Select appropriate scaffold, usually closest relative, ideally using phylogenetic software.

1.2. OUTPUT: draft assembly

2. Obtain raw reads from assembler: Raw data is essential because it contains sequence quality data for each base unlike the automatically-generated sequence data

2.1. 454 Sequencing

2.1.1. Flowgrams (SFF files)

2.2. Illumina sequencing

2.2.1. HUGE (2 GB files) in fastq format - in pairs.

2.3. Sanger sequencing

2.3.1. AB1 files

3. Post-processing of reads

3.1. It is important to see what the genome assembly looks like. Software used: Tablet

3.2. Proof reading: All assembler make mistakes. All sequences get proof-read by humans. Software used: gap5

3.3. For discovery of markers for detection, Single nucleotide polymorphisms (SNPs) are important. Software used: gigaBayes - usually used for mammal data but LFZ has altered it for use in calling bacterial SNPs

3.4. OUTPUT: assembled, proof-read genome

4. Upload assembly data to portal

4.1. LFZ has a a Web page under construction to be used to view, manage and perform analyses on whole genome sequence data: http://bgph.dyndns.org/

5. Annotation: The assembled genome is a set of G, A, T and C nucleotides. Genes must now be assigned to the genome.

5.1. The genome is automatically annotated by computer, and each annotation is curated by a human to check accuracy.

5.2. Software used: myRAST (for automated annotation), in-house software (for manual curation), and Artemis for genome viewing.

5.3. OUTPUT: Annotated genome

6. New node

6.1. New node