Assembly pipeline (Overview) by Mind Map: Assembly pipeline
0.0 stars - reviews range from 0 to 5

Assembly pipeline (Overview)

Assembly of raw reads using a genome assembler.

Software used: MIRA is currently the only assembler which will perform hybrid assemblies using different sequencing technologies, e.g. a mix of 454 and Illumina

De novo assemblies, Reads are assembled using algorithms based upon sequence quality, paired end distances and average depth of coverage: the latter prevents misassembly of heavily repeated areas

Scaffolded assemblies, Used for all genomes with a close relative. Select appropriate scaffold, usually closest relative, ideally using phylogenetic software.

OUTPUT: draft assembly

Obtain raw reads from assembler: Raw data is essential because it contains sequence quality data for each base unlike the automatically-generated sequence data

454 Sequencing

Flowgrams (SFF files)

Illumina sequencing

HUGE (2 GB files) in fastq format - in pairs.

Sanger sequencing

AB1 files

Post-processing of reads

It is important to see what the genome assembly looks like. Software used: Tablet

Proof reading: All assembler make mistakes. All sequences get proof-read by humans. Software used: gap5

For discovery of markers for detection, Single nucleotide polymorphisms (SNPs) are important. Software used: gigaBayes - usually used for mammal data but LFZ has altered it for use in calling bacterial SNPs

OUTPUT: assembled, proof-read genome

Upload assembly data to portal

LFZ has a a Web page under construction to be used to view, manage and perform analyses on whole genome sequence data:

Annotation: The assembled genome is a set of G, A, T and C nucleotides. Genes must now be assigned to the genome.

The genome is automatically annotated by computer, and each annotation is curated by a human to check accuracy.

Software used: myRAST (for automated annotation), in-house software (for manual curation), and Artemis for genome viewing.

OUTPUT: Annotated genome

