Author: kat

Bacterial genomics researcher in Melbourne, Australia

Highlights from SMBE satellite workshop in Kyoto

I’m currently on holiday in Japan after attending the excellent 2nd SMBE Satellite Workshop on Genome Evolution in Pathogen Transmission and Disease in Kyoto.

Before all the memories fade away with the magic of Japanese onsen, sake, ramen etc I wanted to record some of the highlights.

For me the greatest thing was seeing several invited speakers sharing their talk slots, tag-team style, with members of their lab. Perhaps this wouldn’t work in a more formal conference setting but in this workshop style event it was truly fantastic, and I hope to see more of it (and I look forward to doing it myself too – I didn’t on this occasion because I wasn’t presenting).

Here are some other highlights are in no particular order…

  • Young scientists whose PhD work was discussed at the first workshop (2.5 years ago) by their mentors, now attending the workshop themselves and presenting their latest work as postdocs.
  • 3 minute lightning talks – open to all, in place of poster sessions – at the end of every session. This was a great way to give everyone a chance to participate in the workshop.
  • Gaps in the afternoons to get out and see the city where we have all travelled so far to meet. Best for me was an afternoon visit to Arashiyama to see the bamboo forest and monkey park (thanks enormously to Koji Yahara for guiding us!)
  • A speaker presenting content from a recently published paper, sharing their frustrations about Reviewer #3 and hoping they were here so they could have it out in person… and Reviewer #3 declaring themselves later on in the conference. (You know who you are.)
  • Hearing Ed Feil present the first data from the SpARK project, looking at Klebsiella isolated from intensive human, animal and environmental samples around an Italian city. I’ve had multiple knock-backs from Australian funding apps proposing to do something similar locally, so it was cool to see this work is getting done. I know others are generating similar data in other settings (and we will keep trying in Australia too!) so we should slowly start to gain some much-needed clarity on Klebs ecology.
  • Hearing from Martha Clokie that Burkholderia pseudomallei phages switch between lytic and temperate lifestyle depending on temperature… remaining dormant in the genome at low temperatures and switching to lytic at 37C. So the phages may be our friend in this scenario, holding back the pathogen once it enters humans. Also explains why no prophages have been identified integrated into B. pseudomallei genomes – because these genomes all come from clinical isolates, which presumably were only able to proliferate and cause disease because they lacked the phage.
  • Catching up with past lab members Claire Gorrie and Danielle Ingle, in a foreign land… especially sitting with a group 7 others in a tiny local bar with 8 seats, drinking sake into the night, thanks to Claire’s excellent Japanese 😉

Let’s hope we get to come back to Japan for a third workshop soon…

Advertisements

AMR distribution in intestinal E. coli from children in Asia and Africa

 

Today I’m pleased to see the final version of our paper on antimicrobial resistance in intestinal E. coli from Asian & African children published in Nature Microbiology. This is last piece of the puzzle from Danielle Ingle’s PhD research, a tremendous effort centred around the analysis of a collection of ~200 atypical enteropathogenic E. coli (aEPEC) isolated from cases and controls in seven countries during the Global Enterics Multicentre Study (GEMS).

The first analysis of the genome data from this collection was reported in this 2016 paper, also in Nature Microbiology. It focused on understanding the population structure of the pathotype, including establishing a framework for looking at variation in the primary virulence locus (the LEE pathogenicity island; see blog post here).

aEPEC_distribution

Danielle then looked at serotype diversity in the collection, and used the experience to tackle the problem of O and H serotype prediction from genome data. That work is detailed in this Microbial Genomics paper, which utilises the phenotypes and genome data from the GEMS aEPEC collection to assess the reliability of predictions.

Finally we turned our attention to antimicrobial resistance (AMR) in the isolate collection – characterising resistance phenotypes, looking at known genetic determinants of AMR in the genome data, and also examining data on prescribing of antimicrobials for treatment of diarrhoeal disease in children at each study site.

So what did we find?

Firstly, whether we consider AMR phenotypes or genotypes we see that AMR was rampant, with most strains either multidrug resistant (65%; resistan to ≥ 3 drug classes) or susceptible to all drugs tested (19%):

AMR_classes

We found >40 different acquired AMR genes in the genomes, and also point mutations that are known to be associated with resistance to fluoroquinolones (in gyrA, parC) or nitrofurantoin (nfsA). Notably there was no difference between AMR rates in cases and controls, even at the level of individual genes:

genewise case control

We found that many of these AMR genes co-occured together in known mobile genetic elements:

Screen Shot 2018-08-21 at 12.59.41 pm.png

Quite often the structures of these elements were not totally resolvable from the genome assemblies, which were based on short Illumina reads only (no long reads for this data set unfortunately!)… but nevertheless, Danielle could often resolve co-localisation of these genes from the assembly graphs using Bandage:

Screen Shot 2018-08-21 at 1.00.28 pm

We had seen in the first paper that the isolates were highly diverse, comprising dozens of distinct clones… this tree is inferred from a core gene alignment of the study isolates together with some other genomes for context (GEMS study isolates are indicated as dark blue in the outer ring). The ten shaded clades indicate dominant clonal groups in the study population.

aEPEC_tree

Back to the AMR study. We did a discriminant analysis of principle components (DAPC) to see whether the variation in the distribution of genetic determinants amongst the genomes could be used to discriminate between the clonal groups, and saw that AMR was not associated with individual clones:

clone DAPC

Instead we found that variation in AMR gene complement could discriminate isolates from different geographical region, suggesting that AMR genes more often reflect horizontal acquisition from distinct local gene pools in different parts of the world, rather than fixed features of their host bacterium that travel the world with their host strain (clone):

region DAPC

In particular, we saw that fluoroquinolone resistance associated mutations in gyrA were associated with Asian sites; while sites in East vs West Africa could be discriminated by the presence of different dihydrofolate reductase (dfr) genes responsible for trimethoprim resistance, with dfrA8 being more common in West Africa and dfrA5 being present in East Africa.

region gene prevalence

The data we have showed regional differences in AMR phenotypes, and in antibiotic usage for treatment of paediatric diarrhoea at the GEMS sites.

drug res and usage

a) Resistance phenotypes. b) Frequency of antimicrobials prescribed to children with watery diarrhoea. c) Frequency of antimicrobials prescribed to children with dysentery.

However the prevalence of acquired resistance genes amongst E. coli isolated from each site was not associated with local frequencies of drug usage. The exception was fluoroquinolones: point mutations in gyrA and parC (which reduce MIC to ciprofloxacin) were more common at the Asian sites, where ciprofloxacin was used much more often to treat diarrheal disease than in African sites.

cipR gyrA parC

There are many possible reasons for the lack of association between local prescribing for diarrheal disease and the presence of AMR genes in local diarrheal pathogens. We expect that most antimicrobial exposure in human gut bugs like E. coli probably is not associated with attempts to treat E. coli infection at all, but with exposure to drugs given to treat other infections, drugs used in food animals which are a reservoir for E. coli, or even environmental contamination with antibiotics. Also because the horizontally acquired genes tend to travel together as a group in mobile genetic elements, exposure to one drug can co-select for resistance to many. This may be one reason that the association was more evident for ciprofloxacin use and gyrA/parC mutations, which are not in linkage with acquired AMR genes.

Finally, the data provided an opportunity to explore how well we can predict AMR phenotypes based on identifying known genetic determinants of AMR in E. coli genomes. The results were pretty good, indicating low rates of “very major errors” (where we predict a strain to be susceptible, but really it is resistant) for most drug classes. These results are comparable to those done independently in other collections of E. coli and also other bacteria, summarised here. But clearly there is room for improvement, and probably a few new mechanisms floating around out there… notably we didn’t aim to assess changes in expression of intrinsic E. coli genes, such as efflux pumps and beta-lactamases, which can contribute to drug resistance but are not so easy to find in genome data.

geno_pheno

 

Update to Comparative Bacterial Genomics tutorial

by David Edwards

In 2013, Kat and I wrote what turned out to be a very popular Beginner’s guide for comparative bacterial genome analysis. After four years and 120,000+ downloads of the guide, we thought it might be time to update the hands-on tutorial that was included. 

As with any science, there have been advances in this time. We don’t have time to update all aspects, but felt it was important to update the recommended assembler from Velvet to SPAdes. The latter has become the ‘go-to’ assembler with our lab and many others over the last few years. Unfortunately, SPAdes does not work with Windows, but Windows users can use the original Velvet assembler if they wish to attempt their own assembly.

Also, Ryan Wick in our lab has developed a way to visualise the assembly graphs produced by SPAdes and other assemblers, in the form of a software program called Bandage. This allows us to examine and compare the properties of assembly graphs, useful if you are trying to assemble the same set of reads with different methods or parameter settings. 

The other changes in version 2 are mainly to fix broken links to the E. coli sequences that have now been archived by NCBI, kindly pointed out to us by Michael Hall and others via email.

We continue to recommend Artemis and ACT for visualising and comparing annotated bacterial genome sequences, and both tools are still actively maintained at the Sanger Institute. While BRIG is no longer actively maintained, we continue to recommend it as it appears to be stable across newer versions of Java and BLAST, and it remains incredibly useful.

Hands-on tutorial v2 (6 Mb PDF): ComparativeGenomicsTutorialV2

Original article: Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data

Global genomic framework for typhoid

It’s been over a year since we published the first global whole-genome snapshot of nearly 2000 genomes of the typhoid bacterium, Salmonella Typhi in Nature Genetics.

That paper focused on the emergence and global dissemination of what we’ve been calling for years the “H58” clone (see this blog post). This clone accounted for nearly half of all the isolates sequenced, and is a big deal because it tends to be multidrug resistant (MDR), carrying a suite of resistance genes that render all the cheap, first-line drugs like chloramphenicol, ampicillin, and trimethoprim-sufamethoxazole useless for treatment. Detailed genomic epi studies show the local impact of the arrival of MDR H58 in countries as widespread as Malawi and Cambodia; and the emergence of fluoroquinolone resistant H58 sublineage in India and Nepal recently stopped a treatment trial because the current standard of care – ciprofloxacin – was resulting in frequent treatment failure.

While H58 is important, the global Typhi population contains a lot of genomic diversity outside the H58 clone, and we’ve turned our attention to the rest of the population now in a new paper in Nature Communications: “An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid

First, we decided that we needed to revisit the haplotyping scheme of Roumagnac et al (from which H58 gets its name), which was based on just ~80 genes, using the whole genome phylogeny. Here is the tree inferred from core genome SNPs in 1832 Typhi strains, with the old haplotypes indicated by the coloured ring around the outside. It’s pretty easy to see that some haplotypes (like H52 and H1) actually comprise multiple distinct phylogenetic lineages (low resolution), while others subdivide lineages (excessive resolution).

FigS1_GlobalTreeColouredByHgroups

Whole genome SNP tree for 1832 strains, outer ring indicates haplotypes based on mutations in 80 genes as defined in Roumagnac et al, Science 2006.

We used BAPS to define genetic clusters at various levels (thanks to Tom Connor for running this). We settled on 3 levels of hierarchical clustering, indicated in the tree below:

• 4 nested primary clusters (inner-most ring; yellow, green, blue, red). These have 100% bootstrap support and are each characterised by >20 SNPs

• Clusters are further divided into 16 clades (middle ring and labels). The median pairwise distance between isolates in the same clade is 109 SNPs, while the inter-clade SNP distance averages 243 SNPs.

• Clades are further divided into 49 subclades, indicated by alternating background shading colours. The median pairwise distance between isolates in the same subclade is 25 SNPs.

Fig1_noH58_clade_colours_subcladebg_190515_labelled_ED.png

Tree indicating new phylo-informed genotypes. Primary clusters 1-4 are indicated in the inner ring. Branch colours indicate clades, which are also labelled on the outside and coloured in the outer ring. Subclades are indicated by alternating background shading.

subcladerectOne of the key reasons we wanted to define the phylogenetic lineages in this way is to make them easier to identify and talk about. I’ve always been a fan of MLST for this reason, since it’s much easier to talk about K. pneumoniae ST258, ST11, ST15 etc than ‘that lineage that has reference strain X in it’. So we introduce a hierarchical nomenclature system, similar to the one currently in use for Mycobacterium tuberculosis, where the 4 primary clusters (1, 2, 3, 4) are subdivided into 16 clades (1.1, 1.2; 2.1, 2.2, etc) which in turn are subdivided into 49 subclades (1.1.1, 1.1.2, etc). This has the advantage of conveying hierarchical relationships between groups – e.g. 2.2.1 and 2.2.2 are sister subclades within clade 2.2, which is a sister clade of 2.1.

The subclades are easier to distinguish in the collapsed rectangular tree on the right, where each subclade is represented by just one strain.

Some BAPS clusters were polyphyletic and consisted of isolates belonging to rare phylogenetic lineages whose common ancestor in the tree coincided with the common ancestor of an entire clade (n=9) or primary cluster (n=2). These groups contain isolates that, given increased numbers, may emerge as distinct clusters that form sister taxa within the parent clade (or primary cluster), and were given the suffix ‘.0’ rather than a defined cluster number (e.g. 3.0 or 3.1.0) to indicate non-equivalence with the properly differentiated sister clades (n=16) or subclades (n=49). As more genomes are added, these are expected to be more clearly differented into distinct groups and given proper clade/subclade designations.

Next we defined a set of 68 SNPs that can be used to genotype isolates into these groups. We chose one SNP for each primary cluster, clade and subclade (preferentially choosing intragenic SNPs in well-conserved core genes). The SNPs are detailed in a supplementary spreadsheet, and we provide a script to assign strains to genotypes based on an input BAM or VCF file generated by mapping to the reference genome for Typhi strain CT18.

An isolate that belongs to a differentiated subclade such as 2.1.4 will be hierarchically identified by carrying the derived allele for primary cluster 2 (but not the nested clusters 3 and 4); the derived allele for clade 2.1 (but no other clades) and the derived allele for subclade 2.1.4 (but no other subclades). It is possible for an isolate to carry derived alleles for a primary cluster and clade with no further differentiation into subclade.

The clone formally known as H58
Under the new scheme, the infamous H58 clone is named subclade 4.3.1, which so far has no sister clades. I suspect those of us familiar with Typhi population genomics will keep referring to it informally as H58 for some time, since that name is now well known… but I will try to re-train myself to call it 4.3.1 (H58).

Now the fun part: exploring the geographical distribution of these lineages.

Fig1c_worldmap_pies_subclades

Figure 1c from the paper. Pie colours indicate clades found in each WHO region in the global data set (key is in the tree figure above).

In the paper we go on to show that:
• clades are widely geographically distributed, while subclades are geographically constrained (see heatmap below);
• genotyping can be used to predict the geographical origin of travel-associated typhoid in patients in London;
• even better predictions can be obtained based on genome-wide SNP distances to our reference panel of >1800 isolates… but of course that involves a lot more computationally intensive comparisons than a quick screen of a new isolate’s BAM file.

Screen Shot 2016-08-29 at 11.47.15 pm.png

Figure 2 of the paper, showing the geographical distribution of subclades, which shows most subclades are restricted to a single region. For this analysis, the effect of local outbreaks has been minimised by replacing groups of strains that share the same subclade and year and country of isolation with a single representative strain.

You can read the full details in the paper, but here I just want to highlight that you can now explore the global genomic framework for Typhi – including genotype designations as well as temporal and geographic data – interactively in MicroReact.

tree.png

 

How does genotyping help with studying local populations?

We have already begun using the new genotyping scheme in local typhoid studies. I find this a really helpful way to describe/summarise the local populations, and place them in the context of the global population without resorting to large trees.

For example in this recent Nigerian study, we described the population like this: “The majority of isolates (84/128, 66%) belonged to genotype 3.1.1 , which is relatively common across Africa, predominantly western and central countries. In the wider African collection genotype 3.1.1 was represented by isolates from neighbouring Cameroon and across West Africa (Benin, Togo, Ivory Coast, Burkina Faso, Mali, Guinea and Mauritania) suggesting long-term inter-country exchange within the region. Most of the remaining isolates belonged to four other genotypes (4.1, 2.2, 2.3.1 and 0.0.3).”

Of course genotype assignment is not the end of the story – we still want to build whole-genome trees to explore the relationships of local isolates with those from other countries. Importantly, working with genotypes means that we can achieve this without needing to build a megatree of all isolates in the local + global collections (n>2000). Instead, we can use the genotypes to identify which strains from the global collection are relatives of the Nigerian isolates, and build a much smaller tree that still captures all of the information about transmission/transfer between Nigeria and other countries:

journal-pntd-0004781-g001

The tree and map were made using MicroReact, you can recreate theme here: http://microreact.org/project/styphi_nigeria To get this colour scheme just click on the eye icon (bottom left) and select ‘country’; and to get the fan style tree, click the settings button (top right) and click the fan shape.

Another example is in our recent paper on isolates collected in Thailand before and after the introduction of their national vaccination program (pre-print here):

  • Genotype 3.2.1 was the most common (n=14, 32%), followed by genotype 2.1.7 (n=10, 23%)
  • Genotypes 2.0 (n=1, 2%) and 4.1 (n=3, 7%) were observed only in 1973 (pre-vaccine period)
  • Genotypes 2.1.7 (n=10, 23%), 2.3.4 (n=1, 2%), 3.4.0 (n=2, 5%), 3.0.0 (n=3, 7%), 3.1.2 (n=2, 5%), were observed only after 1981 (post-vaccine period)
  • Genotypes 3.2.1 and 2.4.0 were observed amongst both pre- and post-vaccine isolates, but the subclade phylogenies show that these more likely to represent re-introduction of strains from neighbouring countries than persistence within Thailand throughout the immunisation program.

Elizabethkingia anophelis

DATA: raw Illumina reads (CDC) under SRP072035

Assemblies & analyseshttps://github.com/katholt/elizabethkingia

SpeciesTree_OutbreakTree2

Left – core SNP tree, created from assembled genomes using Parsnp. Right – core SNP tree, created by mapping all outbreak genomes to our 9-contig assembly for SRR3240412, using our RedDog pipeline. Details below; all assemblies and mapping outputs are here.


March 19, 2016: I saw on twitter today that there was an outbreak of a weird bacteria I’ve never heard of before (Elizabethkingia anophelis) in Wisconsin, which had infected >50 people and killed almost 20.

The Wisconsin Health Services department has posted some information here (click the “For Health Professionals” tab to get info on the bacteria and antibiotic resistance).

CDC has deposited Illumina reads from 18 outbreak strains into SRA under project SRP072035 so I pulled the data and had a look. I managed to download the readsets in a few minutes (using bionode-ncbi) but it took a really long time to unpack these into fastq files using sra-toolkit.

As I have no idea about this species, I thought I’d start by looking for antibiotic resistance and plasmids in the first 6 read sets using our SRST2 software, while waiting for the rest of the reads to unpack… this ran quickly and showed me the same results for all 6 strains: GOB-10 (1 SNP from the closest allele in the ARG database) and B-2 (3 SNPs), at depths of 35-65x. For example:

SRR3240397 ARGannot.r1 GOB-1_Bla GOB-10_821 100.0 47.59 1snp 0.115 873 0.043 188 821 no;yes;GOB-10;Bla;AY647247;1-873;873
SRR3240397 ARGannot.r1 B-1_Bla B-2_1160 100.0 59.891 3snp 0.4 750 0.03 314 1160 no;no;B-2;Bla;AF189300;1-750;750

Because the matches were not identical, I pulled the consensus sequences (based on read mapping) using –report_all_consensus option in SRST2:

>314__B-1_Bla__B-2__1160 no;no;B-2;Bla;AF189300;1-750;750
ATGTTGAAAAAAATAAAAATAAGCTTGATTCTTGCTCTTGGGCTTACCAGTCTGCAGGCA
TTTGGACAGGAGAATCCTGACGTTAAAATTGATAAGCTAAAAGATAATCTGTATGTATAC
ACAACCTACAATACATTTAACGGGACTAAATATGCCGCTAATGCAGTATATCTGGTAACT
GATAAGGGTGTTGTGGTTATAGACTGTCCGTGGGGAGAAGACAAATTTAAAAGCTTTACG
GACGAGATTTATAAAAAACACGGAAAGAAAGTTATTATGAATATTGCAACACATTCTCAT
GATGATCGTGCCGGAGGTCTTGAATATTTTGGTAAAATAGGTGCAAAAACTTATTCTACT
AAAATGACAGATTCTATTTTAGCAAAAGAGAATAAGCCAAGAGCACAATATACTTTTGAC
AATAATAAATCTTTCAAAGTAGGAAAATCCGAGTTTCAGGTTTACTATCCCGGAAAAGGA
CATACAGCAGATAATGTGGTGGTATGGTTTCCAAAAGAAAAAGTATTGGTTGGAGGTTGT
ATTATAAAAAGCGCTGATTCAAAAGACCTGGGGTATATTGGAGAAGCATATGTAAACGAC
TGGACGCAGTCTGTACACAATATTCAACAAAAGTTTTCCGGTGCTCAGTACGTTGTTGCA
GGGCATGATGATTGGAAAGATCAAAGATCAATACAACGTACACTAGACTTAATCAATGAA
TATCAACAAAAACAAAAGGCTTCAAATTAA
>188__GOB-1_Bla__GOB-10__821 no;yes;GOB-10;Bla;AY647247;1-873;873
ATGAGAAATTTTGTTATACTGTTTTTCATGTTCATTTGCTTGGGCTTGAATGCTCAGGTA
GTAAAAGAACCTGAAAATATGCCCAAAGAATGGAACCAGACTTATGAACCCTTCAGAATT
GCAGGTAATTTATATTACGTAGGAACCTATGATTTGGCTTCTTACCTTATTGTGACAGAC
AAAGGCAATATTCTCATTAATACAGGAACGGCAGAATCGCTTCCAATAATAAAAGCAAAT
ATCCAAAAGCTCGGGTTTAATTATAAAGACATTAAGATCTTGCTGCTTACTCAGGCTCAC
TACGACCATACAGGTGCATTACAAGATCTTAAAACAGAAACCGGTGCAAAATTCTATGCC
GATAAAGAAGATGCTGATGTCCTGAGAACAGGGGGGAAGTCCGATTATGAAATGGGAAAA
TATGGGGTGACATTTAAACCTGTTACTCCGGATAAAACATTGAAAGATCAGGATAAAATA
ACACTGGGAAATACAATCCTGACTTTGCTTCATCATCCCGGACATACAAAAGGTTCCTGT
AGTTTTATTTTTGAAACAAAAGACGAGAAGAGAAAATATAGAGTTTTGATAGCTAATATG
CCCTCCGTTATTGTTGATAAGAAATTTTCTGAAGTTACCGCATATCCAAATATTCAGTCC
GATTATGCATATACTTTCAAAGCAATGAAGAATCTGGATTTTGATATTTGGGTGGCCTCC
CATGCAAGTCAGTTCGATCTCCATGAAAAACGTAAAGAAGGAGATCCGTACAATCCGCAA
TTGTTTATGGATAAGCAAAGCTATTTCCAAAACCTTAATGATTTGGAAAAAAGCTATCTC
GACAAAATAAAAAAAGATTCCCAAGATAAATAA

I had a quick look at the accessions for these genes (they are hiding in the fasta headers above) and found that they are carbapenemase genes reported from Elizabethkingia meningosepticum (previously called Chryseobacterium meningosepticum), reported in these papers: Bellais 2000, Antimicrob. Agents Chemother and Yum 2010, J Microbiol. These genes confer resistance to carbapenems like meropenem and imipenem, which probably contributes to these bacteria causing hospital-acquired infections as they will be selected for by carbapenem exposure.

SRST2 didn’t find any other acquired antibiotic resistance genes (from the ARG-Annot database) or known plasmid replicons (at least those in the PlasmidFinder database), which is consistent with the Wisconsin health services reports that these strains are susceptible to lots of readily accessible drugs including fluoroquinolones, rifampin and trimethoprim/sulfamethoxazole.

Running a quick NCBI BLAST search of the carbapenemase gene sequences shows that these new sequences, which are from outbreak strains identified definitively as Elizabethkingia anophelis by CDC, are closest to sequences annotated in NCBI as originating from Elizabethkingia meningosepticum. (The trees below are just straight out NCBI BLAST, obtained by clicking “Distance tree of results” and then downloading the newick tree files to view in FigTree.)

Screen Shot 2016-03-19 at 4.21.25 pm

The outbreak strain’s GOB-10 gene had 1 synonymous SNP compared to the reference sequence, while the B-2 gene had 1 synonymous and 2 non-synonymous SNPs (affecting codons 31 & 34, which is outside the beta-lactamase domain).

I am guessing that species assignations are pretty tricky for this genus, as few labs will have access to definitive tests to discern them, so we shouldn’t read much into this. However if it is true that the outbreak strains are Elizabethkingia anophelis and the close-matching genes in NCBI did come from Elizabethkingia meningosepticum, this would suggest that there is horizontal gene transfer between these species.

Note 1: while writing this, the fastqs finished extracting and I ran SRST2 and found the same antibiotic resistance gene results on all 18. I’m now running some SPAdes assemblies which I’ll post here later, to save others the trouble…

Note 2: the assemblies (SPAdes fasta and fastg; plus Prokka annotated in GenBank format), and various analyses including trees created using Parsnp (from assemblies) and our RedDog pipeline (mapping of reads to reference genome strain NUHP1 =CP007547) are here in github: https://github.com/katholt/elizabethkingia

The assemblies are a bit variable, but mostly ~3.9 Mbp (the reference is 4,369,828) but the best one was for SRR3240413 – 32 contigs with 3,911,053 bp total. Viewing the SPAdes assembly graph in our Bandage program shows that 3,910,660 bp are in a single linked graph, which corresponds to the chromosome. (The other little bits do not look like plasmids, just leftover bits of sequence and probably adapters, that SPAdes spit out in teeny bits of a few hundred bp each.)

SRR3240413_bandage_graph

 

The genomes look pretty similar at first glance, but interestingly 4 of them share a deletion of ~80 genes. That’s a great little epidemiological marker for the investigations.

Screen Shot 2016-03-19 at 7.24.21 pm

This was detected by our mapping pipeline RedDog, which I used to map the reads to reference genome NUHP1 CP007547 (this may not be the best reference, I just picked one randomly). The assemblies confirm it: genes BD94_0888 to BD94_0962, and the end of BD94_0963, are missing in these 4 strains (although reads do map to BD94_0948, because this is present in a second copy elsewhere in the genome).

Here’s that tree with a bit more detail (red = # core SNPs). The tree was made using FastTree, with NUHP1 reference genome as an outgroup (the outbreak strains are >40,000 SNPs away from this reference).

Screen Shot 2016-03-19 at 8.26.50 pm


UPDATE 20/3

David Edwards has been playing with Hybrid StriDe + SPAdes assembly recently, and tried this with SRR3240412. The SPAdes assembly (here) was 3,913,666 bp in 41 contigs. The hybrid assembly (here) is 3,917,367 bp in 9 contigs. This is what the graph looks like… I’m showing it coloured by BLAST matches to the NUPH1 reference genome so you can see what the likely path through the graph is (rainbow, red -> purple, indicates matches from start -> end coordinates of the reference genome).

scaffolds_resolved_blast2.png

March 22: Hybrid assemblies for all 18 outbreak genomes (9-15 contigs each; ranging 3,830,044 – 3,912,928 bp) are now in GitHub (thanks to David Edwards for this).

David re-ran the RedDog mapping pipeline using this genome assembly as the reference, and got a very similar tree (files in GitHub):

SRR3240412_RedDogTree

And here is a core genome SNP tree (made from genome assemblies using Harvest), which shows the outbreak strains are a novel lineage of E. anophelis, compared to currently available data (tree file in GitHub):

species_tree_with_NCTC10588

Note: Sylvain Brisse has shown the same thing using his core genome MLST scheme (see BioRxiv preprint posted March 19). I have used his nomenclature here (lineage A, lineage B). Note that Lineage B strains were originally identified as E. meningoseptica, but belong to E. anophelis. I have also included here the genome of E.meningoseptica NCTC10588, which was sequenced as part of the Sanger/PHE/PacBio type strain project, in this tree… it clusters within lineage A and is clearly E. anophelis.

UPDATE: This tweet-fest led to a collaborative project between myself, Sylvain Brisse (Institut Pasteur) and CDC, which was eventually published in Nature Communications a year after this blog post: “Evolutionary dynamics and genomic features of the Elizabethkingia anophelis 2015 to 2016 Wisconsin outbreak strain