e. coli

AMR distribution in intestinal E. coli from children in Asia and Africa


Today I’m pleased to see the final version of our paper on antimicrobial resistance in intestinal E. coli from Asian & African children published in Nature Microbiology. This is last piece of the puzzle from Danielle Ingle’s PhD research, a tremendous effort centred around the analysis of a collection of ~200 atypical enteropathogenic E. coli (aEPEC) isolated from cases and controls in seven countries during the Global Enterics Multicentre Study (GEMS).

The first analysis of the genome data from this collection was reported in this 2016 paper, also in Nature Microbiology. It focused on understanding the population structure of the pathotype, including establishing a framework for looking at variation in the primary virulence locus (the LEE pathogenicity island; see blog post here).


Danielle then looked at serotype diversity in the collection, and used the experience to tackle the problem of O and H serotype prediction from genome data. That work is detailed in this Microbial Genomics paper, which utilises the phenotypes and genome data from the GEMS aEPEC collection to assess the reliability of predictions.

Finally we turned our attention to antimicrobial resistance (AMR) in the isolate collection – characterising resistance phenotypes, looking at known genetic determinants of AMR in the genome data, and also examining data on prescribing of antimicrobials for treatment of diarrhoeal disease in children at each study site.

So what did we find?

Firstly, whether we consider AMR phenotypes or genotypes we see that AMR was rampant, with most strains either multidrug resistant (65%; resistan to ≥ 3 drug classes) or susceptible to all drugs tested (19%):


We found >40 different acquired AMR genes in the genomes, and also point mutations that are known to be associated with resistance to fluoroquinolones (in gyrA, parC) or nitrofurantoin (nfsA). Notably there was no difference between AMR rates in cases and controls, even at the level of individual genes:

genewise case control

We found that many of these AMR genes co-occured together in known mobile genetic elements:

Screen Shot 2018-08-21 at 12.59.41 pm.png

Quite often the structures of these elements were not totally resolvable from the genome assemblies, which were based on short Illumina reads only (no long reads for this data set unfortunately!)… but nevertheless, Danielle could often resolve co-localisation of these genes from the assembly graphs using Bandage:

Screen Shot 2018-08-21 at 1.00.28 pm

We had seen in the first paper that the isolates were highly diverse, comprising dozens of distinct clones… this tree is inferred from a core gene alignment of the study isolates together with some other genomes for context (GEMS study isolates are indicated as dark blue in the outer ring). The ten shaded clades indicate dominant clonal groups in the study population.


Back to the AMR study. We did a discriminant analysis of principle components (DAPC) to see whether the variation in the distribution of genetic determinants amongst the genomes could be used to discriminate between the clonal groups, and saw that AMR was not associated with individual clones:

clone DAPC

Instead we found that variation in AMR gene complement could discriminate isolates from different geographical region, suggesting that AMR genes more often reflect horizontal acquisition from distinct local gene pools in different parts of the world, rather than fixed features of their host bacterium that travel the world with their host strain (clone):

region DAPC

In particular, we saw that fluoroquinolone resistance associated mutations in gyrA were associated with Asian sites; while sites in East vs West Africa could be discriminated by the presence of different dihydrofolate reductase (dfr) genes responsible for trimethoprim resistance, with dfrA8 being more common in West Africa and dfrA5 being present in East Africa.

region gene prevalence

The data we have showed regional differences in AMR phenotypes, and in antibiotic usage for treatment of paediatric diarrhoea at the GEMS sites.

drug res and usage

a) Resistance phenotypes. b) Frequency of antimicrobials prescribed to children with watery diarrhoea. c) Frequency of antimicrobials prescribed to children with dysentery.

However the prevalence of acquired resistance genes amongst E. coli isolated from each site was not associated with local frequencies of drug usage. The exception was fluoroquinolones: point mutations in gyrA and parC (which reduce MIC to ciprofloxacin) were more common at the Asian sites, where ciprofloxacin was used much more often to treat diarrheal disease than in African sites.

cipR gyrA parC

There are many possible reasons for the lack of association between local prescribing for diarrheal disease and the presence of AMR genes in local diarrheal pathogens. We expect that most antimicrobial exposure in human gut bugs like E. coli probably is not associated with attempts to treat E. coli infection at all, but with exposure to drugs given to treat other infections, drugs used in food animals which are a reservoir for E. coli, or even environmental contamination with antibiotics. Also because the horizontally acquired genes tend to travel together as a group in mobile genetic elements, exposure to one drug can co-select for resistance to many. This may be one reason that the association was more evident for ciprofloxacin use and gyrA/parC mutations, which are not in linkage with acquired AMR genes.

Finally, the data provided an opportunity to explore how well we can predict AMR phenotypes based on identifying known genetic determinants of AMR in E. coli genomes. The results were pretty good, indicating low rates of “very major errors” (where we predict a strain to be susceptible, but really it is resistant) for most drug classes. These results are comparable to those done independently in other collections of E. coli and also other bacteria, summarised here. But clearly there is room for improvement, and probably a few new mechanisms floating around out there… notably we didn’t aim to assess changes in expression of intrinsic E. coli genes, such as efflux pumps and beta-lactamases, which can contribute to drug resistance but are not so easy to find in genome data.




Genomics of atypical enteropathogenic E. coli

Well our paper on “atypical” enteropathogenic E. coli is finally out, in the new journal Nature Microbiology. The hard work on this paper was done by Danielle Ingle, whom I co-supervise together with Roy Robins-Browne and will soon become the first of my students to complete a PhD. Unfortunately the paper is paywalled at the moment, so I’ll try to summarise the key elements here.

For those who aren’t familiar with the wacky world of E. coli pathotypes, the LEE genomic island is the darling of E. coli pathogenicity genetics. It encodes a type three secretion system, essentially a syringe structure that enables the bacteria to inject its own proteins (known as secreted effector proteins) directly into human cells and – ahem – mess them up (for more technical details see the impressive body of work from Melbourne University’s very own Jacqueline Pearson and Liz Hartland). E. coli that possess shiga toxin as well as the LEE cause nasty bloody diarrhoea and are defined as enterohaemorrhagic E. coli (EHEC), while those with LEE + the bfp adhesion factor cause diarrhoea and are defined as enteropathogenic E. coli (EPEC). Strains carrying the LEE but lacking both shiga toxin and bfp can also cause diarrhoea, but are also sometimes isolated from healthy individuals with no symptoms of diarrhoaea, and it is this big group of LEE + ??? strains that are defined as atypical EPEC (aEPEC), while those with bfp are dubbed typical EPEC.

No one has been able to find a common virulence factor that differentiates diarrhoeagenic aEPEC from asymptomatic strains… which is not that surprising considering that aEPEC is really just an umbrella term for a diverse group of organisms that happen to carry the LEE genomic island.

In this paper, we set out to define the population structure of aEPEC strains using genomics, and also to investigate variation in the LEE island itself. Our strain set is ~200 newly sequenced aEPEC that were isolated from children with diarrhoea, and from asymptomatic age-matched controls, as part of the Global Enteric Multicenter Study (GEMS) – a massive study into the etiology of childhood diarrhoea across seven sites in Africa and Asia, funded by the Gates Foundation. We also included lots of publicly available genomes from NCBI, which included ~60 additional aEPEC.

So what did we find? Well from a population structure point of view, we confirmed what we suspected from the beginning – that the ~200 aEPEC strains actually represent dozens of distinct lineages or clonal groups within the E. coli whole genome phylogeny. We tried making the core genome tree in a few different ways, including mapping reads to a reference genome vs using Roary to generate core gene alignments from assemblies; with and without removing recombinant regions identified using ClonalFrameML. The alternative trees all tell the same population structure story, as you can see below. An interactive version of the mapping-based, recombination-filtered tree (which appears in Figure 1 of the paper, panel a below) is available to play with in MicroReact.


Supplementary Figure 1

The 10 clades highlighted in the tree are those containing >5 aEPEC in our collection, which represent the most common aEPEC lineages. The figures below show that these 10 aEPEC groups are present across the Asian and African GEMS sites; most also appear in non-GEMS collections from Europe as well as North and South America, indicating they are globally distributed.


Next we looked in depth at the LEE genomic island itself. ClonalFrame analysis identified lots of recombination across the locus, particularly affecting the proteins that make direct contact with host cells:

Screen Shot 2016-01-19 at 9.46.45 am

Danielle also did a lot of work to examine selection pressures on the various LEE genes, and found evidence for co-evolutionary constraints on groups of LEE genes (see the paper for details).

We used the recombination-free ClonalFrame tree, which represents the underlying clonal or vertical evolutionary relationships between LEE sequences, to define LEE lineages and subtypes. We also made a MLST-style scheme for the LEE, to make it easy to identify types from sequence data without having to compare to reference trees. The sequence files are included in the SRST2 distribution and can be downloaded from https://github.com/katholt/srst2.

Screen Shot 2016-01-19 at 9.50.41 am

There were three clearly distinct LEE lineages, separated from one other by 4-7% divergence (similar to genus-level differences between bacteria) and with different preferences for chromosomal insertion site. One of the LEE lineages (lineage 1) was entirely novel and we found it only in aEPEC isolates from the GEMS study. Functionality of the LEE was confirmed in all GEMS strains, including these novel lineage 1 examples.

Importantly, different aEPEC lineages had distinct LEE variants, often integrated at different chromosomal sites, confirming that aEPEC lineages have evolved numerous times independently via distinct LEE acquisition events. Also, some aEPEC lineages contained strains carrying different LEE lineages and/or subtypes, sometimes integrated at different sites, indicating ongoing evolution within some aEPEC clades.


Unsurprisingly, some of the “aEPEC” lineages also include examples of EHEC and/or typical EPEC isolates, i.e. subclades within the LEE-containing lineage that have subsequently acquired shiga toxin (via phage) or a bfp plasmid. We also found extensive variation in the complement of known LEE-secreted effectors (which are scattered around the genome and known as non-LEE encoded effectors or nle genes), both within and between lineages.

There’s lots more work to do to unravel the genetic basis for pathogenicity in E. coli, including identifying more of the colonisation factors, regulators and effectors that are important for causing disease in humans. But I think this analysis is a great step towards clarifying the population structure of EPEC and the LEE more generally, which opens the door for much more informative diagnostics and epidemiological studies. This work should also help immensely in guiding future pathogenicity research… most research on EPEC or EHEC pathogenicity has so far focused on a handful of strains, but our genomic analysis shows there is a much wider diversity of strain backgrounds, LEE variants and effector gene content that also needs to be considered.