Whole genome sequencing of Mycobacterium tuberculosis from Cambodia

MTB strain characteristics
Between October 2017 and January 2018, 100 sputum specimens were collected from new bacteriologically confirmed cases of pulmonary tuberculosis in three operational districts of Cambodia: Daun-Keo in Takeo province, Dangkao and Pou Senchey in the capital Phnom Penh ( Fig. 1). Eighty-three mycobacterial isolates were cultured and sequenced successfully. The sequence read from 80 samples successfully mapped to the H37Rv reference genome with an average read depth of 251X and an average genome coverage of 98.9%. Based on Kraken analysis, the three mismapped samples were identified as being from nontuberculous mycobacteria (NTM) species of M. sinense and Mr. terrae and of Mycobacterium abscessus species and were therefore excluded from further analyzes (Fig. 1).
Map and flowchart illustrating the Phnom Penh and Takeo sampling collections in Cambodia and the number of samples collected. The map was created with QGIS software version 3.8.0 released under the Creative Commons Attribution-ShareAlike 3.0 (CC BY-SA) license (https://www.qgis.org/en/site/forusers/download.html) using shapefiles generated from OpenStreetMap data licensed under Open Database 1.0 (www.openstreetmap.org).
Phenotypic and genotypic antimicrobial resistance
The majority of isolates (73/80, 91.25%) were phenotypically susceptible to all anti-TB drugs tested, while a handful (7/80, 8.75%) were resistant to at least one anti-TB drug (isoniazid or streptomycin) , including 3 isolates from Phnom Penh that were resistant to both drugs (Table 1). High accuracies were obtained for the genotypic-phenotypic correlation of resistance predictions (rifampicin = 80/80, 100%; isoniazid = 77/80, 96.25%; ethambutol = 80/80, 100%; streptomycin = 78/ 80, 97.5%), where known genotypic mutations associated with isoniazid resistance were found in catG (S315T) and the fabG1-inhA promoter region (C-15T); while mutations associated with streptomycin resistance have been found in the rpsL (K88R and K43R) and rrs (517c > t) genes (Table 1). In all drug-susceptible phenotypic isolates, we did not identify the presence of any known resistance-associated genotypic mutations in the genes assessed. Thus, the specificity of the genotypic prediction of resistance for each of the antituberculosis drugs tested was 100%. However, the sensitivities of genotypic resistance predictions to isoniazid and streptomycin were less than 40% and 80% respectively.
For phenotypic resistant isolates with no known resistance genetic mutations identified, we assessed non-synonymous mutations identified with the “Catalogue of Mutations in Mycobacterium tuberculosis complex and their association with drug resistance” from the WHO11. For streptomycin, the two mutations: gid_K163E and whiB6_Q94E have been categorized as Group 3: Uncertain significance in the WHO catalog, where the variant gid_K163E was observed once in susceptible strains, and none in resistant strains, whereas whiB6_Q94E was observed in 73 susceptible strains and twice in resistant strains. For isoniazid, the variant katG_R463L was observed in the 3 discordant isolates, but this variant was categorized as Group 5: Not associated with resistance in the WHO catalogue. Another variant katG_R128Q identified was listed in Group 3: Uncertain Significance, where the variant was observed once in resistant strains and none in susceptible strains. The two remaining variants: ndh_V18G and ahpC_H93R have not been evaluated in the WHO catalog although the variant ndh_V18A was found in group 5: Not associated with resistance in the WHO catalogue.
Structure of the mountain bike population
Lineages inferred from MTB isolates were broadly concordant in the 4 genotyping methods of (1) Coll et al. Barcode 62-SNP12; (2) Netikul et al. prediction of subline 113; (3) SpoTyping spoligotypes and (4) RD-Analyzer based deletion regions (Supplementary Table 1). The majority of isolates belonged to lineage 1 (Indo-Oceanic) (69/80, 86.25%), followed by lineage 2 (East Asian) (10/80, 12.5%) and lineage 4 (Euro-American) (1/80, 1.25%). The inferred phylogeny of whole genome SNPs supported the classification of lineages and clades with the genotyping schemes (Fig. 2).

Maximum likelihood phylogenetic tree for the 80 MTB isolates in this study. The tree was constructed based on 12,353 SNPs extracted from the 80 MTB isolates, excluding SNPs in the PE/PPE and repeat regions. The tree is annotated (left to right) to illustrate where the isolate was collected (circle and triangle representing Phnom Penh and Takeo respectively), and whether the isolate was inferred to be clustered (colors representing different clusters while gray indicates non-cluster isolates), sample identifier which is color coded to represent SNP lineages (lineage 1 is based on Netikul et al. scheme while lineage 2 is based on the 62-SNP Coll et al.), and two colored columns indicating phenotypic susceptibility to isoniazid (INH) and streptomycin (STM) (green indicates susceptibility, while red indicates resistance).
Most lineage 2 isolates belonged to the L2.2.1 (modern Beijing) sublineage (9/10, 90%) and originated from Phnom Penh (7/10, 70%), while only one isolate belonged to the L2.1 sublineage (proto-Beijing). The lineage-family inference of this isolate from lineage L2.1 by its spoligotype was however unknown (SIT 246) according to a search in the SITVIT2 database.14.
In this study, we identified isolates in only two of the three major lineage 1 clades12,13,15, namely L1.1 and L1.2, with a dominance of L1.1 (60/69, 87.0%). L1.3 was widespread in Africa, India and Thailand (specific subline of L1.3.2), but not found in Vietnam13.
Within L1.1, there are three main sub-lineages: L1.1.1-L1.1.3. As with the isolates from Vietnam analyzed in Netikul et al., we did not identify any isolates in this study belonging to L1.1.2 and L1.1.3. Among the L1.1.1 isolates, 22 isolates can be grouped into L1.1.1.1 (6/22; 27.3%), L1.1.1.3 (4/22; 18.2%), L1.1.1.5 (7/22; 31.8%) and L1.1.1.10 (5/22; 22.7%). The distribution of sublineages for L1.1 was therefore found to correlate with the distributions identified for Thailand and Vietnam, as reported in Netikul et al.13including a new clade of L1.1.1.10 found only in samples from Thailand and Vietnam.
Using clade nomenclature as reported in Netikul et al., L1.2 can be delineated into 2 clades, where L1.2.1 mainly includes isolates collected from European countries and L1.2.2 which is prevalent in East Asia. East and Southeast Asia. In this study, we identified no L1.2.1 isolates and found nine L1.2.2 isolates (9/69, 13.0%) of which four can be further delineated in the L1.2.2 sublineage. 3. Isolates of sublineage 1.2.2 were mainly from Takeo province (8/9, 88.9%) (Fig. 2).
Suspected active disease transmission through clustered isolates
Eight clusters were identified, each comprising two to five individuals (Fig. 2), where SNP differences between isolates ranged from zero to eight (Supplementary Table S2). All 28 individuals pooled resided in Takeo province and represented half (28/56, 50%) of all individuals sampled in the province. None of the pooled isolates proved resistant to any of the anti-tuberculosis drugs tested. Additionally, all clustered isolates belonged to lineage 1, where the majority were from sublineage L1.1.1 (24/28, 85.7%). Two clusters can be delimited into subline L1.1.1.5 (cluster 1) and subline L1.1.1.3 (cluster 8). There are seven isolates in this study that have been identified as belonging to the L1.1.1.5 sublineage, all sampled from Takeo, and six of them are in cluster 1. Similarly, three isolates were identified as belonging to the L1.1.1.5 sublineage. lineage L1.1.1.3 where the two isolates sampled in Takeo are in cluster 8, while the remaining third isolate sampled in Phnom Penh differs by 58 SNPs with the clustered isolates. The four isolates belonging to the L1.2.2.3 sublineage in this study grouped together in group 3.