Whole genome sequencing of Mycobacterium tuberculosis from Cambodia
MTB strain characteristics
Between October 2017 and January 2018, 100 sputum specimens were collected from new bacteriologically confirmed cases of pulmonary tuberculosis in three operational districts of Cambodia: Daun-Keo in Takeo province, Dangkao and Pou Senchey in the capital Phnom Penh ( Fig. 1). Eighty-three mycobacterial isolates were cultured and sequenced successfully. The sequence read from 80 samples successfully mapped to the H37Rv reference genome with an average read depth of 251X and an average genome coverage of 98.9%. Based on Kraken analysis, the three mismapped samples were identified as being from nontuberculous mycobacteria (NTM) species of M. sinense and Mr. terrae and of Mycobacterium abscessus species and were therefore excluded from further analyzes (Fig. 1).
Phenotypic and genotypic antimicrobial resistance
The majority of isolates (73/80, 91.25%) were phenotypically susceptible to all anti-TB drugs tested, while a handful (7/80, 8.75%) were resistant to at least one anti-TB drug (isoniazid or streptomycin) , including 3 isolates from Phnom Penh that were resistant to both drugs (Table 1). High accuracies were obtained for the genotypic-phenotypic correlation of resistance predictions (rifampicin = 80/80, 100%; isoniazid = 77/80, 96.25%; ethambutol = 80/80, 100%; streptomycin = 78/ 80, 97.5%), where known genotypic mutations associated with isoniazid resistance were found in catG (S315T) and the fabG1-inhA promoter region (C-15T); while mutations associated with streptomycin resistance have been found in the rpsL (K88R and K43R) and rrs (517c > t) genes (Table 1). In all drug-susceptible phenotypic isolates, we did not identify the presence of any known resistance-associated genotypic mutations in the genes assessed. Thus, the specificity of the genotypic prediction of resistance for each of the antituberculosis drugs tested was 100%. However, the sensitivities of genotypic resistance predictions to isoniazid and streptomycin were less than 40% and 80% respectively.
For phenotypic resistant isolates with no known resistance genetic mutations identified, we assessed non-synonymous mutations identified with the “Catalogue of Mutations in Mycobacterium tuberculosis complex and their association with drug resistance” from the WHO11. For streptomycin, the two mutations: gid_K163E and whiB6_Q94E have been categorized as Group 3: Uncertain significance in the WHO catalog, where the variant gid_K163E was observed once in susceptible strains, and none in resistant strains, whereas whiB6_Q94E was observed in 73 susceptible strains and twice in resistant strains. For isoniazid, the variant katG_R463L was observed in the 3 discordant isolates, but this variant was categorized as Group 5: Not associated with resistance in the WHO catalogue. Another variant katG_R128Q identified was listed in Group 3: Uncertain Significance, where the variant was observed once in resistant strains and none in susceptible strains. The two remaining variants: ndh_V18G and ahpC_H93R have not been evaluated in the WHO catalog although the variant ndh_V18A was found in group 5: Not associated with resistance in the WHO catalogue.
Structure of the mountain bike population
Lineages inferred from MTB isolates were broadly concordant in the 4 genotyping methods of (1) Coll et al. Barcode 62-SNP12; (2) Netikul et al. prediction of subline 113; (3) SpoTyping spoligotypes and (4) RD-Analyzer based deletion regions (Supplementary Table 1). The majority of isolates belonged to lineage 1 (Indo-Oceanic) (69/80, 86.25%), followed by lineage 2 (East Asian) (10/80, 12.5%) and lineage 4 (Euro-American) (1/80, 1.25%). The inferred phylogeny of whole genome SNPs supported the classification of lineages and clades with the genotyping schemes (Fig. 2).
Most lineage 2 isolates belonged to the L2.2.1 (modern Beijing) sublineage (9/10, 90%) and originated from Phnom Penh (7/10, 70%), while only one isolate belonged to the L2.1 sublineage (proto-Beijing). The lineage-family inference of this isolate from lineage L2.1 by its spoligotype was however unknown (SIT 246) according to a search in the SITVIT2 database.14.
In this study, we identified isolates in only two of the three major lineage 1 clades12,13,15, namely L1.1 and L1.2, with a dominance of L1.1 (60/69, 87.0%). L1.3 was widespread in Africa, India and Thailand (specific subline of L1.3.2), but not found in Vietnam13.
Within L1.1, there are three main sub-lineages: L1.1.1-L1.1.3. As with the isolates from Vietnam analyzed in Netikul et al., we did not identify any isolates in this study belonging to L1.1.2 and L1.1.3. Among the L1.1.1 isolates, 22 isolates can be grouped into L126.96.36.199 (6/22; 27.3%), L188.8.131.52 (4/22; 18.2%), L184.108.40.206 (7/22; 31.8%) and L220.127.116.11 (5/22; 22.7%). The distribution of sublineages for L1.1 was therefore found to correlate with the distributions identified for Thailand and Vietnam, as reported in Netikul et al.13including a new clade of L18.104.22.168 found only in samples from Thailand and Vietnam.
Using clade nomenclature as reported in Netikul et al., L1.2 can be delineated into 2 clades, where L1.2.1 mainly includes isolates collected from European countries and L1.2.2 which is prevalent in East Asia. East and Southeast Asia. In this study, we identified no L1.2.1 isolates and found nine L1.2.2 isolates (9/69, 13.0%) of which four can be further delineated in the L1.2.2 sublineage. 3. Isolates of sublineage 1.2.2 were mainly from Takeo province (8/9, 88.9%) (Fig. 2).
Suspected active disease transmission through clustered isolates
Eight clusters were identified, each comprising two to five individuals (Fig. 2), where SNP differences between isolates ranged from zero to eight (Supplementary Table S2). All 28 individuals pooled resided in Takeo province and represented half (28/56, 50%) of all individuals sampled in the province. None of the pooled isolates proved resistant to any of the anti-tuberculosis drugs tested. Additionally, all clustered isolates belonged to lineage 1, where the majority were from sublineage L1.1.1 (24/28, 85.7%). Two clusters can be delimited into subline L22.214.171.124 (cluster 1) and subline L126.96.36.199 (cluster 8). There are seven isolates in this study that have been identified as belonging to the L188.8.131.52 sublineage, all sampled from Takeo, and six of them are in cluster 1. Similarly, three isolates were identified as belonging to the L184.108.40.206 sublineage. lineage L220.127.116.11 where the two isolates sampled in Takeo are in cluster 8, while the remaining third isolate sampled in Phnom Penh differs by 58 SNPs with the clustered isolates. The four isolates belonging to the L18.104.22.168 sublineage in this study grouped together in group 3.