Long-read sequencing for the identification of insertion sites in large libraries of transposon mutants
DNA was prepared for nucleotide sequencing using both the new LoRTIS method and the previously described TraDIS-.xpress duplicate protocols, from DNA extraction to generating nucleotide sequence reads. This allows a comparison of reproducibility within each method and also to compare LoRTIS data to those generated using TraDIS-Xpress. For the first LoRTIS replicate, 8.7 million nucleotide sequence reads were generated, of which 4.2 million (48%) included transposon-specific sequences, while for the second replicate, 7.6 million of the 14.2 million reads (54%) had transposon-specific sequences. These data demonstrate that this new method has been successfully enriched for transposon-genome junctions. Read lengths ranged from 300 bp to over 13,000 bp, with an average of over 1,200 bp (Supplementary Table 1).
Reproducibility of LoRTIS and comparison with TraDIS-xpress
The number of nucleotide sequence reads mapped to each gene was determined, and comparison between these values for replicates 1 and 2 in a scatterplot demonstrated the reproducibility of the LoRTIS method (Fig. 2). Comparison of reads per gene generated from LoRTIS with data from TraDIS-xpress highlights the similarity using the two different methods (Fig. 2). Spearman’s correlation coefficient between LoRTIS and TraDIS-xpress data sets was 0.93. The distribution of mapped sequence reads also showed similarity in their positions and numbers between the two methods, indicating accurate calling of transposon insertion sites by LoRTIS (Fig. 3).
Identification of candidate essential genes
During transposon mutagenesis for TIS experiments, mutants with transposon insertions in essential genes do not develop. Therefore, assuming that enough transposon mutants and nucleotide sequence reads are generated to avoid stochastic regions of low coverage, the TIS data should include relatively few sequence reads that match essential genes. However, if the data includes insufficient sequence reads, resulting in the loss of certain genes, then these will appear essential even when they are not, and therefore the precise calling of essential genes requires sufficient data to overcome that. Thus, an ideal quality control of TIS data is a clear demonstration that the reads mapped are distributed across the genome, and enough data is generated to distinguish where very few or no reads are mapped in known essential genes. .
The LoRTIS data presented here not only resulted in sequence reads that mapped across the genome, but also demonstrated an absence of mapped reads in many putative essential genes identified using TraDIS-xpress. As an example, the similarity in the distribution of LoRTIS and TraDIS mappedxpress reads generated over a short section of the genome are shown in Fig. 3. No sequence reads are mapped to candidate essential genes raw and groLwhile there was an abundance of readings that matched the dcuA, fxsA, yjeH and yjeJ genes using both LoRTIS and TraDIS-xpressconfirming that LoRTIS was at least equal to TraDIS-xpress in this regard.
A list of putative essential genes generated from our LoRTIS data was also compared to lists derived from TraDIS-xpress data and conventional TraDIS data from another group6.7. These reference datasets were selected for comparison purposes because they were generated from the same strain of E.coli (BW25113). TraDIS-xpress and TraDIS data was produced using the Illumina platform for sequence generation, while LoRTIS used nanopore sequencing. Comparisons of putative essential genes showed that 311 identified essential genes were common to all three methods (Fig. 4; Supplementary Table 2). Figure 4 illustrates the putative essential genes identified using each method and their relative distribution. Of 398 putative essential genes that have been identified by our TraDIS-xpress data, 340 (85%) were also identified by LoRTIS.
Advantages of long sequence reads in mapping transposon insertion sites in regions of repeated nucleotide sequences
Long reads are particularly useful for mapping unique sites in the genome when the organism’s genome size is large or there are repeating elements. LoRTIS can produce long reads that map repeated elements and into unique regions of the genome, allowing us to identify transposon insertions. In E.coli BW25113, there are seven ribosomal RNA operons; each is over 5 kb in size and contains two highly conserved ribosomal RNA genes. Readings generated by TraDIS-xpress could not be uniquely mapped to these operons while the reads generated by LoRTIS could. Although most of the reads generated in this study were between 0.3 and 2 kB in length, they were uniquely mapped. Indeed, either the reads spanned regions of polymorphisms in the repeat elements, or the reads spanned unique flanking nucleotide sequences (Fig. 5).
Another set of repeated elements in E.coli are the ins places (insA, insB, insH) that have more than one copy of genes spread across the genome16. Insertions of transposons have been reported in these ins loci, but again it was not possible to map any given copy with certainty using short reads. In our LoRTIS data, there were over 47,000 sequence reads that matched ins loci, of which ~22,000 uniquely mapped (47%) while in the TraDIS-xpress data generated by the Illumina short-read platform, across 28,000 reads mapped to ins loci, only ~6500 uniquely mapped (17%) (Supplementary Table 3). These data demonstrate that LoRTIS long reads can uniquely map reads to repeating elements more efficiently than TraDIS-xpress.
Multiplexing of LoRTIS experiments
A unique sequence identifier (barcode) can be added to DNA fragments of a sample during the sequencing library preparation step, allowing different samples to be combined and sequenced on a single flow cell (multiplexed) and after sequencing, reads from each sample can be separated from the pool based on the barcode (demultiplexed). Oxford Nanopore uses 24 bp sequences to assign a unique identifier to each sample; these are called Native Barcodes (NBD), and 96 NBDs are available. We used four of these NBDs to multiplex our LoRTIS DNA fragment preparations. Of the sequence reads from our LoRTIS experiment, 94% and 84% were demultiplexed into these unique NBDs in replicates 1 and 2, respectively. Although each NBD produced a different number of reads, no bias was observed when using a particular NBD (Fig. 6). This confirms that LoRTIS can successfully integrate the multiplexing of different experimental samples onto a single MinION flow cell.