Settings for Pacbio data



NGSengine handles two types of PacBio data:

  • Subreads (also called raw reads)
  • Consensus reads: these are pre-analyzed by the PacBio Long Amplicon Analysis tool (LAA) and contain only a small number of reads (one or two reads per gene)


Subreads:

Go to File > Preferences > Locus default settings. 

For each locus separately, select the following settings:

  • Phasing algorithm: Cluster (1)
  • Quality trimming: select the default settings for PacBio (2)



NGSengine recognizes if these files contains reads >1250 bp. These are analyzed automatically in PacBio mode. 

Typical results with subreads:

  • Noise levels ~10%
  • Lower mappability per gene (5-30%) caused by higher noise levels, sequence artefacts and very long reads
  • Relatively many single base insertions and deletions, homopolymer issues, potentially hampering phasing
  • Single basepair artefacts and homopolymer issues can be avoided by excluding these positions from analysis


Consensus reads:

Check that the cluster phaser is applied (see picture above), for each locus separately.

Switch on PacBio consensus mode by selecting sample(s) > click right mouse button > Sample settings:



A window opens > select PacBio consensus > click OK:



Sample name turns blue:




Other classifier, aligner and phaser will be applied, which are optimized for low number of long reads.

Typical results with consensus reads:

  • Mappability in most cases 100%