Settings for Pacbio data

NGSengine handles two types of PacBio data:

Subreads (also called raw reads)
Consensus reads: these are pre-analyzed by the PacBio Long Amplicon Analysis tool (LAA) and contain only a small number of reads (one or two reads per gene)

Subreads:

Go to File > Preferences > Locus default settings.

For each locus separately, select the following settings:

NGSengine recognizes if these files contains reads >1250 bp. These are analyzed automatically in PacBio mode.

Typical results with subreads:

Noise levels ~10%
Lower mappability per gene (5-30%) caused by higher noise levels, sequence artefacts and very long reads
Relatively many single base insertions and deletions, homopolymer issues, potentially hampering phasing
Single basepair artefacts and homopolymer issues can be avoided by excluding these positions from analysis

Consensus reads:

Check that the cluster phaser is applied (see picture above), for each locus separately.

Switch on PacBio consensus mode by selecting sample(s) > click right mouse button > Sample settings:

A window opens > select PacBio consensus > click OK:

Sample name turns blue:

Other classifier, aligner and phaser will be applied, which are optimized for low number of long reads.

Typical results with consensus reads: