The CEQTM
series DNA sequencing software has built in tools to assist the user in
determining the quality of the sequence data generated -- essential for
sequence alignment and assembly. By building these capabilities
directly into the software, confidence for every base is generated
without having to export to third party software.
The quality of called bases are assessed using two calibrations which
essentially estimate the probability of error, and depending upon how
they are viewed may represent either the probability of error or can be
transformed to represent the probability of correctness.
- Call Scores: Estimate the probability of error and are indicated by
a linear scale from 0.00 to 1.00 (0.99 representing 1 error in 100).
An example of the display is illustrated in Figure 1.
- Quality Values: Provide a transformed estimate of the probability
of correctness (1 - probability of error), and are presented in a log
scale from, 0-50. Generally speaking, quality values of 30 - 50 are
considered to represent high quality sequence.
These error probability factors are assigned by a method that does not
require prior knowledge of the true DNA sequence. The error rate is the
actual number of errors in a section of DNA divided by the number of bases
in that section of DNA.
Errors in base calling are usually the result of misinterpretation of
peaks in a region of the peak trace, but not in the peak itself. Examining
the characteristics of the peaks in the vicinity of the erroneous peak
reveals the indicators of error. The most effective parameters for detecting
base calling errors consider a window of the trace that includes several
peaks flanking the one whose base-call is being assessed.
The parameters used by the CEQ series base caller include:
- Peak spacing - ratio of the largest peak-to-peak spacing to the
smallest peak-to-peak spacing in a given window of seven peaks.
- Peak resolution - the number of bases between the current base and
the nearest unresolved bases x (-)1
- Uncalled/called height ratio in seven peaks - ratio of the amplitude
of the largest uncalled peak to the smallest uncalled peak in a given
window of seven peaks.
- Uncalled/called height ratio in three peaks - ratio of the amplitude
of the largest uncalled peak to the smallest called peak in a given
window of three peaks.
- An event pair score - events are detected elongated peaks and
interpolated bases.
Call scores utilize parameters 1 - 3. Values
generally below 0.97 are considered poor.
Quality values utilize parameters 1 - 5 and produce finer detail in the
local peak environment. Quality values utilize a logarithmic scale from
0 - 50. Use of a log-transformed error probability facilitates working with
error rates in the range of most importance (very close to 0). The quality
value q assigned to a base-call is defined by the following equation:
q = -10 x log10(p)
where p is the estimated error probability for that base-call. For example:
- A probability of error of 1/100 of being correct is assigned a quality
value of 20
- A probability of error of 1/1000 of being correct is assigned a quality
value of 30
- A probability of error of 1/10,000 of being correct is assigned a quality
value of 40
High quality values correspond to low error probabilities. Called bases with assigned quality values of 30 to 40 are considered to be very high quality sequence.
|