LoFreq* (i.e. LoFreq version 2) is a fast
and sensitive variant-caller for inferring SNVs and indels from
next-generation sequencing data. It makes full use of base-call
qualities and other sources of errors inherent in sequencing
(e.g. mapping or base/indel alignment uncertainty),
which are usually ignored by other methods or only used for filtering.
LoFreq* can run on almost any type of aligned sequencing data
(e.g. Illumina, IonTorrent or Pacbio) since no machine- or
sequencing-technology dependent thresholds are used. It automatically
adapts to changes in coverage and sequencing quality and can therefore
be applied to a variety of data-sets
e.g. viral/quasispecies, bacterial, metagenomics or
LoFreq* is very sensitive; most notably, it is able to predict
variants below the average base-call quality (i.e. sequencing error
rate). Each variant call is assigned a p-value which allows for
rigorous false positive control. Even though it uses no approximations
or heuristics, it is very efficient due to several runtime
optimizations and also provides a (pseudo-)parallel implementation.
LoFreq* is generic and fast enough to be applied to high-coverage data
and large genomes. On a single processor it takes a minute to analyze
Dengue genome sequencing data with nearly 4000X coverage, roughly one
hour to call SNVs on a 600X coverage E.coli genome and also roughly
an hour to run on a 100X coverage human exome dataset.
For more details on the original version of LoFreq see
Wilm et al. (2012).
Latest Blog Post:
We recently got asked a lot why LoFreq’s VCF output has no FORMAT and
SAMPLE columns. The reason for their absence is that they represent
genotyping information and current LoFreq versions don’t call
genotypes. This shouldn’t stop you from using LoFreq, as genotype
information is often not even needed (depending on your
analysis). Some downstream tools might required it (e.g. vcf_melt),
even though these columns are optional according to the
VCF specification (see e.g. section 1.3).
As a workaround, you can just add fake columns in cases where you know
that the information is actually not required and for somatic
samples you can use
, which comes with LoFreq (note the pysam dependency). The next versions of LoFreq (2.2) will be
able to call genotypes.
### Click here to read all blog entries