GATK recalibration problems: low coverage and few SNVs

24 Apr 2013

We’ve had reports about very strange results mostly on viral genomes. In such cases very few SNVs will be reported, with odd frequencies and very low coverage. We could trace this back to GATK’s base call recalibration step, in which for yet unknown reasons, most base call qualities are set to the lowest possible value by GATK. We are investigating this and ask users to run LoFreq on the uncalibrated data in such cases (keeping in mind that spurious SNV calls are possible).

LoFreq version 0.6.0 released

27 Mar 2013

Changes:

New option –cons-as-ref for cases where you want to call a consensus base per position and then call SNVs against it instead of calling SNVs against the given reference
Default Bonferroni factor reverted to auto instead of auto-ign-zero-cov
Removed Biopython dependency
Added CONSVAR INFO field to vcf for denoting majority changes with regard to the reference base
User supplied CFLAGS and LDFLAGS are passed down to Python’s extension build as well

Version 0.5.0 released

25 Jan 2013

Changes:

We are now also using read mapping qualities. This is achieved by joining base-call (P_bq) and mapping qualities (P_mq): P_joined = P_mq + (1- P_mq)*P_bq
Now using GNU autotools for compilation and installation (./configure && make install)
Now including a modified version of samtools mpileup
Now including helper scripts to create a stringent, recalibrated mapping (bwa_unique.sh and base_qual_calib_wrapper.sh)
Fixed mixup between –bonf options auto-ign-zero-cov and auto. The former is now default
Added script lofreq_alnoffset.py which makes comparison of SNV calls made on different coordinate systems / against different reference sequences easier