GATK recalibration problems: low coverage and few SNVs

We’ve had reports about very strange results mostly on viral genomes. In such cases very few SNVs will be reported, with odd frequencies and very low coverage. We could trace this back to GATK’s base call recalibration step, in which for yet unknown reasons, most base call qualities are set to the lowest possible value by GATK. We are investigating this and ask users to run LoFreq on the uncalibrated data in such cases (keeping in mind that spurious SNV calls are possible).

LoFreq version 0.6.0 released

Changes:

  • New option –cons-as-ref for cases where you want to call a consensus base per position and then call SNVs against it instead of calling SNVs against the given reference
  • Default Bonferroni factor reverted to auto instead of auto-ign-zero-cov
  • Removed Biopython dependency
  • Added CONSVAR INFO field to vcf for denoting majority changes with regard to the reference base
  • User supplied CFLAGS and LDFLAGS are passed down to Python’s extension build as well

Version 0.5.0 released

Changes:

  • We are now also using read mapping qualities. This is achieved by joining base-call (P_bq) and mapping qualities (P_mq): P_joined = P_mq + (1- P_mq)*P_bq
  • Now using GNU autotools for compilation and installation (./configure && make install)
  • Now including a modified version of samtools mpileup
  • Now including helper scripts to create a stringent, recalibrated mapping (bwa_unique.sh and base_qual_calib_wrapper.sh)
  • Fixed mixup between –bonf options auto-ign-zero-cov and auto. The former is now default
  • Added script lofreq_alnoffset.py which makes comparison of SNV calls made on different coordinate systems / against different reference sequences easier