We used the GATK UG v1.6 for SNV discovery and calculation of genotype likelihoods. We ran GATK UG on 5Mb non-overlapping bins on all samples together. We applied the GATK UG multi-allelic model to call our SNVs to prevent wrongly calling multi-allelic si
odel is built using sites overlapping with known SNVs under the assumption that they are true polymorphic sites. The second model is built on the 3% least confident sites using them as true negatives. We used the following datasets and features to train t
ered the two following metrics and applied VQSR to the data afterwards: a) Sensitivity: proportion of sites in both GoNL and the training sets kept unfiltered. b) Specificity: Transition/Transversion (Ti/Tv) ratio in the entire dataset.