Skip to content

FILTER line is malformed #1

@dridk

Description

@dridk

Issue from jamescasbon#337

Background:
In FILTER, multiple filters should be separated by semicolons. The widely used, but not actively maintained, VarScan2 genomic variant caller uses commas instead. Moreover, VarScan2 does not add ##FILTER metadata for most of its filters. Picard FixVcfHeader can be used to fix missing FILTER metadata. A "fixed" metadata row will look like:
##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">

Error:
PyVCF fails with:
`
Traceback (most recent call last):
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 236, in
main()

File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 232, in main
run(parser.parse_args())

File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 166, in run
df_1 = vcf_to_dataframe(args.vcf_1)

File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 74, in vcf_to_dataframe
vcf_reader = vcf.Reader(open(vcf_file, "r"))

File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 300, in init
self._parse_metainfo()

File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 326, in _parse_metainfo
key, val = parser.read_filter(line)

File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 142, in read_filter
raise SyntaxError(

SyntaxError: One of the FILTER lines is malformed: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
`

Issue:
It might be more robust for PyVCF to treat a filter with commas as just one big filter name, as does Picard FixVcfHeader.
Instead of raising an exception, accept metadata with a filter ID inside double quotes and containing commas, e.g., ID="RefAvgRL,VarAvgRL".
Similarly, in the data, treat a FILTER value like RefAvgRL,VarAvgRL as a single entity. I think this solution is consistent with the VCF 4.2 spec for a filter name: String, no whitespace or semicolons permitted.

Possible pull request:
This hack (changing [^,] + to .+ worked to get me through an urgent analysis, but it may not be the best solution. At parser.py line 142
self.filter_pattern = re.compile(r'''##FILTER=< ID=(?P.+),\s* Description="(?P[^"]*)" >''', re.VERBOSE)

=======

I get the same problem, any update on this issue ?

I hoped switching to PyVCF3 (c.f. jamescasbon#335 ) would solve the issue but apparently not.

My bad, in my case the problem originated from a tag Source in a FILTER field:

##FILTER=<ID=xxx,Description="yyy",Source="zzz">

which is a INFO field tag according to https://samtools.github.io/hts-specs/ and not a FILTER field tag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions