What happened:
Too many Debian packages contain poorly parsed copyright information.
It's even contained in at least one test case:
|
// note: this should not capture #, Permission, This, see ... however it's not clear how to fix this (this is probably good enough) |
What you expected to happen:
Only parse debian/copyright files according to the machine-readable format if they make the claim to be in that format.
https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ specifies that there is a mandatory field called Format: that contains a (http or https) link to the spec.
Only files that contain this field should be parsed as machine-readable. All other files should instead have their entire content put into the .text.content of a license object.
Steps to reproduce the issue:
It's part of a test case in the repo.
What happened:
Too many Debian packages contain poorly parsed copyright information.
It's even contained in at least one test case:
syft/syft/pkg/cataloger/debian/parse_copyright_test.go
Line 35 in d71b747
What you expected to happen:
Only parse debian/copyright files according to the machine-readable format if they make the claim to be in that format.
https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ specifies that there is a mandatory field called
Format:that contains a (http or https) link to the spec.Only files that contain this field should be parsed as machine-readable. All other files should instead have their entire content put into the
.text.contentof a license object.Steps to reproduce the issue:
It's part of a test case in the repo.