Skip to content

Possible startxrefPreg extension #756

@xBambey

Description

@xBambey

Some PDFs in a project could not be read by the parser.

After a closer examination of the binary data, it was noticed that there is often a space before the reference byte.

After a brief search on the Internet, I could not find any information as to whether this space may be included or not. Perhaps someone here who is more familiar with the subject knows more.

By inserting an optional space in the RegEx at this point, the PDF is recognized again.

RegEx would then look like the following:

'/(?<=[\r\n])startxref[\s]*[\r\n]+[\s]*([0-9]+)[\s]*[\r\n]+%%EOF/i'

// Find all startxref tables from this $offset forward
$startxrefPreg = preg_match_all(
'/(?<=[\r\n])startxref[\s]*[\r\n]+([0-9]+)[\s]*[\r\n]+%%EOF/i',
$pdfData,
$startxrefMatches,
\PREG_SET_ORDER,
$offset
);

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions