Skip to content

Embedded png image cannot be parsed #749

@cyril-fsm

Description

@cyril-fsm
  • PHP Version: 7.4.11
  • PDFParser Version: 2.11

Description:

Hello, I am trying to parse a pdf that contains png images but I can't do it. The content of the image seems corrupted which triggers an error with zlib_decode
Thanks for your help

PDF input

https://telechargements.soludedia.fr/divers/LF6451.pdf

Expected output & actual output

Code

if ($f = file_get_contents('file.pdf')) {
	include "./inc/pdfparser-2024/alt_autoload.php";
	
	$parser = new \Smalot\PdfParser\Parser();

	$document = $parser->parseContent($f);
	$pages    = $document->getPages();
	$page     = $pages[0];
	$content  = $page->getText();
	
	$images = $document->getObjectsByType('XObject', 'Image');
	foreach($images as $index=>$image) {
		$img_content = $image->getContent();
		$details = $image->getHeader()->getDetails();

		if ('FlateDecode' === $image->getHeader()->getElements()['Filter']->getContent()) {
			$img_content = zlib_decode($img_content);
			echo '<img src="data:image/png;base64,'. base64_encode($img_content) .'" />';
		}
	}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions