Skip to content

Unable to get all images from a page #2714

Answered by JorjMcKie
mahdyshabeeb asked this question in Q&A
Discussion options

You must be logged in to vote

This may have several reasons:

  • There exist "inline" images. They have no xref and exist only inside the page appearace source (/Contents objects). You can still extract them via page.get_text("dict") (or "rawdict").
  • There exist vector graphics. The more complex ones may look like images, but they aren't and they also have no xref. You can extract those via page.get_drawings().
  • Some PDF creators make annotations with an image instead of a fill color (e.g. buttons). They do not appear in the page.get_images() although they do have an xref. We don't yet support to locate them.

Replies: 4 comments 23 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@mahdyshabeeb
Comment options

Answer selected by mahdyshabeeb
Comment options

You must be logged in to vote
18 replies
@JorjMcKie
Comment options

@benmagos
Comment options

@mahdyshabeeb
Comment options

@maxjeblick
Comment options

@JorjMcKie
Comment options

Comment options

You must be logged in to vote
4 replies
@JorjMcKie
Comment options

@enlacroix
Comment options

@JorjMcKie
Comment options

@JorjMcKie
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants
Converted from issue

This discussion was converted from issue #2712 on October 03, 2023 20:41.