Image in Table Preventing Table Extraction #3585
-
Hello, On page 7 of the attached pdf, you will notice that two small images with links appear in the table with first column "Last Raced". When I run: I get an error message indicating that a table doesn't exist. I don't have this issue with other pdfs in the same document (clip slightly different) but which do not contain a small image in the table. Any thoughts on how to extract the table with the first column titled "Last Race"? I tried creating a new pdf with the link removed, but PyMuPDF still did not grab the table. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Sorry, this unfortunately is one of the cases, where PyMuPDF table recognition fails: no grid lines whatsoever, unclear word positioning information, etc. |
Beta Was this translation helpful? Give feedback.
Certainly possible, but more complex. It would be easier if you knew a list of text alternatives and could check for them.
But you can extract all bold text spans and fight your way through this jungle.