I am unable to extract any type of text from my document #3858
Answered
by
JorjMcKie
LSUCDS
asked this question in
Looking for help
-
Beta Was this translation helpful? Give feedback.
Answered by
JorjMcKie
Sep 11, 2024
Replies: 1 comment 4 replies
-
If
Just saw that you determined an image is covering your page. To OCR a page do this: page = doc[pno] # load page with 0-based number pno
tp = page.get_textpage_ocr(full=True, dpi=150) # execute OCR, store results in the textpage
# now start extracting text, but *ALWAYS* refer to the textpage!!!
text = page.get_text(textpage=tp) # for example |
Beta Was this translation helpful? Give feedback.
4 replies
Answer selected by
LSUCDS
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If
page.get_text()
returns no (or only white) text, then all you have is a number of heuristics to determine the situation:page.first_annot is None
.Just saw that you determined an image is covering your page. To OCR a page do this: