page.find_tables how to extract words in table cell #3768
Unanswered
wangqiangJN
asked this question in
Looking for help
Replies: 1 comment
-
This feature is already there: A minor issue may arise when the table has very narrow cell borders. Then the table finder might identify cell content that technically is not completely inside a cell. The general text extraction is very strict and will discard everything not completely inside the rectangle. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is your feature request related to a problem? Please describe.
pymupdf version 1.24.9
1 I want to parse my pdf , my pdf contains tables and other text.
2.when I use page. find_tables ,it can extract text in cell , but I find when cell has multi words ,as
example cell
price:520 people:bob
expect result by words :
price
520
people
bob
but results now
price:520 people:bob
3. so i want to split table cell content by words and get bbox, here have any function method and solution?
Beta Was this translation helpful? Give feedback.
All reactions