Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header names seem to be trimmed #31

Closed
rdifalco opened this issue Aug 22, 2017 · 8 comments
Closed

Header names seem to be trimmed #31

rdifalco opened this issue Aug 22, 2017 · 8 comments
Labels
Milestone

Comments

@rdifalco
Copy link

If I have a CSV with a header row, the header names seem to be trimmed. For example:

"Padded ,Unpadded\nOne,Two"

The above will produce a map with the key "Padded" instead of "Padded ".

@cowtowncoder
Copy link
Member

Yes, header names are trimmed using same rules as values. To retain spaces, it would be necessary to either double-quote contents, or use escape mechanism (if enabled via CsvSchema).
My understanding is that this is what CSV specification (loose as it is) suggests.

@rdifalco
Copy link
Author

That's a bummer. AWS writes it's detailed billing files making Tags into headers. Tags can have leading or trailing space so " Header" != "Header " != "Header". The Apache CSV parser handles this correctly (as well as most python ones). I love the Jackson Parser but there seems to be no way to make it work for this case. I did code up a crazy solution of reading String[] and making the first row create a header Map<String,Index> and then fake it out into thinking it was iterating Map<String,String> instead of String[]. But it's a lot of code just to work around this issue.

I really believe that your interpretation of the CSV spec is correct. The problem is that no one else's is. Also a flag just to remove that hard coded trim would be pretty awesome.

@cowtowncoder
Copy link
Member

There is CsvParser.Feature.TRIM_SPACES, which you can disable, although it affects headers and values the same way.

Would it also make sense to file an issue against AWS? (regardless of if Jackson can work with this).
White-space that is not enclosed in quotes (or escaped) really is not properly handled when writing CSV, so they are producing something that does not really inter-operate well.
This assuming I understand your issue correctly.

@rdifalco
Copy link
Author

I could file an issue with AWS but I doubt they'll fix it.

As far as I can tell disabling CsvParser.Feature.TRIM_SPACE will only fix values. Headers are always trimmed in CsvParser. It's hard coded.

@rdifalco
Copy link
Author

@rdifalco
Copy link
Author

Actually it looks like even if it is enclosed in quotes that the above line will trim it.

@cowtowncoder
Copy link
Member

Hmmh. Ok, that seems wrong then. I do think enclosed spaces should be retained.

@cowtowncoder cowtowncoder added 2.10 and removed 2.9 labels Oct 5, 2019
@cowtowncoder cowtowncoder added 2.13 and removed 2.10 labels Nov 10, 2020
@cowtowncoder cowtowncoder removed the 2.13 label Apr 22, 2023
cowtowncoder added a commit that referenced this issue Jan 5, 2025
cowtowncoder added a commit that referenced this issue Jan 5, 2025
@cowtowncoder cowtowncoder added this to the 2.19.0 milestone Jan 5, 2025
@cowtowncoder cowtowncoder changed the title Headers seems to be trimmed Header names seem to be trimmed Jan 5, 2025
@cowtowncoder
Copy link
Member

@rdifalco Ok this took forever but I finally came back and implemented it with addition of CsvParser.Feature.TRIM_HEADER_SPACES (default: true), disabling of which leaves spaces in place.
This is probably way too late for your benefit but thank you for suggesting it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants