-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling white spaces in journal names. #119
Comments
I haven't looked at the code for this specifically, but yeah, some sort of solution is needed. I forget how journal names are identified (I think a regex?). In general, it's easier to tweak our journal/statute/citation-specific regex than it is to do things like whitespace stripping (which tends to be less granular). |
Maybe I could go through the regex and replace " L. Rev." with "\s?L.\s*Rev." This would allow a match whether or not there is one space before the L. and whenever L. and Rev. are separated by nothing or nothing except white space. It would not match journal names with missing spaces unless they have L. Rev. in them (e.g. "Admin. L.J. Am. U." would match, but not Admin.L.J.Am.U."), but it would at least be a step in the right direction. |
@flooie Can you take over review on this one, please? (Sorry @bbernicker I just know he'll have better opinions on this codebase.) |
While testing Eyecite today, I noticed that there were some citations to law reviews in my dataset which are missing a space in L.Rev. and/or between the name of the law review and L.Rev. or L. Rev.. For an example, Strickland v. Washington cites 58 N.Y.U.L.Rev. 299; 83 Colum.L.Rev. 1544; 93 Harv.L.Rev. 752; and 50 U.Chi.L.Rev. 138.
I was curious whether ignoring white spaces in the names of journals (and maybe reporters and laws for that matter) would help improve detection (especailly with OCR'd files). Alternatively, does it make sense to specify alternative versions of journal names without some and/or all of its white spaces? Or else to change "L.Rev." to "L. Rev." in Eyecite's clean module?
The text was updated successfully, but these errors were encountered: