Fix court string matching with whitespace #144

mattdahl · 2023-02-23T22:56:31Z

As discussed in #135 (comment), there is presently a bug where court strings without whitespace are not properly matched. 3b2fe09 implements a failing test for this bug. 94b1e2f implements a simple fix.

This PR is also related to the changes proposed in #129, but I think that that proposal has been made obsolete with the removal of all the duplicate citation strings by @flooie (#135 (comment)). In any case, this PR addresses a different problem re whitespace.

Note that this PR is based off of #143 (needed to update black to make GitHub Actions happy), so that should be merged first.

mlissner

LGTM.

My understanding is that in this comment #135 (comment), @flooie eliminated duplicate abbreviations from courts DB, so now it's safe to do things like this.

@flooie, wil you give this a quick review too? The only changes you need to look at are in resolve.py, the rest is updating Black.

Assuming this is good, we need to return to:

Court name issues #129 (might be obsolete)
Court name errors #128 (might be fixed)
Getting full citation span #135 (don't we already support this, regardless of the code here?)

mlissner · 2023-07-06T12:58:58Z

@flooie looks like this one never got your approval. Mind taking a look, please?

mattdahl · 2023-07-06T17:13:34Z

Want me to rebase this?

mlissner · 2023-07-06T17:46:16Z

That'd be great, thanks @mattdahl

…project#135).

mattdahl · 2023-09-22T19:53:16Z

Thanks for merging that other PR, @flooie. I just rebased this one as well.

N.B., I previously suggested that #129 had been made obsolete by intervening changes. This is false. More notes over there.

flooie · 2023-09-22T19:53:59Z

Thanks. I'll take a look soon.

add test

quevon24 · 2024-12-07T02:08:18Z

I added a few lines because if you test it with "Commonwealth v. Muniz, 164 A.3d 1189 (Pa. 2017)" as mentioned in the issue, it now returns "njcirctpassaic" instead of "pa", this happens because the first court in the list returned by courts-db that starts with "Pa" is "Passaic Cty. Cir. Ct., N.J."

I added a check to look for an exact match before try the startswith approach, for this specific case this approach works.

mattdahl · 2024-12-08T16:59:07Z

Thanks, I agree that looking for an exact match first is good (cf. #129 (comment)).

But can we refactor this so we only iterate over courts once?

Something like:

def get_court_by_paren(paren_string: str) -> Optional[str]:
    """Takes the citation string, usually something like "2d Cir", and maps
    that back to the court code.

    Does not work on SCOTUS, since that court lacks parentheticals, and
    needs to be handled after disambiguation has been completed.
    """
    court_str = strip_punct(paren_string)
    court_str = court_str.replace(" ", "")

    court_code = None
    if court_str:
        for court in courts:
            s = strip_punct(court["citation_string"]).replace(" ", "")
    
            # Check for an exact match first
            if s == court_str:
                return court["id"]
    
            # If no exact match, try to record a startswith match for possible eventual return
            if s.startswith(court_str):
                court_code = court["id"]
    
        return court_code

quevon24 · 2024-12-11T17:32:56Z

Thanks, I agree that looking for an exact match first is good (cf. #129 (comment)).

But can we refactor this so we only iterate over courts once?

Something like:

def get_court_by_paren(paren_string: str) -> Optional[str]:
    """Takes the citation string, usually something like "2d Cir", and maps
    that back to the court code.

    Does not work on SCOTUS, since that court lacks parentheticals, and
    needs to be handled after disambiguation has been completed.
    """
    court_str = strip_punct(paren_string)
    court_str = court_str.replace(" ", "")

    court_code = None
    if court_str:
        for court in courts:
            s = strip_punct(court["citation_string"]).replace(" ", "")
    
            # Check for an exact match first
            if s == court_str:
                return court["id"]
    
            # If no exact match, try to record a startswith match for possible eventual return
            if s.startswith(court_str):
                court_code = court["id"]
    
        return court_code

great, thanks, it looks more readable, I'll push this change

quevon24 · 2024-12-12T02:18:21Z

There is a new PR for all checks to pass correctly: #188

mlissner approved these changes Feb 24, 2023

View reviewed changes

flooie self-assigned this Jul 6, 2023

mattdahl force-pushed the issue-135-fix-court-string-matching branch from 94b1e2f to 96dcc84 Compare July 6, 2023 17:56

mattdahl added 2 commits September 22, 2023 14:57

test(find): Adds failing test for court string without space (freelaw…

1f408cc

…project#135).

fix(find): Strips whitespace from court strings for matching.

c9b4d78

mattdahl force-pushed the issue-135-fix-court-string-matching branch 2 times, most recently from 2d0f7cb to c9b4d78 Compare September 22, 2023 19:49

mattdahl mentioned this pull request Sep 22, 2023

Court name issues #129

Open

Merge branch 'main' into issue-135-fix-court-string-matching

37f237f

flooie closed this Jul 18, 2024

flooie reopened this Jul 18, 2024

fix(get_court_by_paren): try with an exact match

9c6b387

add test

quevon24 added 4 commits December 11, 2024 11:33

fix(get_court_by_paren): try with an exact match

deac1a7

fix(get_court_by_paren): try with an exact match

33c997c

fix(get_court_by_paren): fix comment line length

4943d05

fix(get_court_by_paren): cast to str to avoid mypy warning

d11736f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix court string matching with whitespace #144

Fix court string matching with whitespace #144

mattdahl commented Feb 23, 2023

mlissner left a comment

mlissner commented Jul 6, 2023

mattdahl commented Jul 6, 2023

mlissner commented Jul 6, 2023

mattdahl commented Sep 22, 2023

flooie commented Sep 22, 2023

quevon24 commented Dec 7, 2024

mattdahl commented Dec 8, 2024

quevon24 commented Dec 11, 2024

quevon24 commented Dec 12, 2024

Fix court string matching with whitespace #144

Are you sure you want to change the base?

Fix court string matching with whitespace #144

Conversation

mattdahl commented Feb 23, 2023

mlissner left a comment

Choose a reason for hiding this comment

mlissner commented Jul 6, 2023

mattdahl commented Jul 6, 2023

mlissner commented Jul 6, 2023

mattdahl commented Sep 22, 2023

flooie commented Sep 22, 2023

quevon24 commented Dec 7, 2024

mattdahl commented Dec 8, 2024

quevon24 commented Dec 11, 2024

quevon24 commented Dec 12, 2024