Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix court string matching with whitespace #144

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mattdahl
Copy link
Contributor

As discussed in #135 (comment), there is presently a bug where court strings without whitespace are not properly matched. 3b2fe09 implements a failing test for this bug. 94b1e2f implements a simple fix.

This PR is also related to the changes proposed in #129, but I think that that proposal has been made obsolete with the removal of all the duplicate citation strings by @flooie (#135 (comment)). In any case, this PR addresses a different problem re whitespace.

Note that this PR is based off of #143 (needed to update black to make GitHub Actions happy), so that should be merged first.

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

My understanding is that in this comment #135 (comment), @flooie eliminated duplicate abbreviations from courts DB, so now it's safe to do things like this.

@flooie, wil you give this a quick review too? The only changes you need to look at are in resolve.py, the rest is updating Black.

Assuming this is good, we need to return to:

@mlissner
Copy link
Member

mlissner commented Jul 6, 2023

@flooie looks like this one never got your approval. Mind taking a look, please?

@mattdahl
Copy link
Contributor Author

mattdahl commented Jul 6, 2023

Want me to rebase this?

@flooie flooie self-assigned this Jul 6, 2023
@mlissner
Copy link
Member

mlissner commented Jul 6, 2023

That'd be great, thanks @mattdahl

@mattdahl mattdahl force-pushed the issue-135-fix-court-string-matching branch from 94b1e2f to 96dcc84 Compare July 6, 2023 17:56
@mattdahl mattdahl force-pushed the issue-135-fix-court-string-matching branch 2 times, most recently from 2d0f7cb to c9b4d78 Compare September 22, 2023 19:49
@mattdahl
Copy link
Contributor Author

Thanks for merging that other PR, @flooie. I just rebased this one as well.

N.B., I previously suggested that #129 had been made obsolete by intervening changes. This is false. More notes over there.

@flooie
Copy link
Contributor

flooie commented Sep 22, 2023

Thanks. I'll take a look soon.

@mattdahl mattdahl mentioned this pull request Sep 22, 2023
@flooie flooie closed this Jul 18, 2024
@flooie flooie reopened this Jul 18, 2024
@quevon24
Copy link
Member

quevon24 commented Dec 7, 2024

I added a few lines because if you test it with "Commonwealth v. Muniz, 164 A.3d 1189 (Pa. 2017)" as mentioned in the issue, it now returns "njcirctpassaic" instead of "pa", this happens because the first court in the list returned by courts-db that starts with "Pa" is "Passaic Cty. Cir. Ct., N.J."

I added a check to look for an exact match before try the startswith approach, for this specific case this approach works.

@mattdahl
Copy link
Contributor Author

mattdahl commented Dec 8, 2024

Thanks, I agree that looking for an exact match first is good (cf. #129 (comment)).

But can we refactor this so we only iterate over courts once?

Something like:

def get_court_by_paren(paren_string: str) -> Optional[str]:
    """Takes the citation string, usually something like "2d Cir", and maps
    that back to the court code.

    Does not work on SCOTUS, since that court lacks parentheticals, and
    needs to be handled after disambiguation has been completed.
    """
    court_str = strip_punct(paren_string)
    court_str = court_str.replace(" ", "")

    court_code = None
    if court_str:
        for court in courts:
            s = strip_punct(court["citation_string"]).replace(" ", "")
    
            # Check for an exact match first
            if s == court_str:
                return court["id"]
    
            # If no exact match, try to record a startswith match for possible eventual return
            if s.startswith(court_str):
                court_code = court["id"]
    
        return court_code

@quevon24
Copy link
Member

Thanks, I agree that looking for an exact match first is good (cf. #129 (comment)).

But can we refactor this so we only iterate over courts once?

Something like:

def get_court_by_paren(paren_string: str) -> Optional[str]:
    """Takes the citation string, usually something like "2d Cir", and maps
    that back to the court code.

    Does not work on SCOTUS, since that court lacks parentheticals, and
    needs to be handled after disambiguation has been completed.
    """
    court_str = strip_punct(paren_string)
    court_str = court_str.replace(" ", "")

    court_code = None
    if court_str:
        for court in courts:
            s = strip_punct(court["citation_string"]).replace(" ", "")
    
            # Check for an exact match first
            if s == court_str:
                return court["id"]
    
            # If no exact match, try to record a startswith match for possible eventual return
            if s.startswith(court_str):
                court_code = court["id"]
    
        return court_code

great, thanks, it looks more readable, I'll push this change

@quevon24
Copy link
Member

There is a new PR for all checks to pass correctly: #188

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Status: General Backlog
Development

Successfully merging this pull request may close these issues.

4 participants