Replies: 4 comments 7 replies
-
This is great, thanks Kevin. Could we generate and export a script each month that has these commands in it? I'm guessing that wouldn't be too hard, and then people could just run |
Beta Was this translation helpful? Give feedback.
-
Hi - This is incredibly helpful. I was able to follow these instructions up to "originating-court-information." For some reason I'm not seeing that file listed on the bulk data page for court.listener: Am I looking in the wrong place or was that not exported? |
Beta Was this translation helpful? Give feedback.
-
some sql commands need to be updated as the files have extra columns, e.g. 'courts' has a new 'parent_court_id' column |
Beta Was this translation helpful? Give feedback.
-
I 'd be happy to but I don't know what PR is |
Beta Was this translation helpful? Give feedback.
-
In this discussion i will explain in more detail how to load the bulk data and the problems you may face while loading them.
The first thing you need to do is to load the schema, this will automatically create the required database and tables. To do that simply run:
psql -f /opt/bulk/schema-2023-08-31.sql -p 5432 -U postgres
In my case my user is postgres and it is running on port 5432
You may see some errors like:
These errors are normal because some data is not dumped in the schema(functions, triggers, roles, etc), we don't pass any additional argument to psql to ignore any errors that may have occurred during the restoration process.
Now to load bulk data, first you need to load some data, this data is required to import the dockets, clusters, opinions and financial disclosures, people needs to be loaded in the order presented:
Courts:
COPY public.search_court (id, pacer_court_id, pacer_has_rss_feed, pacer_rss_entry_types, fjc_court_id, date_modified, in_use, has_opinion_scraper, has_oral_argument_scraper, position, citation_string, short_name, full_name, url, start_date, end_date, jurisdiction, notes) FROM '/opt/bulk/courts-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER);
people-db-people:
COPY public.people_db_person (id, date_created, date_modified, date_completed, fjc_id, slug, name_first, name_middle, name_last, name_suffix, date_dob, date_granularity_dob, date_dod, date_granularity_dod, dob_city, dob_state, dob_country, dod_city, dod_state, dod_country, gender, religion, ftm_total_received, ftm_eid, has_photo, is_alias_of_id) FROM '/opt/bulk/people-db-people-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER);
people-db-school
COPY public.people_db_school (id, date_created, date_modified, name, ein, is_alias_of_id) FROM '/opt/bulk/people-db-schools-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER);
people-db-positions
COPY public.people_db_position (id, date_created, date_modified, position_type, job_title,sector, organization_name, location_city, location_state,date_nominated, date_elected, date_recess_appointment,date_referred_to_judicial_committee, date_judicial_committee_action,judicial_committee_action, date_hearing, date_confirmation, date_start,date_granularity_start, date_termination, termination_reason,date_granularity_termination, date_retirement, nomination_process, vote_type,voice_vote, votes_yes, votes_no, votes_yes_percent, votes_no_percent, how_selected, has_inferred_values, appointer_id, court_id, person_id, predecessor_id, school_id, supervisor_id) FROM '/opt/bulk/people-db-positions-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER);
originating-court-information
COPY search_originatingcourtinformation (id, date_created, date_modified, docket_number, assigned_to_str, ordering_judge_str, court_reporter, date_disposed, date_filed, date_judgment, date_judgment_eod, date_filed_noa, date_received_coa, assigned_to_id, ordering_judge_id) FROM '/opt/bulk/originating-court-information-2023-09-22.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
fjc-integrated-database
COPY public.recap_fjcintegrateddatabase (id, date_created, date_modified, dataset_source, office, docket_number, origin, date_filed, jurisdiction, nature_of_suit, title, section, subsection, diversity_of_residence, class_action, monetary_demand, county_of_residence, arbitration_at_filing, arbitration_at_termination, multidistrict_litigation_docket_number, plaintiff, defendant, date_transfer, transfer_office, transfer_docket_number, transfer_origin, date_terminated, termination_class_action_status, procedural_progress, disposition, nature_of_judgement, amount_received, judgment, pro_se,year_of_tape, nature_of_offense, version, circuit_id, district_id) FROM '/opt/bulk/fjc-integrated-database-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER);
Now you can start loading dockets, clusters, oral arguments, opinions and financial disclosures.
Docket:
COPY public.search_docket (id, date_created, date_modified, source, appeal_from_str, assigned_to_str, referred_to_str, panel_str, date_last_index, date_cert_granted, date_cert_denied, date_argued, date_reargued, date_reargument_denied, date_filed, date_terminated, date_last_filing, case_name_short, case_name, case_name_full, slug, docket_number, docket_number_core, pacer_case_id, cause, nature_of_suit, jury_demand, jurisdiction_type, appellate_fee_status, appellate_case_type_information, mdl_status, filepath_local, filepath_ia, filepath_ia_json, ia_upload_failure_count, ia_needs_upload, ia_date_first_change, view_count, date_blocked, blocked, appeal_from_id, assigned_to_id, court_id, idb_data_id, originating_court_information_id, referred_to_id) FROM '/opt/bulk/dockets-2023-08-31_0.csv' WITH (FORMAT csv, ENCODING utf8, HEADER);
Clusters: (Requires the dockets to already be loaded)
COPY public.search_opinioncluster (id, date_created, date_modified, judges, date_filed, date_filed_is_approximate, slug, case_name_short, case_name, case_name_full, scdb_id, scdb_decision_direction, scdb_votes_majority, scdb_votes_minority, source, procedural_history, attorneys, nature_of_suit, posture, syllabus, headnotes, summary, disposition, history, other_dates, cross_reference, correction, citation_count, precedential_status, date_blocked, blocked, filepath_json_harvard, docket_id, arguments, headmatter) FROM '/opt/bulk/opinion-clusters-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
oral-arguments (Requires the dockets to already be loaded)
COPY public.audio_audio (id, date_created, date_modified, source, case_name_short, case_name, case_name_full, judges, sha1, download_url, local_path_mp3, local_path_original_file, filepath_ia, ia_upload_failure_count, duration, processing_complete, date_blocked, blocked, stt_status, stt_google_response, docket_id) FROM '/opt/bulk/oral-arguments-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)'
Opinions: (Requires the clusters to already be loaded)
COPY public.search_opinion (id, date_created, date_modified, author_str, per_curiam, joined_by_str, type, sha1, page_count, download_url, local_path, plain_text, html, html_lawbox, html_columbia, html_anon_2020, xml_harvard, html_with_citations, extracted_by_ocr, author_id, cluster_id) FROM '/opt/bulk/opinions-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER);
Financial disclosures
COPY public.disclosures_financialdisclosure (id, date_created, date_modified, year, download_filepath, filepath, thumbnail, thumbnail_status, page_count, sha1, report_type, is_amended, addendum_content_raw, addendum_redacted, has_been_extracted, person_id) FROM '/opt/bulk/financial-disclosures-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
Now from here we can load the remaining data:
Requires clusters to already be loaded:
COPY public.search_citation (id, volume, reporter, page, type, cluster_id) FROM '/opt/bulk/citations-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)'
Requires opinions to already be loaded:
COPY public.search_opinionscited (id, depth, cited_opinion_id, citing_opinion_id) FROM '/opt/bulk/citation-map-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)'
COPY public.search_parenthetical (id, text, score, described_opinion_id, describing_opinion_id, group_id) FROM '/opt/bulk/parentheticals-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)'
Requires people to already be loaded:
COPY public.people_db_retentionevent (id, date_created, date_modified, retention_type, date_retention, votes_yes, votes_no, votes_yes_percent, votes_no_percent, unopposed, won, position_id) FROM '/opt/bulk/people-db-retention-events-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)'
COPY people_db_education (id, date_created, date_modified, degree_level, degree_detail, degree_year, person_id, school_id) FROM '/opt/bulk/people-db-educations-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
Requires financial disclosures to already be loaded:
COPY public.disclosures_investment (id, date_created, date_modified, page_number, description, redacted, income_during_reporting_period_code, income_during_reporting_period_type, gross_value_code, gross_value_method, transaction_during_reporting_period, transaction_date_raw, transaction_date, transaction_value_code, transaction_gain_code, transaction_partner, has_inferred_values, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-investments-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)'
COPY public.disclosures_position (id, date_created, date_modified, position, organization_name,redacted, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-positions-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
COPY public.disclosures_agreement (id, date_created, date_modified, date_raw, parties_and_terms,redacted, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-agreements-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
COPY public.disclosures_noninvestmentincome (id, date_created, date_modified, date_raw, source_type,income_amount, redacted, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-non-investment-income-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
COPY public.disclosures_spouseincome (id, date_created, date_modified, source_type, date_raw, redacted, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-spousal-income-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
COPY public.disclosures_reimbursement (id, date_created, date_modified, source, date_raw, location, purpose, items_paid_or_provided, redacted, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-reimbursements-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
COPY public.disclosures_gift (id, date_created, date_modified, source, description, value, redacted, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-gifts-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
COPY public.disclosures_debt (id, date_created, date_modified, creditor_name, description,value_code, redacted, financial_disclosure_id) FROM '/opt/bulk/financial-disclosure-debts-2023-08-31.csv' WITH (FORMAT csv, ENCODING utf8, HEADER)
NOTES
\copy
is not used to load the data is described in this issue: Error importing opinions.csv into Postgres: unterminated CSV quoted field #3152Any comments are welcome 😄
Beta Was this translation helpful? Give feedback.
All reactions