Skip to content

Latest commit

 

History

History
13 lines (7 loc) · 712 Bytes

README.md

File metadata and controls

13 lines (7 loc) · 712 Bytes

Legal Corpora

This repository contains:

  1. A Jupyter Notebook script (leg_extract.ipynb) for the extraction of sections and sub-sections from XML tagged legislation, as available from http://www.legislation.gov.uk/

  2. A python script (LIscrape.py) implementing a Scrapy spider for the extraction of contractual clauses from material contracts filed with the SEC, as available from https://www.lawinsider.com/

  3. A sample dataset of extracted legislation (Leg_data160718.csv)

  4. A sample dataset of extracted contract clauses (LIdata160718.csv) and a list of scraped URLs (before addition of suffixes) (TopDomains.txt)

Please see https://richardbatstone.github.io/ for a discussion and further background.