Legal Corpora

This repository contains:

A Jupyter Notebook script (leg_extract.ipynb) for the extraction of sections and sub-sections from XML tagged legislation, as available from http://www.legislation.gov.uk/
A python script (LIscrape.py) implementing a Scrapy spider for the extraction of contractual clauses from material contracts filed with the SEC, as available from https://www.lawinsider.com/
A sample dataset of extracted legislation (Leg_data160718.csv)
A sample dataset of extracted contract clauses (LIdata160718.csv) and a list of scraped URLs (before addition of suffixes) (TopDomains.txt)

Please see https://richardbatstone.github.io/ for a discussion and further background.

Provide feedback