Legal Corpora

This repository contains:

A Jupyter Notebook script (leg_extract.ipynb) for the extraction of sections and sub-sections from XML tagged legislation, as available from http://www.legislation.gov.uk/
A python script (LIscrape.py) implementing a Scrapy spider for the extraction of contractual clauses from material contracts filed with the SEC, as available from https://www.lawinsider.com/
A sample dataset of extracted legislation (Leg_data160718.csv)
A sample dataset of extracted contract clauses (LIdata160718.csv) and a list of scraped URLs (before addition of suffixes) (TopDomains.txt)

Please see https://richardbatstone.github.io/ for a discussion and further background.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
.gitattributes		.gitattributes
LIscrape.py		LIscrape.py
README.md		README.md
TopDomains.txt		TopDomains.txt
leg_extract.ipynb		leg_extract.ipynb

Provide feedback