Work on the Statutes Project in February 2017:
0: Numerous corrections to the OCR of the Pickering and Ruffhead editions of the Statutes At Large, uploaded to Github. Still a long way from readable, but getting there.
1: A new series OCR’d, or at least half a series. The Statutes of the Realm was the most academic, comprehensive and careful collection of acts, the text generally taken from the statute rolls themselves. Consequently, it is a typographical nightmare, and the OCR is worse than for the – admittedly less reliable – series of the Statutes At Large. I have put on Github the text for two volumes (numbers 3 and 5) found on Google, and, thanks to the University of Southampton waving their No Derivatives license, the text for volumes 6 to 11 from the British Parliamentary Publications set, digitized by Soton, on archive.org.
2: I have also started extracting the tables of acts from the OCR’d volumes, and uploading them to Github. The idea is to create a reliable list of legislation enacted, with the long title of each act. Given the length of titles, this will constitute a corpus of sufficient size for text mining and distance reading (I hope). It also constitutes the first step in creating metadata for this project.
3: Laws collected from around the web:
The 1918 Parliament (Qualification of Women) Act: Allowing women to sit in Parliament, and the shortest statute at a mere 27 words, when preamble and short title clause are put aside.
4: And also: a short post on James I’s laws on sanctuary, over at my Alsatia blog.
Planned for March: More acts collected from round the web, and more tables of statutes.