November Updates

Work on the Statutes Project in November 2016:

0: The big news is that the OCRing of the digitized volumes of statutes is now complete. That’s a total of 137 separate volumes. Quite how many words that is I haven’t checked yet, but the Danby Pickering series alone contains around 13 million words. There should be a more or less complete set of public acts from 1761 to 1875, 115 years worth of legislation in the volumes published contemporaneously. Before 1761, the statutes are incomplete as many acts that had either been repealed or had just expired were not included in the collections. The numbers missing are yet to be ascertained.

The raw OCR is available via github: https://github.com/Anterotesis/statutes

This stage complete, I now need to consider how best to correct the OCR and organize the texts. News on this next month.

1: Added the Riot Act of 1714 and the Cruelty to Animals Act 1876 to the collection of miscellaneous statutes.

October Updates

Work on the statutes project in October 2016:

0: Further OCRing of volumes of statutes. Current status is that the complete set of Danby Pickering’s Statutes At Large, from Magna Carta to 1806 have been put through the machine, as has what I’m calling the Butterworths series, spanning 1807 to 1839. Investigation of the avaiable digitized volumes of statutes suggest that there is a continuous series up until 1875, whereuon coverage gets very patchy. So for all intents and purpose, 1875 is the cut-off point for this project. Just another 35 years worth of statutes left to OCR!

1: Added two laws collected from around the internet:  The Witchcraft Act, 1735 and The Poor Law, 1601.

2: Added some volumes to the list of Scottish statutory resources, and started a page for Acts etc. for Burma / Myanmaar.

3: A visit to the V & A to see a perpetual motion machine that was not in motion. The promised post on the statute organizing its auction will be coming this month. Promise.

4:Discussion of getting the statutes on to Wikidata, with Andrew Gray.

5: What I’ve been reading (about law). Two short papers recommended by Law & History:

On the agenda for next month: the end, I hope, for the moment of the OCRing of the statutes, then some organizing and error correction of the resultant texts.

September Updates

A regular report of updates to the Statutes Project.

0: Launched this site.

1: Links to digitized volumes of the Laws of Grenada, 1763 to 1875, added.

2: Volumes 31 and 32 of Danby Pickering’s edition of the Statutes uploaded to the Github repository.

3: All 6 volumes of Pickering’s edition so far OCR’d run through Ted Underwood’s OCR Normalizer. The text is still poor, but nevertheless considerably improved over what came straight out of Abbyy Finereader. Again, all to be found on Github.

4: Began posting the specific volumes I am using as sources for British legislation. Note that the particular scans I am using as sources are important, due to individual blemishes and stamps on the original volume, and technical distortions – not to mention stray fingers – of the digitization used.

Planned for October: more OCRing, more normalizing, some thinking about the titles of the statutes, and a very odd lottery.

Introducing The Statutes Project

The aim of the Statutes project is quite simple: to put the majority of historic English legislation online in accessible, useful formats, readable by humans and machines alike, with accompanying metadata, without any financial, technical or legal obstacles to use or adaption.

The simplicity of this statement masks the many difficulties: finding the laws, digitizing them, turning page images into clean, correct text, and so on. And doing so  without having an entire life devoured by spell checking and hand correction.

The many volumes of statutes compiled through the last three centuries, coupled with mass digitization projects such as those run by Google Books and the Internet Archive, along with optical character recognition and text correction tools, does at least allow for the hope that useable – but not perfect – texts can be produced with a minimum of effort.

The focus will be on the late seventeenth, eighteenth and early nineteenth centuries, the ‘long eighteenth century’ that is central to my own historical studies. Expect a concentration on matters relating to debt and debtors; that is the subject of my PhD.

This blog is more a notebook than a full archive of legislation, although that is the long-term hope. It will cover the technical side more than the theoretical, although that won’t be absent. When there’s a sufficient corpus, quantatively and qualatively, there will be some preliminary attempts at analysis, little games aiming to investigate the possibilities.

Future posts will discuss the project in more detail, covering the source volumes, the software, textual analysis, dissemination, and undoubtably the many trials and tribulations produced by a simple idea rashly executed.

Launch Date, Redux

The Statutes Project launches properly on September 1st. Honest guv, not ‘aving you on or anything. Got lots of lovely OCRed statutes, just need a bit of tender care and they’ll be as right as rain.

Launch date

The Statutes Project will properly open on May 1st 2016.