Category Archives: Project updates

Standardizing Statutes

I have just added the 1689 act ‘Absence of King William‘ to the statutes text section.

I took the text from Wikisource, which in turn transcribed it from the Statutes of the Realm collection, volume 6. It is also available from British History Online, which has transcribed three volumes of that series.

Statutes of the Realm is the most complete collection of pre-Union legislation available; it was commissioned to collect all the laws up to the union with Scotland, without regard to whether an act was in force or not. The act is not included in either Pickering’s or Ruffhead’s ‘Statutes At Large’ series, presumably because it had long since expired at the time those were published, and those collections were more pragmatically focused.

The text I’ve posted is different from the other transcriptions, in that I have standardized it. The Statutes of the Realm sought fidelity to the original manuscripts, and reconciling the originals and the inrolled copies, noting their differences, omissions, and discrepancies, and strictly following original spellings. This makes for difficult, interrupted reading for humans; similarly, it is an obstacle to ‘distant reading’, that is, the digital analysis analysis of large volumes of text.

Consequently, with the help of a simple line of code and a short, hand compiled list of obsolete spellings, the version I publish is readable both for people and machines.

All the changes to the text are quite minor: replacing antiquated and inconsistent spellings with regular, modern ones, often just removing a superfluous last letter (Regal for Regall, public for publick, etc.). The list of standardization couples available on github. It’s short, just 52 pairs, but it’s a start. I haven’t uploaded a script to utilise them yet, mainly because just one line is adequate:

while read n k; do sed -i.bak "s/\b$n\b/$k/g" target/*.txt; done < word-standardization-couples.txt

This should produce corrected versions of texts in the folder called target (insert your own path), with the originals renamed to *.txt.bak.

Note this has been tested on Lubuntu 18.04 and Mac OS High Sierra; other operating systems are available.

There is obviously a great deal more to say about manipulating texts in this way, covering matters ethical, academic, technical, and typographical. For the moment I leave all that aside, but it is worth noting these issues.

A Chronological Bibliography

Following an exchange on twitter with the Victorian Commons project, I have rejigged part of my first listing of volumes of statutes, and published a chronological bibliography of nineteenth century law.

This will make it easier to locate the texts of laws in the editions held by Google Books and the Internet Archive, as long as you know the correct calendar and regnal years for an act.

At the moment, this bibliography covers the years 1806 to 1908, but many later nineteenth century volumes are missing. These will be added as they are located, and when I have time.


Updates, October to December 2018.

Work on the Statutes project for the last three months of 2018:

The big news is that I now have a complete set of volumes of statutes for the nineteenth century, courtesy of the Institute of Historical Research allowing me to photograph their copies. The OCR’d text, messy but undergoing correction, can be found on Github.

There is also now a complete set of tables of public acts for the Parliament of the United Kingdom of Great Britain and Ireland, 1801 to 1921. Again, find them on Github.

Laws added: the utterances of an oaf required the addition of  Statute of Praemunire; Hallowe’en led me to add some witchcraft acts from 1541, 1563, and 1604, and Bonfire night was marked with James I’s dictat for the Observance of November the 5th. Topical stuff, eh?

Also added: 1739 County Rates Act and 1838 Public Records Act.

A new section has been created for private, local and personal acts; the first text in it is the Lancashire Sessions Act of 1798.

And the usual round of automated OCR corrections.


Digitization of the missing late c19th volumes

Although there are many digitized collections of statutes available online, and indeed many digitizations of the same publication, I have not found a number of volumes from the last two decades of the nineteenth century.

Happily, I have now been able to digitize these volumes myself, courtesy of the Institute of Historical Research, who very kindly allowed me to photograph their copies.

I copied them using an iphone and a selfie stick designed by Sussex Unversity Humanities Lab. Althugh SHL are developing a whole workflow for DIY scanning and OCRing documents through a modern smartphone, I simply took pictures, and later ran them through Abbyy Finereader, as I have been doing with the digital volumes downloaded from Google Books and Internet Archive.

The whole procedure took a full work day, which I think quite quick given the size and number of the volumes; once I got into the rhythm, the apparatus held firm, I averaged about one volume an hour, photographing two pages at a time.

The text of these volumes can be found on github; some automated correcting has been carried out, but it is still all pretty raw, especially the tables. No doubt there will be pages I have inadvertently photographed twice, photographed poorly, or accidentally omitted, but by and large I think the quality is as good as can be expected. As with all the other volumes I have OCRd, the text is public domain.

Once again, my thanks to the IHR for access to their books and a desk at which to copy them, and to Sussex Humanities Lab for the selfie sticks. Without such help, ‘unofficial’, grassroots, lone scholar projects such as this one would not be able to develop their potential.

Tables of Statutes of the United Kingdom, 1801 to 1921.

I have now completed tables of the full, long titles of public statutes passed by the parliament of the United Kingdom of Great Britain and Ireland, from the Act of Union in 1801 up to 1921, when Ireland was divided and the south achieved independence. They can be found on github.  All these tables are public domain, and can be reused for any purpose and in any way one wishes.

I am currently working on generating tables of abbreviated titles of private and local acts for this period, using the annotated lists of local acts and private acts produced by

This will be quicker than working through the full titles in the volumes of statutes for this period, although at the cost of less detail. (Tables giving full titles will be produced eventually as I work on correcting the OCR of the scanned volumes, but this will take some time.)

Once the private and local tables have been created, I will produce a more convenient package of these lists, easy to download and suitable for searching and text mining.

Updates: August and September 2018

Work on the Statutes Project done over the last two months:

A blog post: on a satyrical law against make-up and adornments, sometimes taken as real, that I’ve dated back to 1785.

New tables: There is now a complete run of tables of public acts spanning 1807 to 1912, hosted on Github.

New acts added: three on the preservation of historical monuments from 1882, 1892 and 1900; the Corruption of Blood Act, 1814; and the Transportation Act, 1718.

And the usual round of automatic corrections to the OCR’d text of the collections of statutes. Whilst still very messy, the text is readable for those volumes in a modern font, and approaching readability for those in old, ‘long-s’ typefaces. Find them on Github.

Updates: June and July 2018.

Work on the Statutes Project done in the last two months:

New Volumes: I have scanned, and uploaded to Github, 13 volumes of the series ‘The Law Reports: The Public General Statutes’, each covering one session of parliament held between 1880 and 1898. Note that this is raw, uncorrected OCR; as yet I haven’t run my corrections list over these files. And note that this isn’t a complete, or even consecutive, run; there are still 8 years of the nineteenth century missing.

New Tables: there is now a complete run of tables of legislation spanning 1814 to 1900, hosted on Github.

One new act: for the Relief of Insolvent Debtors, 1760.

New bibliographies, of the English Statutes of the Realm series, and – of a mere three items – for legislation of Antigua and the Leeward Islands. Also, updated are the Jamaican and Indian lists.

And the usual round of automated OCR corrections, and ‘garbage removal’, of random characters and symbols.

Updates, April and May 2018

These last two months have mainly be spent making automated corrections to the OCRd volumes of the Statutes. Alongside correcting specific mistranscriptions, I’ve been working on correcting word endings, such as ‘fiire’ for shire and ‘mcnt’ for ment; and I have also been cleaning up random junk and stray marks generally interpreted by Abbyy Finereader as punctuation or symbols.

In other work, many new tables have been added, mainly for the nineteenth century. There is now a complete run from 1851 to 1880.  Find them on Github.

Added to this site: New bibliographies for the various collections of British statutes. See the blog post about them, and the bibliographies themselves.

Also added, just a couple of new laws: 1857 Time in Royal Marines, 1858 Abolition of Franchise Prisons.

Updates, November and December 2017

Work carried out in the last two months of 2017:

Transcribed two missing pages, listing statutes of the reign of Edward the Third, of volume 2 of Pickering’s Statutes at Large.

Started removing latin and french text from the earlier volumes of Pickering’s Statutes. English translations are given in the books, and the foreign versions serve to complicate the OCR correction process.

Github reorganization: I have split the Butterworths volumes into two groups on the basis of whether they used the long s or not.

Tables of statutes added to Github: just one, for 1756.

Many individual statutes added to this site.

Uploaded the first item to a new folder of miscellaneous items, namely a collection of statutes relating to Kingston Upon Hull.

And of course, automatic correction of many common mis-transcriptions in the Pickering, Ruffhead and Butterworths ‘long-s’ volumes.

To do in 2018, given I have other pressing commitments: to concentrate on the legislation of the parliament of Great Britain, from union with Scotland (1707) to union with Ireland (1800); to produce a full set of tables for these years; and to experiment with visualizing these tables.