Author Archives: johnl

Tables of Statutes of the United Kingdom, 1801 to 1821.

I have now completed tables of the full, long titles of public statutes passed by the parliament of the United Kingdom of Great Britain and Ireland, from the Act of Union in 1801 up to 1921, when Ireland was divided and the south achieved independence. They can be found on github. <url> All these tables are public domain, and can be reused for any purpose and in any way one wishes.

I am currently working on generating tables of abbreviated titles of private and local acts for this period, using the annotated lists of local acts and private acts produced by Legislation.gov.uk <url>

This will be quicker than working through the full titles in the volumes of statutes for this period, although at the cost of less detail. (Tables giving full titles will be produced eventually as I work on correcting the OCR of the scanned volumes, but this will take some time.)

Once the private and local tables have been created, I will produce a more convenient package of these lists, easy to download and suitable for searching and text mining.

Updates: August and September 2018

Work on the Statutes Project done over the last two months:

A blog post: on a satyrical law against make-up and adornments, sometimes taken as real, that I’ve dated back to 1785.

New tables: There is now a complete run of tables of public acts spanning 1807 to 1912, hosted on Github.

New acts added: three on the preservation of historical monuments from 1882, 1892 and 1900; the Corruption of Blood Act, 1814; and the Transportation Act, 1718.

And the usual round of automatic corrections to the OCR’d text of the collections of statutes. Whilst still very messy, the text is readable for those volumes in a modern font, and approaching readability for those in old, ‘long-s’ typefaces. Find them on Github.

The act to protect men from false adornments

One reason I started this ridiculously ambitious project was that I found myself hearing some unbelievable tales about laws wild and wacky, none of which could actually cite the statute in question. An example of such is the following, sometimes entitled ‘The act to protect men from false adornments’:

That all women, of whatever age rank, profession, or degree, whether virgins, maids, or widows, that shall, from and after such act, impose upon, seduce, and betray into matrimony any of his Majesty’s male subjects, by the scents, paints, cosmetic washes, artificial teeth, false hair, Spanish wool, iron stays, hoops, high-heeled shoes, &c., shall incur the penalty of the law now in force against witchcraft and like misdemeanors and the marriage, upon upon conviction, shall stand null and void.

You can find a great many volumes citing this ‘act’, courtesy of Google Books, variously dating it to  1670, 1700, 1720 and 1770.

It is, however, a jest. Searching through all the volumes I have turned into plain text, absolutely nothing comes close to these words. Nor is there any trace of it in the tabulation of rejected bills, ‘Failed Legislation‘ (Hambledon Press, 1997).

Searching through the Burney collection of historic newspapers and the British Newspaper Archive turns up a clutch of newspapers printing this squib in August 1785. The original publisher is the Public Advertiser of Tuesday, August 23, 1785 (No. 15989), which uniquely gives a second clause:

And that such an act might be productive to the State, it might be further enacted, “that all men, boys, bachelors, widowers, or others, that shall have been so imposed upon, deceived, and seduced into matrimony, shall, upon the divorce taking place, forfeit unto our Sovereign Lord the King, one half or moiety of any sum or sums of money, lands, tenements, &c. that he or they shall have received as a marriage portion, with his or their said wife or wives; and, if no such portion or dowry as received, then shall they forfeit one hundred pounds of lawful money of Great Britain, as a penalty in recognizance of their extreme weakness, blindness, and imprudence, in being so deceived.

The first part is copied and pasted in quick succession by the Whitehall Evening Post of August 25, 1785 (No. 5973.), the Chelmsford Chronicle of August 26, 1785 (BNA link – behind paywall) and the Hampshire Chronicle of August 29. (BNA link – behind paywall).

Curiously, this item was passed off as fact in an execrable book of 1859, ‘Manners and Customs of the English Nation‘, dating it to 1770. As the Gentleman’s Magazine wrote, in an excoriating review, “We regret to say no authority is given”; it nevertheless found ironic praise for the author’s “air of originality.”

The spoof law is a genre in itself, and certainly this one can be cited as an example of late eighteenth century fears of the feminine. But it is not an example of statutory sexism.

Updates: June and July 2018.

Work on the Statutes Project done in the last two months:

New Volumes: I have scanned, and uploaded to Github, 13 volumes of the series ‘The Law Reports: The Public General Statutes’, each covering one session of parliament held between 1880 and 1898. Note that this is raw, uncorrected OCR; as yet I haven’t run my corrections list over these files. And note that this isn’t a complete, or even consecutive, run; there are still 8 years of the nineteenth century missing.

New Tables: there is now a complete run of tables of legislation spanning 1814 to 1900, hosted on Github.

One new act: for the Relief of Insolvent Debtors, 1760.

New bibliographies, of the English Statutes of the Realm series, and – of a mere three items – for legislation of Antigua and the Leeward Islands. Also, updated are the Jamaican and Indian lists.

And the usual round of automated OCR corrections, and ‘garbage removal’, of random characters and symbols.

Updates, April and May 2018

These last two months have mainly be spent making automated corrections to the OCRd volumes of the Statutes. Alongside correcting specific mistranscriptions, I’ve been working on correcting word endings, such as ‘fiire’ for shire and ‘mcnt’ for ment; and I have also been cleaning up random junk and stray marks generally interpreted by Abbyy Finereader as punctuation or symbols.

In other work, many new tables have been added, mainly for the nineteenth century. There is now a complete run from 1851 to 1880.  Find them on Github.

Added to this site: New bibliographies for the various collections of British statutes. See the blog post about them, and the bibliographies themselves.

Also added, just a couple of new laws: 1857 Time in Royal Marines, 1858 Abolition of Franchise Prisons.

Bibliographies of Collections of British Statutes

I have now finished compiling the bibliographies of the several collections of British legislation I have used for this project. Each entry, due to the magic of Zotero, should have a link to the digitized version of the book, and each bibliography a link to the OCRd text I am currently correcting, hosted on Github.

These bibliographies are not complete, both in that there are other collections I have not made lists for, and that those I have do not list all the volumes. I have concentrated on books freely available online, and that I have used to generate the OCRd texts I am correcting. Given time, I may well expand this, but for the moment it provides at least one volume covering the period from Magna Carta until 1878. After that date, far fewer volumes are freely available, and so for all intents and purposes, the project stops there. But note that legislation for the twentieth and twenty first centuries is available via Matthew Williams marvelous datasets.

Not every law is to be found in full in these volumes. Some are abbreviated, giving the preamble, perhaps a few clauses, and a summary. Some are omitted entirely. Very few private, personal and local acts are given. And very annoyingly, volume 43 part 2 of Pickering’s Statutes at Large is the sole missing part of that long and useful series.

All this notwithstanding, I think these bibliographies will be of great utility to anyone wanting to track down historic laws.

Go to the Index Page.

Updates, January and February 2018

Over the past two months I have taken a look at the volumes of statutes published from 1820 on, that is, with a modern typeface and without the long s that OCR software interprets in a multitude of ways.

Overall, the standard of text generated from the digitized PDFs is good to very good. Part of this may simply be due to the books not being as old, and therefore printed better, on better paper and being less worn and torn, than older volumes. But the typeface is certainly more amenable to being OCR’d, and the raw text is generally quite readable. The major problem is the recognition of the page layout, which with the statutes means that the side annotations get integrated into the body of the text. Certainly, the speed with which I have corrected some of the lists of legislation is far greater than for the pre-1820 texts.

Consequently, I’m considering concentrating on these volumes, although the eighteenth century is where most of my interests lie. But apart from sorting the tables, this is a decision I shall put off.

Also this month:

The usual run of automatic corrections; find improved text on Github.

Added tables of statutes for 1703, 1713, 1790, and 1866 to 1878. Again, find them on Github.

New acts: the famous 1918 Representation of the People act, in honour of its centenary; the notorious Buggery Act of Henry VIII, the 1706 Escape from Prisons act and the Repeal of the South Sea Bubble Act.

Added a bibliography of volumes of statutes in the series ‘A Collection of Public general Statutes’, with links to the relevent Google Books page, for 1837 to 1869;

And finally, a blog post on ways of checking and correcting OCR’d text.

There will be a pause until after Easter, whilst work and PhD take priority. This is very much a one-person side project, without any funding, and as such has to take second (and third) place to other demands.

On automatic correction of OCR output

Although this project began because I found many historical questions led to statutory source material, it has taken a technical turn into creating reliable and useful texts of the laws. Whilst I wasn’t surprised to find that the raw OCR of the eighteenth and early nineteenth century publications was foul, I had hoped it could be knocked into reasonable shape simply by correcting obvious, predictable errors, such as the long s being interpreted as an f.

This turned out to be true to a certain extent. I’m running a fairly simple bash script that takes a list of errors and their corrections, and one by one works through each word of the OCR’d text of circa 90 volumes published before 1820, and the results are promising. The errors are much more diverse than I presumed, but are still fairly uniform. For example, the combination of long s followed by h, as in parish, is often read as lh, lii, jh, and so on.

A bigger problem is when the s interpreted as f produces another english word, such as ‘lame’ or ‘fame’ for same. For this I have used the same script to check for phrases. Day makes sense preceeded by same, so correcting nonsense phrases like lame day and fame day is quite safe. And as the statutes are quite formulaic, with many repeated phrases, this approach is quite suited to them. Even better, as more words are corrected, the more these phrases are made apparent. With the word ‘act’ corrected from the very many misreadings, one can start correcting the phrase ‘act parted’ into ‘act passed.’

Another approach is to think in terms of parts of words. Given that the verb ‘establish’, often rendered as eftablifh, has a number of derivatives – established, establishing, disestablishment and so on – it makes sense to correct the stem of the word, rather than check for each variant.

All to the good, but this is a big body of text. There’s something like 14 million words in Pickering’s collection of the statutes alone. And that means there’s going to be a lot of mistakes, and more importantly, a lot of types of mistakes. The long s alone has at least 3 types of common misreading, as f, j, and l, and even more when it gets taken in conjunction with its following letter.

Working out how to tackle this has been gratifyingly interesting. There’s all sorts of technical ways of doing this, by looking at the texts as individual words, as stems, or lemmas, of words, as a collection of phrases, as strings of characters. There’s also some deeper, mathematical, ways of thinking about this, that would alleviate having to compile a near-infinite list of possible errors that do not run afoul of false positives for any eighteenth century text. For example, the lame king is not to be found in the statutes, but no doubt turns up in some novel of the time.

It should, for example, be possible to search the statutes for every string close to, but not identical with, the phrase ‘the authority aforesaid’ and correct it, without having to produce a list of every possible variant. Such a more subtle process should be quicker than the ‘brute force’ method I am currently using.

This is leaving aside the other causes of errors: those caused by the quality of the digitization, the quality of the printing and the markings of readers in the volumes digitized, and most problematic for this project, the mis-recognition of the layout of the pages. The convention of annotating laws with marginal notes – and these notes are not part of the statute itself – complicates the page design, and the raw OCR often integrates the comments into the main body of the text. On reflection, I should have taken more care of that when putting the books through the OCR machine, but that comes with a considerable cost in time. There may be ways of automating the detection of such errors.

Work on error correction continues, with the pleasant collateral that it is a fascinating problem, and not mere drudgery. In the meantime, I have a growing set of lists of automatic correction pairs on github. These have been split into certain categories: place names, latin, phrases, as well as English words. Depending on the text being corrected, some will be relevent and others not. Note that because of the script I am using (which I hope to publish soon), spaces in phrases and split words are escaped with a backslash, as in ‘authority\ aforesaid’.

 

Updates, November and December 2017

Work carried out in the last two months of 2017:

Transcribed two missing pages, listing statutes of the reign of Edward the Third, of volume 2 of Pickering’s Statutes at Large.

Started removing latin and french text from the earlier volumes of Pickering’s Statutes. English translations are given in the books, and the foreign versions serve to complicate the OCR correction process.

Github reorganization: I have split the Butterworths volumes into two groups on the basis of whether they used the long s or not.

Tables of statutes added to Github: just one, for 1756.

Many individual statutes added to this site.

Uploaded the first item to a new folder of miscellaneous items, namely a collection of statutes relating to Kingston Upon Hull.

And of course, automatic correction of many common mis-transcriptions in the Pickering, Ruffhead and Butterworths ‘long-s’ volumes.

To do in 2018, given I have other pressing commitments: to concentrate on the legislation of the parliament of Great Britain, from union with Scotland (1707) to union with Ireland (1800); to produce a full set of tables for these years; and to experiment with visualizing these tables.

Updates, August and September 2017

The last two months have seen: continuing automated correction of the OCR-generated text of Pickering’s Statutes At Large, and some of the Butterworths-published volumes (1807 to 1819, in other words those using the ‘long s‘). The bash script I have written for this is improving, and I hope to release it soon on github (under a free license of course).

A side effect of hunting down erroneous OCR is the production of lists of such mistranscriptions. I have started to put those on Github; used with the forthcoming script this will constitute an easy way of improving raw OCR of eighteenth century books.

I have started a page collecting volumes of historic American state legislation, mainly colonial, but with some post-revolutionary laws.

SSL has been enabled for the site, courtesy of a free certificate via my hosts Evohosting and Let’s Encrypt! I will be making all URLs secure by default at some point in the future; this should not break any pages you have bookmarked. Until then, simply starting any them with ‘https://’ will call up the secure advise

New laws added to the site, including: the 1807 Abolition of Slavery Act; from 1740, encouragement of mariners; and Hogarth’s act for protecting copyright in engravings of 1735.

There will now be a hiatus until November, whilst I concentrate upon writing my PhD thesis.