Category Archives: Uncategorized

Chronological Bibliography, from Magna Carta to 1970.

The Chronological Bibliography of British and U.K. public statutes now runs up to 1970, and so, for copyright reasons, can be regarded as complete. Together with the Statutes of the Realm series, this means that the majority of public legislation made over circa 765 years should be fairly easy to access.

(The earliest statute in the Realm series is that of Merton, 1235-6, 20 Henry 3; but it is preceded by a variety of charters dating back to 1101; the starting point for Pickering and Ruffhead is Magna Carta of 1215. Many of the earlier acts are given in Latin or French, sometimes without translation.)

I managed to get Google to release the later volumes by citing the Copyright, Designs and Patents Act 1988, section 164: , which states that Royal copyright “subsists in the case of an Act or a Measure of the General Synod of the Church of England, until the end of the period of 50 years from the end of the calendar year in which Royal Assent was given.”

This 50 year term makes 1970 the limit for this bibliography, although there is the possibility of annual extensions, depending on whether the volumes have actually been digitized. From 1988 on, have a complete set of U.K. public general acts.

These volumes contain the vast majority of laws, especially from the early nineteenth century. But they do not contain every public act passed, and the eighteenth century is much abbreviated. In some cases, the scanned copies of the books are severely marked and worn; and fonts introduce ambiguities. Inevitably, there are also flaws and occlusions in the digital images.

Nevertheless, this collation is both useful and useable. Any errors found, any better quality scans located, please leave a comment.

I am now scanning these volumes, and uploading the OCR’d text to Github. It is my aim to start publishing the complete, corrected texts en masse, in an easy to navigate archival format, next year.

The English Reports, updated.

Further to my last post, I have located another 9 freely available volumes of the English Reports. The bibliography is now missing volumes 42, 68, 74, 83, 101, 112, 117, 127 and 165, due to Google, Hathi Trust and Internet Archive not having digital copies; volumes 170 to 176 are missing as although Google have copies, the full text is not being made available, presumably due to copyright issues. In all, just 16 volumes, around 9%, are absent from the full set of 178 books.

I have also adapted the table of the reports archived in the Wayback Machine, to include links to the individual volumes. This provides a convenient way of locating material by original publication, and by the abbreviations generally used to refer to them.

Update, 11 May 2021: As pointed out in the comments, Common LII hold a full set of the Engish Reports, broken down into single PDFs. I don’t find their database particularly easy to search or browse, though: some reports appear misfiled under the wrong date, and the PDFs often have fragments of other cases included, which means searching often throws up false positives. The OCR, as ever, also leaves something to be desired. Nevertheless, it is the most convenient, complete and openly accessible set available.

The English Reports

Although this project is focused on the acts passed by the British parliament, the law made by Judges, in the courts, is a constituent part of the common law system. It is also just as rich as historical source material, in both quantity and quality. And similarly, it requires much the same sifting and organization as the statutes do, although thankfully much of the heavy lifting was done in the early twentieth century with the consolidation of the many historic series of law reports in the form of the English Reports.

To this end, I have now added to the bibliographies of British legal materials one listing all the freely available volumes of the English Reports. Of the 178 volumes published between 1901 and 1930, I have found 153 accessible  digitizations online, held variously by Google Books, Hathi Trust, and Internet Archive. In all, I estimate they contain approximately 200,000 pages of text, covering cases from 1220 to 1865. (In 1866, the ICLR began publication of their own series of Law Reports.)

These Reports are very different from the Proceedings of the Old Bailey; those are the records of the regular business of London’s main courthouse, whilst these are selected, important precedents set out in the high courts. They are not intended to gather the legally mundane, or record charges, facts and judgements, but collate those decisions and opinions that interpreted and clarified the statute law. But notwithstanding their specific legal purpose, there is considerable larger historical interest in these volumes. For example, Somerset v Stewart, 1772, is the case that led to the end of slavery in Britain, and the King v Thames Ditton the case that showed how ambiguous and precarious that ending was. The Reports are also international and far-reaching: there are cases concerning India, the Caribbean, and relating to international and maritime law.

But such is the volume of material – most volumes are well over one thousand pages – that it is difficult to know what exactly is within them, or get any sense of how matters are distributed across time and space.

Happily, that the two index volumes are freely available goes some way to making this vast compilation useable; there is also a table of the books comprising each volume available via the Wayback Machine. Some volumes of the Reports – it is unclear which – can be found on CommonLII in searchable PDF format. I am looking for other reference material, and considering the best way to present it digitally. The ideal solution would be to OCR it all and have it as plain text, but that is too much work for one person to do on top of producing a standardized collection of the statutes.

Bibliographies of Collections of British Statutes

I have now finished compiling the bibliographies of the several collections of British legislation I have used for this project. Each entry, due to the magic of Zotero, should have a link to the digitized version of the book, and each bibliography a link to the OCRd text I am currently correcting, hosted on Github.

These bibliographies are not complete, both in that there are other collections I have not made lists for, and that those I have do not list all the volumes. I have concentrated on books freely available online, and that I have used to generate the OCRd texts I am correcting. Given time, I may well expand this, but for the moment it provides at least one volume covering the period from Magna Carta until 1878. After that date, far fewer volumes are freely available, and so for all intents and purposes, the project stops there. But note that legislation for the twentieth and twenty first centuries is available via Matthew Williams marvelous datasets.

Not every law is to be found in full in these volumes. Some are abbreviated, giving the preamble, perhaps a few clauses, and a summary. Some are omitted entirely. Very few private, personal and local acts are given. And very annoyingly, volume 43 part 2 of Pickering’s Statutes at Large is the sole missing part of that long and useful series.

All this notwithstanding, I think these bibliographies will be of great help to anyone wanting to track down historic laws.

Go to the Index Page.

Updates, January and February 2018

Over the past two months I have taken a look at the volumes of statutes published from 1820 on, that is, with a modern typeface and without the long s that OCR software interprets in a multitude of ways.

Overall, the standard of text generated from the digitized PDFs is good to very good. Part of this may simply be due to the books not being as old, and therefore printed better, on better paper and being less worn and torn, than older volumes. But the typeface is certainly more amenable to being OCR’d, and the raw text is generally quite readable. The major problem is the recognition of the page layout, which with the statutes means that the side annotations get integrated into the body of the text. Certainly, the speed with which I have corrected some of the lists of legislation is far greater than for the pre-1820 texts.

Consequently, I’m considering concentrating on these volumes, although the eighteenth century is where most of my interests lie. But apart from sorting the tables, this is a decision I shall put off.

Also this month:

The usual run of automatic corrections; find improved text on Github.

Added tables of statutes for 1703, 1713, 1790, and 1866 to 1878. Again, find them on Github.

New acts: the famous 1918 Representation of the People act, in honour of its centenary; the notorious Buggery Act of Henry VIII, the 1706 Escape from Prisons act and the Repeal of the South Sea Bubble Act.

Added a bibliography of volumes of statutes in the series ‘A Collection of Public general Statutes’, with links to the relevent Google Books page, for 1837 to 1869;

And finally, a blog post on ways of checking and correcting OCR’d text.

There will be a pause until after Easter, whilst work and PhD take priority. This is very much a one-person side project, without any funding, and as such has to take second (and third) place to other demands.

Automatic correction of OCR

A milestone: I have begun automatically correcting the OCR errors in the 46 volumes of Danby Pickering’s ‘Statutes At Large’, and have uploaded the improved text to Github.

Given the quantity of text I’m dealing with – the Pickering series alone amounts to over fifteen million words – correcting each volume ‘by hand’ is obviously impractical. Bulk ‘find and replace’ is an improvement, but still not fast enough to be practical.

Such repetitive tasks are grist to the digital mill. So, using this list of common OCR errors, augmented with others I’ve found, and a one line bash script, automatic improvement of the texts has commenced.

The results are obviously an improvement. Nevertheless, the texts still aren’t great. There are still many spelling errors. As I used spaces as separators, words with punctuation attached are uncorrected. The many problems arising from layout are still to be faced.

But this is an important step forward.

Introducing The Statutes Project

The aim of the Statutes project is quite simple: to put the majority of historic English legislation online in accessible, useful formats, readable by humans and machines alike, with accompanying metadata, without any financial, technical or legal obstacles to use or adaption.

The simplicity of this statement masks the many difficulties: finding the laws, digitizing them, turning page images into clean, correct text, and so on. And doing so  without having an entire life devoured by spell checking and hand correction.

The many volumes of statutes compiled through the last three centuries, coupled with mass digitization projects such as those run by Google Books and the Internet Archive, along with optical character recognition and text correction tools, does at least allow for the hope that useable – but not perfect – texts can be produced with a minimum of effort.

The focus will be on the late seventeenth, eighteenth and early nineteenth centuries, the ‘long eighteenth century’ that is central to my own historical studies. Expect a concentration on matters relating to debt and debtors; that is the subject of my PhD.

This blog is more a notebook than a full archive of legislation, although that is the long-term hope. It will cover the technical side more than the theoretical, although that won’t be absent. When there’s a sufficient corpus, quantatively and qualatively, there will be some preliminary attempts at analysis, little games aiming to investigate the possibilities.

Future posts will discuss the project in more detail, covering the source volumes, the software, textual analysis, dissemination, and undoubtably the many trials and tribulations produced by a simple idea rashly executed.