We have had Big Science, Big Data, now should be the big time for History

“Ossum iacta est”

Second to biology few fields have advanced as far and fast in my lifetime as history and archeology have.

Actual historians might have very different ideas of how to make their field big than me, but I think three or four rather large books should be written, and we can start writing them now. These are:

  1. The Book of People
  2. The Book of Dwellings
  3. The Book of Stories
  4. The Book of Eras

The Book of People

This Domesday Book of the 21st century would be rather less concerned about taxes, but more about completeness.

The Book of People should list every person who has ever lived (but doesn’t do it anymore)

One published estimate claims 108 billion people have lived the last 50,000 years.

For privacy reasons the book might exclude the living, their parents, and the recently departed. Even so, the number would be close to 100 billion.

This is a fairly large undertaking. Wikipedia is big. It has 52 million articles in 309 languages (6 million just in English).

If we rather arbitrarily define a person as a human with a name, most of these 100 billion would have lost their personhood by now. What’s more, for most people we cannot even infer that they even existed.

We can trace people by bones, records, and relatives. Archaeology has advanced greatly, we will in coming decades likely cover all ground, over and under water, and receding ice, and find much more bone fragments and DNA than we have done so far. They will place many of our predecessors in time and place. Still this is, and will remain, the smallest dataset.

Records came with writing (and vice versa), and for most of the time for most of the people there were no records. Still genealogists through generations have built up a massive map of our more recent and populous past.

Finally we got the records in our genes, not just of us, but all the ancestors we can reconstruct, particularly the matrilinear and patrilinear ones. Still, even in best case this will lose those who died in childhood (which we assume are more numerous than those who grew up), died childless, or whose descendents did, the victims of genocide or natural causes.

As a wild guesstimate we are then left with 30 billion, either as people or lineages. Most in fairly recent times.

For each person we would want:

  • links to their parents and children (biological and social)
  • links to other people they had relationships with
  • their name
  • place and time of birth and death
  • cause of death
  • place and time of anything else recorded
  • occupations
  • how much taxes they paid and owed

The Book of Dwellings

Our predecessors didn’t just live in their time, they lived in dwellings. Living out in the open air was very much the exception. Many of these dwellings were temporary and lost in time, many more permanent and still were lost in time, but some stayed up. At least enough for us to get a recording of them.

The Book of Dwellings should list every construction with a name, an address, or with remains

This book will be significantly smaller than the Book of People. Fewer dwellings remain, each had a larger number of people living in them.

For those buildings the Book of Dwellings would contain what’s known of:

  • Where and when it existed
  • Which buildings were replaced by or replaced this building
  • How it was made
  • Who were involved in its construction, maintenance, and reconstruction
  • Who were the owners
  • Who were the residents
  • Who were the visitors

The Book of Stories

What really separates us from all the other species we know of is that we organise our world in stories we tell each other. If something isn’t a story, it might as well not exist as far as we are concerned.

The number of recorded stories is relatively modest, in the order of millions, not billions, as we tend to tell each other the same stories over and over again. Also, our storytelling tradition has mostly been oral. When all the storytellers died, so did the story. Even more than people and dwellings, our stories are predominantly recent.

There are already schemes and databases to organise stories and recordings of stories. E.g. ISBN is a scheme to identify published books, and Amazon’s IMDB is a database of many movies and TV series. There are many more.

Delineation and classification will be a bigger issue with stories. We generally agree on who are or were human, and could agree on what is a dwelling. We don’t as readily agree on what is a story, and when a story is different from another story. Predictably people will disagree. That notwithstanding, the brief is:

The Book of Stories should list all stories in any published media that we have ever told each other

“Published media” is an attempted constraint that is less of a constraint after the arrival of the Internet. It reflects the Wikipedia concept of “notability”, not all stories are equal. Where the threshold will lie is a matter of capacity. It would make sense to start with the most notable stories and proceed from there.

This book will lists

  • Names and schemes of a story
  • Variant stories
  • Stories this story is inspired by, and stories inspired by it
  • Collections the story is a part of
  • The creators and publishers
  • Time and place where the story is set, and where it is made
  • People the story involve
  • Dwellings the story involve
  • Link, direct or indirect, to the story

The Book of Eras

The book of history. History is the story we tell about our past, so strictly speaking this book is redundant. But it would be useful to have a book to make sense of the other books.

It is easy to imagine other derivative books, like The Book of Ideas, The Book of Products, The Book of Conflicts, but for our purpose those would just be special cases of The Book of Eras.

Time, space, context and uncertainty

Linear time works well for organising, but there are several ways to measure linear time. According to archaeologists and geologists we are now 70 years and a couple months after the present. More tricky than establishing a scale and setting an origo, time gets more fuzzy and uncertain the further back we go.

Establishing where can be even harder than establishing when. In the words of Heraclitus, panta rhei. Apart from the poles, there are no fixed points on this planet, everything is moving, land, sea and underground. So where exactly is “here, only 3500 years ago”? For recent history we can use the natural geography of coordinate systems, longitude and latitude, as well as the cultural geography of addresses. For the longer past we might have to operate with overlapping coordinates rather than absolute ones, and addresses change constantly.

In these more or less precise coordinates of time and space we can locate the items of these books, and The Book of Eras can add further context. When a town moves, is it still the same town?

Data can be right, wrong (to be fixed when/if better data comes along), or imprecise/uncertain (“somewhere around here”, “more or less at this time”, “probably Luke’s father”). The books should be able to handle all the alternatives.

How to begin?

With something somewhere sometime, with the framework to fill the needs of everything to be in these books eventually.

It makes more sense to start with the relatively recent past, where the data quality is better, the data machine readable and available, and the issues fewer.

Everything should have their place, and that means everything with an identity should have a URI.

So what should be the URI of Charles Darwin? Of his grandparents? What of Robin Hood? There are two main approaches, the meaningful and the meaningless. The meaningful URI will imply something about the thing or person in question, the meaningless will not. However meaning changes over time, so either they then would have to be reassigned new URIs each time, or opt for meaningless. Thus,

person:some-random-string-of-letters-and-numbers
building:some-random-string-of-letters-and-numbers
story:some-random-string-of-letters-and-numbers

Each could then link to other existing or future schemes (e.g. ISBN). Expanding existing data collections should be the best way to start with least effort and quickest returns.

On Twitter: jaxroam

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store