Introduction to the Data

The data consists of three different things: objects, the things we are trying to describe; events, points in time where something happened to these objects; and attributes, properties which changed during an event. The following describes each of these things in detail.

Objects

Each object represents one particular real-world thing that is described in the project.

What kind the object is of is defined by its type. It could be a railway line, or a station, or an organization. While technically you could use whatever you want as the type of your object, it makes sense to use any of the well-defined types that are conventionally used in the project. We can include additional data in the project easily by agreeing on new types.

Each object is uniquely identified by an alphanumerical identifier called the object's key. We use hierarchical keys where each level is separated by a dot. The first part normally is the type of the object which is followed by the (present day) country the object relates to. However, this is just a convention to avoid collisions between keys.

Events

In a way, history is about tracking change happening over time (proper history asks about the why of the change, we are contempt with just listing these changes). Each point in time where a change happened to an object is stored as an event in our database.

Most objects contain a list of events. Some never-changing objects don't, such as source objects that describe a book or such that was used as a data source. These never change and don't need events.

With each event, we store the time at which it happened, i.e., the date. (It appears to be sufficient to limit the accuracy to days.) The date is expressed by what is known as the ISO date. You give the year with four digits, followed by a hyphen, followed by the number of the month with two digits, followed by yet another hyphen and the day of month, yet again with two digits. If you don't know the day, you can leave it out. If you only know the year, you can leave out both month and day. In both cases, you drop unnecessary hyphens, too.

Two extensions to the ISO date are in use: If you prefix it with a small c (short for circa) to the date, you say that this is roughly the date. If you suffix a question mark, this means that you are not quite sure.

The circa c is usually used if the editor doesn't really have a clue but guesses that it should have happened around that time. So, "1970c" can really mean any date between 1950 and 2000 but the editor believes that it happened in the 1960s or 1970s.

As an illustration, some valid dates

  • 1884-12-01 o 1884-12 o 1884 o 1884-12-01? o c1880

In addition to all these, there is a "placeholder date", given by either leaving out the date or explicitly giving a None value. It is used whenever no specific date is known and can be used for an event that collects the earliest known state of an object. Eventually, all placeholder dates should be changed into more meaningful dates, but keeping them around is preferable over assuming some dates when they are not very well known.

For instance, if the oldest source for the name of a station opened in 1883 is from 1914, you cannot safely attach that name to an event dated 1883 as the station may very well have had a different name earlier. Thus you add the name to an event with the asterisk date and your source attached to it. Then everyone knows that we only have the name from 1914 confirmed.

Attributes

Now that we know when an object changed, it would also be nice to know what exactly happened. This is where attributes come in. You could describe change by saying that a certain property or attribute of an object has changed. For instance, if a station was renamed, the name attributes of the station has changed. If you are creative enough, you can describe all change by such attributes. If a line was opened, the operational status attribute has changed.

This is what attributes do. For each event you can give any number of attributes and how they changed or what they changed into (this is a bit of a subtlety and depends on the attribute in question).

In computing, such attributes usually have a symbolic name (like `status` for the operational status) which allows you to look up the concrete meaning in the documentation.

For each object type, a certain number of such names is defined in the documentation including the syntax and semantics for the value. These values can be quite complex.

For most types, the attributes are attached to an event. For types that do not have events, they are attached to the object itself