Europeana and crowd-sourcing: a short reflexion

The ‘Europeana 1914-1918’ portal1 is part of the larger Europeana project and aims to collect digitised source material on the First World War. In this reflection, I would like to give my personal opinion on what sets it apart from other digitisation projects, namely the bringing-together of ‘professional’ content from cultural heritage institutions and of private, ‘crowd-sourced’ content.

In this course, we have previously talked about the ‘European History Primary Sources’ project2. It was designed as an index of the growing number of repositories for digitised primary sources. The overwhelming majority of these were set up and filled with content by cultural heritage institutions (museums, libraries, etc.). This is why I would like to comment on the ‘professional’ side of the ‘Europeana 1914-1918 project’ only briefly here. When looking at the total number of items available in the search function, one can clearly see that most of the material available has been provided by contributing institutions. To be specific, the site currently offers 374544 items, of which 362481 have been provided by institutional partners3. The sheer amount of material is certainly remarkable. Futhermore, these items are always linked directly to the proprietary website of the providing institution, meaning that one can usually find the available metadata quite easily. Instead of talking at length about the various types of sources and providers of this ‘professional’ side of the ‘Europeana 1914-1918’ project, I would now like to focus more on the crowd-sourced material.

As mentioned before, the website in question currently holds 374544 items. 12063 of these have been provided by the public, i.e. by private contributors4. Putting aside the enormous amount of items hosted by institutions for a second, the latter number is, at least in my opinion, quite impressive. For a private person to be able to upload anything online (this might be stating the obvious) he or she first of all has to know about the website. Then, one must create an account and go through the process of providing all the types of information required in order for the source material to be published. Considering the amount of work and time one has to put into it might be more helpful to put the number of private contributions in context than the comparison to the enormous amount of material provided by institutions. A noteworthy strategy employed by the makers of ‘Europeana’ in order to get access to private sources are their so-called ‘collection days’. These are held throughout Europe, allowing members of the public to bring their source material for it to be digitised ‘on-scene’.(( These events are one way of minimising the effort private contributors have to put into making their material available, all the while making the process of sharing much more personal and interactive than merely uploading data on a server themselves. As is evident from the ‘Collections’ tab in the search engine, 2517 items have been uploaded to the overall collection online, whereas the remaining roundabout 9500 have been collected in local events. Although they are certainly not cheap to organise and host (advertising them, having experts travel and invest what I would assume to be many man-hours into digitising material), they certainly seem to fulfil their goal of extending the overall collection.

Eventually, I would like to have a closer look at the crowd-sourced material available on the website. Using nothing but the search filters on the right-hand side of the screen, one can make a number of interesting discoveries. First of all, a look at the ‘Theatres’ parameter reveals that there is a clear predominance of items linked to the western front of the War (6453 as opposed to 1235 for the eastern front). Finding out the reasons behind this would certainly be an extensive research task, so I do not wish to go into speculation here. Instead, I would like to point out another imbalance of the total corpus of crowd-sourced material, namely the bias towards photographs and postcards. These two categories contain 468 and 461 items respectively. To put this into perspective, there are only 79 diaries, 23 drawings and 4 paintings. When searching within the collection of private sources as a historian, one has to keep this in mind in order to properly assess the results that are being generated by the search engine. As certain types of sources are over-represented, one risks to grant them more importance than they should be given ideally, that is to say if one wants to produce a ‘balanced’, academic historical narrative. This goes much further of course than just looking at the eclectic numbers that I put forward here. Critically assessing the available source corpora is a major task for any historian, whether his or her research be conducted on- or offline (or both). I would like to point out positively though how easy it is to generate statistics like the ones I have just mentioned in digital repositories in general and in ‘Europeana 1914-1918’ in particular. In combination with the overall variety of material (both ‘professional’ and private), the generally extensive metadata as well as the links to the original websites (in case of the ‘professional’ content), the excellent search function makes ‘Europeana 1914-1918’ a real asset for conducting research on the First World War. As any digital repository – and indeed any archive – it is of course not without its drawbacks, but at least in my personal research so far, it has been tremendously helpful.

