Operation War Diary has been running for over two years now. Together, we have placed hundreds of thousands of tags, made similar numbers of comments, and followed the journeys of hundreds of units through the conflict at the Western Front.
And, like all things, we have evolved over that time.
When we began, we followed in the footsteps of other great crowd-sourced digital humanities projects like Old Weather. But the content we are dealing with at Operation War Diary is unique in its depth, breadth and richness. It meant we had to make certain assumptions when we started out.
Mainly, this was around what should and shouldn’t be tagged, which in turn was based on what we thought the data we would produce might look like and how it would be used. In part, we were led by the transcription mantra, which is that only what is there should be written down. However, tagging is a very different activity to transcription, with a quite different set of applications.
Under our initial guidance, volunteers tagged only what was explicitly mentioned on a diary page, and we also told them not to tag certain everyday activities for units like ammunition columns, mobile veterinary sections and engineers – the movements, collections and checking of infrastructure which might be considered the bread and butter of the units in question.
In part, this was to make the process less onerous for our taggers. We have 1.5 million pages to get through, after all! But, as I said before, it was also partly because we hadn’t quite left the transcription mindset behind.
However, we now have our first real use of Operation War Diary data to refer to, courtesy of Professor Richard Grayson, and it makes for very interesting reading. If you haven’t read the article already, you can find it here.
To some extent, the quality and richness of the data which can be used to support studies like this is limited by what was included in a diary in the first place – some are much sparser than others. However, by following the transcription-oriented method of only tagging what we can see, are we also unnecessarily reducing the coverage of the data we produce?
What about the case of a unit which we know to be in the line, because the author tells us so on one day, but over the course of the next four or five day’s worth of entries, that fact isn’t explicitly mentioned again? Very often, it’s clear that the unit is still in the line, but that information is then lost because there’s nothing for us to drop a tag on.
Or the Mobile Veterinary Section who spend a week travelling from place to place, picking up sick horses to take back to the depot? Again, under our starting assumptions, that detail would also have been lost, because we felt it wasn’t necessary to tag activities we already knew certain units spent much of their time doing.
That’s fine from the standpoint of our knowledge and common-sense understanding of these units and the functions they carried out during the war. But if we shift the perspective to one of providing evidence, quantitative facts which we can use to illustrate our understanding, then by not tagging certain things we know to be true, we aren’t realising the full potential of Operation War Diary.
Of course, there’s a line between inferring what to fill the blanks with and making things up, but as our understanding of the project evolves, so too does the knowledge and experience of our long-term taggers, who may have started off knowing very little about the war diaries, but who have now read and tagged hundreds, if not thousands of pages and are very well placed to see patterns in the information and extrapolate from what is written down to what is only implied.
That will mean making judgement calls at times, but the Talk forums provide a great environment for testing out any inferences before we press the ‘Finish’ button. The whole concept of Operation War Diary is that it is built on consensus, so why not extend that to these situations too?
There are practical issues to overcome – where to place a tag for an inferred activity, for example, or which tag to use. For the former, I would suggest dropping inferred tags close to the date to which they should be linked – our clustering algorithm will then group them together and ensure the information is recorded in the way it was intended. For the latter, we may have to recourse more frequently to the unsatisfactory ‘Other’ option for activities which do not fit neatly into the standard list, but that at least will still allow us to build up a comprehensive timeline for each unit and will clearly indicate what they were not doing, even if we can’t provide specifics beyond that.
With our first published use of Operation War Diary’s data, I believe we now have a clear and compelling case for tagging as much information as we can as accurately as we can. And that is the beauty of Operation War Diary – we can evolve and improve what we do and, in doing so, can tell the stories of the Western Front in the most effective way we know how.