Operation War Diary has been running for over two years now. Together, we have placed hundreds of thousands of tags, made similar numbers of comments, and followed the journeys of hundreds of units through the conflict at the Western Front.
And, like all things, we have evolved over that time.
When we began, we followed in the footsteps of other great crowd-sourced digital humanities projects like Old Weather. But the content we are dealing with at Operation War Diary is unique in its depth, breadth and richness. It meant we had to make certain assumptions when we started out.
Mainly, this was around what should and shouldn’t be tagged, which in turn was based on what we thought the data we would produce might look like and how it would be used. In part, we were led by the transcription mantra, which is that only what is there should be written down. However, tagging is a very different activity to transcription, with a quite different set of applications.
Under our initial guidance, volunteers tagged only what was explicitly mentioned on a diary page, and we also told them not to tag certain everyday activities for units like ammunition columns, mobile veterinary sections and engineers – the movements, collections and checking of infrastructure which might be considered the bread and butter of the units in question.
In part, this was to make the process less onerous for our taggers. We have 1.5 million pages to get through, after all! But, as I said before, it was also partly because we hadn’t quite left the transcription mindset behind.
However, we now have our first real use of Operation War Diary data to refer to, courtesy of Professor Richard Grayson, and it makes for very interesting reading. If you haven’t read the article already, you can find it here.
To some extent, the quality and richness of the data which can be used to support studies like this is limited by what was included in a diary in the first place – some are much sparser than others. However, by following the transcription-oriented method of only tagging what we can see, are we also unnecessarily reducing the coverage of the data we produce?
What about the case of a unit which we know to be in the line, because the author tells us so on one day, but over the course of the next four or five day’s worth of entries, that fact isn’t explicitly mentioned again? Very often, it’s clear that the unit is still in the line, but that information is then lost because there’s nothing for us to drop a tag on.
Or the Mobile Veterinary Section who spend a week travelling from place to place, picking up sick horses to take back to the depot? Again, under our starting assumptions, that detail would also have been lost, because we felt it wasn’t necessary to tag activities we already knew certain units spent much of their time doing.
That’s fine from the standpoint of our knowledge and common-sense understanding of these units and the functions they carried out during the war. But if we shift the perspective to one of providing evidence, quantitative facts which we can use to illustrate our understanding, then by not tagging certain things we know to be true, we aren’t realising the full potential of Operation War Diary.
Of course, there’s a line between inferring what to fill the blanks with and making things up, but as our understanding of the project evolves, so too does the knowledge and experience of our long-term taggers, who may have started off knowing very little about the war diaries, but who have now read and tagged hundreds, if not thousands of pages and are very well placed to see patterns in the information and extrapolate from what is written down to what is only implied.
That will mean making judgement calls at times, but the Talk forums provide a great environment for testing out any inferences before we press the ‘Finish’ button. The whole concept of Operation War Diary is that it is built on consensus, so why not extend that to these situations too?
There are practical issues to overcome – where to place a tag for an inferred activity, for example, or which tag to use. For the former, I would suggest dropping inferred tags close to the date to which they should be linked – our clustering algorithm will then group them together and ensure the information is recorded in the way it was intended. For the latter, we may have to recourse more frequently to the unsatisfactory ‘Other’ option for activities which do not fit neatly into the standard list, but that at least will still allow us to build up a comprehensive timeline for each unit and will clearly indicate what they were not doing, even if we can’t provide specifics beyond that.
With our first published use of Operation War Diary’s data, I believe we now have a clear and compelling case for tagging as much information as we can as accurately as we can. And that is the beauty of Operation War Diary – we can evolve and improve what we do and, in doing so, can tell the stories of the Western Front in the most effective way we know how.
The war diaries are never more emotionally engaging than when they show the effect the First World War had on the men who fought in it. Whether on individuals, or on the units as a whole and their attitude to the conflict, the weight of the long years of toil and sacrifice can be felt in the tone of the words, the brief glimpses into their innermost thoughts that some diary authors allow us.
Like all stories, the First World War has a beginning, a middle and an end. How closely these intersected with the personal beginnings, middles and ends of the men we read about was often down to blind luck. For some, the end came all too quickly. Others saw out the whole story.
In narrative (as well as actual) terms, mobilisation was where it all began. Whether in India or the UK, the diaries are crisp, efficient. You get a sense of the great military machine grinding into action – reservists arriving at their depots, kit being issued, travel arrangements made. Then there’s almost a drawing of breath, a moment’s respite before we get to the middle of the story. For some it’s a day or two, a boat ride across the channel and a train trip to the battle area. For others it’s a crossing from another continent. Either way, the destination is the same. Barely suppressed excitement leeches from the pages of some diaries, trepidation from others.
The middle part is the longest, of course. The four years of fighting, mobile at first, becoming bogged down in the trenches later on. This is where the changes are most noticeable – the switch from intrepid expeditionary force to hardbitten veterans of a war that must have seemed endless, life after life eaten up in the giant mincing machine. Each diary author deals with it in their own way. Some produce dry accounts of death and loss – casualty lists, terse descriptions of the circumstances in which their comrades died. Others turn to humour, describing the blackest of days in wry tones. Sometimes the official veneer slips – the army record becomes a more personal narrative, a snapshot into the mind of a man caught in the midst of hell.
Some of the most moving accounts appear in the diary of the 15th Ludhiana Sikhs, one of the Indian Army battalions which made the crossing to France. After moving back from the line, the author writes of how the
numerous green fields with hedges and trees bursting into bud make a most welcome change to the desolation left behind.
He goes on to describe the physical deterioration of the troops after six months of trench warfare.
Those who have gone right through it…march with a shuffle, bent knees and backs beat with the weight of the…constant fatigues.
You can see the original diary page here: http://talk.operationwardiary.org/#/subjects/AWD0002xfr
Finally, for those who made it through the long, gruelling middle the story came to an end: Demobilisation and the return home. In many of the diaries, this was preceded by a period which many of the adjutants seemed almost bored by – a fighting army becoming a garrison, days filled with drill and training and lectures. After what had gone before, just imagine what a blessed relief that boredom must have been!
Stay tuned to the blog – in future posts we’ll have a look at the German Army of the First World War and try and build up a picture of the enemy the authors of our diaries were facing. The National Archives will also be doing a series of posts on their ongoing digitisation of the war diaries – we hope to have some pictures of the original diaries to show you – pretty incredible to see the original documents we’re all tagging!
Here’s to a succesful week, Citizen Historians! Keep tagging! (or join us at http://www.operationwardiary.org/ if you’re new!)
When Operation War Diary launched earlier this year, we aimed to produce a structured data set covering the daily activities of all the diverse units which operated on the Western Front. Three hundred and twenty nine diaries in and the project is not just fulfilling this initial aim, but is also building up a rich resource of hashtags, covering areas from the condition and treatment of horses to the emergence of aerial warfare over the trenches.
One thing that hasn’t changed from the project’s inception is the importance of names. Names are central to Operation War Diary – they are what makes all the other information we’re collecting real, the visual reminder that it relates to the daily experiences of people just like us.
So far, we’ve identified over 50,000 unique names. Many of these belong to officers, but there are a great number of Other Ranks too, many of them only ever mentioned once in all the millions of pages we have to tag. That’s what makes the work of our Citizen Historians so important – if that person isn’t tagged, we may never find the reference to them again, yet by tagging it we can make it visible and accessible to others who come after us. We can ensure their legacy is preserved.
Tagging names can be extremely time-consuming, especially when we encounter long lists of them, and yet there’s nothing more important. If you find you can’t tag the names for any reason, please use our #nominalroll hashtag to mark the page, to ensure that we can find it again later. We’ll keep track of all these pages and, if necessary, we’ll re-open them for tagging later.
It’s an opportune moment to pause and look back at how many names we’ve tagged – the 11th of November is fast approaching, anniversary of the Armistice and the UK and Commonweath’s Remembrance Day. This year, the Imperial War Museum is encouraging everybody to take an active part in Remembrance through its Let’s Remember Together campaign.
In partnership with the National Archives and the Lives of the First World War community of over 44,000 people, IWM would like to work with you to share the Life Stories that are your connection to the First World War. Your connection could be a relative who served, someone who shares your surname or a person listed on your local war memorial. In the case of Operation War Diary Citizen Historians, it might also be a name you have uncovered in one of the war diaries.
Whoever they are, we encourage you to share their story on Lives of the First World War, the permanent digital memorial to over 8 million men and women from across Britain and the Commonwealth who made a contribution during the First World War. Here at Operation War Diary headquarters, we’ll be blogging again about the connections we’ve uncovered.
With all our Citizen Historians gallantly tagging away, we thought it was high time we explained how all that hard work is being used to produce the data sets for the project.
While we really appreciate all the effort each and every individual is putting in on the diaries we know that errors can arise for one reason or another. For that reason, we generate what’s known as consensus data. We have an algorithm that allows us to do this.
To begin with, each of the diary pages is tagged by at least five Citizen Historians. Five different people who might each look at that page in a slightly different way. Once that tagging is complete, the diary page is closed and put in the queue for processing.
The system starts this by identifying tags of the same type relating to the same entity (a place, a person, an action etc.). It has to take a best guess at this, clustering tags together based on a percentage of the image size for each scanned diary page. Trial and error has shown that this percentage is best set at 3% vertical and 10% horizontal. There must be a minimum of two tags for a particular entity if it is to make it into the final consensus data set. So, if two of the five Citizen Historians who have tagged a diary page have both identified a place in the same position on that page, that place makes it in.
Image © IWM (Q 5700)
The consensus tag generated from this tag cluster is then placed at the average location in which all of its constituent tags were generated
Next the system has to determine exactly what information should be attached to each tag. This is relatively straightforward when the original tags came from a fixed list (e.g. Activities tags, which can be of only a certain number of types). Where tags contain free text (e.g. person or place), fuzzy text matching is used to determine their attached information (e.g. Slater-Booth, Sclater-Booth and similar variants would be grouped together). Where a majority of these free text tags have the same value, that value becomes the consensus value. However, if there is no clear majority value, then the consensus tag will be formed of the leading variants.
The algorithm is also designed to create serialised data. In essence, this means that each consensus tag is associated with a date, which allows the data generated to then be ordered by date. When Citizen Historians tag dates on a diary page, they essentially segment that page, and it’s that segmentation which allows the system to determine which consensus tags should lie inside which date area.
Once these operations have been carried out for one page of a diary, the next page will be processed and so on until the diary is complete.
Don’t worry about us losing all the tags you’ve generated, though – our databases hold everything that every single one of our Citizen Historians has added to Operation War Diary, be it individual tags, hashtags or text comments. We know just how valuable a resource that’s going to be for anybody wanting to investigate the diaries beyond the standard, structured tags we’ve defined.
Why not check out our first batch of consensus data here: http://wd3.herokuapp.com/public