This tutorial presents a free and easy-to-use digital workflow for annotating historical sources, with a particular focus on mapping the geography of texts. If you are interested in tracing itineraries, investigating migration or trading routes, or comparing accounts of the same places between different authors and media and/or across different times and cultures, Recogito-in-a-box could be for you.1 In this tutorial, you will learn about the value of digital gazetteers, semantic annotation, map-based visualisations,2 and digital editions using Recogito, an open-source, free and online semantic annotation tool developed by the Pelagios Network.3
To help you see how Recogito works in practice, we will use a specific case study – the Historia de la Conquista del Río de la Plata, best known as La Argentina Manuscrita – an early seventeenth-century chronicle written by a Spanish-Guarani officer, Ruy Díaz de Guzmán (del Rio Riande et al. 2019a). It is in La Argentina Manuscrita that, among others, we find the first description of the Río de la Plata region in Spanish.4
The workflow demonstrated here includes technologies such as TEI-XML, the standard mark-up for text in the Humanities developed by the Text Encoding Initiative (TEI) Consortium5 and Linked Open Data (LOD)6 – but the good news is that you don’t need to have any prior computing expertise related to these technologies to use Recogito.7
Step 1: Create an account
First of all, create an account on Recogito (Figure 1).8 It’s completely free, the user interface is available in several languages, including English, Spanish, German and Italian, and it works with most supported browsers (Firefox, Chrome, Safari). Once you have your account, you can start annotating straight away. This global version of Recogito doesn’t require any installation. However, since Recogito is open-access software, you can also download a local version of it to customise, for example, by adding an additional gazetteer specific to your needs.
You can also annotate a document that another user has shared with you. We will discuss this possibility when talking about sharing options (see Step 8).
Step 2: Upload a document
With Recogito you can annotate an array of digital documents (including image formats), but in this tutorial we just focus on text documents. To upload a text document to Recogito, we recommend using the .txt format (Figure 2). If your document is in another text format (e.g. .doc), you first need to convert it into the Unicode format UTF-8. You can do this in any of the most popular text editors, such as Word, Writer or Google Docs, simply by using the option Save as.
When you export the text file in .txt, please check that the text you are uploading is the final version: as Recogito is not a text editor, you won’t be able to make changes on the text once it is uploaded. Although the optimum format for working on text documents in Recogito is currently .txt, Recogito now also has the capacity to enable the annotation of TEI-XML. We talk about this option later in the tutorial (see Step 7).
If you upload more than one text document at the same time, Recogito will collate the files to create a metadocument that brings them together. This function is particularly useful if you want to compare different chapters of the same book or if you wish to analyse accounts of the same trip or place in different authors.
Step 3: Add your metadata
When you first upload a document, it is recommended that you fill in as much metadata about it as you can – information such as authorship, title and date of the text, and the provenance of its digital format, may all be important, particularly if you want to share the document. By default, all your documents will be visible to you only; if you want to share them with others (see Step 8), please be sure that you have the appropriate permissions and that you have supplied this in the metadata. You will only be able to share a document if you own the copyright to it or if it is under a Creative Commons license.9
Step 4: Create annotations
Creating annotations in Recogito is simple and enjoyable. You simply select the word or words in the text that you wish to annotate. This action will bring up a small annotation pop-up window, which asks you to assign a category to your annotation. You can choose between three different categories: Place, Person and Event.
If you click on Place, Recogito will try to help you disambiguate your annotation by matching it to related entries from one or more global authority records for places through its gazetteers (Figure 3).10 Recogito currently uses seven historical gazetteers – Pleiades (gazetteer of the Ancient World), CHGIS (China Historical GIS), DPP Places (Places from the Digitizing Patterns of Power Project), DARE (Digital Atlas of the Roman Empire), MoEML (Map of Early Modern London), HGIS de las Indias (Historical-Geographic Information System for Spanish America, 1701–1808), Kima (historical gazetteer with place names in Hebrew script) – as well as a contemporary one (a subset of Geonames). It remains up to you to choose the place record that you think best fits the place mentioned in the text that you are annotating. There is an added bonus of aligning your place annotation to a global place record: because gazetteers also provide other information (such as coordinates), Recogito will automatically visualise your place annotations on a map.
We will say a little more about visualisation options in Step 6. You are also able to mark your annotations as People or Events, though currently – given the lack of global authority records on these entities – you won’t be able to disambiguate them using unique identifiers as you can do with places. You can even use tags and free text comments to further refine your annotations (see Figure 4), for example, by manually adding external identifiers to Wikidata or museum catalogues.
Recogito also offers the option of creating semi-automatic annotations using a Named Entity Recognition (NER) algorithm. NER algorithms are language specific, so you should select the most appropriate from those available in Recogito. At the moment, it offers NER in English, French, Spanish and German. More experimental NER algorithms are also available in Hebrew and Latin (Figure 5). If none of the available algorithms match the language of your text, you may try one that is linguistically close (for example, the Spanish algorithm for an Italian text). The algorithm parses the text and tries to identify all words that can be place names or person names. When the NER recognises a word as a possible place name, it will also try to match it automatically with an entry in one of Recogito’s global gazetteers. These annotations will appear in grey highlight to reflect their automatic matching – to turn them green, a human user needs to confirm that: (i) the word is indeed a place; and (ii) it matches the particular place in the gazetteer (Figure 6).
Although, as noted above, there are several gazetteers in Recogito, the specific gazetteer most useful to the text we are using here as case study – La Argentina Manuscrita – is likely to be Indias, based on the HGIS de las Indias developed by Werner Stangl.11 To narrow down the results from Recogito’s automatic place matching, go to Annotation Preferences under Document Settings, and uncheck all those gazetteers that you don’t think will be helpful.
Now it is time to work with our text, searching for places. In the example shown in Figure 7, we looked for Río de la Plata (the River Plate). Recogito couldn’t find the river or the region but it did find San Salvador, the first fort that Sebastián Gaboto founded in 1527.
Even though the user might not know at first hand that San Salvador is a fort located on the River Plate, if the (human) annotator checks this information beforehand this automated annotation might be very helpful. Here are some other examples from La Argentina Manuscrita to give you a sense of georeferencing in action, and to help guide your own annotation practice.
- 1) ‘que los de Buenos Aires descubrieron por tierra el año de 605’ (del Rio Riande et al. 2019a, Chapter 2).12 Buenos Aires is in the Indias gazetteer, so I find the place easily.
However, our document doesn’t refer to the city of Buenos Aires, but to the port of Buenos Aires. This is a detail that we can add in the free text comment section (see Figure 8).
- 2) ‘la boca de este gran Río de la Plata, a quien los naturales llaman Paraná Guazú, que quiere decir río como mar’ (del Rio Riande et al. 2019a, Chapter 1).13 This example includes a name in both Spanish and in its Guaraní indigenous form, Río de la Plata and Paraná Guazú respectively. Recogito is unable to find the Guaraní name, since it wasn’t used in the historical-geographical dictionaries on which our specific gazetteer is based. (It doesn’t even appear in Geonames.) We know from the text that the author is referring to the source of the river. Thus, after this intellectual step, we can make the match to San Salvador (as in San Salvador del Río de la Plata) (Figure 9).
- 3) In some cases, the place has changed its name sometime over the last century. This is the case for Cabo de Santa María, which is nowadays known as La Paloma (Figure 10).
Again, the human annotator must know this information beforehand in order to make a decision. This information that relates the old and modern names can be added in the comments.
If you want to know more about semantic annotation in Recogito, have look at the video tutorial in YouTube, ‘Annotate texts in Recogito/Anotar textos en Recogito’ (del Rio Riande et al. 2019b) (Figure 11).
Step 5: Relations
There is one other kind of annotation that you can perform in Recogito. This is known as relational tagging, by means of which you can create a connection between entities, or relations between two existing annotations.
To mark relations between entities, switch Recogito’s annotation mode to Relations, and then simply click on the first annotated entity and drag the pointer to the second. A dotted line will appear connecting the two annotations, along with a text box: you can fill this in to describe (or tag) the relationship. The line also has an arrow, which indicates the direction of the relationship. This is crucial for relationships that are hierarchical, as in, for example, isPartOf or isDaughterOf.
The relations created in Recogito can be exported in two formats: a basic CSV, and two separate tables for nodes and edges.14 Both can be visualised in network analysis software such as Gephi.15 If you are using the simplest option, just remember to change the denomination of the columns from from_quote to source, and to_quote to target, and simply upload the spreadsheet as an edge table (selecting the option create missing nodes). If you want to use the nodes and edges format instead, to have more control over your network visualisation, please bear in mind that, when downloaded in this format, each relation receives a different ID – and so the data will need consolidation before being processed in Gephi.16
In the example shown in Figure 12, we are marking the relationship between different South American ethnic groups: the Guayanás and the Guaraníes, also known (by Ruy Diaz) as Arachanes, and their enemies, the Charrúas.
When you have finished annotating entities (such as places or people) in your text document, and you have also marked the relations between them, your Recogito annotation screen will probably look something like this (Figure 13).
Step 6: Visualise the geographic annotations
Annotations can be visualised and read as a continuum in the Map View option (upper menu). Just click on the arrow and move from site to site (Figure 14).
Step 7: Download the text with the annotations
While you can do all your annotation in Recogito, the platform has the great advantage of enabling you to download both your annotated texts and/or annotation data in a variety of different formats. This means that you can explore your annotations or do further editing of your text in other applications.
In the example below, we are downloading our annotated text in TEI-XML, the standard mark-up language for digital scholarly editions in Digital Humanities (Figure 15).
As well as UTF-8 plain text, Recogito now allows you to annotate TEI-XML documents. However, this functionality is still very much in beta mode. As a result, when you download your text, you will need to revise and rework the TEI (e.g. annotations currently overlap) (Figure 16).17 Nevertheless, since you can annotate entities (such as places and people) so easily in Recogito, and given the fact that there is also some basic encoding support for the text header and body, it is a great platform from which to start working on a digital edition and/or learning the basics of TEI encoding.
Step 8: Share the annotations (and credit)
Recogito allows you to work on your own or in a team. You can modify your sharing options, add collaborators and share your annotations in the tool section (upper menu). You can also follow up, backup your work or delete it (Figure 19).
Remember that if you choose to leave your document open on the Web, it will be searchable by any engine and Google will index it. If you are doing teamwork, remember to use a proper name and email that give credit to all your collaborators. You will also be able to see your collaborators in the map view (Figure 20).
And finally, don’t forget you can also follow all the annotations in the Stats section (Figure 21).