Despite the prevalence of written language in the digital age, forensic authorship analysis is an underestimated tool in forensic investigations, which can facilitate profiling authors and identifying authorship.

Imagine law enforcement is faced with a ransom note in a kidnapping case. One of the sentences in the note reads ‘Put it in the green trash kan on the devil strip at corner of 18th and Carlson.’ You might notice that the author misspelt kan or that they correctly used 18th and capitalised Carlson. This type of evidence could help you infer information about the author, although this can be tricky: It might seem like the author has a low education level, given this misspelling, but they spell other difficult words correctly, and may be trying to disguise their identity. Indeed, this is what was found to have happened in this case, while the feature that ultimately broke the case was the phrase the devil strip. This phrase is highly regionally bound and primarily used in the city of Akron, Ohio. This information was then used to narrow down the list of suspects.

This type of linguistic analysis is considered to be an application of forensic linguistics, specifically forensic authorship analysis. In general, authorship analysis is concerned with inferring information about the author of a document of questioned authorship. This could be (a) to determine whether different texts were authored by the same individual, called authorship verification, (b) to assess who is the most likely author of a text given a set of potential authors, called authorship attribution, or (c) to infer characteristics about the author by their language use, called authorship profiling. For example, authorship analysis has been used to assess whether a suspect had actually authored their police statements or to determine whether messages sent from a victim’s phone were written by their suspected murderer. Limitations for authorship analysis arise through sparse data, genre constraints or texts being written by multiple authors. But, what features help determine the authorship of a text?

Analysing authorship

Even though, theoretically, every individual can use language in any way they please so long as they follow linguistic protocols (e.g., “grey green talk dog” is not a sentence that easily conveys meaning), people have preferences of how they use language. This means there is a degree of linguistic individuality, tendencies of using certain words with certain other words. Based on this assumption authorship analysis can generally assess whether texts were authored by the same individual. For example, in the Starbuck murder case the use of semicolons in a series of questioned emails was pivotal for showing that the emails were written by Jamie Starbuck who had murdered his wife, Debbie Starbuck, and then assumed her identity online.

Authorship profiling focuses on the linguistic features that let us infer characteristics of an author.

The linguistic analysis found that he was impersonating her, but the usage of semicolons in the disputed emails was less clear at first. In their undisputed emails, Jamie used relatively few semicolons, while Debbie used them with great frequency. In the disputed emails, semicolons were used far more frequently than had even been observed in Debbie's writing. Further examination, however, revealed that the semicolons in the disputed texts were used grammatically in the same way as Jamie, as opposed to Debbie. It was therefore concluded that Jamie had purposely increased his rate of semicolon usage to impersonate Debbie, but had not appreciated the grammatical pattern that characterised Debbie’s usage, thereby revealing himself.

Regional profiling

When there is no comparison material, authorship analysis can still provide important insights into the author of a text. Authorship profiling focuses on the linguistic features that let us predict the social characteristics of an author, for example, age or gender. This type of analysis is rooted in sociolinguistics, the analysis of language and its relationship to society. In dialectology, for example, sociolinguists research the regional distribution of language variation. This research can then be applied to forensic authorship questions and be used for regionally profiling an unknown author, which is an exciting area of current research.

Profiling the regional background of an author can be done through careful, manual analysis and requires the analyst's knowledge about regional dialect variation, as illustrated in the ransom note example above. This task, which is referred to as geolinguistic profiling, can also be based on statistical or computational methods, for example, comparing the language used in an unknown text to patterns of regional variation observed in large collections of social media data.

This is a topic we are currently working on, developing a method for automatically profiling the regional background of the author of a questioned document through the quantitative analysis of large corpora of English and German social media data. Specifically, our approach involves creating a map for each word in a questioned document showing its regional distribution. These maps can be combined into one map, weighing each word map by its regional strength. An aggregated map like this shows how language used on social media would predict the location of the analysed text and could aid law enforcement in their investigations.


Interested practitioners can find more information on forensic linguistics and contact details of forensic linguists through the global forensic linguistics mailing list and the International Association for Forensic and Legal Linguistics.

Read more

Coulthard, M., Johnson, A., & Wright, D. (2016). An Introduction to Forensic Linguistics: Language in Evidence (2nd Ed). Routledge. https://www.routledge.com/An-Introduction-to-Forensic-Linguistics-Language-in-Evidence/Coulthard-Johnson-Wright/p/book/9781138641716

Grant, T., & Grieve, J. (2022). The Starbuck case: Methods for addressing confirmation bias in forensic authorship analysis. In I. Picornell, R. Perkins, & M. Coulthard (Eds.), Methodologies and Challenges in Forensic Linguistic Casework (First Edition, pp. 13–28). Wiley Blackwell. https://research.aston.ac.uk/en/publications/the-starbuck-case-methods-for-addressing-confirmation-bias-in-for

Nini, A. (2018). Developing forensic authorship profiling. Language and Law / Linguagem e Direito, 5(2), 38–58. https://ojs.letras.up.pt/index.php/LLLD/article/view/6116/5758

Nini, A. (2023). A Theory of Linguistic Individuality for Authorship Analysis (1st ed.). Cambridge University Press. https://doi.org/10.1017/9781108974851

Shuy, R. W. (2001). DARE’s role in linguistic profiling. Dictionary of American Regional English Newsletter, 4(3), Article 3. https://dare.wisc.edu/wp-content/uploads/sites/1051/2008/03/DARENEWS-43.pdf

Svartvik, J. (1968). The Evans statements: A case for forensic linguistics. University of Goteborg. https://www.thetext.co.uk/Evans%20Statements%20Part%202.pdf