Psychological Profiling and Event Forecasting Using Computational Language Analysis

The ability to ‘get inside a person’s head’ by analysing their language patterns from a distance has tremendous appeal and several practical applications, ranging from the patently obvious to the surprisingly nuanced.

Psychologists have long believed that we can discern what makes a person tick by analysing their language. The modern study of language has become a highly sophisticated area of research that leverages computational modelling, objective measures of language, and extensive empirical rigor.

The links between a person’s mental processes and the words that they say or write have been extensively studied, validated, and applied to fields as diverse as computer science, medicine, sociology, and anthropology, to name just a few.

The ability to ‘get inside a person’s head’ by analysing their language patterns from a distance has tremendous appeal and several practical applications, ranging from the patently obvious to the surprisingly nuanced.

Substance Versus Style

In research on the psychology of language, most scientists have traditionally focused on the substance or content of language – words that have an explicit meaning (e.g., house, friend, bomb, etc.). Unsurprisingly, there are several direct links between what a person talks about and what they are thinking. Extroverts tend to use more words related to social processes and use more positive language. Neurotic people tend to use more words indicative of anxiety, and so on.

These (perhaps obvious) patterns are grounded in psychological theory and can therefore be extrapolated to a broader understanding of the individual. The content of a person’s language is reliably diagnostic of their intelligence, political orientation, personality characteristics, and even how long they live.

The things that occupy a person’s mind are not merely diagnostic of their thoughts – they are indicative of deeply ingrained patterns in their life.

Perhaps more interesting, however, is the style of a person’s language. An abundance of research in recent years has found that the small, throw-away words in language like articles, pronouns, conjunctions, and so on, are deeply revealing of lower-level psychological processes. People whose language is highly self-focused (e.g., high rates of words like ‘I’, ‘me’, and ‘my’) tend to be relatively insecure and depressed. People who use more articles (‘the’, ‘a’, ‘an’) and prepositions (‘in’, ‘by’, ‘across’) in their writing tend to be more analytic in their thoughts, and factors such as someone’s social status and authenticity tend to be reflected in a person’s linguistic style more than its content.

By pairing the analysis of what a person says with how they say it, we can often paint a remarkably detailed picture of a person’s mental and social universe. Such analyses can be performed extremely quickly and objectively using computational tools, and many psychological phenomena can be reliably estimated using relatively simple statistical models.

Psychological Profiling

Much of the work in computational psychological profiling is founded on research demonstrating that linguistic patterns are relatively stable across time and contexts, particularly the stylistic components of language. The quality of language-derived psychological profiles can range from speculative to unbelievably strong, sometimes allowing us to identify an author with near-perfect accuracy using only their language. A language-driven approach to profiling allows us to understand the person behind a given text rather than just the text itself.

Rather than simply taking a threat of violence at face value, we can computationally evaluate the speaker’s language for deeper clues. Are they at-risk for a future depressive or schizophrenic episode? Are they obsessive-compulsive, or perhaps prone to conspiratorial thinking? Statistical estimation of these types of psychological vulnerabilities can help to highlight critical intervention strategies.

Language-based psychological profiles can also be applied at the group level, revealing fundamental differences in how group members think and engage with the world. Recent research found that Islamic State, as a group, shows greater authoritarianism and religious fervour in their psychological profile (revealed by markers such as low rates of present-focused and tentative language, plus high rates of religious language) relative to al-Qaeda. Moreover, study participants who scored high on authoritarianism and religiosity reported more favourable attitudes towards the language of Islamic State compared to the language of al-Qaeda. Understanding such group differences can provide insights into how a group functions, as well as what types of people might find these groups appealing.

Psychological profiles can also be built for broader communities and monitored over time. The psychological health of a community can easily be tracked following a tragedy using various data sources, such as newspapers or social media. Research from several labs has investigated the psychological impact of calamitous events ranging from the 9/11 attacks to mass shootings, finding unique patterns of coping as they unfold in response to major upheavals.

Behavioural Forecasting

In a vacuum of information about an author, the statistical analysis of language can give us important clues about a person’s future behaviours. Often, this approach relies on the relationship between language and general behavioural patterns. For instance, we find that the language representative of someone’s core values (e.g., family, work ethic, empathy) are strongly related to their regular behaviours, such as attending religious functions, donating time/money to a cause, or even playing games online.

Most of the recent work in language-based behavioural forecasting focuses on interpersonal behaviours. Messages written prior to events like suicide or spree killings show distinct psycholinguistic fingerprints. When soliciting sex from minors online in sting operations, individuals who exhibit high certainty and planning markers in their language are at high risk for repeat-offending in the same crime categories (e.g., acquisition of child pornography, future attempts to solicit minors).

Similarly, research on group processes finds that linguistic cues related to planning decrease immediately before the betrayal of an ally (along with an increase in positivity and politeness). A failure to linguistically adapt to a changing group membership tends to precede members exiting a group, and changes in interpersonal linguistic coordination can foreshadow the initiation, stability, and dissolution of a relationship.

Where Do We Go From Here?

The implications of language-based profiling and behavioural forecasting are far-reaching and can represent a double-edged sword. The same language data can be leveraged for multiple purposes, and care must be taken to protect the words of vulnerable or high-profile individuals. Individuals working in the security sector who are psychologically vulnerable can be identified from their language patterns, resulting in a non-negligible risk for targeted exploitation.

Sources who have insight into future plans require particularly high discretion. A person who knows of impending policy changes or upcoming events may show extremely subtle changes in their language patterns. Such changes are often not discernible to an untrained observer yet can still be detected using modern computational techniques. Compartmentalisation of knowledge may serve as insulation from detection, yet this approach may not be feasible (or desirable) in many situations.

Future research will continue to discover still-unknown links between language and psychology, meaning that language data from any period can be revisited extensively and mined for new insights. Language data is one piece of the puzzle, and recent work that integrates language-derived profiles with other known factors (such as age, political affiliation, and images) show significant promise for advancing the field. Reliable obfuscation techniques remain to be developed and will likely be reactive (rather than proactive) as new methods for language analysis continue to emerge.


Dr Ryan L. Boyd is a computational social scientist and behavioural scientist at the University of Texas at Austin. His research involves the inference of motives and psychological patterns from verbal behaviour.

Paul Kapoor is a Senior Principal Systems Engineer at the Northrop Grumman Corporation and a former US Navy Civil Servant.

This article appeared in issue 9 of CREST Security Review. You can read or download the original article here.

Read more

Michael L Birnbaum, Sindhu Ernala, Asra Rizvi, Munmun De Choudhury, John Kane. 2017. A collaborative approach to identifying social media markers of schizophrenia by employing machine learning and clinical appraisals. J Med Internet Res, 19(8): e289. Available at: https://doi.org/10.2196/jmir.7956

Ryan L. Boyd. 2017. Psychological text analysis in the digital humanities. In S. Hai-Jew (Ed.), Data analytics in the digital humanities (pp. 161–189). New York: Springer International Publishing.

Ryan L. Boyd, James W. Pennebaker. 2015. Did Shakespeare write double falsehood? Identifying individuals by creating psychological signatures with text analysis. Psychological Science, 26(5): 570–582. Available at: http://bit.ly/2OuJtmA

Ryan L. Boyd, Steven R. Wilson, James W. Pennebaker, Michal Kosinski, David J. Stillwell, Rada Mihalcea. 2015. Values in words: Using language to evaluate and understand personal values. In Proceedings of the Ninth International AAAI Conference on Web and Social Media (pp. 31–40). Available at: http://bit.ly/2JLNVi7

Shuki Cohen, Arie Kruglanski, Michele Gelfand, David Webber, Rohan Gunaratna. 2018. Al-Qaeda’s propaganda decoded: A psycholinguistic system for detecting variations in terrorism ideology. Terrorism and Political Violence, 30(1): 142–171. Available at: http://bit.ly/2CHMjQu

Kimberly Glasgow, Clayton Fink, Jordan Boyd-Graber. 2014. “Our grief is unspeakable”: Automatically measuring the community impact of a tragedy. In Eighth International AAAI Conference on Weblogs and Social Media. Available at: http://bit.ly/2FLc6YX

Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, Cristian Danescu-Niculescu-Mizil. 2015. Linguistic harbingers of betrayal: A case study on an online strategy game. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1650–1659). Beijing, China: Association for Computational Linguistics. Available at: http://www.aclweb.org/anthology/P15-1159

James Pennebaker. 2013. The secret life of pronouns: What our words say about us (Reprint edition). New York: Bloomsbury Press.

Matteo Vergani, Ana-Maria Bliuc. 2018. The language of new terrorism: Differences in psychological dimensions of communication in Dabiq and Inspire. Journal of Language and Social Psychology. Available at: https://psyarxiv.com/xg4f3/

Laura Wendlandt, Rada Mihalcea, Ryan L. Boyd, James Pennebaker. 2017. Multimodal analysis and prediction of latent user dimensions. In G. L. Ciampaglia, A. Mashhadi, & T. Yasseri (Eds.), Social Informatics: 9th International Conference, SocInfo 2017, Oxford, UK, September 13-15, 2017, Proceedings, Part I (Vol. 10539 LNCS, pp. 323–340). Cham: Springer International Publishing. Available at: http://bit.ly/2TYR9U2


As part of CREST’s commitment to open access research this article is available under a Creative Commons BY-NC-SA 4.0 licence. For more details on how you can use our content see here.

Tags from the story
, , , ,