In this CREST Debate podcast, Professors Aldert Vrij and Don Grubin go head-to-head on one of the most controversial tools in risk assessment: the polygraph.

The Polygraph: Useful Tool or Means to Fool?

In this episode, two global experts, Professor Don Grubin and Professor Aldert Vrij, debate one of the most contested tools in risk and credibility assessment: the polygraph. Bringing together decades of research and practice, they debate the evidence behind polygraph testing, challenge common myths, and offer sharply contrasting perspectives on its value in managing risk.

Recorded in front of a live audience, this candid conversation cuts through the stereotypes, moving beyond TV portrayals to explore what actually happens in real-world testing. From cooperation and disclosures to question design and countermeasures, the debate digs into the science, the practice, and the unanswered questions that matter for practitioners and policymakers alike.

Listeners can jump straight to any of the big questions using timestamps.

Episode Highlights

  • Is the polygraph the holy grail — or just another tool? A deep dive into whether it truly improves decision-making or whether practitioners risk over-trusting it.

  • Cooperation, rapport, and real practice: why pre-test interviewing matters far more than the “strap them in and see the needle jump” stereotype.

  • Not all polygraph tests are created equal: some formats are backed by decades of research, others haven't had rigorous testing.

  • Accuracy depends entirely on context: specific-incident tests, screening tests, and sex-offender tests operate on different scientific foundations.
  • A rare area of agreement: the technology isn’t the problem; test design and examiner skill are what drive accuracy.

  • Repeated testing: is a double-edged sword as it may help supervision, but does it also help people learn how to beat the test?
  • AI, commercialisation and reality TV: why polygraphs on entertainment shows have little to do with real methodology.

Read transcript

Welcome & Introductions

CREST Podcast Intro

Simon

Welcome to our second podcast debate: The Polygraph – A Useful Tool or a Means to Fool? These sessions bring together experts with different perspectives, so practitioners can hear the arguments, the evidence, and the challenges for themselves. Today we’re looking at the polygraph — does it genuinely help, or does it risk misleading us?

Joining us are two leading experts: Professor Don Grubin, a practitioner with extensive experience using polygraphs, and Professor Aldert Vrij, a researcher who has studied their limits and reliability. After their opening remarks, we’ll open the floor for your questions and discussion.

Let’s get started. Aldert, would you like to introduce yourself — tell us a bit about your background and why this topic interests you?

 

Aldert

ALDERT’S BACKGROUND

I am a professor of applied social psychology, which I have been since June 2000, so more than 25 years now. When I saw Simon earlier, he didn’t ask me how I was. Instead, he asked me when I was going to retire. Maybe in a couple of years, I said, but he seemed to think it should be much sooner! I’m old, but not that old.

Now, onto the polygraph. It’s a heated debate — and I saw that firsthand back in 2006 at a polygraph conference in Maastricht, in the Netherlands. That was quite a meeting, because both Charles Honts and Gershon Ben-Shakhar were there. They are the two leading figures in the polygraph world and represent very different approaches to polygraph testing. Getting them in the same room was a huge achievement. To demonstrate what an achievement that was: later that evening we arranged to go for dinner with both of them. We knew Charles from previous conferences and agreed to have dinner that evening. We met Gershon for the first time at the conference, and also agreed to have dinner with him. When we all met in the lobby Gershon saw that Charles was there, and he left. It is difficult to get them together!

Another person at that meeting was Udo Undeutsch — a pioneer in verbal lie detection who started to work in the 1950s on what is now known as Statement Validity Analysis. But he was also a polygraph examiner belonging to the Honts camp, so to speak. At the end of the polygraph conference, he shook Ben-Shakhar’s hand and said, “I disagree with every single word you said.” Gershon just laughed. That, in a nutshell, is the world of the polygraph.

So, is the polygraph useful, or is it a tool to fool? The topic of the seminar today. I think it’s both. It’s accuracy and usefulness depends entirely on what kind of test you’re talking about. And that’s my problem with the way polygraph examiners often present their work: they don’t always make clear how much the accuracy depends on the type of test being used. People not familiar with polygraph testing therefore come to think that the machine produces the same results whatever test you use. That is not the case.

Some polygraph tests are much better than others. The Concealed Information Test, or CIT — developed by David Lykken in the 1950s, but heavily researched by Gershon Ben Shakhar after that, is the best polygraph test available. It is still one of the best lie detection methods ever invented. Even outside the polygraph world — it’s brilliant! In essence, it works on recognition: you ask about details only a guilty person would know the answer to. For example, in the O.J. Simpson case, Lykken in his book gave that example — where he denied having visited the crime scene where his wife Nicole was found murdered — you could ask: “Where was Nicole found? And then the options are: In the living room? In the bathroom? In the garden? In the hallway? In the kitchen? Next to the swimming pool?” A guilty examinee is likely to react strongly to the correct option, due to the orienting reflex. The correct option is familiar information to the guilty examinee and creates a brain wave that is also linked to a physiological response that the polygraph measures.

Typically, you have 6 answer options to a question. The odds that an innocent person responds by chance to the correct option is 1 out of 6. To protect innocent examinees further, you ask a second question. For example, you could ask, “How was Nicole murdered?’ and now again the 6 answer options. The odds of an innocent person reacting by chance to the correct options on both questions is very small, less than 5%. The CIT is a brilliant test, and I’ve always said so. People sometimes say that I am ‘anti-polygraph’. I am not. I am in favour of this polygraph test, I am not in favour of other tests, but I am in favour of this one.

But then you move down the polygraph test ladder and arrive at the Comparison Question Test (CQT), that’s the Honts test, widely used in practice. Here, you ask OJ Simpson a direct question like, “Did you murder your wife Nicole?” — but that produces a strong reaction in both guilty and innocent examinees. So examiners add their “control” questions to the test, such as: “Have you ever hurt someone in your life before?” These control questions are not easy to phrase. They need to be somehow related to the relevant question (so in a murder case you can ask about hurting someone earlier in your life as a control question), but this question is only useful if the examinee has indeed hurt someone in his life before. How can the examiner know that? But anyway, assume the examiner does know that the examinee has hurt someone in his life before. Now the examiner needs to achieve that the examinee will not just admit this by saying yes. Note examiners try to achieve that the examinee denies this and therefore lies. And how does he do that? By making the examinee to believe that admitting hurting someone before would make him the main suspect for the murder. As a result of that, every examinee will now say ‘no’ to this hurting someone before question (the control question) and therefore lie. The innocent examinee is thought to respond stronger to the control question (to which he lies) than to the relevant question about the murder (to which he answers truthfully) whereas the guilty examinee is thought to respond stronger to the relevant than the control questions. He lies to both, but the relevant question is the more immediate threat.

The problem is there’s no solid theory behind this test. The National Research Council said that in 2003. Charles Honts said the same more recently. But Honts’ view is: “It works, so who cares if we can’t explain why?” Personally, I think that’s a problem. If you don’t know why something works, it’s hard to trust it fully. Take airports. Suppose we don’t know how the person and luggage detectors at airports work and we thus don’t know which dangerous materials are detected and which ones are not. That would make flying irresponsible, and airports would shut down. Knowing how something works is important.

And then you go further down the polygraph ladder to non-specific tests — like those used with sex offenders that Grubin supports. Here there’s no clear incident to ask about. Instead, you get vague questions such as: “Is there anything you’ve done in the last three months that might concern me?” That vagueness makes the test far less accurate. An examinee might pass the test because he does not think about a specific wrongdoing when answering the question. Making the questions more specific (‘Did you in the last three months carry out activity X”) does not solve the problem either. Since you do not know what the person may have done, the specific question could be about something he has not done and that he can therefore honestly deny. Another activity he may have carried out (activity Z) which should concern examiners remains undetected because the specific questions are not about that other wrongdoing. And this problem becomes even more serious because sex offenders are tested multiple times. By the third time they will know what activities are raised in the test and which ones are not.So they can easily will free out and do all king of wrong doings about questions that they are not asked about. In general, over time when being tested multiple times examinees learn how a polygraph test works. That’s why experts like Charles Honts say you should not do repeated testing. Never do repeated testing. Yet in sex offender management, people are polygraphed again and again, every few months. That reduces accuracy even further.

Examiners say polygraph tests lead to admissions. And sometimes they do. But how do we know whether those admissions are really accurate? If you fail a polygraph test, are you going to confess to something that will send you straight back to prison? No — you will offer a smaller explanation that satisfies the examiner without incriminating yourself too badly. You just don’t know whether these disclosures are truthful.

So my point is this: there are huge differences between different types of polygraph test. The CIT (Lykken – Ben Shakhar test) is excellent. The CQT about a specific incident (the Honts test) is less convincing. The non-specific incident CQT— like sex offender testing — is weaker still. And yet, too often, practitioners blur those distinctions. They cite 80–90% accuracy rates that come from specific-incident CQT tests and imply the same accuracy rates apply to much vaguer contexts. That is misleading.

It’s very different in verbal lie detection, my area of expertise: some detection methods are stronger than others, and we are always clear about which method leads to which accuracy. We do not purposefully blur things to make it all sound impressive. So my message is this: be honest about what the polygraph can and cannot do. Not all the tests are the same and they are not equally accurate. Some tests are genuinely useful; others far less so. And with sex offender polygraph testing you are at the bottom of this scale.

 

Simon

Thank you, Aldert. Now Don, over to you — could you introduce yourself and share a little about your background and perspective on the polygraph?

 

Don

DON’S BACKGROUND

Before I respond, a quick note in relation to something Aldert said: O.J. Simpson actually did have two polygraph tests — both were comparison question tests — which he failed them badly. The examiner told me it was one of the worst failures of his career. Those results never saw the light of day, not surprisingly, because the lawyers buried them, but in that case the CQT did seem to work.

Now by way of background, I’m a forensic psychiatrist, not a polygraph examiner. I became interested in polygraph in the late 1990s after attending a sex-offender conferences in the United States, where a probation officer told me: “If they took polygraphs out of our programme, I’d quit.” Why was this? When I asked about that, she said, “Well because of the information that they gained.” — which is an important theme that I’ll be coming back to I think, a number of times today.

Polygraph yields two things, and that’s often overlooked: First there’s the test outcome (was the examinee truthful or deceptive or was it inconclusive); and disclosures during the test — additional information that comes from the test which wasn’t previously known. I usually refer to that as ‘information gain’. But both of those are important and they’re complimentary to each other.

Now I was initially sceptical about all this. In the early 2000s the Home Office funded a small pilot with around 32 participants (we approached over 100 but on 32 agreed to take part). Probation officers warned that we’d find nothing new because these were the ‘stars’ in their programme — the star cases. Having said that, 33 of 35 made significant new disclosures, about a third of which were risk-relevant (sometimes we refer to them as “panic now” type-disclosures — things you have to act on immediately). Examples of these included things like active grooming of young children; unsupervised approaches to children, that services simply hadn’t picked up. So on the back of that we then ran a much larger comparison study — in which some probation areas were using polygraph tests, and others weren’t. We saw roughly a five-fold increase in new, relevant disclosures that were relevant to the management of these individuals in the polygraph group. But in addition, a clean test — so a test where there were no disclosures, but the individual appeared to be truthful, meant that the examiners, sorry, that the managers, weren’t missing something that was obvious. Now the problem with that was that they were volunteers, and criticisms was, “Well, if you made it mandatory, you wouldn’t get the disclosures and you wouldn’t get that useful information”.

But based on that, on the results of that study, the Offender Management Act was passed in 2007, which mandated a proper trial of polygraph testing in offender management in which the tests were mandatory. There was really strong resistance within the Ministry of Justice to that. They had their own agenda that they wanted to follow. But what they did was they separated implementation from the evaluation (I think they thought I had been cooking the books up ‘til then. I felt it important to lead the implementation because I though a good study but poor implementation would tell us very little.

The University of Kent led the evaluation, but that was all overseen by the Ministry of Justice. And the findings again were very supportive of the utility of polygraph testing, and because of that mandatory polygraph testing for sex offenders was rolled out nationally in 2014.

Along the way of this I noticed what I call the Nine M’s that always dog discussions of polygraph testing. I’ll try to remember what they all were: misunderstanding (what polygraph is), misconceptions (of how it’s used), misinterpretations (of the evidence), plain mistakes, occasional misrepresentation, general misinformation, sometimes misleading claims, money (in other words there’s commercial interests that interfere with clear thinking about this), and a fixed mindset — an almost visceral belief that it “can’t work”. I’ve met clinicians who said that no amount of evidence would ever change their minds.

Having said all that, I realise I’m going on a little bit, I just wanted to make a few clarifications before we carry on. Firstly, the polygraph isn’t a ‘lie detector’. In Comparison Question Test formats we’re not measuring a ‘lie response’ in any way. What we’re recording are physiological responses that are linked to cognitive processing and autobiographical memory. Salient, personally experienced material tends to produce stronger responses than mere accusations.

Regarding the theory: There is a theoretical basis around cognitive load and salience in terms of the motive action of polygraph tests, regardless of what Aldert might have to say on that.

In terms of the National Research Council study [2003]: For single-issue tests they reported accuracy in the range of 80–90% and what they noticed was that nothing else had been shown to be more accurate at that time — and that remains the case, there’s nothing that’s been shown to be more accurate than single issue polygraph testing.

In terms of practice: It’s not either/or with other types of interviewing. Polygraph testing is complimentary to other interview techniques and other interview formats. Despite having prior interviews, however, what we continually see is that we get additional disclosures from the polygraph tests.

Many criticisms of polygraph testing are in fact just criticisms of poor practice.

We, in the UK programme, addressed this through training, through quality control, and through very close ongoing supervision. Our procedures are all standardised. This all meets many of the objections that the British Psychological Society have raised over time.

In terms of the courts: In the UK, we don’t use polygraph results as evidence in court — nor do I think we should. Polygraph testing provides information that could be useful in investigation and in case management, but it’s not a standalone determiner of guilt or innocence — and I think there a lot of problems when you try to introduce it into the courts.

Where it can be used very successfully: Beyond sex-offender management, well in the UK firstly we also use it to manage post-conviction domestic abuse offenders and terrorist offenders who are on licence and thought to be high risk. But it’s not about predicting the future about whether they’re going to offend or not, or re-offend or not; it’s about spotting risk-indicative behaviours early enough to intervene — for example in domestic abuse cases, undisclosed new relationships.

And finally, I agree with Aldert that some formats — for example the relevant/irrelevant test format — are very poor and shouldn’t be used. But well-run programmes that use standardised methods, that have good supervision, that have good Quality Control, can deliver useful information gain.

Now I’ll end my introduction which I know is going on a bit, with a quote from that well-known philosopher, Richard Nixon, who said: “I don’t know anything about polygraphs. I don’t know how accurate they are, but I know that they scare the hell out of people.”

 

Simon

Brilliant — I’d like to bring Aldert back in to respond, and then give Don a chance to come back as well.

We’re all adults here, and I think the practitioners in the room will benefit from a bit of back-and-forth between them. Not too long (we want to make sure there’s plenty of time for your questions too) but since I saw Aldert making notes while Don was speaking, and I know Don will have his own thoughts, let’s give it a go.

And just to be clear — this is an academic exchange, for practitioners to learn from. No bad language, and definitely no fisticuffs!

 

Aldert

I’ll go through my comments in the order I wrote them down, rather than by importance.

First, O.J. Simpson failed the polygraph test twice. That does not surprise me at all. The CQT test — the one O.J. Simpson was submitted to — is highly subjective. It relies on how to phrase the control questions and relevant questions and to stress the importance of them. Research shows that if the examiner believes someone is guilty, he will emphasise the importance of the relevant questions and the examinee is more likely to fail the test. If the examiner believes the person is innocent, he will emphasise the importance of the control questions and the examinee is more likely to pass the test.

In the O.J. Simpson case, most people assumed he was guilty. So if examiners polygraph him, thinking he is guilty, they will get a guilty outcome, because their bias prior to the test shapes the result. That’s why I’m not surprised. Actually, if a polygraph examiner would have come to the conclusion that OJ Simpson was innocent, it would equal professional suicide and he better could change jobs, because nobody would think he is any good.

I do agree that polygraph testing of sex offenders leads to more disclosures. The research is clear on that. But three things are important. First: what really matters is re-offending. And there is no effect whatsoever on re-offending when polygraph tests are used. Re-offending is the key outcome, not disclosure.

Second: we need to ask, more disclosures compared to what? We already know there are highly effective interview techniques that can also produce disclosures — without using a polygraph.

Third, they define more disclosures in terms of new disclosures compared to what the sex offender said earlier in the investigation. However, the polygraph test is always the last examination. Whatever you do last, it is likely to lead to new disclosures. If you start with a polygraph test followed by an investigative interview, the investigative interview will lead to new disclosures.

The problem with using the polygraph test is that it creates misplaced trust. If someone passes, the examiner may believe there’s nothing to worry about. But that examinee may still be guilty. That’s dangerous, because it gives false reassurance.

The British Psychological Society opposed polygraph testing because it involves deception. You’re essentially lying to your examinees about the importance of the control questions. In psychological research, lying to participants is not permitted and that’s why the BPS took that position.

Public opinion, though, is positive. The press portrays polygraphs positively. The public believes they work, and politicians want to please the public, so they are keen to implement them. But that political pressure doesn’t make the test any more valid.

Another danger: if people are repeatedly tested, such as sex offenders, they may learn countermeasures and eventually beat the test. That’s exactly why Charles Honts and others warn against repeated testing — it undermines validity and increases risk. The support for sex offender polygraph testing is very weak amongst scientists.

Now, on the theory side: we talk about cognitive processing. Yes, lie detection involves cognition. In verbal lie detection, for example, truth-tellers and lie-tellers use opposite strategies. Truth-tellers are forthcoming and willing to provide detail. Lie-tellers try to keep it simple. In interviews, those opposite strategies result in clear differences. But with CQT polygraph testing, both truth-tellers and lie-tellers may experience similar physiological processes. There isn’t the same clear distinction as with verbal lie detection.

To use Concealed Information Test theory to explain the CQT, as Don Grubin did, is problematic. CIT is widely accepted as the better approach, even among CQT researchers. But Don is essentially borrowing CIT theory to justify a CQT, which is not how scientists treat it. Charles Honts, for example, is much more honest about this — he admits we simply don’t know why the CQT works, rather than mixing theories. The claim that CIT theory can explain the CQT test results is a Grubin assumption that will never appear in any decent scientific article.

Finally, the National Research Council did report 80–90% accuracy, but that was only for specific-event incident tests. For non-specific tests, such as those used with sex offenders polygraph testing, they explicitly say accuracy is lower. Don doesn’t acknowledge that distinction. And that’s misleading.

 

Simon

Thank you Aldert. And now Don, please, with your response.

 

Don

Well, I’ve made a note of everything I disagree with, and I’ve got pages of it, but I’ll keep this brief and I’ll just pick on the more important ones.

First, examiner bias: It can be a problem, absolutely, no question about that. But that’s not really about polygraph itself, it’s about bad polygraph practice — something that I referred to earlier. In our programme, we very closely monitor quality control and supervision to reduce the potential for examiner bias. We make sure there isn’t an emphasis on one question or another just because of what an examiner might believe about the offender. You can’t stop these things from happening, but if you pick it up, you can correct the examiner, and again, over time, the idea is that you don’t have an issue with examiner bias.

Regarding reoffending: Aldert said that there’s no effect, and I don’t entirely agree or disagree with that. The fact is, there’s just no evidence either way. We simply don’t have that kind of study. What we do know is that polygraph contributes to offender management. Probation officers and the police managing these individuals tell us it helps — it gives them more information with which they can make decisions. It’s not something you can just pull out in isolation. Does it reduce reoffending long-term? Well we just don’t know. If someone wants to run a ten-year study, I’ll gladly take the money to carry it out! But for now, what we do know is that it improves offender management.

As for existing interview techniques leading to disclosure — absolutely! They’re very effective. And this is what I’ve said before, polygraph testing is complimentary to that. When we run polygraph tests after those interviews however, we still get more information, and that tends to be what convinces offender managers and police officers who are managing offenders that this technique is useful. These individuals gave gone through interviews and we still get more information. So clearly, interviews alone, no matter how good they are, aren’t getting it all. It’s not either/or; it’s complementary.

On losing trust: well in our studies we asked offender managers whether polygraph testing damaged their relationships with offenders. Ninety-five percent said no. In fact, many said it improved their relationship — because when an offender manager suspects something, and the offender keeps denying it, they just go round in circles. With the polygraph test, you can bottom this sort of thing out and stop just going around in endless circles.

Some offenders even find it helpful, because it lets them show they’re not engaging in risk behaviours. Do we miss things? Of course we do. But we were already missing things. If we used to miss a hundred pieces of information, now maybe we miss ten, which is an improvement. As I said, it’s information gain.

On the BPS’ concern about lying: Well that comes from a misunderstanding of the way the test works. Lying to comparison questions isn’t the point. What you’re trying to create is cognitive work. Indeed, you can use what are called ‘directed lie’ questions, where the examinee is told a lie, for example, about ever having committed a traffic violation. Well there’s nothing unethical in that.

There was one study which used a comparison question, “Do you think there’s something odd about the colour blue?” The important point there was to make the examinee think it was important, rather than whether they were lying or not to that.

Countermeasures: well there’s no evidence that our sex offenders are using them in any great amount, but we do monitor this closely, and our quality control staff that used to work for the U.S. government — and they have decades of experience of testing spies, they’re international experts in countermeasures — and they’re looking closely as part of the quality control to see whether offenders are using them or not.

And one thing that often gets overlooked is cost-effectiveness. We did a study with the Metropolitan Police, that if we use polygraph testing to help align risk assessment with offender management — for example, moving low-risk individuals off the sex offender register — the savings can be substantial. In our study with the Met, we estimated that introducing polygraph testing across the force would save about £1.5 million a year.

Simon

Thank you Don. Ok, now before we open the floor, I want to highlight something important. Notice not only where Don and Aldert disagree, but also where they is actual agreement — that’s just as valuable.

From my perspective, this is about practitioners. We want those of you here — and those listening afterwards — to take away insights you can use. Whether you work directly with polygraphs, or you’re advising on policy and strategy, the point is to understand both the benefits and the limitations.

So as we go into questions, think about the practical side: how might this help in your work? Where could the polygraph be useful, and where might it not be? With that in mind, let’s open the floor to your questions.

Question 1: Why are some lie-detection contexts easier than others?

Aldert, a question for you. You’ve argued that when people hear figures of 80–90% accuracy, that’s referring to specific-issue tests, not necessarily sex offender polygraph testing. Some lie detection situations are easier than others, and that has an impact on accuracy. Could you explain why?

 

Aldert

Some lie detection scenarios are easier to resolve than others.

If you have physical evidence and the person denies knowing it, you can use a CIT — and that’s an easier form of lie detection. If you can’t use a CIT (to get back to the OJ Simpson case, if OJ Simpson would have said he found his wife dead at home but didn’t murder her), you can’t use a CIT because there’s no knowledge he hides. But you do have a specific incident (the murder of Nicole), then you can use a specific incident CQT, but that is a more difficult lie detection scenario than the CIT test scenario.

And it becomes even more difficult if there’s no specific incident at all, like with sex offenders. What questions to ask? That difference in difficulty directly affects accuracy. The harder the situation, the lower the accuracy. That’s exactly what the National Research Council says in their report.

We don’t know the accuracy of sex offender testing. There is no research about it. What we do know, from verbal lie detection, is that the following pattern holds true: the more difficult the deception scenario, the lower the accuracy rate.

 

Simon

Question 2: Can polygraph accuracy improve on the 54% human baseline?

A follow-up question for you Aldert: Most research puts untrained human judgments sitting around 54% accuracy. So part of the argument here is that the polygraph or verbal credibility techniques lift accuracy beyond that baseline, even if it’s not up to 90%, does that still improve professional decision making?

 

Aldert

Fifty-four percent is a very specific figure. That comes from studies with lay people—when you give them just a single statement and ask, “Is this true or false?”

But that doesn’t mean that using specific interview styles leads to 54%. Research has shown that some interview techniques lead to 75% accuracy or even higher.

Now, with sex offenders, we don’t know the accuracy. There’s simply no research on it. And why do I think it’s lower than the often cited 80-90% accuracy? Because the deception scenario is more difficult resulting in a questioning protocol that is more problematic than the one used in a specific incident CQT.

With a specific incident test, you ask about one incident, and the examinee knows exactly what you’re referring to. But if you ask something like, “Is there anything that may concern me?” — Nobody knows what’s going through the examinee’s mind. It could be anything, relevant or not. And that kind of vagueness will result in lower accuracy. More specific questions are not the solution either because they may refer to issues the examinee is genuinely innocent about whereas at the same time he can hide serious wrongdoings because the specific questions don’t ask him about these serious wrongdoings.

And of course, there’s also the issue of repeated testing. Ask Charles Honts — he’ll say, “Never do repeated testing.” People come to know how the test works and can use countermeasures to beat it. But in sex offender practice, they are repeatedly tested. So you can’t cite the CQT accuracy rates from specific-incident cases and apply them to sex offending testing.

 

Simon

Question 3: Specific-issue versus screening tests: what is the difference?

When we’re talking about research, how should we understand the difference between specific-issue tests and screening tests like those used with sex offenders or in vetting?

 

Don

I pretty much agree with everything that’s just been said, and I want to come back to your point. It’s not just about sex offenders. What we’re really talking about is the difference between screening tests and specific-issue tests.

And you’re absolutely right Aldert — the research the National Research Council quotes is about specific-issue tests. Those are much easier to study. You can run mock-crime experiments: somebody steals a laptop, somebody doesn’t, and you compare. That’s pretty straightforward research. But how do you do research on a screening test? That’s far more difficult to get right.

 

Aldert

Yes, exactly, I agree with that.

 

Don

So that being said, there are a couple of screening studies in the American Polygraph Association’s meta-analyses, but the reason screening tests are less accurate isn’t because the questions are vague. They’re not. The questions can be very very specific: Have you entered your exclusion zone? Have you had unsupervised contact with a child? Have you handed secrets to the Russians? Whatever. These are very concrete questions.

The problem is about the amount of data you have to work with. In a typical single issue test, you might have three relevant questions, each asked three times — that’s nine repetitions when you’re asking about that single issue. If you have three screening questions, that’s just three repetitions of each question. The less data you have, the more error can creep in. That’s why in screening tests decisions are made regarding the entire test, not specific questions. Although a consistent response to one question does suggest that the examinee is probably concerned about it.

But also, the more separate questions you have, the higher you have of a false positive — someone failing the test who should be passing it. It’s not false negatives that they are concerned with in screening tests, it’s the false positives.

There are ways to address this. One is to narrow the focus. In our program we’ve developed a single-issue screening test, which mirrors the specific-issue format that the National Research Council looked at. That way, you get around the reduced data problem.

Finally, in terms of evidence: we’ve collected over ten years of data now, over lots and lots and lots of tests. Among sex offenders who are deceptive on the tests, about two-thirds go on to provide post-test disclosures to explain why. In other words, confirming that the outcome, of those tests at least, isn’t a false positive.

 

Simon

Question 4: What do we actually know about screening accuracy?

Another member of the audience asks, “When we move from 80–90% accuracy in specific-issue tests into screening contexts, like with sex offenders — what do we actually know about accuracy?” Don, you’re saying it’s still high, but we don’t have the research to prove that. Is that right?

 

Don

Yeah, I would agree with that. The evidence we do have suggests it’s more on the 80% side rather than the 90% side, but still within that window.

But it’s important to remember that the polygraph is not a lie detector, which I think also goes to the screening issue. What it’s picking up is how much that question is salient to the individual, which is of course, a different issue, and is especially relevant in screening contexts, and when you’re managing individuals post-conviction.

 

Aldert

But you need to cite peer-reviewed research to back up that 80% figure. And you can’t — because it doesn’t exist. You can say, “we think it’s 80%,” but that’s not evidence.

What makes you say that sex offender screening tests reach anything close to 80%? There isn’t any evidence. If you can show me valid, peer-reviewed articles in solid journals, with proper data analysis, then yes — I’ll accept it. But until then, no.

 

Don

So if we can show you evidence that it is high, you’ll change your view?

 

Aldert

If it is in a peer-reviewed journal, with solid data, then yes. One study alone is not enough — but if there’s strong evidence emerging from multiple studies, of course I would reconsider. We NEED to get money to get that research done, we NEED to do the field studies, and get the ground truth.

 

Don

Does anybody in the audience have any money for a study?

(Audience laughter)

 

Simon

If anybody in this room has funding, this is where it matters! Practitioners need to take this debate back to their bosses. And particularly in screening, which is a big push in personal security right now, the question is crucial. If I were a boss with funding, listening to this, I’d want to support proper studies to really work out the accuracy of screening tests. Why? Because sooner or later it’s going to be my job — or my successor’s job — to explain why, out of a hundred people, ten slipped through who shouldn’t have. Alright so the challenge on the table is clear: show peer-reviewed evidence that screening accuracy is at 80%. That’s the heart of the debate here — practice suggesting one thing, research demanding stronger proof.

 

Simon

Question 5: Is the polygraph the "holy grail," or just another tool?

One audience member put it quite directly: Is the polygraph the holy grail, or should we just be using other tests? Practitioners say they want to use it, because they hear it works. But because of all the debate about accuracy, they don’t have the confidence. So my question to both of you is: does the polygraph genuinely add value to the process — whether that’s in screening, security vetting, or investigations?

 

Don

I’ve already said this a couple of times, but let me make it clear: Screening tests are less accurate than single-issue tests. I’m not arguing about that. It’s true. My view, though, is that they still will fall within that 80–90% accuracy window, though the accuracy will decrease the more questions that you ask.

But accuracy isn’t the whole story. You also get disclosures. That’s the other benefit, and as I said before, the two are complementary.

We know from American studies in pre-employment screening, and our own studies here, that huge amounts of previously unknown and often unexpected information comes out — people admitting to crimes, drug use, even being involved in serious crimes. Why they disclose, we don’t exactly know — there’s some theories — but they do.

So no, the polygraph isn’t the “holy grail.” It’s not perfect. But it does add value. For example, in vetting: if someone has passed every stage of the process, and then fails the polygraph, that doesn’t mean you reject them immediately. What it does mean is you go back and take a closer look to see if there’s something that you’ve missed. Sometimes you might decide it’s a false positive and let them through. But are you confident enough to just let them sail through without checking? And that’s the point — the polygraph gives you extra information that lets you focus your attention where it’s needed. It aids decision making. It doesn’t stand alone though — what does? There is no ‘Holy Grail’ in this area anyway.

 

Aldert

I think it is okay to use sex offender polygraph testing, but you should not rely too much on it. The danger is this: if you assume accuracy is close to 80%, you start treating it as highly reliable. But in reality, we have no evidence on accuracy in those kind of tests. It should be one tool in the toolbox, not the only one.

The problem is, too often examiners rely heavily on the outcome. Don said earlier, “if somebody passes the test, there’s nothing we’ve missed.” But how do you know that? You don’t. That’s exactly the problem.

And using polygraph tests for disclosures — there are better tools than a polygraph for disclosures. There are fantastic interview techniques developed that are very successful in making interviewees to talk and report sensitive information. So yes, you can use sex offender polygraph testing — but don’t pretend it’s 80% accurate, don’t think it is the best tool we have, and don’t place too much trust in it.

 

Don

Well I don’t know of any better tool for disclosures, but regardless, it’s not just about accuracy, it’s also about utility. Its value is in information gain and the complimentary nature of outcome and disclosure.

And since you’ve said you’re more comfortable with single-issue testing, let me ask: in police investigation, where there’s a clear incident — say an informant claims, “There’s a bomb placed at Victoria Station. True or false?”, well that’s a specific-incident test. Would you accept that kind of use for the polygraph?

 

Aldert

Informants are a special case, because the relationship between source and handler is so important. I don’t think the polygraph would be good for that relationship.

 

Don

But we don’t have evidence it would harm that relationship. People always often say that, but there’s just no evidence, and as I said earlier, we have evidence that it doesn’t.

And in terms of testing the information itself — in a specific incident — that’s exactly where the polygraph can be most useful.

 

Simon

Question 6: Does cooperation matter in making the test work?

Ok following on from informants, another audience member asked whether you both agree that cooperation is essential in verbal credibility assessment or in the polygraph? In other words, the person has to be engaged and willing to take part. I see Aldert nodding yes – you need cooperation. Don – how does that actually play out in practice, especially when someone like a sex offender or a domestic violence offender may not walk into the room ready to cooperate?

 

Don

That’s a very fair point, and it’s true that when a sex offender or domestic violence offender comes into the room, they’re not always in the frame of mind to cooperate.

The point you’re making is important, because a lot of people think it’s just a matter of walking in, sitting down in the chair, getting strapped up, and having questions fired at you. That’s the version you see on television.

In reality, there’s a pre-interview process that can take a couple of hours or more. During that time, the aim is to build rapport, to get the person focused on the questions they’ll be asked, and to give them confidence that if they’re telling the truth, they’ll get through the polygraph. That’s a big part of it.

And this is something examiners are specifically trained in—interviewing techniques, establishing rapport, managing the session before the test even begins.

There’s a great example from The Simpsons that I often think of in this respect. Has anyone seen Mo being polygraph tested? He sits down with this contraption around his head, and there’s a light that goes red or green. They ask, “Do you hold a grudge against Montgomery Burns?” He says, “No,” and the red light flashes. Then he admits, “Alright, I had a grudge, but I didn’t kill him!” The light goes green. “Okay, that checks out, he can go.” And Mo says, “Great, because I’ve got a hot date tonight.” The red light goes off. “A date.” Red light. “Dinner with friends.” Red light. And eventually he says, “I’m going to sit at home and Google the Sears catalogue.” And finally, the green light comes on.

Well that’s how people imagine polygraph works—strap someone in, ask questions, and watch the lights go red or green. But of course, that’s not how it works in practice.

 

Simon

Question 7: Do repeated tests reduce disclosures – or help offenders beat the polygraph?

Don, a question for you. You quoted a large study showing that disclosures from offenders reduced over time with repeated polygraph testing, and you suggested that was because individuals had revealed everything they were going to disclose. But couldn’t another explanation be that they were actually learning how to beat the polygraph, rather than becoming more open?

 

Don

I’m not sure exactly which of the studies you’re referring to, but we keep detailed stats from the mandatory testing, which has been running now for over ten years.

What we’ve found is that there isn’t, in fact, a significant reduction in disclosures over that time. On the first test, it’s about 66%, around two thirds. On a retest, it’s about 60%. So, there’s no massive reduction. What we have found, however, is a reduction in post-test disclosures after a deceptive outcome. And you have to remember as well, that they’re being asked different things as the tests go on over time.

With sex offenders, we don’t find evidence of countermeasures in any meaningful way. To do that effectively, you’d need a lot of practice. We pick up around 5% where it looks like they may be using countermeasures.

It’s different with terrorist offenders, where attempted countermeasures appear to be much higher —and there are probably a number of reasons for that.

But the important point is, we believe we’re largely picking them up. Are we missing some? Probably. But again, they’re already beating you, they’re just continuing to do so, but we’ve cut that number down. What the test does is reduce the number of this — the times of them beating you — substantially.

 

Aldert

Research has convincingly shown that people can beat polygraph testing if they know how these tests work. Those who have invented the CQT polygraph test made it very clear never to use repeated testing as it diminishes accuracy. Don is isolated in his claim that it doesn’t matter.

 

Simon

Question 8: Does the technology itself matter?

Ok, so moving on, this audience question is for both of you. You’ve both focused a lot on good practice, accuracy, and test methods. But how much does the technology itself matter?

 

Aldert

To a far lesser extent. The technology itself isn’t really the problem. It depends on which test you are using.

With a polygraph, the machine is actually quite accurate in measuring what it measures: arousal. That’s not the issue. The issue is the questions you ask; the type of test you use: CIT, CQT specific incident test; CQT non-specific incidents test (the sex offender test). That’s what determines accuracy.

Now, if you move to something like a voice stress analyser, that’s different. Those measurements are noisy, unreliable. They really don’t work. But the polygraph machine itself? It’s accurate. The real issue lies in the test you use, not the technology.

 

Don

Yeah, I’d agree with that. The technology for polygraphing is pretty good. It’s all digital now, and I don’t think I’ve ever seen serious criticism of the equipment itself. That’s not where the problem lies.

 

Simon

Great, it sounds like we’ve found a point of agreement — the technology itself isn’t the issue, it’s how the tests are designed and used.

Question 9: AI and commercialisation: reality TV versus reality

Ok here’s a lighter question from another audience member that I think would be of interest to the public: I’m not a practitioner or an academic, so I know very little about this, which is why I was excited to come to this debate. I was recently watching a reality TV show called The Honesty Box, where contestants were given polygraph tests to judge whether they were being truthful in their relationships. They said it was “AI-led polygraphy.”

So I suppose I have two questions. First: could AI in the future be capable of running some form of polygraph? And second: what are your views on the commercialisation of polygraph, especially now it’s showing up in mainstream entertainment?

 

Don

Well, polygraph has unfortunately always been part of daytime television. I must admit, I missed that episode of The Honesty Box, but I’ve seen plenty of clips online — and they don’t give a good portrayal at all.

These shows tend to trivialise polygraph testing and give a false impression of it. And they often claim it’s 100% accurate, which is, of course, simply not true.

In the private, unregulated world of polygraph testing, most money seems to come from so-called ‘fidelity tests’ — people bringing in their boyfriends or girlfriends to check if they’ve been faithful. We don’t allow our examiners to do those kinds of tests.

As for AI: well the big challenge is data. AI needs huge datasets — millions of examples. Even with 10,000 tests, which we might potentially get, that’s only a tiny amount of training for AI. So, while people are looking into it, I don’t see how they’ll get enough data to make it work reliably. There’s a lot of talk that AI will take the place of things that humans do. To test that out every year, I asked AI to tell me a polygraph joke, and so far the jokes have been pretty poor! So, until they can tell a good joke, I think that we're safe from AI taking over.

 

Aldert 

I completely agree with Don. I don’t know how AI polygraph is supposed to work, but what you see on TV has nothing to do with how professionals use the polygraph in their practice.

Real practice is far better than what is shown on TV as entertainment. So don’t assume what you see on tv, or read in magazines, reflects reality — it does not.

 

Simon

And that’s exactly why debates like this are important. Decision-makers often only have five minutes, and if all they’ve seen is something on TV — like The Honesty Box or Jeremy Kyle — that’s what shapes their views.

I’m not criticising the media, but if that’s where people are getting their knowledge, then experts like you need space to explain what’s really going on, based on evidence rather than television narratives.

Question 10: Polygraph and neurodiversity: how do these differences in brain function impact how people respond to the test?

This is a more serious question. An audience member asked “I’m interested in the intersection between polygraph testing and EDI — particularly with people who are neurodiverse or have mental health difficulties. You’ve talked a lot about polygraph working through cognitive load and autobiographical memory. But for someone with PTSD, memory might be limited. For people with autism or ADHD, their brains work differently. There hasn’t been much research into how this affects polygraph accuracy. But based on your experience, how do these differences in brain function impact how people respond to the test?

 

Don

We do have a lot of experience with that. In some of our cohorts there’s a high incidence of neurodiversity and mental health conditions. And remember, I’m coming at this as a psychiatrist, so our examiners get a lot of training in interviewing people with PTSD, neurodiverse backgrounds, and so on.

But you’re right — there’s no specific research on these groups. In theory, though, there’s no reason to believe the polygraph wouldn’t work in the same way, if we’re talking about cognitive load and salience.

The one exception is people on the autistic spectrum. We know that sometimes they can get ‘stuck’ — again, without going too deep into the technique, they may focus on one set of questions in a way that biases the results.

So our position is that we need to be more cautious with these groups, because the evidence just isn’t there. But from what we do know about the underlying theory, there’s no strong reason to think that the polygraph would work in any fundamentally different way. It’s interview technique that requires more attention, not the test itself.

 

Simon

Aldert, can I bring you in — but from the lens of credibility assessment more broadly, rather than just the polygraph itself?

 

Aldert

There’s no research on this, so we simply don’t know. To make predictions, you need a theory about why lie-tellers and truth-tellers respond the way they do. We know that truth tellers and lie tellers have different aims in interviews: The aim of truth tellers is that the interviewer comes to know all they (the truth tellers) know. The aim of lie tellers is that the interviewer does NOT come to know what they (the lie tellers) know. These contrasting aims lead to contrasting strategies: truth-tellers are forthcoming and willing to share all details they know, whereas lie-tellers try to keep things simple.

As long as truth tellers and lie tellers have these contrasting aims and use those contrasting strategies, credibility assessment works. But if they don’t — for example, very young children, or people with memory impairments, or people with trauma that makes it hard to remember what happened — then credibility assessment could be difficult because truth tellers can’t provide much detail. That’s why theory is so vital: it lets you predict when a method will work, and when it won’t work. And for CQT polygraph testing, including sex offender polygraph testing, that theoretical basis does not exist.

 

Simon

And that links to culture too. I’m glad this point came up, because it’s not about criticism — it’s about understanding limitations. If you’re going to use a method, whether it’s polygraph or verbal credibility assessment, you need to recognise how it can be applied, and what skills are required to interpret it.

 

Don

Can I just add a quick point? Aldert keeps referring to Charles Honts as if he’s the authority on this, and yes, he’s influential. But just because Honts says “there’s no theory” doesn’t mean there isn’t a theory. We do have a theory of how polygraph testing works. It’s about cognitive load and salience — I’ve referred to it before, but I can go into more detail if people want. But there is a theoretical basis to polygraph testing.

 

Simon

Question 11: What would a proper research study on screening look like?

And finally, our last question from the audience asks: What would an experimental study to validate the use of the polygraph in screening actually look like?

 

Aldert

I think we need to take a step back and start in the lab.

Everyone wants real-life data, but that is the end point — we can’t begin there because there’s no reliable ground truth in the field. Ground truth means that we know with 100% certainty who was lying and who was telling the truth. We cannot say how effective lie detection tools are if we do not know who lied and who actually told the truth.

So you work with practitioners to identify the key issues, the kinds of questions they want to answer, and then design controlled experiments in the lab around them. That’s how we carry out our verbal lie detection research now for many decades. It’s not one study — it’s many studies that logically flow from each other, and that takes years to complete. Science is slow, and it costs money, but that’s the only way to build a proper evidence-based lie detection tool.

And here’s the problem with field data: the feedback is biased. If someone passes the test, you assume it’s correct and don’t check further. The guilty examinee will get away with it unnoticed. If someone fails, you push for a disclosure. To satisfy the examiner, the examinee may give one, but you don’t know how accurate those disclosures are and may never find that out. However, each disclosure is seen as evidence that the polygraph result was correct. In other words, errors again remain unnoticed.

The result of errors staying unnoticed is that examiners become convinced that the test is more accurate than it really is. That’s why lab studies are essential — they give you proper feedback about your decisions and a real measure of accuracy.

 

Don

I agree that lab studies are important, but I don’t think we should dismiss field data.

We now have results from over 10,000 tests in our programme. While we don’t have perfect ground truth in screening — in single-issue testing — and that gives us a baseline. So we can ask: do the patterns we’re seeing in the field line up with what we’d expect based on lab studies?

Post-test disclosures are one piece of that puzzle. They’re not perfect, but when a failure leads to a disclosure, that’s a strong indication the test picked up something real.

So yes, we need controlled lab experiments. One could, for example, set up a mock pre-employment screening experiment, where the individual is asked things known to be true about their background. But we should also use the large field datasets already available — otherwise, we’re ignoring valuable evidence. And ideally, the two approaches would run in parallel: lab studies building theory, and field studies showing how it plays out in practice.

 

Aldert

Yes, I can agree with that. Ideally, you would run the two approaches: carefully designed lab studies to build the evidence base, and field studies to see how well that holds up in practice. But the key is to be cautious about how you interpret the field data, because without proper ground truth, it can be misleading.

And there are still open questions we need to test. For example, does the polygraph itself produce more disclosures, or is it the interview process around it that matters most? Also, if you change the order — interview first, then polygraph (which is the current practice) versus polygraph first, then interview — would the effect be different?

Discussing only one incident or topic in a polygraph test (they are designed for this) versus multiple incidents or topics in one polygraph test (that’s current practice in sex offending polygraph tests), would the accuracy be different? These are empirical questions, and we shouldn’t assume we already know the answers. We shouldn’t assume that those questions don’t matter because the accuracy will stay the same. It's just not the case.

That’s why collaboration between practitioners and researchers is vital: to identify the gaps in knowledge and then to design research around it to address these gaps.

 

Simon

Thank you both — and I think Becky’s hand-waving means it’s time for us to wrap up.

I just want to say how much I’ve enjoyed this session. For me, it’s been a real highlight. This was a tricky subject, with strong opinions on both sides, but I think we’ve managed to show the nuance in the challenging questions, and our presenters gave us a thoughtful, constructive debate. That’s how the field moves forward — well, it will move forward if we can find some funding to finally do the studies and settle the debate once and for all!

So, a huge thank you to all of you for your contributions, and especially to Professor Aldert Vrij and Professor Don Grubin for sharing their time and expertise. A special thanks as well to the CREST team — in particular, to Alex Brown for organising today’s session, and to Becky Stevens for editing what I know was quite a difficult podcast to pull together.

And on that note, there are still three scones left and a couple of pastries at the back — so please do help yourselves. And to our listeners, thank you for tuning in, and don’t hesitate to email us with your thoughts at crest@lancaster.ac.uk

Thank you all very much.