HomeBlogHow Emotional AI Claims to Read Your Feelings — and Why It Probably Can’t

How Emotional AI Claims to Read Your Feelings — and Why It Probably Can’t

How Emotional AI Claims to Read Your Feelings — and Why It Probably Cant - Scott Dylan

The Promise of Reading Emotions Algorithmically

A significant frontier in AI development is affective computing—systems designed to recognise, interpret, and respond to human emotions. The narrative is compelling: AI systems that can detect your emotional state from facial expressions, vocal tone, body language, or physiological signals. Such systems could theoretically improve customer service (by detecting frustration and routing to human agents), enhance mental health support (by identifying depression or anxiety), or improve workplace safety (by detecting stress before it leads to mistakes). In education, emotion-detecting systems could theoretically identify when students are confused and adjust instruction accordingly. The potential applications are vast.

The technology companies developing these systems have significant financial incentive to promote the narrative that emotion can be reliably detected and classified algorithmically. Facial expression recognition systems are being integrated into recruitment tools to assess candidate authenticity and trustworthiness. Emotion recognition algorithms are being deployed in schools to monitor student engagement. Call centre systems analyse vocal tone to detect and escalate frustrated customers. Surveillance systems claim to detect suspicious behaviour by analysing emotional states. Automotive systems monitor driver state to detect fatigue. The deployment is advancing rapidly, with significant sums invested in affective computing companies.

Yet there’s a fundamental problem at the heart of emotional AI: the scientific basis for emotion recognition algorithms is far shakier than the marketing narratives suggest. The assumption underlying these systems—that emotions have universal facial expressions, vocal markers, and physiological signatures that algorithms can detect—is increasingly questioned by neuroscience and psychology research. Understanding why this assumption is problematic is essential for evaluating emotional AI claims and recognising the risks of deploying these systems in consequential contexts.

The Neuroscience of Emotion: Beyond Universality

For decades, the scientific consensus was that emotions were discrete categories—fear, anger, sadness, joy, disgust, surprise—each with universal facial expressions. This framework came from influential research by Paul Ekman in the 1970s, which suggested that facial expressions for basic emotions were recognised consistently across cultures and were therefore biologically universal. This universality claim became foundational to emotion recognition systems: if emotions map reliably to facial expressions, then algorithms trained to recognise those expressions could reliably detect emotions.

This framework is increasingly challenged by contemporary neuroscience and psychology. Researchers including Lisa Feldman Barrett have conducted extensive research demonstrating that emotions are not discrete categories with universal expressions, but rather constructed states varying across cultures, individuals, and contexts. Barrett’s work on constructed emotion argues that emotional experience is generated in the moment through a process that brings together bodily sensations, conceptual knowledge, and context. The same facial expression might represent different emotions in different contexts, or might not represent an emotion at all.

Cross-cultural research has further challenged universality claims. Facial expressions recognised as reflecting particular emotions in Western cultures are interpreted differently in non-Western cultures. There’s also substantial individual variation: the same person might express the same emotion differently in different contexts, or express different emotions with similar facial expressions. Additionally, many emotions don’t have distinctive facial expressions at all—guilt, embarrassment, and pride, for instance, don’t have reliable facial signatures. The discrete emotion framework breaks down when examined empirically.

Lisa Feldman Barrett’s Research and the Constructed Emotion Theory

Lisa Feldman Barrett, a prominent neuroscientist at Northwestern University, has become a key voice challenging the validity of emotion recognition systems. Her research, synthesised in books like ‘How Emotions Are Made’ and in numerous scientific publications, presents compelling evidence that emotions are not read from the world but constructed by the brain using past experience and prediction. This constructed emotion theory has significant implications for whether algorithmic emotion recognition is even possible in principle.

The theory works as follows: your brain is constantly making predictions about incoming sensory information based on patterns it’s learned. When you encounter a situation, your brain predicts what you’re feeling based on what you’ve learned in similar situations. You then perceive emotion based on that prediction, adjusted for your current context and physiology. Different people, with different experiences, might construct entirely different emotions in response to the same situation. Additionally, your brain is constructing emotion for yourself, not broadcasting it in your facial expression for others to read. A facial expression is something you produce—often unconsciously—but it’s not a direct window into your emotion.

Barrett’s work specifically challenges the concept that emotions are written on the face in ways that algorithms can reliably read. If emotions are constructed in the moment based on individual history and prediction, and if the same facial expression can represent different emotions in different contexts, then algorithmic systems trained to map facial expressions to discrete emotion categories are fundamentally mismatched to the actual nature of emotion. The algorithms might achieve modest accuracy in controlled laboratory settings, but this accuracy reflects pattern-matching to training data rather than genuine emotion detection.

The Technical Challenge: Training Data and Labelling

Practically, emotion recognition algorithms face a crucial technical problem: they require labelled training data where emotional expressions are categorised. Someone—a researcher or annotation worker—must watch footage of people and assign emotion labels: ‘this is fear’, ‘this is anger’, ‘this is joy’. But if emotions aren’t discrete categories with universal expressions, this labelling process is inherently unreliable. Different people labelling the same facial expression might apply different emotion categories. The same expression from the same person in different videos might be labelled differently depending on context. The training data itself is contaminated by the ambiguity and subjectivity of emotion.

Additionally, emotion recognition systems are typically trained on datasets with particular demographic characteristics. They’re trained mostly on actors making exaggerated expressions in laboratory conditions, not on genuine emotional expression in real-world contexts. This creates a significant generalisation problem: the expressions in training data are stylised and artificial compared to genuine emotional expression. An algorithm trained on these datasets will perform differently in real-world conditions, particularly on people from different demographic backgrounds, with different expression norms, in different contexts.

The demographic bias in emotion recognition is well-documented. Like other computer vision systems, emotion recognition algorithms perform less accurately on people of colour, women, people from non-Western backgrounds, and people with facial differences. This isn’t just an accuracy problem; it’s a fairness problem. If emotion recognition systems are deployed in consequential contexts—hiring decisions, medical evaluation, law enforcement—demographic bias means that certain groups are misclassified more frequently than others, creating disparate impact.

What Emotion Recognition Systems Actually Measure

A crucial distinction that’s often blurred in affective computing marketing is between measuring facial expression and measuring emotion. Emotion recognition systems are technically measuring facial expressions—the configuration of facial muscles, head position, eye gaze. They’re not measuring emotion; they’re making inferences about emotion from expression. These inferences are based on assumptions about the relationship between expression and emotion, assumptions that neuroscience increasingly suggests are wrong.

What the algorithms actually do is create correlations between facial configurations and emotion labels in training data. These correlations might exist—certain expressions might be statistically more likely to co-occur with certain emotions—but correlation is not causation or reliable indication. A facial expression might indicate emotion, or might indicate something completely unrelated: an itch, a tic, an attempt to manipulate the system, cultural expression norms, or simply how that person’s face looks. The algorithm can’t distinguish these possibilities; it makes probabilistic predictions based on correlations.

This matters critically for applications where emotion detection informs consequential decisions. If a hiring system rejects a candidate because it detected ‘lack of engagement’ from facial expression, it’s rejecting them based on inferred emotion from expression, not on actual emotion or capability. If a school system flags a student as ‘disengaged’ or ‘stressed’ based on emotion recognition, it’s making inferences about psychological state that might be entirely wrong. The confidence the system displays in its classifications—often expressed as probability scores—is based on training data correlations, not on validity of the underlying construct being measured.

Affective Computing in Practice: Recruitment and Education

One of the most consequential deployments of emotion recognition systems is in recruitment. Various companies have developed or claim to develop systems that analyse video recordings of job candidates, assessing emotional expressions, facial movements, vocal characteristics, and other features to generate scores on traits like authenticity, engagement, personality, and trustworthiness. The pitch is that these systems enable objective assessment of candidate suitability beyond what interviews traditionally measure.

The problems with this application are profound. First, the systems are measuring facial expressions and vocal characteristics, not the traits they claim to measure. An algorithm can correlate facial expressions with engagement in training data, but correlation with engagement in training data doesn’t mean the expression reliably indicates engagement in novel contexts. Second, candidates aware they’re being analysed for emotion might alter their expressions—trying to appear more engaged, more likeable, more authentic—making measurements of genuine emotional state impossible. Third, demographic bias means that expressions normal in some cultures might be classified as ‘lack of engagement’ by the algorithm, disadvantaging candidates from non-dominant cultural backgrounds.

The impact is discrimination through the appearance of objectivity. A human recruiter’s bias is visible and potentially correctable; it can be challenged, discussed, and adjusted. An algorithm’s bias is hidden in training data and model weights, presented as objective measurement, and difficult to contest. A candidate rejected because a human recruiter had a bad impression has recourse to point out that recruiters can be biased. A candidate rejected because an emotion recognition system detected ‘low authenticity’ has minimal recourse—how do you contest an algorithmic measurement of authenticity?

Emotion Recognition in Schools and Student Monitoring
How Emotional AI Claims to Read Your Feelings — and Why It Probably Cant - Scott Dylan

Some school districts have piloted emotion recognition systems to monitor student engagement and emotional state during learning. The proposed benefit is that systems could identify students who are struggling—confused, anxious, disengaged—and flag them for additional support. The risk is pervasive surveillance of children’s emotional states coupled with inferences likely to be inaccurate and biased. A system flagging a child as ‘disengaged’ based on facial expression might be wrong; the child might actually be focused, or might be naturally less expressive, or might be from a cultural background with different expression norms.

The psychological impact of continuous emotion monitoring is also concerning. Children knowing they’re being monitored for emotional state might modify their expressions, creating artificial data. They might become anxious about being judged for their emotions, affecting genuine engagement. Teachers knowing that students’ emotional states are being monitored and evaluated might become defensive about classroom dynamics. The systems assume that emotion can be inferred from expression, enabling intervention, but the inference is unreliable and the intervention might target false positives.

Additionally, there are development and equity concerns. Students from minority backgrounds show higher error rates in emotion recognition systems. Students with facial differences, tics, or non-dominant expression styles will be systematically misclassified. The systems amplify and automate classroom biases—if teachers tend to perceive certain students as more engaged based on expression, emotion recognition systems will learn and reinforce those perceptions. Using these systems to guide educational intervention could worsen educational equity.

Workplace Surveillance and Emotional Monitoring

Employers are increasingly adopting emotion recognition systems ostensibly to monitor employee wellbeing, detect stress before burnout occurs, or improve customer service interactions. Call centre workers’ calls are analysed for emotional tone. Remote workers are monitored through webcams and emotion recognition software. Employees’ stress levels are assessed through facial analysis and heart rate monitoring. The framing is benevolent: we’re trying to support employee wellbeing.

Yet the practical effect is workplace surveillance that extends beyond behaviour to inferred emotional state. An employee can modify their behaviour to comply with workplace norms, but monitoring inferred emotion suggests something more invasive: assessment of internal psychological states without consent or understanding. Additionally, the inference is unreliable. An employee might have a naturally less expressive face, might be from a cultural background with different expression norms, might have facial differences that the system misinterprets as emotional markers. Emotion recognition monitoring creates the possibility of being disciplined or evaluated based on inaccurate inferences about your emotional state.

There’s also a power dynamic issue. Employers monitoring emotion can use that information to manipulate workers—identifying who’s stressed and intervening with additional pressure, or identifying who’s satisfied and reducing support. Workers can’t easily defend against inferences about their emotional state because they don’t necessarily agree with them and can’t always control their expressions. This creates a work environment where workers must manage not just their behaviour but the appearance of managing their emotions, creating additional emotional labour.

The EU AI Act and Restrictions on Emotion Recognition

Recognition of the risks of emotion recognition systems has prompted regulatory response. The EU AI Act, finalised in late 2023, explicitly restricts the use of emotion recognition AI. Specifically, the Act restricts the use of emotion recognition systems in law enforcement contexts (in policing, border control, and immigration) and in critical infrastructure management. The restriction acknowledges that emotion recognition lacks sufficient scientific validity to reliably support consequential decisions about people’s rights or safety.

The EU’s approach reflects recognition that the scientific basis for emotion recognition is uncertain, the potential for misuse is significant, and the impact on fundamental rights is serious. An emotion recognition system incorrectly inferring a traveller’s emotional state could affect border crossing decisions. An emotion recognition system inferring a suspect’s emotions could influence police investigation decisions. These applications are potentially so harmful that the EU concluded they shouldn’t be permitted even if they achieved some technical accuracy, because the lack of scientific validity combined with the consequentiality of decisions creates unacceptable risk.

Beyond these direct prohibitions, the EU AI Act requires that emotion recognition systems be identified as such when deployed, and requires transparency about their limitations. This is significant because it denies emotion recognition systems the appearance of objective measurement. Systems using emotion recognition must disclose this, enabling informed choice about whether to interact with systems that are making inferences about your emotional state. In the UK, which has left the EU, equivalent restrictions have not yet been implemented, though there’s discussion of similar safeguards.

The Science: Why Emotion Can’t Be Objectively Classified

A fundamental issue with emotion recognition is that emotion is not a natural kind with objective boundaries that can be discovered through measurement. Emotions are categories created by humans, varying across cultures, and experienced differently by different people. The Wittgenstein concept of family resemblance applies well to emotion: emotions in a category share some features with each other but might not all share a common essential feature. Anger, for instance, can involve different physiological states, facial expressions, cognitive patterns, and behaviours depending on the person and context.

This categorical ambiguity makes objective emotion classification impossible. An algorithm might achieve statistical correlations between facial expressions and self-reported emotion in training data, but this reflects pattern recognition in the data, not discovery of objective emotion categories. The categories remain social constructs. This is different from measuring something like temperature or height, which are continuous variables with objective measurement. Emotion classification is more like classifying art into genres—you can find patterns, but the classification is ultimately subjective.

Additionally, self-reported emotion—what people say they’re feeling—is not a reliable ground truth for training data. People might misidentify their own emotions, might report what they think they should feel rather than what they do feel, might interpret the same emotional experience differently depending on context. Using self-reported emotion as the labels for training emotion recognition systems builds in this unreliability. The algorithms are trained to match facial expressions to people’s stated emotions, but people’s stated emotions are themselves uncertain.

Voice Analysis and the Broader Problem

Emotion recognition isn’t limited to facial expression; vocal emotion recognition systems attempt to detect emotion from voice characteristics like pitch, speed, intensity, and vocal quality. These systems have similar problems to facial recognition. Vocal characteristics correlate with emotion in training data but the relationship is neither universal nor reliable. The same person might express the same emotion with different vocal characteristics in different contexts. Different people might express different emotions with similar vocal characteristics. Cultural differences in vocal expression norms mean the systems perform differently across demographic groups.

Vocal emotion recognition is used in call centre analysis, customer service routing, and security applications like lie detection. The same problems apply: the systems measure vocal characteristics, not emotion, and make inferences about emotional state that might be wrong. A person calling customer service might sound frustrated because they’re actually frustrated, or because they have a naturally harsh voice, or because they’re speaking on a poor connection, or because they’re from a cultural background with different vocal norms. The system can’t distinguish these possibilities.

When used for lie detection—assessing truthfulness through voice analysis—vocal emotion recognition systems are particularly problematic. The assumption underlying voice-based lie detection is that lying produces detectable emotional responses, but this is disputed by research. Some liars experience emotion (fear of detection, for instance) but some don’t. Some truth-tellers experience emotion (anxiety about being disbelieved) even though they’re being truthful. Emotion and truthfulness are not closely related. Using vocal emotion recognition for lie detection is scientifically invalid and creates risk of false accusation.

Physiological Monitoring and the Broader Surveillance Landscape

Beyond facial and vocal analysis, companies are developing systems that infer emotional state from physiological measures: heart rate, skin conductance, pupil dilation, even blood oxygen levels captured through computer vision. The pitch is that physiological measures are more objective than behavioural ones—you can’t control your heart rate like you might control your facial expression. This sounds scientific and objective, but it’s equally problematic.

Physiological measures respond to many conditions besides emotion: physical exertion, caffeine, temperature, medical conditions, medications. A person’s elevated heart rate might reflect fear, or excitement, or recent exercise, or cardiovascular issues, or caffeine consumption, or many other factors. The same physiological signature might accompany different emotions in different contexts. Using physiological measures to infer specific emotions is scientifically invalid. Additionally, collecting physiological data through surveillance—without explicit knowledge or consent—raises serious privacy and bodily autonomy concerns. Employers monitoring employees’ heart rates are collecting biometric data that reveals health information.

What’s emerging is a comprehensive surveillance infrastructure attempting to infer internal psychological states through multiple data streams: facial expression, voice, physiology, text analysis, and more. Even if each individual system had decent accuracy—which they don’t—combining multiple noisy inferences compounds uncertainty. The surveillance is becoming more invasive, more totalising, and more inaccurate, yet being presented as increasingly sophisticated and reliable.

The Problem of Scope Creep and Normalisation

A significant risk with emotion recognition systems is scope creep: systems deployed for limited purposes eventually expanding to broader uses. A system deployed to identify customer frustration in call centres might be expanded to monitor employee stress. A system deployed to identify disengaged students might be expanded to monitor for emotional instability. A system deployed to identify deception in security screening might be expanded to general law enforcement. Each expansion happens incrementally, often without explicit policy decision or public debate, but collectively they create a society where emotional states are continuously monitored and inferred by algorithmic systems.

Additionally, as emotion recognition systems become more deployed, they become normalised. People become accustomed to emotion monitoring and view it as standard rather than exceptional. This normalisation makes it harder to subsequently restrict or remove systems because both users and institutions have adapted to their presence. The systems become embedded in infrastructure, making change difficult even after risks become apparent.

What’s required to prevent this scope creep is clear restrictions on use cases, transparency about where systems are deployed, and ongoing evaluation of whether they’re actually solving the problems they claim to address. Currently, these safeguards are often absent. Systems are deployed with minimal public notice, restrictions evolve informally, and evaluation is typically done by vendors rather than independent researchers.

Why the Hype Persists Despite Weak Evidence

Given the weakness of the scientific basis for emotion recognition, why are these systems being so enthusiastically developed and deployed? Several factors are at play. First, there’s significant financial incentive. Companies developing emotion recognition systems have strong motivation to promote narratives that their systems work reliably. Venture capital has invested in these companies based on promises of large markets. Existing companies have invested in deploying systems and are reluctant to acknowledge that the systems might not work.

Second, there’s a halo effect from AI more broadly. AI systems have achieved impressive capabilities in image recognition, language processing, and other domains, creating a generalised sense that AI can do almost anything. This halo effect extends to emotion recognition even though emotion is a domain where AI’s limitations are particularly significant. The reputational success of AI in other domains makes people less sceptical of emotion recognition claims.

Third, emotion recognition systems appear to deliver what organisations want. A hiring system that claims to detect authenticity from facial expression appeals to hiring managers who want better tools for assessment. A customer service system that detects frustration from voice appeals to companies seeking to improve satisfaction. A student monitoring system that detects disengagement appeals to schools wanting to improve outcomes. That the systems are actually unreliable is less visible than their superficial appeal.

The Ethical Dimension: Autonomy and Authenticity

Beyond the scientific critique, emotion recognition raises profound ethical questions about autonomy and authenticity. When you know your emotional expressions are being analysed, you can’t authentically express emotion—you’re performing emotion, aware that you’re being assessed. This creates a paradox: systems designed to detect authentic emotional expression actually create conditions where authentic expression becomes impossible. People being monitored for emotion adjust their expressions to appear the way the monitor expects them to appear, contaminating the very data being analysed.

There’s also an autonomy question: who has the right to infer and interpret your emotional state? Even if emotion recognition were scientifically valid—which it’s not—there’s an ethical question about whether employers, schools, or government agencies should be making inferences about your internal emotional state without consent and without your ability to contest those inferences. This goes beyond privacy; it’s about who gets to make claims about your inner life and whether you have say in how those claims are used.

Throughout my work on mental health and neurodiversity, I’ve come to value authenticity and self-determination. People with mental health conditions or neurodivergence often face systems that claim to measure their mental state and prescribe interventions based on those measurements. When those measurements are algorithmic and unreliable, the harm is amplified. Someone being told they’re stressed or anxious or disengaged based on emotion recognition might internalise that message even if it’s wrong, affecting their self-understanding and wellbeing.

What Should Be Done: Restrictions and Transparency

Given the weak scientific basis for emotion recognition combined with its potential for harm, several regulatory and policy approaches seem warranted. First, emotion recognition should be restricted in high-stakes contexts: hiring, criminal justice, education, healthcare. In these domains, the stakes are too high to rely on scientifically questionable inference systems. If these contexts demand assessment of emotional or psychological state, that assessment should be done by qualified professionals, not algorithms.

Second, where emotion recognition is permitted, it should be used only with informed consent and clear disclosure. Users should know they’re interacting with emotion recognition systems, should understand what inferences are being made about them, and should have ability to opt out without facing disadvantage. Currently, much emotion recognition is invisible to users, operating in the background of systems they’re using. Transparency is essential for enabling informed choice.

Third, there should be restrictions on the use of inferences from emotion recognition systems to make decisions about people. A system might detect something it labels as ‘low engagement’ in a student; that observation should not automatically trigger intervention. Systems should be decision aids at most, with human judgement and deliberation required before action. And critically, people should have ability to contest inferences made about them—to provide alternative explanations, to present evidence that the inference is wrong, to have their perspective on their own emotional state valued.

The Research Imperative: More Transparency and Scrutiny

What’s needed urgently is more research on emotion recognition’s actual reliability and utility in real-world settings. Most research comes from vendors or academic labs with financial interest in promoting the technology. Independent research evaluating whether emotion recognition systems actually achieve claimed benefits, whether they improve decision-making, and whether harms outweigh benefits is limited. Academic funding for critical evaluation of these systems should be substantially increased.

Additionally, emotion recognition systems should be subject to more rigorous scientific evaluation before deployment. The standards for proving that a system reliably measures what it claims to measure should be high, comparable to standards for medical devices or pharmaceuticals. Systems claiming to measure emotional state should provide evidence of validity across diverse populations, in real-world conditions, against alternative explanations. Currently, many systems are deployed with minimal validation.

Interdisciplinary collaboration between computer scientists, psychologists, neuroscientists, and ethicists is essential. The problems with emotion recognition don’t reduce to technical challenges; they’re grounded in fundamental questions about what emotion is and whether it can be objectively measured. Technologists alone cannot solve these questions.

Conclusion: Healthy Scepticism Toward Emotional AI

Emotion recognition AI represents a case study in how technological capability gets confused with scientific validity. Systems can technically analyse facial expressions and correlate them with emotion labels in training data. But this technical capability doesn’t mean the systems reliably measure emotion, that the inferences they make are valid, or that their deployment is ethically acceptable. Contemporary neuroscience increasingly suggests that emotions are not discrete categories with universal expressions, but constructed states varying across individuals and contexts. If this is true—and the evidence increasingly supports it—then algorithmic emotion recognition is fundamentally misconceived.

The risks of widespread emotion recognition deployment are significant: discrimination through the appearance of objectivity, surveillance extending to inference of internal psychological states, disruption of authentic human expression, and decision-making based on scientifically questionable inferences. Against these risks, the benefits are often modest: marginal improvements in customer service routing, or detection of disengagement that could be determined through simpler means.

The path forward requires healthy scepticism toward emotional AI marketing, demands for evidence before deployment, restrictions in high-stakes contexts, and transparency enabling informed choice. It requires recognising that just because technology can measure something doesn’t mean that measurement is valid or that the technology should be used. And it requires remembering that emotions are fundamentally personal—they can’t be read off the face or voice by an algorithm. They exist in the complex interior life of the person experiencing them, knowable reliably only by that person themselves.

Related reading: Developing Ethical Frameworks for AI Implementation, Improving Diagnostic Accuracy with AI Technologies and Legal Tech: How AI is Reshaping the Legal Industry.

You May Also Like


Discover more from Scott Dylan

Subscribe to get the latest posts sent to your email.

Written by
Scott Dylan