Human-Robot Interaction
Systematic review on making robots funny through AI and performative techniques
Can robots be funny? This systematic review synthesized six empirical studies on comedy robots performing for human audiences. The review found that the most effective comedy robots combine natural language processing (NLP) for joke generation, reinforcement learning (RL) for audience adaptation, and human performative techniques like gaze coordination and gestural timing.
As social robots increasingly enter public-facing roles, humor becomes a critical social skill. But what makes a robot actually funny to a human audience? Existing comedy robots range from pre-scripted joke-tellers to AI-driven performers, with inconsistent results. This review aimed to identify what combination of technical and performative strategies produces the most positive audience response.
6
Studies meeting inclusion criteria
197+
Total participants across studies
p<0.05
Significant effects in all 6 studies
Search Strategy
Systematic search using UBC Library Summon across peer-reviewed HCI conferences and journals (IEEE, ACM). Boolean keywords: comedy OR humour OR joke AND audience AND robot.
Screening
Inclusion criteria: English papers from 2015-2021, empirical quantitative and qualitative studies with human participants evaluating comedy robot performance.
Data Extraction
Extracted methodology, robot platform, audience size, comedy type, AI techniques used, and statistical significance of findings from each study.
Synthesis
Compared approaches across studies - categorizing by joke generation method, adaptation strategy, and performative technique.
Critical Analysis
Identified gaps in existing research including sample size limitations, venue constraints, and unexplored modalities like voice tone.
Performative gaze and pointing gestures significantly affect audience response (p<0.01) - robots that look at and gesture toward audiences get better laughs
Manzai-style robots using NLP can generate original jokes from web news articles, producing routines rated as interesting and understandable
Reinforcement learning with social adaptation (gaze, prosody, smile detection) outperforms static or table-based approaches
Time adaptivity is critical - robots that adjust timing based on audience reactions are significantly funnier (p=0.001)
Street-style public performances attract more natural audience reactions than lab settings, but audiences often respond out of politeness
Monotonous robot voice remains the biggest gap - no study has successfully addressed vocal expressiveness in comedy delivery
50 participants - performative gaze + gestures
Katevas et al. (2015)
11 participants - NLP joke generation (Manzai)
Umetani et al. (2016)
30 participants - RL with social adaptation
Ritschel (2020)
24 participants - real-time RL + social signals
Weber et al. (2018)
10-20 participants - timing adaptivity
Vilk & Fitter (2020)
72 participants - street-style comedy
Swaminathan et al. (2021)
The ideal comedy robot
Combines NLP-based joke generation, reinforcement learning for real-time audience adaptation, performative gaze and gestures, and time-adaptive delivery. No single study achieved all four.
Lab vs. real world
Lab settings provide controlled conditions but may inflate positive responses. Street-style studies reveal that audiences often respond out of social politeness rather than genuine amusement.
The voice gap
Across all six studies, monotonous robot voice was identified as a major limitation. Future research should prioritize vocal expressiveness and prosody variation.
Small sample sizes
Most studies used fewer than 30 participants, limiting generalizability. Larger, more diverse audience samples are needed to validate these findings.