Title: Mutual Theory of Mind for Human-AI Communication in AI-Mediated Social Interaction

 

Date: September 12th, 2024

Time: 12 - 2:30pm ET

Location (Virtual): https://gatech.zoom.us/j/92885239983 

 

Qiaosi Wang (Chelsea)

Ph.D. Candidate in Human-Centered Computing

School of Interactive Computing

Georgia Institute of Technology

qswang@gatech.edu | http://qiaosiwang.me 

 

Committee:

Dr. Ashok K. Goel (Advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Munmun De Choudhury, School of Interactive Computing, Georgia Institute of Technology

Dr. Elizabeth N. Disalvo, School of Interactive Computing, Georgia Institute of Technology

Dr. Q. Vera Liao, FATE team, Microsoft Research

Dr. Lauren G. Wilcox, Responsible AI, eBay & Georgia Institute of Technology

 

Abstract

AI systems are being equipped with human-like social capabilities while serving different social roles as our assistants and partners. Some recent AI systems are said to have Theory of Mind (ToM)-like capability that advances their social adeptness. ToM is a basic social and cognitive human capability of attributing mental states such as beliefs, emotions, knowledge, plans, and goals to oneself and others based on behavioral or verbal cues. As these AI systems exhibit such advanced social capability, humans are increasingly uncertain about how they should perceive such AI systems’ social roles and capabilities. Thus, managing and accounting for human perceptions of AI systems performing at various social capacities becomes crucial in improving user experience and mitigating harms in human-AI communications. 

 

Inspired by people’s usage of their ToM capability in human-human communication to constantly recognize, monitor, and respond to others’ perceptions of them, this thesis posits the Mutual Theory of Mind (MToM) framework to enhance human-AI communication. The MToM framework aims to guide the design of human-AI communication by breaking down this iterative communication process into three analyzable stages: (1) ToM construction: AI’s construction of human’s interpretation, (2) ToM recognition: human’s recognition of AI’s interpretation, and (3) ToM revision: AI’s revision of its interpretation. Each MToM stage represents a ToM process of one party’s communication feedback shaping the other’s interpretation of how they are perceived by others. Following the MToM framework, this thesis reports on a series of empirical studies that provide design guidelines for AI systems that can account for human perceptions of AI during communications. These studies were conducted in the context of AI-mediated social interaction in large-scale learning environments, where AI systems are already leveraging their ToM-like capability to provide personalized social recommendations to socially isolated adult learners based on information inferred from their digital footprints.

 

This thesis begins by qualitatively examining the design requirements of AI’s social roles and capabilities in AI-mediated social interaction that can cater to adult learners’ current practices, challenges, and preferences in remote social interactions. The rest of the thesis empirically explores students’ perceptions of AI in AI-mediated social interaction at each stage of the MToM framework. At the ToM construction stage, I conducted a longitudinal survey study that established the feasibility for AI agents to construct students’ evolving perceptions of the AI by leveraging social cues embedded in students’ utterances to the AI. At the ToM recognition stage, I conducted a mixed-methods study and found that students continuously acquire knowledge from AI’s (mis)interpretations of them to shape their perceptions of AI, which can be inaccurate and harmful. At the ToM revision stage, I conducted a mixed-factorial vignette experiment and found that AI’s revision and communication of its misinterpretations can effectively mitigate students’ negative perceptions of AI after encountering AI misinterpretations. 

 

Overall, this dissertation makes theoretical, design, and empirical contributions to the fields of human-AI interaction, computer-supported cooperative work, and responsible AI. This work provides theoretical guidance, rich empirical descriptions, and actionable design implications for the next generation of AI systems that can continuously construct, recognize, and respond to human perceptions of AI in human-AI communication.