A computerized scale for monitoring levels of agreement during a conversation

Back channels and acknowledgment tokens represent some of the most salient and frequent words uttered by speakers of American English. The degree to which a person agrees with or acknowledges the utterance of their interlocutors is manifested in an elaborate feedback system of vocalizations. These vocalizations, when studied systematically, can shed light on the way in which speakers negotiate the expression of their opinion within the politeness and speech act constraints. In psychotherapy conversations, the demand characteristics of the situation emphasizes honesty over politeness, and therefore the pattern of agreement and disagreement of the patientís speech could be related more reliably to their interpersonal characteristics (and problems thereof), and their personality in general.

In the current study, an attempt was made to construct a computerized measure that could evaluate tokens of consent and acknowledgment and calculate the level of agreement that the speaker has verbalized in a particular turn. Dictionaries of back-channel and acknowledgment tokens were compiled by several research groups, but very little attention was given to the relative strength or impact that these tokens have on the level of agreement that the speaker conveys. A back-channel token such as "exactly" confers more strength or commitment to oneís agreement than "I guess", and yet both of them are usually taken to denote an agreement.

To test empirically the relationship between the various acknowledgment tokens and their contribution to the level of expressed agreement, a corpus of over 4000 conversation turns, taken from transcripts of psychotherapy hours, was constructed. The corpus consisted of approximately 50 hours of conversation, spoken by 14 patient-therapist dyads (all patients and therapists were native speakers of American English). The basic data unit of the corpus was a couplet containing the therapist utterance and the subsequent response of the patient. All turns were randomly ordered to prevent a halo or carry-over effect, and the corpus was divided into two equal parts for consistency and reliability check.

A group of 4 graduate students, all native speakers of American English, rated each turn on a Likert-scale ranging from 0 (utter disagreement) through 3 (indifference or lack of direct response to the therapist) to 5 (unqualified agreement). The relatively high number of raters was needed to control against the natural variability in human understanding of the level of agreement in an utterance. Despite this variability, the inter-rater reliability between the raters was high (kappa=0.76).

Text-analysis was then performed on the corpus, based on a list of known agreement markers and frequency analysis of repetitious words. The task of the raters was to assess to what degree did the patient agree with the therapist, to the best of their knowledge as speakers of American English.

A regression-based analysis revealed a highly-significant, systematic and consistent pattern of agreement and disagreement markers, in which each agreement or disagreement marker contributed differentially to the overall agreement level. The overall agreement level was operationalized as the average of the human ratings, and the "strength" or impact of each token on the agreement level was operationalized as the calculated regression coefficient (standardized beta) of the token in the model. Thus, tokens like "yeah" and "mm-hm" were strongly correlated with the human ratings of agreement, and hence contributed significantly to the agreement level. Less committal tokens as "mm" or the phrase "I guess", contributed positively to the agreement level, albeit less significantly, and words like "no" or "actually" were negatively correlated with the agreement ratings, and their impact on the agreement level was highly significant and negative, as expected.

The predicted agreement, calculated by the computer based on the regression coefficients of each token, correlated highly with the human ratings of the overall agreement level (r=0.83). A correlation of such magnitude can attest to the adequacy of the computerized model to assess reliably the level of agreement and acknowledgment in a transcript of a natural conversation, and opens the way to a larger-scale search for agreement patterns in corpora of conversational data.

The paper will also discuss the implications of this scale for the empirical study of discourse processes. An emphasis will be put on the statistical analysis of patterns of agreement and their vicissitudes, particularly within various conversation topics. The scale is currently employed in a research paradigm that studies the correspondence between discourse style and the interpersonal characteristics of patients in psychotherapy.