Communicating by voice in Second Life
For my final PhD study I’m looking at how the introduction of voice into Second Life in 2007 has impacted people’s experience of this virtual world. I’ve taken a kind of ethnographic approach, becoming a resident to understand the place and the people. I am also conducting more purposive data-gathering, such as interviewing residents. I conducted interviews both in Second Life and face-to-face depending on circumstances.
The situation in SL is more complex than that in game worlds, because while MMORPGs are socially complex, at least broadly speaking everyone is there for the same purpose. SL has no defined goals, and different kinds of people are doing very different things with it. So it’s hard to make a simple statement like “communicating using medium X has the following pros and cons for Second Life users” – because media effects will have different impacts on different types of use. I’ve come to the conclusion that to do this analysis I have to divide SL users into categories, then analyze the usefulness of each medium with respect to each category. That would produce a matrix with two dimensions: medium and user-type.
Step one is to decide on an appropriate set of user types.
A popular belief in the SL community is that there are two kinds of SL users: immersionists and augmentationists. I think there is some validity to this, but they might more usefully be seen as attitudes, or kinds of use, rather than kinds of users, so that an individual could switch between these attitudes according to situation. (See Henrik Bennetsen‘s essay on these types.)
Immersionists correspond roughly to the role-players in MMORPGs. They seek to maintain an online identity which is independent of their RL identity. For these people, the ability of virtual worlds to support identity play is an attractive feature of the technology. Sherry Turkle documented this practice a decade ago in her book “Life on the Screen”.
Augmentationists, on the other hand, see virtual worlds as communication media. They project their real-life identity into SL in order to engage in mediated interaction with other RL identities. For example, they might use SL to teach a class, or hold a business meeting. They are not playing a character (any more than in real life), but are being themselves.
In the debate over the introduction of voice into SL, it has been argued that immersionists should prefer the relative anonymity of communicating by text while augmentationists would embrace the efficiency and immediacy of voice. (See Tom Boellstorf, “Coming of age in Second Life” pp 112-6 for a discussion of immersionist attitudes to voice.)
I’m going to assert that the immersionist-augmentationist distinction divides the large number of virtual-world use-cases into two high-level categories. (To test this we could try to list the purposes to which SL is put, and see if they fit the categories.) A user’s purpose influences their style of use, and the relative utility of different communication media. There are many kinds of SL users doing many kinds of things. For example, I have interviewed academics who teach RL university classes in SL, business people who sell virtual goods, workers who use SL for job-related collaboration, artists who make virtual sculptures, people who use it as a place for cyber-sex, or to meet people for RL sex, people who like to chat with RL friends or SL friends, people who like to participate in large discussion groups, people who like building, and people who like looking at other people’s buildings. These examples only scratch the surface: a quick glance through Second Life’s lists of groups and events hints at the variety of uses of SL. Can all these examples can be straightforwardly contained within the broader imm/aug categories? Boellstorf suggests that “immersionist” and “augmentationist” might more usefully be thought of as ideal types, at the ends of a spectrum along which real users might be placed, or even move, according to mood and situation. I agree. I am also aware that these are contested categories. But even if the imm/aug dimension is dynamic and gradual rather than clear-cut and static, the popularity of the categorization suggests that it represents something significant about differences in the types of use people make of Second Life.
The other dimension of my analysis is communication medium. In virtual worlds currently, the two points on this axis are text and voice (though more are on the way). Potentially then I might have a simple 2×2 analysis matrix, with two styles of use (aug, imm) and two media (text, voice). This might not be much of a contribution: could anything be said other than that immersionists prefer text while augmentationists prefer voice? So instead I reformulate the media dimension.
Communication media themselves differ in a number of ways. In my literature review I synthesized previous media studies to produce eight dimensions along which media vary to affect the communication that is achieved using them (see ‘Voice vs Text’ post). These properties affect users’ choice of one medium over another in a given situation, and affect the communication subsequently carried out.
So I’m trying to figure out where immersionist and augmentationist users sit on each of these eight dimensions. Here’s what I think:

Placing the immersionists and augmentationists along the first three dimensions is relatively straightforward. Immersionists want their SL identities to be separate from RL identities. To do this, they should choose a communication medium that maximizes the social distance between themselves and other users, by transmitting the least information about the people behind the avatars, and which obscures, rather than transmits, real-life status cues. Augmentationists on the other hand want the mediated experience to be as much like “being there” as possible. These three properties seem to define what is it to be an immersionist or an augmentationist.
Preferences for the other media properties is less obvious.
I don’t predict that either user type should have a preference for either synchronous or asynchronous communication. The utility of asynchronicity is dependent on whether one’s correspondents are online or not, and this should be equally problematic for both immersionists and augmentationists. Does the act of storing a mesage for later transmission remind an immersionist that there is an offline world, and that one’s correspondents are not permanently in-world? Boellstorff in his ethnography discusses SL residents’ understanding that each avatar is driven by a person who is frequently away-from-keyboard, or just busy performing a task such as building, and unwilling to be disturbed. If augmentationists are comfortable with the occasionally-asynchronous nature of VW communication, I think it’s reasonable to assume that augmentationists have no particular problem with it.
By similar reasoning, both types of users should have about the same desire for the ability to store and search messages that they have received. Maybe this is especially useful in augmentationist use, such as teaching and business meetings.
Immersionists should be neutral with regard to a medium’s ability to support the coordination of group action. This is important for some activities and not for others. For immersionist activites such as role-play, real-time coordination is useful. In a fast-paced game like WoW it is often essential, and is probably the main reason voice is popular there. Augmentationists, whose primary goal in using a VW is to communicate in situations such as business meetings, education, and social events, would be expected to value group coordination and appreciate media such as voice that support it.
Neither type of user should desire that their communication medium be sensitive to whether people communicating already know each other. This is an impediment of voice. Both groups can be expected to desire that their communication medium be able to support simultaneous conversations. This is a desirable feature of a medium.
Argent said,
December 3, 2008 @ 6:02 am
I find that text supports group coordination MUCH better than voice, because with voice so much of the work of group coordination gets wasted on making sure that people aren’t talking over each other, or aren’t being indefinitely deferred by more vocal people. With text, you naturally output at a lower rate…. but your output can be easily organized and integrated into a document that is read at a much higher bandwidth than you can hear. So each person generates less output than they can accept as input… which is exactly what you want in a group situation.