Panel: The High-Tech Mediation of Social Interaction Ron Cole, Oregon Graduate Institute, Portland, OR

Rusel DeMaria and Alex Uttermann, DeMaria Studios, Boulder Creek, CA

Sam Tucker, WebActive, a project of Progressive Networks, Seattle, WA

Rolland Waters, CTO, RTime Inc., Seattle, WA

 

Ron Cole: Cyberspeech: Passport to Cyberspace

Ron Cole, Stephen Sutton, Mark Fanty, Ed Kaiser, Johan Schalkwyk,

Jacques de Villiers, Andrew Cronk, Don Colton

Center for Spoken Language Understanding

Oregon Graduate Institute of Science & Technology

P.O.Box 91000, Portland, OR 97291

 

Cyberspace: An Exclusive Community

 

Of the 80 million households in America, about 97% have televisions, about 96% have telephones, and about 11% have access to the Internet. What barriers are preventing 70 million households from participating in cyberspace, and how can these be barriers be overcome?

This article talks about how advances in human language technology can help overcome some of the barriers that prevent community participation in cyberspace. Human language technology refers to the set of technologies, such as speech recognition and speech synthesis that are used to create spoken language systems--systems that allow people to communicate with machines using speech.

A significant advantage of using speech as an interface modality is that it can be transmitted by existing communications networks using common and inexpensive devices such as telephones and televisions. Today, use of the internet is limited to people with access to computers and the skills to use them. These requirements exclude a great many Americans: computers are too expensive for many of us to own, and about one third of our citizens are functionally illiterate (National Center for Education Statistics, 1993). In the future, computers are unlikely to be the major appliance for accessing the National Information Infrastructure; Telephones, cellular phones, televisions connected to cable networks and inexpensive information applicances are likely to become the preferred means of access.

To be sure, speech technology cannot by itself satisfy all needs. For instance, many people are unable to speak and/or hear. Some kinds of information, such as images, are not in a form that can be readily conveyed using speech. Nevertheless, the vast majority of people in the United States speak and understand a language, and speech is an obvious means for them to access information. As spoken language technologies mature, we can imagine spoken language systems performing as cooperative agents, not unlike helpful human operators, to support a wide variety of transactions.

Center for Spoken Language Understanding

Our goal at the Center for Spoken Language Understanding is to develop new and vastly improved tools for human computer interaction that will enable people to learn, create, publish and communicate more effectively using their natural, available communication skills. It is our hope that the continued development and widespread distribution of free tools for creating and using more "human-centered" systems will help ordinary citizens to participate in an information society.

To achieve our goals, we have developed the CSLU Toolkit, an integrated set of software tools and technologies that represents the state of the art in tools for research, development and learning about spoken language systems. Through partnerships created during the past year, we are extending the capabilities of the toolkit to allow creation of more powerful ``human-centered'' systems that incorporate research advances in spoken language understanding text-to-speech synthesis, dialogue modeling, and the production of synthetic speech by realistic talking faces. In addition to developing new technologies and incorporating them into the toolkit, we are working with our partners to determine how to design and apply human-centered systems to diverse populations of learners.

If our work is successful, we will (a) put tools that support easy development and use of human-centered systems into the hands of millions of users; (b) bring the study of human language technology and spoken language systems into high schools, undergraduate schools and graduate programs throughout the United States; and (as a result of these efforts), (c) help provide universal access to the National Information Infrastructure for ordinary citizens using speech and other available natural communication skills.

Tools for Change

The CSLU toolkit for spoken language systems is many things. It is a collection of first-rate spoken language technology which is constantly improving. It is a software environment which provides a consistent, powerful interface to this technology, thereby creating power tools for researchers and developers. It is a learning environment for these technologies, with manuals, on-line man pages, tutorial examples and short-course curriculum materials. It is an end-user environment for non-experts with intuitive, graphical system-building tools built on top of the underlying power tools, allowing naive users to create useful applications with little training. It is freely available for non-commercial use (including source code) via the CSLU Web site. It is a commercial tool with no per-use license for member companies of CSLU. The toolkit, then, provides a means for a community of users to learn, research and develop, to create and transfer technology and to share applications (Sutton, et al., 1996).

The toolkit includes a set of library modules that implement common and essential algorithms including the components for neural network and continuous HMM speech recognition, multi-pass decoding with N-best and word lattice generation, text-to-speech synthesis and telephone/microphone control. Pre-built recognizers are provided for general English, digits and alphabet (including a directory search that can handle hundreds of thousands of names). The graphical dialogue design tool has an in-built repair mechanism for handling miscommunications.

The toolkit has been in constant use for research and development of spoken language systems at CSLU since January, 1996. It is in daily use in industry (e.g., SW Bell) to develop and test spoken language interfaces for real-world applications. It is being translated to Spanish and Portuguese through our collaborations with universities in Mexico and Brazil. Also, the toolkit is being ported to Windows-95 and Windows-NT. This will enable telephone access to the Internet from PC platforms without any special-purpose hardware (since speech input/output will be handled using a standard data/fax/voice modem).

More details on the toolkit can be found at http://www.cse.ogi.edu/CSLU/toolkit /toolkit.html.

Tools for Learning

A major focus of our work is to provide more effective tools for learning within the CSLU Toolkit. We have received grants from the National Science Foundation and Intel to design and offer courses in which people use the toolkit to create and learn new things. Initial indications are that high school students and teachers are widely enthusiastic about the things they can do with the toolkit (Colton, et al., 1996).

In October, 1996, we offered a three day short course to students and teachers from ten local high schools in Oregon. During the first half of the course, students learned the basic functions of the toolkit's authoring tools to create systems of increasing complexity. The assignments walked participants through building increasingly more complex systems, such as pizza ordering, telephone polling and voice-controlled web browsing. During this phase of the course, the participants gained insights into a range of technical processes, including speech recognition fundmentals and dialogue design issues such as, how to phrase prompts, how to select and model the recognition vocabulary, how to build grammars to constrain the recognition process, and how to detect and recover from miscommunications.

During the final part of the course, students had the opportunity to work on projects of their own choice, ending with demonstrations of their working spoken language systems. This was an opportunity for students to apply what they'd learned in a creative manner and to use speech in a way that might enhance the performance of ordinary tasks. An array of prototype telephone-based applications were built and successfully demonstrated, including the following systems:

 

The students and teachers who took the course were unanimous in their enthusiasm and praise. One student called it the best educational experience of his life. The students became so excited about working with the tools that they only allowed the teachers to use the computers when they left on breaks.

What is impressive about these projects is not just their diversity, but also the fact that they were accomplished in a short amount of time (typically less that eight hours) and with only limited amount of training (typically as part of a three-day course).

Based on the success of this course, a computer running the toolkit with telephone and Internet access, is being provided to each of the ten high school districts in Washington County, Oregon. Each high school will allow students to build spoken language systems to meet required educational goals (e.g. instead of a written report).

Future Courses

In the coming months we plan to run more courses, including:

Partners for Change

To truly serve the community, it is necessary to extend the capabilities of the toolkit to serve diverse populations of learners and users. To this end, we have developed partnerships with researchers and educators to improve the technologies and incorporate new ones (like talking faces for language training), and to engage diverse populations of learners in a participatory design exercise that will guide the toolkit's development.

These partnering activities are being supported in part by a planning grant from the National Science Foundation that will result in a proposal to the NSF to support a Center for Learning and Language to achieve the goals outlined in this article. We list here some of the partners who have agreed to work together to help create a cyber community that includes all of us.

NWRESD. In her role as manager of research and development for the NorthWest Regional Educational Service District, serving 159 K-12 schools with over 79,000 students, Dr. Dvenna Carlson has provided schools with over 1,000 donated computers. She has been key in providing access, expertise and vision in promoting the introduction and use of human-centered systems into classrooms and special education activities.

NWRESD will play a leading role in working with our partners to design and conduct basic research with a variety of populations on the role of spoken language systems in learning and creating. The results of this research will be used to design applied research projects which integrate the toolkit into the daily curriculum of the schools. Such applied research projects may include using interactive presentations as alternative assessment instruments to demonstrate knowledge of social studies and science, studying the effectiveness of the toolkit as a vehicle for students to become active designers of curriculum (rather than passive receivers), designing language training systems with improved visual displays of the articulators to aid speech therapists help students with severe hearing impairments produce understandable speech, and making the design of spoken language interfaces using the toolkit possible by voice alone.

UC Santa Cruz, Perceptual Science Laboratory. Professor Dominic Massaro is a leading theorist and researcher on how information is combined and understood. In order to achieve precise experimental control of facial movements and expressions during speech, Dom and his colleague Dr. Michael Cohen developed ``Baldi,'' probably the world's most accurate talking face (licensed by AT&T and others). Dr. Massaro's laboratory will work to make Baldy even more accurate and expressive, to integrate Baldy into the toolkit, and to conduct research to discover how, when and where to use talking faces in human-centered systems for language training and other applications.

Saturday Academy. Dr. Gail Whitney is director of Saturday Academy, a mentoring and outreach program recently identified by an NSF team as one of the top mentoring programs in the United States (for which Dr. Whitney received an award at the White House.) Dr. Whitney has been an innovative leader in developing programs in which middle school and high school students interact with and learn from leading researchers and admirable role models. Saturday Academy will administer and participate in all of the educational activities of the Center.

The Carnegie Mellon Children's School. Dr Sharon Carver, director of the CMU children's school, and a leading researcher in instructional technology, will collaborate on research to determine how spoken language systems can be used to support learning with three to five year old children. We believe that most educational software available today for young children can be improved dramatically with spoken language systems.

Center for Children and Technology. Dr. Babette Moeller of CCT will collaborate with the technology developers and educators to design and evaluate programs that use human-centered systems at various educational sites already working with CCT, including inner schools in New York City, a school for the deaf, and Native American programs. Dr. Moeller has extensive experience in educational research applying principles of participatory design to the introduction and evaluation of instructional technology in diverse learning populations.

Seattle Community Network. Dr. Yvonne Chen, Manager of the Central Library, and Ms. Aki Namioka of the Seattle Community Network (also President of Computer Professionals for Social Responsibility) will collaborate to provide spoken language interfaces for various applications in the Seattle Community Network (e.g., allowing users to create voice-enabled Web sites) and will work with us to train users on-site to develop their own systems using the CSLU Toolkit.

Tucker Maxon School for the Deaf. The Tucker Maxon school works with about 75 profoundly deaf children, of whom about one third have cochlear implants. (Tucker Maxon was the first school in the U.S. to work with a child with a cochlear implant.) We will work with the school's teachers, speech pathologist and students to use spoken language systems in various language training exercises, including using Baldi to display the desired positions and movements of the articulators.

University of Arkansas at Little Rock. Dr. Philipos Loizou will work with the toolkit to develop and evaluate language training procedures and materials for profoundly deaf children, including groups of congenitally deaf children who have received cochlear implants. Dr. Loizou will also work to place the toolkit into high schools throughout the Little Rock area.

University of Edinburgh. Dr. Paul Taylor and Dr. Alan Black are collaborating with Dr. Michael Macon at CSLU and Drs. Massaro and Cohen at UCSC to improve text-to-speech synthesis and explore its use with talking faces in the CSLU Toolkit.

Undergraduate Testing Sites. Dr. David Paulson at Evergreen State University in Washington, Dr. Radhakrishnan Srikanth at Clark Atlanta University (historically a black university), and Dr. Richard Alo, Chairman of the Mathematics and Computer Science Department at the University of Houston (predominantly a Hispanic minority institution) will work with us to integrate the toolkit into undergraduate programs.

Intel. Intel has supported the formation of the proposed Center through grants for course development and equipment, and by providing free consulting and software to CSLU. Intel is especially excited about applications involving learning and teaching that use computer networks, since their ProShare software allows teachers and students to communicate using video and speech over PCs over a network. Intel is interested in disseminating the Windows version of the CSLU Toolkit via its Web site.

The success of the short course for high school teachers and students, supported by grants from Intel and NSF, was a key factor in Intel's decision to grant our request for an equipment donation for a state of the art computer teaching laboratory at OGI. The laboratory, which will come on-line on March 1, 1997, contains 19 Pentium Pro platforms running the toolkit, with telephone and Internet access to each computer.

References

Colton, D., Cole, R., Novick, D., Sutton, S. ``A Laboratory Course for Designing and Testing Spoken Dialogue Systems," Proceedings of the 1996 International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, 1129-1132, May, 1996.

 

National Center for Education Statistics, ``Adult Literacy in America", U.S. Department of Education, technical report no. GPO 065-000-00588-3, U.S. Government Printing Office, Washington, DC, September, 1993.

 

Sutton, S., Novick, D.G., Cole, R., Vermeulen, P., de Villiers, J., Schalkwyk, J. Fanty, M., ``Building 10,000 Spoken-Dialogue Systems", Proceedings of the 1996 International Conference on Spoken Language Processing, Philadelphia, PA, 709-712, October, 1996

About this Document…

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

 

Rusel DeMaria: High Tech Mediation of Social Interaction

I write about games. I analyze games. I design games. I have within me a finely honed barometer for fun. It's like the particular palate of a wine taster. I can sip a game of any particular vintage, swish around in it for a while, and spit it out with a pronouncement-- fun... or not fun. And explain the subtleties, the nuances, the gradations of why it is, or isn't, working.

OK, so it's not a talent that ranks with predicting the stock market or laying on of hands, but it is a particular ability and I try to use it wisely. How does that bring me here to speak about responsibility in the computer world or the "High Tech Mediation of Social Interaction," which, to a gamer, is quite a mouthful?

Surprisingly enough, I've given a lot of thought to both the issues of responsibility in gaming and how computers affect social interaction. I thought about it when games were mostly played on console machines and there was no popular internet. Now, with the online world becoming the next brave new world, it's imperative to consider questions of social responsibility, to consider the effect that content has on the world at large and individuals specifically.

When I began to think about it, I realized that spreadsheets, databases, and word processors - all venerable and distinguished members of the computer applications union (or would be if there were one), have no inherent responsibility in them. Like a gun, responsibility lies in the hands of the wielder. An improperly used spreadsheet could cause all kinds of harm, and I can think of several scenarios in which a database could be used to further evil intentions. And we need not speak of word processors, which daily are used to turn out the most odious text. And yet, you would not think in terms of responsibility when referring to these applications.

Games are different. Games are, in a weird way, like poetry or, if you will, novels or movies. They are ostensibly designed to entertain. In reality, they are designed to sell... that is, to make money. But cynicism aside, they are in fact meant to entertain and they can be designed responsibly or irresponsibly. I would generally categorize games into three rough categories: irresponsible, neutral, and responsible... and in most cases, responsible equals boring! And why is that so?

Remember what I said earlier about my nose for fun? Well, it doesn't perk up in contact with most games that are currently considered "responsible." (And even games that I consider primarily responsible sometimes do irresponsible things. I'll give you an example in a moment.) But first, what do we mean when we use the words "computer game" and "responsible" in the same sentence - a rare enough occurrence to be sure?

I think we generally mean "educational" the sure killer epithet for most games. To say, "It's educational," translates to most members of the younger generations, "It's deadly dull; it sucks, these guys want to cram a lot of useless crap down our throats and put a few absolutely lame game elements around it."

There are exceptions. I'll mention a few: Carmen Sandiego, Putt Putt and its sequels, Living Books, Sim City, Rocky's Boot, Widget's Workshop, Incredible Machine... although some of these games are not, technically speaking, games. They're more like activities. But they all are fun, imaginative, and teach without being overly obvious about it in most cases, or by being clever enough to make the learning part intriguing enough that players will want to continue on in the experience.

I could go on and on about this, but I want to get on to more interesting matters. I'll fulfill my promise first and mention an irresponsible moment in an otherwise responsible game. What I have in mind are the various disasters placed in Sim City. Earthquakes might be all right, they're natural, they do happen - but Godzilla? Let's get real... ! And yet, my nose perks up at that. It's fun. I mean, if a kid wants to build a great city and then watch Godzilla tromp all over it, why not? That's not so bad. It's an analog to the fate that befell my toy soldiers many, many times.

The important point is that the kid built the city. Not that he or she had some destructive fun afterward. That child is ready to be mayor of any small community. Well... maybe not quite, but you get my point. He or she learned a lot playing the game, even if all along the plan was to sic a ninety-foot dinosaur on it.

And I think that gets me to one of my points, finally. It has to do with how you introduce teaching in games. And whether there's anything you simply can't teach in a computer.

First, I think you can impart a wide variety of ideas, behaviors, concepts, facts, and multitudes of mimes through a computer. The trick is to do it so that players on the other end want to be learning. An easy sell, for example, is a program to teach a foreign language-people wouldn't buy it if they didn't already know they wanted to learn to speak that language. But think about history. I really used to think history one of the most deadly dull subjects in school, right next to geometry. Looking back, I wonder if I didn't actually have deadly dull history teachers; certainly my geometry teacher would have put a speed freak to sleep.

In recent years, I stumbled across a geometry program that brought the subject alive to me. So, I surmise, if I had had history teachers who made the subject more vibrant... what would have happened? Who knows? But I DO know this: After having played a few good historically-based games, and then having researched several projects based in history, I discovered on my own that human history is not only fascinating, but rich, vibrant, woven with human stories, and simply way cool.

In fact, even in school, my favorite brush with history took place after class at a friend's home playing the board game Gettysburg. Go figure. A game. And I knew a lot more about the Battle of Gettysburg than I ever knew about anything else in American history. I knew the generals, the hills, the strategies. I lived that battle.

My point? Well, it should be obvious by now. What child or adolescent wouldn't learn about flying a Spitfire over England fighting off the German Messershmitts... if that kid could be at the controls, making the decisions that control the outcome of the situation?

I'm actually talking about a new way of learning. An interactive approach that has something to do with a second standard, something I learned from a progressive industrialist. The second standard is a way of introducing a whole new system of doing something without invalidating the old. The story involves some cow breeders a few decades ago who wanted to increase the butterfat content in the milk their cows gave. They couldn't simply say to all their member breeders, "Your cows will, henceforth, give one percent more butterfat. We have spoken." What they did do, however, was to introduce a secondary breeding program whose purpose was to raise butterfat content gradually. Within a short time in cow years, they had succeeded in producing cows that did, indeed, deliver one percent more butterfat in their milk. Then, slowly, this second breed, or second standard, became the norm.

How do we do this with computer gaming? Simple. First you make the game fun. You make sure you understand what is fun content, by today's standards. Then, into that mix, you introduce ideas, facts, various models of social interaction, whatever it is that you REALLY want to communicate. People absorb a lot when they're having fun and are highly invested. If they're struggling or bored, they absorb much less and they care much less. My partner Alex and I are already embarked upon the road to designing games we know, first and foremost, will appeal to people who want to have fun. Perhaps our games aren't one hundred percent perfect. But they do have ideas in them. They have historical accuracy woven into them. Or they may even model some social behavior that we, in our dubious wisdom, consider appropriate.

Ultimately, responsibility begins and ends with the individual, but, if I can recognize that kids will flock to the latest fighting game, and I can be clever enough to find a way to create such a game, AND imbue it with some other content that players will notice peripherally, I may bring them closer to learning in general terms - or to some specific message that's important to me.

In my mind, you attract more flies with honey than with vinegar, and knowing what constitutes honey for a target audience is vital. Once you have an engaged, interested audience, all that's left is to provide valuable, meaningful content - a topic for another discussion at a later date, perhaps…

 

Alex Uttermann: The Meta-View: Computer Gaming & 3-d Graphic Worlds Online, or, How I Spent My Youth Practicing for This Moment

The subject of gaming-based and role-playing based interactions have always fascinated me. When I was a child, my family used to engage in these high-speed, lightning-powered Monopoly games that provided some of the most giddy, hilarious and downright fun experiences I've ever had. The fun wasn't in the game itself, although certainly that was entertaining enough (plus, I always lost early on but still stayed around to hear my older brothers and sisters duke it out for domination of the high-rent districts). It had more to do with the shared focus of a group of diverse people. And with the energy they brought to our dining room table, suddenly transformed into the property deed office, the railroad stations, and of course, the dreaded bank.

Despite official records declaring otherwise, I was a backgammon major in college. Certainly I spent more contiguous hours engaged in rolling dice in the cafe, then in other pursuits. Again, it wasn't the game itself - the fun was in the community of people with whom I played backgammon. Certainly I was ripe to develop into a Dungeons & Dragons player, and graduate from that fun into other role-playing experiences like Traveler and Diplomacy.

As a computer game designer and a writer, I want to be able to recreate a similar spirit in the games I produce. I've spent a good deal of time trying to remember just what was so compelling about those experiences. Was it the props, like the intricate sets of dice, the leathery smell of a backgammon board, the pot smoke in the air behind the Dungeon Master?

Partly. & certainly the flights of imagination that we took were another part of the picture. But I keep coming back to the same central point -- what was really amazing and magical about those games was the total unpredictability of them, as expressed by the rather chaotic group of individuals who came together to participate in them. The human element. The kibitzing, the banter, the gossip. The verbal equivalent of a butterfly in Tokyo flapping its wings until tidal waves appear in another part of the world.

Computer role-playing games, although somewhat diverting, don't leave me with the same inner giddy whirlwind. I'm not constantly waiting for the characters to do or say something impulsive and silly and thoroughly unexpected. They're pre-scripted, pre-rendered characters on a screen, fer goshsakes, and they're always going to do and say what the programmer and the writer of the game thought they ought to. They may be clever enough, but I'm more likely to lose interest and read a book, or sign online.

Online, well, that's a totally different experience. It's back to the human factor, the great unknown, the goofy whirl of Chaos. At first, I peeped into many a chat room, trying to get a handle on this type of communication. I got bored pretty quickly because the same thing that I'd responded to as a child -- shared focus -- just wasn't anywhere in sight. (With the notable exception of self-help rooms. But those are hardly entertaining, in the strict sense.)

On the other hand, I spent a lot of time in those rooms out of fascination. Not having been online during the early days of BBSs, I discovered that it was FUN to type words on a screen and see others respond. Clearly, millions of people think this is fun, too - and for many, it's more than just the sheer novelty of the experience.

I heard a lot of arguments - well, okay, I participated in them, too - about politics, the role of the net in global society, and so on. I wrote a proposal for a book of essays from prominent commentators, activists, celebrities, cryptographers, sci-fi writers, privacy advocates and government voices (a healthy cross-section of netizens). Something was happening on the net, something of Gargantuan import. And it had one of my middle names, Diversity, written all over it.

Not a MUD or MOO aficionado, I got interested in graphic 3-d worlds online. Looking over my partner's shoulder, watching these games and new worlds, has never been so fascinating. Although I've been designing single player CD-ROM games for the last three years, no medium has caught my imagination more than online gaming world development. It's a return to my dining room table, covered in paper money mounds and scattered property deeds. Only the pieces are nicely & neatly kept in an inventory, and 1s and 0s underscore the basic mechanics. Game design to me has become part sociology, part playground monitor, part entertainment coordinator (a la cruise ships), part encounter group facilitator, part live improv theatrical direction. It's about creating an experience that is as rich in social interaction as it is in challenging game play. Using technology as a delivery means for social tools, it's also about empowerment, about giving people from all over the world the means with which to create their own entertainment. To determine their own course, to expand their own communication skills. To play, in the way that other animals do, and as our children do, as a way of trying out & practicing modes of living. To widen their worlds, and words, and scope. I'm not going to stretch the metaphor to the point at which world peace is achievable through computer gaming. But I would suggest that if my brothers & sisters & I had had the means to expand our own on-going Monopoly games from a basis of spontaneous role-playing - well, the real estate industry today would be a startlingly different place . . .! Well, wouldn't it?

Back To Community Space & Cyberspace Proceedings Contents