Siri

A Note on the Science and Technology Behind SMART

The EST/Sloan Project is committed to “challenge and broaden the public’s understanding of science and technology and their impact on our lives.” In that spirit, we offer this essay on the science and technology behind SMART by Mary Elizabeth Hamilton, the 2023 EST/Sloan mainstage production. SMART began previews on March 30 and runs through April 23. You can purchase tickets here.

Voice-Activated AI: Mixing Convenience with Risk

By Rich Kelley, Science Press Liaison

“A lot of cutting-edge AI has filtered into general applications, often without being called AI, because once something becomes useful enough and common enough, it’s not labeled AI anymore.”—Nick Bostrom

“If something is free, you’re the product.” —Richard Serra, 1973

Mechanical devices that talk to us have a long and storied history. In 1589, Robert Greene’s play Friar Bacon and Friar Bungay depicted the “Brazen Head” reportedly invented by the 13th-century Franciscan friar and philosopher Roger Bacon.

A woodblock engraving of Miles (the assistant to the friars) playing the tambour while friars Bacon and Bungay sleep and the Brazen Head finally speaks. From the 1630 edition of Robert Greene's The Honorable Historie of Frier Bacon, and Frier Bongay.

In the play, Bacon and his fellow friar build a large brass head that they hope will speak and reveal to them the secrets of the universe. It takes them seven years. Then, having watched the head night and day for two months waiting for it to speak, Bacon falls asleep and never hears the head’s mysterious oration: “Time is. Time was. Time is past.” After which the head explodes.

So even in our earliest imaginings smart devices failed to live up to expectations.

Today we are most familiar with chatbots in their incarnation in the voice-driven digital assistants we find in Amazon’s Alexa, Apple’s Siri, Google Assistant, Microsoft’s Cortana, and Samsung’s Bixby. While they are all voice-activated, their features vary. As primarily virtual assistants for phones, Siri and Bixby are more integrated into their phone’s ecosystem: sending messages, making phone calls, setting reminders. Alexa, Cortana, and Google Assistant are more focused on smart home control and home devices like lights, locks, and thermostats.

These devices have enabled the corporations behind them to use the data they have been collecting on customers’ transactions, web searches, and browser data to make conversational chitchat about the weather, sports scores, and what the listener should buy next. According to Business Insider as of 11/22, Alexa is third in the voice-assistant wars with Google Assistant at 81.5 million users, Apple’s Siri at 77.6 million, and Alexa at 71.6 million.

Is Anyone Listening?

In 2019, Bloomberg reported that Amazon uses thousands of contractors and full-time Amazon employees in outposts from Boston to Costa Rica, India, and Romania to listen to voice recordings captured in Alexa owners’ homes and offices. The teams then transcribe, annotate and feed back those recordings into the software to help improve Alexa’s understanding of human speech and to help it respond to user requests.  These listeners work nine hours a day and can parse as many as 1,000 audio clips a day.

Amazon Echo unpacked (2105) (Photo: Brewbooks/ CC 2..0)

Alexa software is designed to record snatches of audio continuously, listening for a “wake” word, “Alexa” by default for Alexa. “Hey, Google” for Google Home. “Siri” for Apple’s Siri. When Alexa detects the wake word, the light ring at the top of the Echo turns blue indicating that the device has started recording and is sending a command to Amazon’s servers.

But sometimes Alexa begins recording without any prompt at all. One interviewee said the auditors can transcribe as many as 100 recordings a day when Alexa receives no wake command or an accident triggers the recording:

“Occasionally, the listeners pick up things Echo owners likely would rather stay private: a woman singing badly off key in the shower, or a child screaming for help.”

Two workers interviewed in Romania said they picked up what they believe was a sexual assault. After requesting guidance, they were told it wasn’t Amazon’s job to interfere.

Recordings sent to the Alexa auditors don’t include a user’s full name and address but do include an account number, the user’s first name, and the device’s serial number. Apple’s Siri also uses human auditors. According to an Apple white paper, the recordings lack personally identifiable information and are stored for six months tied to a random identifier. Google also employs reviewers of audio snippets from its Google Assistant, but the company says, they are not associated with any personal identifiable information and the audio is distorted.

Enter ChatGPT

ChatGPT, released by the AI research company OpenAi in November 2022, uses the large language model — millions of human-created texts available online — to produce answers based on which word it considers most likely to come next in a human response. As prominent computer scientist Stephen Wolfram explains in “What Is ChatGPT Doing … and Why Does It Work?

“ . . . at each step it gets a list of words with probabilities. But which one should it actually pick to add to the essay (or whatever) that it’s writing? One might think it should be the ‘highest-ranked’ word (i.e. the one to which the highest ‘probability’ was assigned). But this is where a bit of voodoo begins to creep in. Because for some reason if we always pick the highest-ranked word, we’ll typically get a very ‘flat’ essay, that never seems to ‘show any creativity.’ But if sometimes (at random) we pick lower-ranked words, we get a ‘more interesting’ essay. The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time.” 

In early March, OpenAI released ChatGPT-4, representing a quantum improvement over ChatGPT-3.5. Where ChatGPT-3.5 scored in the tenth percentile on the Uniform Bar Exam law students must pass to practice legally, ChatGPT-4 scored 298 out of 400, the 90th percentile.

The Risks of Home Devices

Connectivity has its costs. Not only is there the risk, recounted above, of smart devices recording conversations without having heard the “wake” word, there is also risk because the cloud is hackable. Cloud-based gadgets can be vulnerable to hacking since not all data transmitted over the web is encrypted. Most people secure their networks with weak passwords, making them vulnerable to hacking.

Since your home network is likely to have all your personal and banking information, that information is also vulnerable. Smart home devices are connected to a Global Positioning System (GPS) that automatically identifies the location of your home. If someone steals this information, your identity is at risk.

As AI and natural language processing technology continue to advance, smart devices will become even more sophisticated and capable of handling complex tasks. Smart consumers need to decide what tradeoffs of personal risk they want to make for the additional convenience.