Brown Engineering

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, 15 May 2012

Brian Reggiannini figures out who’s talking

Posted on 12:35 by Unknown
If computers could become ‘smart’ enough to recognize who is talking, that could allow them to produce real-time transcripts of meetings, courtroom proceedings, debates, and other important events. In the dissertation that will allow him to receive his Ph.D. at Commencement this year, Brian Reggiannini found a way to advance the state of the art for voice- and speaker-recognition.

Real-time tracking of who’s talking
With the right algorithms and signal processing software, an
array of button-size microphones placed around the perimeter
of a room can identify, follow, and record each of several
people as they move about, interrupt each other, and converse.

Credit: Frank Mullin/Brown University
Everyone does signal processing every day, even if we don’t call it that. With friends at a sports bar, we peer up at the TV to see the score, we turn our head toward the crashing sound when a waitress drops a glass, and perhaps most remarkably, we can track the fast-paced banter of all the people in our booth, even if we’ve never met some of the friends-of-friends who have insinuated themselves into the scene.

Very few of us, however, could ever get a computer to do anything like that. That’s why doing it well has earned Brian Reggiannini a Ph.D. at Brown and a career in the industry.

In his dissertation, Reggiannini managed to raise the bar for how well a computer connected to a roomful of microphones can keep track of who among a small group of speakers is talking. Further refined and combined with speech recognition, such a system could lead to instantaneous transcriptions of meetings, courtroom proceedings, or debates among, say, several rude political candidates who are prone to interrupt. It could help the deaf follow conversations in real-time.

If only it weren’t so hard to do.

Brian Reggiannini“We’re trying to teach a computer how to do something that we as humans do so naturally that we don’t even understand how we do it.”Brian Reggiannini
“We’re trying to teach a computer how to do something that we as humans do so naturally that we don’t even understand how we do it.”

But Reggiannini, who came to Brown as an undergraduate in 2003 and began building microphone arrays in the lab of Harvey Silverman, professor of engineering, in his junior year, was determined to advance the state of the art.
The specific challenge he set for himself was real-time tracking of who’s talking among at least a few people who are free to rove around a room. Hardware was not the issue. The test room on campus has 448 microphones all around the walls and he only used 96. That was enough to gather the kind of information that allows systems – think of your two ears – to locate the source of a sound.
The real rub was in devising the algorithms and, more abstractly, in realizing where his reasoning about the problem had to abandon the conventional wisdom.
Previous engineers who had tried something like this were on the right track. After all, there is only so much data available in situations like this. Some tried analyzing accents, pronunciation, word use, and cadence, but those are complex to track and require a lot of data. The simpler features are pitch, volume, and spectral statistics (a breakdown of a voice’s component waves and frequencies) of each speaker’s voice. Systems can also ascertain where a voice came from within the room.
Snippets, not speakers
But many attempts to build speaker identification systems (like the voice recognition in your personal computer) have relied on the idea that a computer could be extensively trained in “clean,” quiet conditions to learn a speaker’s voice in advance.
One of Reggiannini’s key insights was that just like a politician couldn’t possibly be primed to recognize every voter at a rally, it’s unrealistic to train a speaker-recognition system with the voice of everyone who could conceivably walk into a room.
Instead, Reggiannini sought to build a system that could learn to distinguish the voices of anyone within a session. It analyzes each new segment of speech and also notes the distinct physical position of individuals within the room. The system compares each new segment, or snippet, of what it hears to previous snippets. It then determines a statistical likelihood that the new snippet would have come from a speaker it has already identified as unique.
“Instead of modeling talkers, I’m going to instead model pairs of speech segments,” Reggiannini recalled.
A key characteristic of Reggiannini’s system is that it can work with very short snippets of speech. It doesn’t need full sentences to work at least somewhat well. That’s important because it’s realistic. People don’t speak in florid monologues. They speak in fractured conversations. No way! Yes, really.
People also are known to move around. For that reason position as inferred by the array of microphones can be only an intermittent asset. At any single moment in time, especially at the beginning of a session, position helpfully distinguishes each talker from every other (no two people can be in the same place at the same time), but when people stop talking and start walking, the system necessarily loses track of them until they speak again.
Reggiannini tested his system every step of the way. His experiments included just pitch analysis, just spectral analysis, a combination of the two, position alone, and a combination of the full speech analysis and position tracking. He subjected the system to a multitude of voices, sometimes male-only, sometimes female-only, and sometimes mixed. In every case, at least until the speech snippets became quite long, his system was better able to discriminate among talkers than two other standard approaches.
That said, the system sometimes is uncertain and in cases like that it defers assigning speech to a talker until it is more certain. Once it is, it goes back and labels the snippets accordingly.
It’s no surprise that the system would err, or hedge, here and there. Reggiannini’s test room was noisy. While some systems are fed very clean audio, the only major concessions that Reggiannini allowed himself were that speakers wouldn’t run or jump across the room and that only one would speak from the script at a time. The ability to filter individual voices out from within overlapping speech is perhaps the biggest remaining barrier between the system remaining a research project and becoming a commercial success.
A career in the field
While the ultimate fate of Reggiannini’s innovations is not yet clear, what is certain is that he has been able to embark on a career in the field he loves. Since leaving Brown last summer he’s been working as a digital signal processing engineer at Analog Devices in Norwood, Mass., which happens to be his hometown.
Reggiannini has yet to work on an audio project, but that’s fine with him. His interest is the signal processing, not sound per se. Instead he’s applied his expertise to challenges of heart monitoring and wireless communications.
“I’ve been jumping around applications but all the fundamental signal processing theory applies no matter what the signal is,” he said. “My background lets me work on a wide range of problems.”
After seven years and three degrees at Brown, Reggiannini was prepared to pursue his passion.

- by David Orenstein
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in Reggiannini, Silverman | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Brown Engineering Alumni H. David Hibbitt Ph.D. ’72 and Enrique Lavernia ’82 Elected to the National Academy of Engineering
    Brown University engineering alumni H. David Hibbitt Ph.D. ’72 and Enrique Lavernia ’82 have been elected to the National Academy of Enginee...
  • Indo-US Science and Technology Forum collaborates with Brown
    Created in March 2000, the Indo-US Science and Technology Forum (IUSSTF), established under an agreement between the Governments of India an...
  • Alumni Dr. George Thurston '73 will present on Asthma and Diesel Air Pollution
    Diesel Air Pollution and Asthma in New York City Presented by Dr. George D. Thurston '73 Dept. of Environmental Medicine NYU School of M...
  • Thomas Powers named Director of Graduate Programs for School of Engineering
    Professor Thomas Powers has been named the director of graduate programs at the School of Engineering at Brown University for the 2011-12 ac...
  • Nurmikko and Donoghue join U.S. BRAIN initiative
    Neuroscientist John Donoghue and engineer Arto Nurmikko were on hand at the White House Tuesday morning, April 2, as President Barack Obama ...
  • Freund honored by ASME for his contributions to materials engineering
    Lambert Ben Freund, Ph.D., the Henry Ledyard Goddard university processor and a professor of engineering at Brown University (Providence, R....
  • Device Replicates Complex Bird Songs
    A team of researchers, including Shreyas Mandre, have developed a simple rubber device that is able to replicate many different bird songs. ...
  • Meet the New Faculty: Jacob Rosenstein
    Biological sensors that detect currents at the nanoscale would have important clinical applications, but how to separate signal from noise w...
  • Brown Engineering Alumna Jeanie Ward-Waller ’04 Bicycling Across the Country for Safe Routes
    Jeanie Ward-Waller ’04, a Brown University civil engineering alumna, is bicycling across the country as part of an advocacy campaign to rais...
  • Wei Yang PhD ‘85 among eight honorary degree recipients at Brown Commencement
    During its 244th Commencement, Brown University will confer eight honorary doctorates: Carolyn Bertozzi, biochemist; Viola Davis, actress; J...

Categories

  • "Sirinrath Sirivisoot"
  • aaas
  • aaron
  • abet
  • accreditation
  • aceros
  • ACS
  • admission
  • admissions
  • advanced baby imaging lab
  • AIChE
  • AIChE ugrad award
  • AIMBE
  • almeida
  • alumni
  • Andrew Peterson
  • apoE4
  • archaeology
  • archambault
  • argyria
  • ARPA-E
  • article
  • artificial ovary
  • asme
  • associate dean
  • audax
  • audio
  • awar
  • award
  • axena
  • Aziz
  • Baek
  • bahar computer conference
  • banyan
  • bashevkin
  • bats
  • battery
  • beam
  • BEAR Day
  • bio
  • biodiesel
  • biofilm
  • biofuel
  • blume
  • bme
  • bmes
  • book
  • borton
  • bower
  • brain initiative
  • Brain Science
  • brain sensor
  • brain-computer interface
  • braingate
  • braingate2
  • breuer
  • breuer bats reuters
  • briant
  • brown institute of brain science
  • bull
  • bull risd car
  • business plan
  • calakli
  • calo
  • CAMR
  • car
  • carbon
  • career fair
  • Caswell
  • catena
  • cave
  • CfNN
  • chemical innovation program
  • China
  • civil engineering
  • coda
  • coe-sullivan
  • collaboration
  • commencement
  • competition
  • Computational Materials Science
  • concussion
  • cooper
  • cord-clamping
  • crisco
  • crisco risd
  • CRL
  • Curet
  • curtin
  • cyberkinetics
  • dang
  • dean
  • decker
  • deisley
  • deoni
  • desai
  • desktop delta-v
  • dingman
  • DOE
  • donoghue
  • donovan
  • Durmus
  • dworak
  • eastman conference
  • ejiofor
  • election
  • elevator pitch
  • emanuel
  • emotive
  • Empower
  • en4
  • energy-momentum spectroscopy
  • ENGN 1930G
  • entrepreneurship
  • Entreprenuership
  • EPSCOR
  • escuti
  • event
  • ewb
  • faculty
  • feature
  • fellowship
  • Felzenszwalb
  • fleeter
  • fluid dynamics
  • franck
  • freund
  • fsae
  • fulbright
  • gao
  • General Motors
  • GhostBot
  • gidmark
  • gingerbread
  • GM
  • GM/Brown
  • grad
  • graduate
  • grant
  • grantab
  • graphene
  • greis
  • guduru
  • guo
  • haberstroh
  • halpin
  • halpin prize
  • Hargus
  • hazeltine
  • heart
  • hibbitt
  • HnC
  • hochberg
  • huebscher
  • hurt
  • hurt cfl nyt
  • hydrokinetic
  • IE
  • IIT-Bombay
  • IMNI
  • implant
  • Indo-US
  • innovation
  • institute of medicine
  • international
  • internship
  • jackson
  • jadhav
  • jakubek
  • Jay
  • JCD Wind
  • jepsen
  • joukowsky
  • JPL
  • kane
  • kesari
  • keynote
  • Kim
  • klout
  • Kristie Chin
  • kulaots
  • Külaots
  • kumar
  • Kummer
  • lacrosse
  • larson
  • laser
  • laulicht
  • lavernia
  • Lazos
  • Lee
  • LEGO
  • LIB
  • light emission
  • liquid bone
  • lithium ion battery
  • Liu Finalist GEMS Award
  • lubricin
  • lysaght
  • magnet
  • malik
  • Mandre
  • maris
  • materials
  • Materials Research
  • mathiowitz
  • mba
  • mccalla
  • Mechanics
  • Megan Buczynski
  • mentor
  • mercury
  • metamaterials
  • metaphotonics
  • MGI
  • mittlemann
  • morgan
  • muri
  • NAE
  • nano
  • nanoparticles
  • nanopatch
  • nanoscience
  • nanoskin
  • nanotechnology
  • nanotubes
  • nanovis
  • NASA
  • Needleman
  • neuroengineering
  • Neurorestoration
  • Neuroscience
  • neurotechnology
  • NewMech
  • NewMech2012
  • nih
  • nsf
  • NSFC
  • nurmikko
  • nurse
  • open house
  • optical
  • osteoarthritis
  • overhead.fm
  • pacifici
  • padture
  • palmore
  • palmore hoffmankim nih
  • paper
  • patent
  • paxson
  • pecase
  • Peterson
  • petteruti
  • Phi Beta Kappa
  • photos graduation
  • powers
  • powers editor journal
  • president
  • prime
  • PRIME Omega-3
  • Privicare
  • PriWater
  • profiles
  • project
  • publication
  • publication leadership
  • qd vision
  • Raimondo
  • rainwater
  • ramesh
  • Ramos
  • rankings
  • reda
  • reed
  • Reggiannini
  • research
  • richardson
  • risd
  • Riviere
  • robot
  • robots
  • rome
  • rosakis
  • Rosenstein
  • Runa
  • salomon award
  • sarin
  • schutter
  • scripta materialia
  • selenium
  • SES
  • sharp
  • sheldon
  • shenoy
  • sigma xi
  • silver
  • Silverman
  • simeral
  • simulia
  • SMART
  • solar
  • Solar4Cents
  • space
  • Speramus
  • Spira
  • stac
  • startup
  • stem outreach
  • Stout
  • summer
  • superfund
  • suuberg
  • swe
  • sygiel
  • takamoto biogas
  • tau beta pi
  • taubin
  • taylor
  • team
  • timoshenko
  • tissue
  • tissue engineering
  • tran
  • tripathi
  • tsang
  • twitter
  • ugrad
  • VA
  • van de Walle
  • venture for america
  • video
  • Vlahovska
  • wadia
  • wang
  • warshay
  • watson
  • website
  • webster
  • webster nano
  • webster nurmikko bio conference
  • Wells
  • weng
  • wireless
  • workshop
  • yang
  • yin
  • zhang
  • zhang webster star award phd
  • zia
  • zia nsf award

Blog Archive

  • ►  2013 (18)
    • ►  April (1)
    • ►  March (5)
    • ►  February (4)
    • ►  January (8)
  • ▼  2012 (76)
    • ►  December (5)
    • ►  November (8)
    • ►  October (9)
    • ►  September (5)
    • ►  August (6)
    • ►  July (6)
    • ►  June (5)
    • ▼  May (4)
      • How ion bombardment reshapes metal surfaces
      • People with paralysis control robotic arms using b...
      • Brian Reggiannini figures out who’s talking
      • Brown Team Wins Rhode Island Business Plan Competi...
    • ►  April (8)
    • ►  March (11)
    • ►  February (6)
    • ►  January (3)
  • ►  2011 (95)
    • ►  December (7)
    • ►  November (9)
    • ►  October (8)
    • ►  September (11)
    • ►  August (8)
    • ►  July (4)
    • ►  June (3)
    • ►  May (10)
    • ►  April (7)
    • ►  March (10)
    • ►  February (10)
    • ►  January (8)
  • ►  2010 (55)
    • ►  December (13)
    • ►  November (8)
    • ►  October (3)
    • ►  September (2)
    • ►  August (5)
    • ►  July (1)
    • ►  June (2)
    • ►  May (8)
    • ►  April (5)
    • ►  March (2)
    • ►  February (4)
    • ►  January (2)
  • ►  2009 (46)
    • ►  December (7)
    • ►  November (2)
    • ►  October (4)
    • ►  September (4)
    • ►  August (2)
    • ►  July (5)
    • ►  June (4)
    • ►  May (6)
    • ►  April (5)
    • ►  March (1)
    • ►  February (4)
    • ►  January (2)
  • ►  2008 (15)
    • ►  December (1)
    • ►  November (4)
    • ►  October (1)
    • ►  September (1)
    • ►  August (2)
    • ►  July (3)
    • ►  June (2)
    • ►  April (1)
  • ►  2007 (1)
    • ►  February (1)
Powered by Blogger.

About Me

Unknown
View my complete profile