Speech recognition systems are amazing
April 5, 2004 | 12:00am
It was only sometime in 92 when I personally experienced for the first time, within the context of artificial intelligence, speech recognition technology right before my very own eyes, and with my very own ears. For it was almost a decade before, in 83, that I first became aware of attempts to develop speech recognition systems. How could a machine tell the difference between the sounds speakers make when uttering, for instance, "abominable" and "a bomb in a bull" or between "She is at anchor" and "She is a tanker"?
Current wisdom back then in 83 had it that the correct configuration of words could only be identified if the computer had a wider understanding of context and meaning. At a seminar way back then, computer scientists, computational linguists, even experimental psychologists and philosophers were in anguish over the challenge of programming computers to understand speech and the meaning of words. Progress was slow. But as we know now, it has been exciting and intriguing to learn that speech recognition for personal computers, and other machines like the telephone, has been perfected with dazzlingly high accuracy rates.
It was winter in Washington DC, 92. On an official mission for the Department of Transportation and Communications, I was tasked to meet with a top Motorola executive who had been the Head of the US delegation to the plenipotentiary conference of the International Telecommunications Union (ITU Plenipot) in Nice, France in 89 Ambassador Travis Marshall. US delegations to the Plenipot have, by tradition, always been headed by a prominent member of their private telecom sector, and given the rank of Ambassador, the members of the delegation composed of very senior public telecom sector officials, both policy and technical experts, and quite a number of business leaders of the industry. For the Philippines, the delegation is always headed by the most senior telecom public officials involved in policy and regulation, only about three, with the rest coming from the private sector.
The Nice Plenipot was the first one I headed in 89. Of course, the venue was enticing Nice, the host state being France. It was my first exposure to the ITU arena, and the Philippines was running for election to the governing council for the first time.
The deputy head was Atty. Kathleen Heceta, then chief of the legal department of the National Telecommunications Commission, now its deputy commissioner. Arriving a day before the official start of the Plenipot, Kathy and I scheduled meetings with the different heads of delegations. The first one was with Ambassador Marshall and his deputy, Ambassador Barbarelli and we four agreed to meet at the Nesgresco Hotel coffee shop. The whole point of our meeting with the most powerful head of delegation was of course to reaffirm mutual support for the elections. We got out of that briefing with the two gentlemen promising us two ladies the commitment of several blocs, the many others who were then within the spheres of influence of the US.
And Travis Marshall became a good friend of mine. He visited the Philippines after the conference, and was instrumental in enlarging the presence of Motorola in our country. He enjoyed our country so much, and kept coming back. In fact, he visited the Philippines quite often, that when Mount Pinatubo erupted, he found himself marooned in Manila because the international airport was inoperable due to the far-reaching ash spill that affected even our airport.
Travis, to this day, remains one of the nicest, most polished and gentlemanly Americans I have ever met.
And, of course, he provided me with one of the most technologically exciting experiences of my life in the dead of winter, 14 years ago. Today, the world citizen is talking to his machines regularly, and not just to create letters and reports. In the morning, for example, he walks into his office and says something like "Machine on E-mail New messages " and then replies to the messages orally through voice recognition rather than via the keyboard.
I feel good that I had witnessed a dress rehearsal of sorts of the future through Travis Marshall in Washington DC in 92. After a conference in his office he opens the door of his car for me, takes the wheel to go to a lunch he had organized for me with about six other industry leaders of the US.
As he drove he told me, "Josie, let me show you something which you probably have not witnessed yet." He lifted the cover of the armrest and spoke to the telephone within, "Get me Bill Borman, please." Almost instantaneously, I heard the ringing of a phone and Borman answering, "Whats up Trav?" I had also met Bill at the Plenipot who worked with Motorola, too. He must have answered in the same manner just by speaking to his machine in his car since he was also on his way to lunch. Marshall answered that he and I were on our way to the lunch and then told me to try to get his machine to respond to my voice. Of course I did not fall for that one. I had learned from technology expositions that the voice is the most difficult to imitate. Even the best mimic will not be able to crack the machine. The voice tonal and resonance qualities are so intricately and specifically disparate that it is the most difficult nut to crack.
The very real dynamism of technology on this score has brought us more than 12 years later to today. The systems are so much more complex. Their fail-safe features are amazing. The research laboratories through the past 12 years really cracked this one, not through artificial intelligence (AI), and not by programming computers to understand and recognize words as human beings do. Rather, by using the considerable processing power of technology, it had become possible to build statistical models of language which helped tackle problems such as those of homonyms (i.e. "to", "two", "too") and phrases of the "She is at anchor" variety.
Contemporary speech recognition systems based on statistical models come in various shapes and sizes. Speaker-dependent systems like the one Travis Marshall proudly displayed to me, are those that recognize the sound of particular people, their pronunciation, accent, intonation, and speed of expression. Travis explained this is achieved by studying the individual speech patterns in advance, and building an acoustic analysis that can anticipate the way in which these individuals will pronounce almost any word.
Who was it who said that the advance of technology to heights never before imagined is limited only by human genius and imagination? Who was it who coined the term "science-fiction"? It is fast disappearing science-fiction yesterday is "science-fact" today. Who was it who said, "The universe is full of magical things patiently waiting for human genius to tap."
Thank you for your e-mails sent to jtl@info.com.ph.
Current wisdom back then in 83 had it that the correct configuration of words could only be identified if the computer had a wider understanding of context and meaning. At a seminar way back then, computer scientists, computational linguists, even experimental psychologists and philosophers were in anguish over the challenge of programming computers to understand speech and the meaning of words. Progress was slow. But as we know now, it has been exciting and intriguing to learn that speech recognition for personal computers, and other machines like the telephone, has been perfected with dazzlingly high accuracy rates.
It was winter in Washington DC, 92. On an official mission for the Department of Transportation and Communications, I was tasked to meet with a top Motorola executive who had been the Head of the US delegation to the plenipotentiary conference of the International Telecommunications Union (ITU Plenipot) in Nice, France in 89 Ambassador Travis Marshall. US delegations to the Plenipot have, by tradition, always been headed by a prominent member of their private telecom sector, and given the rank of Ambassador, the members of the delegation composed of very senior public telecom sector officials, both policy and technical experts, and quite a number of business leaders of the industry. For the Philippines, the delegation is always headed by the most senior telecom public officials involved in policy and regulation, only about three, with the rest coming from the private sector.
The Nice Plenipot was the first one I headed in 89. Of course, the venue was enticing Nice, the host state being France. It was my first exposure to the ITU arena, and the Philippines was running for election to the governing council for the first time.
The deputy head was Atty. Kathleen Heceta, then chief of the legal department of the National Telecommunications Commission, now its deputy commissioner. Arriving a day before the official start of the Plenipot, Kathy and I scheduled meetings with the different heads of delegations. The first one was with Ambassador Marshall and his deputy, Ambassador Barbarelli and we four agreed to meet at the Nesgresco Hotel coffee shop. The whole point of our meeting with the most powerful head of delegation was of course to reaffirm mutual support for the elections. We got out of that briefing with the two gentlemen promising us two ladies the commitment of several blocs, the many others who were then within the spheres of influence of the US.
And Travis Marshall became a good friend of mine. He visited the Philippines after the conference, and was instrumental in enlarging the presence of Motorola in our country. He enjoyed our country so much, and kept coming back. In fact, he visited the Philippines quite often, that when Mount Pinatubo erupted, he found himself marooned in Manila because the international airport was inoperable due to the far-reaching ash spill that affected even our airport.
Travis, to this day, remains one of the nicest, most polished and gentlemanly Americans I have ever met.
And, of course, he provided me with one of the most technologically exciting experiences of my life in the dead of winter, 14 years ago. Today, the world citizen is talking to his machines regularly, and not just to create letters and reports. In the morning, for example, he walks into his office and says something like "Machine on E-mail New messages " and then replies to the messages orally through voice recognition rather than via the keyboard.
I feel good that I had witnessed a dress rehearsal of sorts of the future through Travis Marshall in Washington DC in 92. After a conference in his office he opens the door of his car for me, takes the wheel to go to a lunch he had organized for me with about six other industry leaders of the US.
As he drove he told me, "Josie, let me show you something which you probably have not witnessed yet." He lifted the cover of the armrest and spoke to the telephone within, "Get me Bill Borman, please." Almost instantaneously, I heard the ringing of a phone and Borman answering, "Whats up Trav?" I had also met Bill at the Plenipot who worked with Motorola, too. He must have answered in the same manner just by speaking to his machine in his car since he was also on his way to lunch. Marshall answered that he and I were on our way to the lunch and then told me to try to get his machine to respond to my voice. Of course I did not fall for that one. I had learned from technology expositions that the voice is the most difficult to imitate. Even the best mimic will not be able to crack the machine. The voice tonal and resonance qualities are so intricately and specifically disparate that it is the most difficult nut to crack.
The very real dynamism of technology on this score has brought us more than 12 years later to today. The systems are so much more complex. Their fail-safe features are amazing. The research laboratories through the past 12 years really cracked this one, not through artificial intelligence (AI), and not by programming computers to understand and recognize words as human beings do. Rather, by using the considerable processing power of technology, it had become possible to build statistical models of language which helped tackle problems such as those of homonyms (i.e. "to", "two", "too") and phrases of the "She is at anchor" variety.
Contemporary speech recognition systems based on statistical models come in various shapes and sizes. Speaker-dependent systems like the one Travis Marshall proudly displayed to me, are those that recognize the sound of particular people, their pronunciation, accent, intonation, and speed of expression. Travis explained this is achieved by studying the individual speech patterns in advance, and building an acoustic analysis that can anticipate the way in which these individuals will pronounce almost any word.
Who was it who said that the advance of technology to heights never before imagined is limited only by human genius and imagination? Who was it who coined the term "science-fiction"? It is fast disappearing science-fiction yesterday is "science-fact" today. Who was it who said, "The universe is full of magical things patiently waiting for human genius to tap."
Thank you for your e-mails sent to jtl@info.com.ph.
BrandSpace Articles
<
>