Saturday, January 2, 2016

Phonetic Search

3:15 AM Posted by Anuradha Welivita No comments


Have you ever gone to McDonald’s? Is it McDonald’s or MacDonald’s? Well it sounds like the latter though. Phonetics is all about how words sounds like. People are bad at spelling things correctly. They do not memorize spellings but they type in the words guessings the spellings by the way the word sounds like. Most of us are more comfortable to listening stuff to understand something rather than having to read thousands of lines of text. And each year the number of distinct words we have encountered grows bigger and bigger. And as technology advances people rely on auto-correct generators or spell checkers to correct their spellings. These are the major reasons that most of the people in the world are spelling-challenged.
As people are the ones who use computers, it is a risk for the computers to depend on humans to type in the words spelled correctly for searches and stuff. If you read the blog post http://www.nextmovie.com/blog/most-misspelled-celebrity-names you will be able to see even the most popular celebrity names are reported misspelled most of the time. Not knowing either to pronounce what they read or spell what they hear often leads people to end up in this situation.   
This can be mostly seen in among spelling trademarks also which often leave out the vowels and arrange the consonants in unusual ways to sound unique. This uniqueness or distinctiveness in names often troubles the general public. And this has a greater effect on the amount of revenue earned by these brands as well because people may not be able to spell the brand name correctly on search engines and might not be able to get the correct results or might end up finding a different brand.   
Hence this arises the need to make computers less dependable on what people type in. Following approaches used in the content technology minimizes this problem to a certain extent.
  • Phonetic search
  • Voice search
  • Speech synthesis
Let’s look at Phonetics search in detail in this article.
Phonetic Search
Most of the traditional searches rely on the user to type in a query word. Although it auto-generate suggestions or predict text those also rely on the part of the text typed in by the user. But Phonetic search gives user the flexibility to search according to how the words sound like. Google also does not support it yet. One of the example where Phonetic search has been incorporated is general purpose search engine Exalead, from France’s Dassault Systèmes. This kind of search is mostly useful for e-commerce applications where products, brands and trademarks are searched in bulks.
How it works
Basically what it does is identify the sounds represented by the query word we type in and match it with words which sound alike (homonyms or near-homonyms for that sound). These words might have different meanings like the words cell and sell but will sound alike. You will understand that many such words which sound alike but completely different in meaning exist by referring the article http://www.cooper.com/alan/homonym_list.html. For this both the query and target words should be converted to a phonetic representation and when typed in the query word all the words which have a similar phonetic representation to that of the query word are returned.
Soundex Algorithm
Soundex is the main algorithm used in Phonetic search. It was developed by Robert C. Russel and Margaret King Odell and patented in 1918 and 1922. Many more modern Phonetic algorithms have also come up but the basis for most of these Phonetic algorithms is the Soundex algorithm.
Soundex algorithm indexes words according to their phonetic sound such that different words which sound alike get the same index. So, when storing words following the Soundex algorithm, those with similar phonetic sounds can be related together by their index and can be matched despite the minor differences in  their spellings.
Soundex algorithm first remove the vowels (except a vowel appearing as the first letter of a particular word) and other irrelavent consonants. The it calculates the Soundex index for the refined word. Soundex index for a name consists of a letter followed by 3 numerical digits. The letter is the first letter of the name and the digits encode the remaining consonants. Consonants at a similar place of articulation share the same digit. It partitions these consonants into groups and all the consonants which belong to the same group share the same digit. The following tables denote the similar groups of consonants and their corresponding digits.
Original Soundex consonant grouping
B P F V
1
C S K G J Q X Z
2
D T
3
L
4
M N
5
R
6
Refined Soundex consonant grouping
B P
1
F V
2
C K S
3
G J
4
Q X Z
5
D T
6
L
7
M N
8
R
9
Letters which are not included in the above table (all the vowels, H and W) are removed in the first step. Then adjacent letters from the same group are written as one letter. The result is truncated to the first letter followed by 3 digits. Missing positions of the 3 digits part is embedded with zeros. This is the Soundex index of a word and two words which sound alike will end up having the same Soundex index.
The following example will show how you calculate the Soundex index of two words which sound the same.

0 comments:

Post a Comment

Labels