Alpine Consulting, Inc. HomeCompanySolutions & ServicesCareersSoftwareResource CenterContact Us



  Peak Spotlight

Company

Peak Spotlight
Our Successes
01.Feb 2008
Would you like to Enhance Client Relations and Maximize ROI at the same time?  Alpine can help your business do just that by."
Announcements
Our Recent News
13.Dec 2007
Alpine Consulting, Inc. announced today that is has received the Information On Demand (IOD) Specialty accreditation. This difficult to obtain IBM recognition allows Alpine...
bulletNewsletter Assets

GNR: An Advanced Name Finding Tool

This article is about names. Not any names in particular, but about finding names. With an increasing diverse set of names becoming an everyday occurrence for any business (large or small), there is an increased emphasis on resolving multiple variations of a person’s name into a unique qualifier. Resolving names has increasingly become a Critical Success Factor for critical enterprise initiatives, such as CDI (Customer Data Integration), Data Warehousing, and even MDM (Master Data Management). Organizations are finding that getting lots of names is easy. Finding the names you want is hard.

Name Damage

Name Damage means the inadvertent or intentional errors or irregularities introduced into the name. Let’s start by looking at some of the ways names can be damaged. The three main causes of name damage are:
  • Data Capture Issues
  • Hidden Names
  • Deception Issues

Data Capture Issues

Data Capture Issues concern how names are saved into your computers for long term storage.  Take a look at just a few examples of how people often spell the “same” name differently:

John, Jon, Jonathan, Johnny

How many inventive ways have you seen your own name spelled?  People can also make mistakes (simple typos or swapping first & last names).  Customers may speak unclearly; names may sound one way on the phone but be spelled differently (Thomson vs. Thompson, Cline vs. Kline).  Optical Character Recognition (OCR) document scanning can also damage names.  It only takes minor character changes to make it very hard to find someone later.

Sometimes people have to put names into a single field.  Thus they put: “Bob Smith”, “Robert Smith”, “Smith Bobby”, or “Smith Bob” - which one is correct?  Of course there is no single “correct” way; they all may be good enough for some purpose.  One division may keep notes the first way.  Another group might track shipment history with the second. The Customer Payments group might use the third, while the Help Desk might use another for support tickets.  Style changes within a group can complicate finding records for any given person.  Shifting styles across groups and divisions can make it nearly impossible.

Different name lists can have their own ideas about how to best keep names.  What happens when you buy a mailing list or acquire another company?  Combining another name list with your existing names can be non-trivial.  What should one do when each list has its own idea of how first & last names are arranged? And can you be sure that “first name” and “last name” have been used consistently during the life of a given name list?

Hidden Names

So far we have considered fields that are intended to hold names, but what happens when people get creative and start hiding names in other fields?  We are not referring to street names like: ADDRESS1=”MARTIN LUTHER KING DRIVE” but rather ADDRESS1=”ATTN MARIA HILBERT”

Now these are all things that “just happen” during the work day – data entry errors, personal errors, and many such reasons.  However, Name Capture issues pale in comparison to Deception Issues.

Deception Issues

Deception Issues occur when someone tries to hide their identity.  Sometimes people change their names; they may use variations, intentional misspellings, or nicknames.  They may also employ the same on someone else’s name (e.g. identity theft).  Researchers found that over 62% of the criminal records in the Tucson, AZ police department files contained misleading name data provided by suspects.  There is a long list of theft and fraud activities that can hurt your organization.  Some people may be very motivated to hide their identities from you.  A creative person with malicious intent can make name damage due to Name Capture Issues (above) look harmless by comparison.

Name Damage Impact

Small changes (even innocent typos) can cause names to just “drop off your radar.”  This is an especially challenging problem because people “don’t know what they don’t know.”  Your customer service center may successfully handle thousands of requests every day; who would ever know if a name fell through the cracks?  By definition, they are unknown because existing searches do not reveal them – and no human has the time (or ability) to dig through mountains of names looking for strays.  Now that you know how names can be damaged, we will take a look at why name searching is so hard.

Name Resolution – Limitations of Prevailing Search Strategies

A helpful search strategy will find similarities between names, either based on how names sound or how similar their spellings are.  The table below shows a brief history of name search strategies.  Using a search strategy is conceptually simple.  Take the name you want to find, and a list of candidate names to search through (maybe a phone book, or a stack of customer files, etc.).  Generate a score for the search-name and each candidate-name.  Scoring is where the “strategy” part comes into play; the strategy defines rules for cranking out a score based on the name’s original spelling.  If the scores match, you should keep that candidate because the names might be the same.  The difficulty with search strategies is that two names “might be” the same, but often they are not the same at all.  To see an example of this, let us look at how Soundex works.

Original Name

Soundex 1918

Edit Distance 1965

NYSIIS 1970

META PHONE 1990

CLINE

C450

dist = 1

CLAN

KLN

KLINE

K450

 

CLAN

KLN

MCALEVEY

M241

dist = 4

MCALAFY

MKLF

CALVERLEY

C416

 

CALVARLY

KLFR


Search Strategy – A Brief History with Examples

All of these strategies have trouble coping with Name Damage, Name Variants, and Cultural / Regional differences.  For a simple example of Cultural differences, consider CLINE and KLINE.  In Anglo-cultures, C and K sometimes sound the same.  In Russian culture, C always sounds like S, as in “city”.  If you had to search Russian names, strategies designed for English sounds would work poorly.  Thus all the strategies above has their own limitations – the kinds that may not help you meet your business objectives.

In summary, the more a strategy knows about names (such as cultural background, phonetics sensitivity, etc.), the better job it can do of finding names. 

Resolving Names – The GNR Approach

If we wanted to make a better name search strategy, it would be a great help to have “inside information” about how names worked.  To the outside observer, it would look as if we had an unfair advantage over those other strategies.  Let us look at some of the things we might be able to “learn” about names, and how that knowledge can help build better search strategies.

The Knowledge Advantage

Let us start with a small example.  Consider the 100 most common Last Names for a given culture, like Chinese.  Now it turns out there are over 2,000 different Chinese Last Names.  But interestingly enough, the top 100 names account for 85% of all Chinese people.  In other words, knowing 5% of the names covers 85% of the people.

So how does this help us build a better search strategy?  When we make our sounds-like rules, we can focus on the most popular names and build rules that are customized for that culture.  And because we know the most popular names, we could check to see if the search request had a typo and then (automatically) look for similar spellings.  This would leave Soundex-like strategies wandering around lost while we hone in on interesting search results.  Note that this would also be a remarkably effective way to deal with name damage – rather like having a spell checker that was name-smart.

Untangling Names

Consider the following name:

Alejandr Rodriguez de la Pena y de Ybarra
   Which part is First Name?
   Which part is that Last Name?

If we can separate the different parts of a name, we can make much better decisions about searching.  This increase in “hi-definition” helps us resolve the name more meaningfully.

Name Quiz – Answer
Given

Family

1st Surname

2nd Surname

Alejandr

Rodrigues

de la Pena

y de Ybarra

Recall our discussion about Soundex-like strategies.  They work rather well when comparing any single part of a name.  However, those strategies run into huge difficulties when comparing entire names.  Part of the problem is that people can write the same names in so many different ways.  Soundex requires we feed it only “Last Names” – but how do you do that if you are not sure what part the “Last Name” is?

Name Variations Across Cultures

Another useful kind of “name knowledge” pertains to name variations.  We saw some simple examples of name variants earlier.  Does culture really matter for name variations?  The answer is yes, culture makes a huge difference in determining which variations to use when searching for names.  Since we are already keeping track of how popular names are in various cultures, we could also make note of which names are variants of one another.  If we did that, we would end up with something like the table below (Table 1), which shows the top six most popular variations for a given name.

Original Culture

Original Spelling

Anglo variants

Arabic variants

Chinese variants

Hispanic variants

Russian variants

Arabic

Isaac

ISAAC
ISACK
ISSAC
ISAC
ISSAAC
ISSACE

ISAAC
ALISAC
ESAAC
ESAK
ESSAC
ISAAQ

ISAAC

ISAAC
YSAAC
IZAAC

ISAAC

Russian
Юрий

Yury

YURI
YURAI
YURY

YIOURI
EURI
YURY

YUJI
YOUJI
YURI
YURY

YURI
YURY
LLURI
LLURY

YURIY
IOURI
YURY
YURI
JOURI
YOURI

Hispanic

Manuel

MANUEL
MANUELE

MANUEL
MANAWEL
MANAWEIL
MANOAEL
MANOEIL
MANOIL

MANUEL

MANUEL
MANOLO
LICO
MANOLETE
MEME

MANUEL

Chinese

Liu

LIU
LIUU

ABDEOLEU
LAYO
LIU

LU
LIU
LIAO
LAU
LUI
LAO

LIU

LJU
LU
LIU

Table 1 - Name Variants Across Cultures

How many of the variants listed in Table 1 would you have thought of?  More to the point, how many would your computers have thought of?  Knowing these name variants would give our search strategy a huge advantage in coping with name damage.  This kind of advantage makes the difference between finding needles in haystacks versus going home empty handed. 

GNR Defined

Global Name Recognition (GNR) is a powerful name searching toolkit.  For over 20 years, professional linguists and computer experts have been analyzing names to help make GNR smart.  They have compiled an extensive set of cultural databases from about one billion names gathered from all over the world (while not the whole planet, that is a respectable start).  They have also boiled down the sounds-like rules for different cultures.  This gives GNR peerless insight into names, pronunciation, and more.  In other words, GNR comes pre-tuned for everyone, everywhere, right out of the box.

GNR is a tool kit because it comes “straight from the factory” as a low-level set of libraries ready to bolt into your applications (typically SOA or linked in C++).  GNR is not a rip & replace strategy.  Most companies have hundreds of man-years (or more!) invested in their systems.  “Just” replacing these systems is non-trivial; in many cases, we find there are straight forward integration points that GNR can plug into.  And of course GNR can be used in new projects as well.

Would you like to see GNR in action?  You already have!  Many of the examples in this article were produced with the GNR toolkit.

What Can GNR Do For Me?

Alpine would love to show you!  One of Alpine’s popular services is a Data Quality Assessment (DQA).  In addition to a data health-check, a DQA can give you a baseline of interesting name search results that you can measure your existing search strategy’s performance against.

Next, consider the Data Capture Issues that were discussed earlier.  You could try being strict with your customers and staff to eliminate or reduce those issues.  Your computers would have a much easier life if only people would behave properly (stop making typos, be consistent, no “creative” field use, etc.).  Another approach would be to give your computers a “lift” with GNR – this more reasonable approach would help your computers “accommodate” those unreliable, unpredictable humans.  I will leave it as an exercise for the reader to work out which approach is a “solving it once” scenario and which involves “solving it over and over again.”

Finally consider the Deception Issues discussed earlier as well.  If shrinkage or compliance issues are proving to be difficult to manage, tools like GNR can help rope them in.

Summary

In this article we covered how name damage happens and the kinds of challenges that must be handled when searching for names.  As organizations’ customers / suppliers / partners profiles get increasingly global, adopting state of the art Name Resolution technologies serve the promise of better relationships and definitely better protection of one’s business interest.  Many ‘Early Adopter’ organizations are reaping the benefits of such technologies as GNR in regards to their compliance, cost savings, and potential for additional revenues.

Alpine Consulting, Inc. 1100 East Woodfield Rd, Ste 105, Schaumburg, IL 60173
Tel: +1 847 605 0788 Fax: +1 847 240 5831 Email: info@alpineinc.com Web: www.alpineinc.com
© 2008 Alpine Consulting, Inc.