• Jump to Left Menu
  • Jump to Right Menu
  • Jump to Main Content
  • Jump to Footer
  • Accessibility Page
IT-Director.com Logo

 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Register | Login to Member's Area

 
 
DOMAINS
  • Enterprise
  • SME
  • Business Issues
  • Technology
  • Services
  • Channels
FEATURED EVENTS
  • Information Process Quality Improvement
    19th March - 21st March
    London, United Kingdom
  • Convergence Summit North 2012
    17th April - 18th April
    Manchester, United Kingdom
POPULAR PAPERS
  • Best practices for cloud security by Bloor Research
USEFUL LINKS
  • Last 7 Days
  • Archives
  • Top Articles
SHARE THIS PAGE
  • Delicious Icon Delicious
  • Digg Icon Digg
  • reddit Icon reddit
  • Facebook Icon Facebook
  • StumbleUpon Icon StumbleUpon
CONTENT FEED

Sitewide
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Say Again? - "David was a Hebrew king skilled at playing the liar." - From Student Bloopers

PAGE TOOLS
  • Request Reprints
  • Tell A Friend
  • Contact Author
RECENT POSTS
  • Four Vendor Views on Big Data and Big Data Analytics: IBM
  • Four Vendor Views on Big Data and Big Data Analytics Part 2- SAS
  • SAP moves to social media analysis with NetBase partnership
  • Attensity on Big Data and Big Data Analytics
  • The Inaugural Hurwitz & Associates Predictive Analytics Victory Index is complete!
  • Informatica announces 9.1 and puts stake in the ground around big data
ADVERTISEMENT
BLOG ARCHIVE
  • January, 2012
  • December, 2011
  • November, 2011
  • September, 2011
  • June, 2011
  • May, 2011
  • April, 2011
  • February, 2011
  • January, 2011
  • December, 2010
  • November, 2010
  • October, 2010
Blogs > Fern Halper

The Importance of multi-language support in advanced search and text analytics

Fern Halper By: Dr Fern Halper, Partner, Hurwitz & Associates
Published: 17th March 2010
Copyright Hurwitz & Associates © 2010
Logo for Hurwitz & Associates

I had an interesting briefing with the Basis Technology team the other week. They updated me on the latest release of their technology called Rosette 7. In case you're not familiar with Basis Technology it is the multilingual engine that is embedded in some of the biggest Internet search engines out there—including Google, Bing, and Yahoo. Enterprises and the government also utilize it. But, the company is not just about keyword search. Its technology also enables the extraction of entities (about 18 different kinds) such as organizations, names, and places. What does this mean? It means that the software can discover these kinds of entities across massive amounts of data and perform context sensitive discovery in many different languages.

An Example
Heres a simple example. Say you're in the Canadian consulate and you want to understand what is being said about Canada across the world. You type "Canada" into your search engine and get back a listing of documents. How do you make sense of this? Using Basis Technology entity extraction (an enhancement to search and a basic component of text analytics), you could actually perform faceted (i.e. guided) navigation across multiple languages. This is illustrated in the figure below. Here, the user typed "Canada" into the search engine and got back 89 documents. In the main pane in the browser, you can see that an arrow in a number of different languages highlights the word Canada, so you know that it is included in these documents. On the left hand side of the screen is the guided navigation pane. For example, you can see that there are 15 documents that contain a reference to Obama and another 6 that contain a reference to Barack Obama. This is not necessarily a co-occurrence in a sentence, just in the document. So, any of these articles would contain a reference to Obama and Canada. This would help you determine what Obama might have said about Canada. Or, what the connection is between Canada and the BBC (under organization). This idea is not necessarily new, but the strong multilingual capabilities make it compelling for global organizations.

If you have eagle eyes, you will notice that the search on Canada returned 89 documents, but the entity "Canada" only returned 61 documents. This illustrates what entity extraction is all about. When the search for Canada was run on the Rosette Name Indexer tab (see upper right hand corner of the screen shot) the query searched for Canada against all automatically extracted "Canada" entities that existed in all of the documents. This includes all persons, locations, and organizations that have similar names. This included entities like "Canada Post" and " Canada Life" which are organizations, not the country itself. Therefore the 28 other documents with a Canada variant are organizations or other entities.

Use Cases
There are obviously a number of different use cases where the ability to extract entities across languages can be important. Here are three:

  • Watch lists. With the ability to extract entities, such as people, in multiple languages, this kind of technology is good for government or financial watch lists. Basis can resolve matches and translate names in 9 different languages. This includes resolving multiple spelling variations of foreign names. It also enables organizations to match names of people, places, and organizations against entries in a multilingual database.
  • Legal discovery. Basis technology can identify entities in 55 different languages. This can obviously help in legal discovery by narrowing down the number of documents that companies would need to analyze, for example, in the case of a global enterprise. Additionally, it could process many documents and extract the entities associated with them to find the right set of documents needed in legal discovery.
  • Brand image, competitive intelligence. The technology can be used to extract company names across multiple languages. The software can also be used against disparate data sources, such as internal document management systems as well as external sources such as the Internet. This means that it could cull the Internet to extract company name (and variations on the name) in multiple languages. I would expect this technology to be used by "listening posts" and other "Voice of the Customer" services in the near future.

While this technology is not a text analytics analysis platform, it does provide an important piece of core functionality needed in a global economy. Look for more announcements from the company in 2010 around enhanced search in additional languages.

Reader Comments

The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.

We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.



  • Report errors / Make Suggestions
  • | Site Map
  • | Terms of Use
  • | Privacy

Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761