Named Entity Recognition (NER)

Named Entity Recognition (NER), is a sub problem of information extraction and involves processing structured and unstructured resources and identifying expressions that refer to Peoples, Places, Organizations, Dates and Times (See Figure) NER is a fundamental task in information extraction, to extract information based on analyzing natural language. The term Named Entity Recognition and other above terms was introduced in the sixth Message Understanding Conference.


Since the Named Entity Recognition (NER) task is seems relatively simple , a high accuracy rate and independent domain system with responding to senses names is expected.

At the first time building a NE system seems easy but with so many studies in this areas, still a large number of ambiguous cases that make it difficult leaves on the NER task and reduces the performance in existing methods.

For illustration in bellow example:

When is "The White house" as organization, and when is it a location?
When is "June" a person name? And when is it a month name?

In: "He visited Bush at White House", here White House is a location", but in “The White House has asked the Department of Justice to look into … ", White House is an organization.

For humans in daily dialog, text reviewing and NER is intuitively very simple, only sometimes they have a few problems to recognize very difficult and unheard names. As well as many named entities are proper names and most of them have initial capital letters and can easily be recognized by that way, so for unknown word they use of dictionary and other origins, but for machine, it is so hard. One might think the named entities can be classified easily using dictionaries and some fixed grammatically rules, because most of named entities are proper nouns, but this is a wrong opinion. As time passes, new proper nouns are created continuously also where there are more than 500 different languages in the world with various grammar rules that they changed and grow with time. Therefore, it is impossible to add all those proper nouns to a dictionary thus such systems not sufficient and language independent to recover all name Entities. Even though named entities are registered in the dictionary, it is not easy to decide their senses. Most problems in NER are that they have semantic (sense) ambiguity, on the other hand, a proper noun has Different senses according to the context.

NER involves two main processing tasks, the identification of proper names and second the classification of these names into a set of predefined categories of interest, such as person names, organizations (companies, government organizations, committees, etc), locations (cities, countries, rivers, etc), date and time expressions. Since it seems these are two independent tasks and can be divided and respond alone, but we will show there are strong relations between identification and classification of each name, where classify of names without consideration to recognition step it makes system empty of life.

1 comment:

chandu4ever said...

Respected Sir,

I am Chandra sekhar, studying M.Tech.I am doing project on "A survey of Web Information Extraction Systems".So, please send me the related work to my project (code,documentation,PPTs or any other details).Please help me sir.My Mail.ID is " chandragreen@gmail.com ".