Named entity

In information extraction, a named entity is a real-world object, such as persons, locations, organizations, products, etc., that can be denoted with a proper name. It can be abstract or have a physical existence. Examples of named entities include Barack Obama, New York City, Volkswagen Golf, or anything that can be named. Named entities can simply be viewed as entity instances (e.g., New York City is an instance of a city).

From a historical perspective, the term Named Entity was coined during the MUC-6 evaluation campaign[1] and contained ENAMEX (entity name expressions e.g. persons, locations and organizations) and NUMEX (numerical expression).

A more formal definition can be derived from the rigid designator by Saul Kripke. In the expression "Named Entity", the word "Named" aims to restrict the possible set of entities to only those for which one or many rigid designators stands for the referent.[2] A designator is rigid when it designates the same thing in every possible world. On the contrary, flaccid designators may designate different things in different possible worlds.

As an example, consider the sentence, "Obama is the president of the United States". Both "Obama" and the "United States" are named entities since they refer to specific objects (Barack Obama and United States). However, "president" is not a named entity since it can be used to refer to many different objects in different worlds (in different presidential periods refers to different persons, or even in different countries or organizations refers to different people). Rigid designators usually include proper names as well as certain natural terms like biological species and substances.

There is also a general agreement in the Named Entity Recognition community to consider as named entities temporal and numerical expressions such as amounts of money and other types of units, which may violate the rigid designator perspective.

The task of recognizing named entities in text is Named Entity Recognition while the task of determining the identity of the named entities mentioned in text is called Named Entity Disambiguation. Both tasks require dedicated algorithms and resources to be addressed.[3]

References

  1. Grishman, Ralph; Sundheim, Beth (1996). Design of the MUC-6 evaluation (PDF). TIPSTER '96 Proceedings.
  2. Nadeau, David; Sekine, Satoshi (2007). A survey of named entity recognition and classification (PDF). Lingvisticae Investigationes.
  3. Nouvel, Damien; Ehrmann, Maud; Rosset, Sophie (2015). Wiley, ed. Named Entities for Computational Linguistics. ISBN 978-1-84821-838-3.
This article is issued from Wikipedia - version of the 11/4/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.