LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 11 : 5 May 2011
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         Jennifer Marie Bayer, Ph.D.
         S. M. Ravichandran, Ph.D.
         G. Baskaran, Ph.D.
         L. Ramamoorthy, Ph.D.


HOME PAGE



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • Contributors from South Asia may e-mail their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    mallikarjun@ciil.stpmy.soft.net.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2010
M. S. Thirumalai


Custom Search

Identification of Different Feature Sets for NER Tagging
Using CRFs and Its Impact

Vijay Sundar Ram R., Pattabhi R.K. Rao and Sobha Lalitha Devi


Abstract

This paper presents a study of the impact of different types of language modeling by selecting different feature matrices in the Conditional Random Fields (CRFs) learning algorithm for Named Entity tagging. We have come up with four different feature matrices and identified features at word, phrase and sentence level. It is identified that the language model which has the structural feature is better than the models with other features.

I. INTRODUCTION

In this paper, we present a study on how the performance of the Named Entity Recognition (NER) using Conditional Random Fields (CRFs) varies according to different features and feature matrices. Named Entity tagging is a labeling task. Given a text document, named entities such as Person names, Organization names, Location names, Product names are identified and tagged. Identification of named entities is important in several higher language technology systems such as information extraction, machine translation systems. Named Entity Recognition was one of the tasks defined in MUC 6. Several techniques have been used for Named Entity tagging. A survey on Named Entity Recognition was done by David Nadaeu[6]. The techniques used include rule based technique by Krupka [9], using maximum entropy by Borthwick [4], using Hidden Markov Model by Bikel [3] and hybrid approaches such as rule based tagging for certain entities such as date, time, percentage and maximum entropy based approach for entities like location and organization [16]. There was also a bootstrapping approach using concept based seeds [14] and using maximum entropy markov model [7].

Alegria et al, [1], have developed NER for Basque, where NER was handled as classification task. In their study, they have used several classification techniques based on linguistic information and machine learning algorithms. They observe that different feature sets having linguistic information give better performance.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


Vijay Sundar Ram R., Pattabhi R.K. Rao and Sobha Lalitha Devi
AU-KBC Research Centre
MIT Campus of Anna University
Chennai
Tamilnadu
India

Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.