LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 12 : 11 November 2012
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         Jennifer Marie Bayer, Ph.D.
         S. M. Ravichandran, Ph.D.
         G. Baskaran, Ph.D.
         L. Ramamoorthy, Ph.D.
Assistant Managing Editor: Swarna Thirumalai, M.A.

HOME PAGE

Click Here for Back Issues of Language in India - From 2001




BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2012
M. S. Thirumalai


Custom Search

Query Optimization:
A Solution for Low Recall Problem in Hindi Language Information Retrieval

Kumar Sourabh
Vibhakar Mansotra


Abstract

While information retrieval (IR) has been an active field of research for decades, for much of its history it has had a very strong bias towards English as the language of choice for research and evaluation purposes. Whatever they may have been, over the years, many of the motivations for an almost exclusive focus on English as the language of choice in IR have lost their validity. The Internet is no longer monolingual, as the non- English content is growing rapidly. Hindi is the third most widely spoken language in the world (after English and Mandarin): an estimated 500-600 million people speak this language. Information Retrieval in Hindi language is getting popularity and IR systems face low recall if existing systems are used as-is. Certain characteristics of Indian languages cause the existing algorithms to become unable to match relevant keywords in the documents for retrieval. Some of the major characteristics that affect Indian language IR are due to language morphology, compound word formations, word spelling variations, ambiguity, word synonym, foreign language influence, and lack of standards for spelling words. Taking into consideration the aforesaid issues we introduce Hindi Query Optimization technique (design and development) which solved the problem of recall up to a great extent.

Keywords: Information retrieval, Hindi, Monolingual, Query optimization, Interface, Hindi WordNet.

1. Introduction

The World Wide Web, or simply the web may be seen as a huge collection of documents freely produced and published by a very large number of people, without any solid editorial control. This is probably the most democratic – and anarchic –widespread means for anyone to express feelings, comments, convictions and ideas, independently of ethnics, sex, religion or any other characteristic of human societies. The web constitutes a comprehensive, dynamic, up-to-date repository of information regarding most of the areas of human knowledge; and, it supports an increasingly important part of commercial, artistic, scientific and personal transactions, which gives rise to a very strong interest from individuals, as well as from institutions, at a universal scale. However, the web also exhibits some characteristics that are adverse to the process of collecting information from it in order to satisfy specific needs; some of the characteristics are: the large volume of data it contains, its dynamic nature, constituted by unstructured or semi-structured data, content and format heterogeneity and irregular data quality. End-users also introduce some additional difficulties in the retrieval process. Information needs are often imprecisely defined, generating a semantic gap between user needs and their specifications. The satisfaction of a specific information need on the web is supported by search engines and other tools, aimed at helping users to gather information from the web.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


Kumar Sourabh
Department of Computer Science and IT
University of Jammu
J&K 180001 INDIA
Kumar9211.sourabh@gmail.com

Vibhakar Mansotra
Department of Computer Science and IT
University of Jammu
J&K 180001 INDIA
Vibhakar20@yahoo.co.in

Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian/South Asian scholarship.