LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 6 : 8 August 2006
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.

HOME PAGE


AN APPEAL FOR SUPPORT

PAYPAL

  • We seek your support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. You can use the PAYPAL link given above. Please click on the PAYPAL logo, and it will take you to the PAYPAL website. Please use the e-mail address thirumalai@mn.rr.com to make your contributions using PAYPAL.
    Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

In Association with Amazon.com



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports (preferably in Microsoft Word) to thirumalai@mn.rr.com.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net
  • Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2004
M. S. Thirumalai


 
Web www.languageinindia.com

TELUGU PARTS OF SPEECH TAGGING IN WSD
T. Sree Ganesh, M.A., M.Phil.


CORPUS ANALYSIS FOCUSES ON LINGUISTIC DESCRIPTION OF PERFORMANCE

In modern Linguistics trend 'Corpus Analysis' is a remarkable stream. 'Corpus Analysis' is useful in any area of Linguistics like Phonology, Morphology, Syntax, Socio-linguistics, Machine Translation and Computational Linguistics and so on. It focuses on the description of quantitative patterns of Linguistic elements.

Mainly it focuses on the Description of Linguistic Performance in a particular language. Today linguistics believes that what people actually use is the real language. It reflects on the ideological as well as technological meanderers in Linguistics. Ideological changes makes from the path of intuition based rationalistic assumption. The technological change i.e. computers posses and delivers massive storage facilities and impressive processing. In this paper we deal the corpus analysis for parts of speech tagging rules in technological way. For that we observe the definitions to 'corpus-corpora'.

CORPUS-CORPORA

The word corpus is derived from the Latin word, which means 'body'. Corpus is a limited sized body of machine readable texts sampled in order to be maximally representative of the language variety under consideration. It has the quality of 'representativeness.' There are many definitions to corpus given by Linguists.

TYPES OF CORPORA

In Corpora the types include texts as well as the combinations. It is very difficult to design an organized scheme of corpus classification based on the content. With the features, which are discussed below Corpora are classified into the following way by usage.

USES OF CORPORA

In modern linguistics 'corpus' study is very essential part to do any research activity. Corpus studies show the total structure of a particular language before our eye. Corpus studies are very useful in speech research, lexical studies, psycholinguistics and NLP and so on.

TAGS

With help of Tags, we develop the "Telugu Tag Set". In Telugu there are mainly five POSes. They are

  1. Noun
  2. Pronoun
  3. Verb
  4. Adjective
  5. Avyea (ayaya).

We make a sub-classification of these main POS further depending upon the context.

This paper illustrates the procedure involved in POS Tagging for Telugu, which can also be applied to the analysis of other Indian languages.

PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN A PRINTER-FRIENDLY VERSION.

T. Sree Ganesh

Communication Across Castes | The Hells Envisioned in the Divine Comedy and Bhagavtam | Telugu Parts of Speech Tagging in WSD | Practicing Literary Translation: A Symposium Round 10 | The Effectiveness of Genre-based Approach to Develop Writing Skills of Adult Learners and Its Significance for Designing a Syllabus | Structural Predictability of Malayalam Riddles | Parsing in Tamil - Present State of Art | HOME PAGE OF AUGUST 2006 ISSUE | HOME PAGE | CONTACT EDITOR


T. Sree Ganesh, M.A., M.Phil.
Department of Telugu
University of Hyderabad
Hyderabad, A.P., India
mrthottempudi@yahoo.com
 
Web www.languageinindia.com
  • Send your articles
    as an attachment
    to your e-mail to
    thirumalai@mn.rr.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.