LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 9 : 10 October 2009
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.

HOME PAGE


AN APPEAL FOR SUPPORT

  • We seek your support to meet the expenses relating to the formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc.Please write to the Editor in his e-mail address msthirumalai2@gmail.com to find out how you can support this journal.
  • Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

In Association with Amazon.com



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to msthirumalai2@gmail.com.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and booklength reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2009
M. S. Thirumalai


 
Web www.languageinindia.com

Will Sentences Have Divergence Upon
Translation? : A Corpus-Evidence Based
Solution for Example Based Approach

Deepa Gupta, B.A (Hons), M.Sc. (Maths), Ph.D.


Abstract

This paper presents a corpus-evidence based scheme for deciding whether the translation of an English sentence into Hindi will involve divergence. Divergence is the phenomenon when sentences of similar structure in the source language do not translate into structurally similar sentences in the target language. Divergence assumes special significance in the domain of Example Based Machine Translation (EBMT) where translation of a given sentence is generated by first retrieving translation example(s) of similar sentence(s) from the system's example base, and then by adapting them suitably to meet the requirements of the present input sentence. Surely, occurrence of divergence poses a great hindrance in efficient adaptation of retrieved sentences.

A possible remedy may lie in dividing the example base of an EBMT system into two parts: examples of normal translation, in one, and examples involving divergence in the other, so that given an input, the retrieval can be made from the appropriate part of the example base. But success of this scheme depends heavily on the system's ability to judge a priori whether translation of a given input will involve divergence. The task, however, is not straightforward as occurrence of divergence does not follow any rules that make their prior identification simple. The technique proposed here is aimed at achieving this goal. The scheme is explained and illustrated in the context of English to Hindi EBMT.

1 Introduction

Dealing with divergence is one major difficulty of any translation system. Typically, in a translation the structure of the translated sentence is guided by the syntactic and semantic properties of the target language. If upon translation the Parts of Speech (POS) and Functional Tags (FT) of the constituent words of the source language sentence do not undergo any changes then we term it as a normal translation. However, there are occasions when the structure of the translated sentence deviates from this normal structure. Such exceptions are called translation divergences [4]. Consider, for example, the English sentences \It is running" and \It is raining". Although these two sentences are structurally very similar, their Hindi translations are structurally very different. The first sentence is translated as \wah (it) bhaag (run) rahaa (..ing) hai (is)", which is a normal translation. But the second one is translated as "\baarish (rain) ho (be) rahii (..ing) hai (is)". The second example is a clear case of divergence, where the subject of the Hindi sentence is realized from the verb of the English sentence.

Translation divergence has heavy bearings on Example Based Machine Translation (EBMT). In an EBMT system the translation for a given input sentence is generated by retrieving the translation of a similar sentence from the system example base, and then modifying (adapting) them to suit the requirements of the current input sentence [8] [1]. Selection of the right past example is, therefore, extremely important for successful EBMT. The need arises primarily in the following two scenarios:

  • The past example that is retrieved for carrying out the task of adaptation has a normal translation, but translation of the input sentence should involve divergence.
  • The translation of the retrieved example involves divergence, whereas the input sentence should have a normal translation.

In both the situations the retrieved example may not be helpful in generating the translation of the given input, and consequently, developing efficient adaptation scheme becomes extremely difficult.

A possible solution may lie in separating the example base (EB) into two parts: Divergence EB and Normal EB so that given an input sentence retrieval can be made from the appropriate part of the example base. However, this scheme can work successfully only if the EBMT system has the capability to judge from the input sentence itself whether its translation will involve any divergence. But making such a decision is not straightforward since occurrence of divergence does not follow any patterns or rules. In fact, a divergence may be induced by various factors, such as, structure of the input sentence, semantics of its constituent words etc. In this work we propose a corpus-evidence based approach to deal with this difficulty. Under this scheme, upon receiving an input sentence, a system looks into its example base to glean evidences in support/against any possible type of divergence. Based on these evidences the system decides whether the retrieval has to be made from the normal EB, or from the divergence EB.

A critical look at machine translation suggests that EBMT has been studied extensively as a major paradigm for machine translation over the last decade and more [2]. At the same time literature is replete with works on translation divergence, and its identification, resolution etc. However, the works on these two aspects of machine translation have progressed somewhat independently. No significant work has so far been found regarding how divergence can be dealt with efficiently in an EBMT framework. The proposed work aims at bridging this gap. Since divergence is a language-dependent phenomenon, we have concentrated on a specific source and target language pair, English and Hindi, for this work.

Divergence in English to Hindi translation has been studied thoroughly in some of our earlier works ([5], [6], [7]). With respect to English to Hindi translation, seven different types of divergence have been identified. These are structural, categorial, conflational, demotional, pronominal, nominal and pos- sessional. Of the seven types, possessional divergence is somewhat different in nature as unlike the other six, its occurrence depends upon more than one Functional Tag of the sentence. The scheme in its present form cannot handle possessional divergence efficiently. Hence we exclude possessional divergence from the present discussion. The algorithm proposed here, therefore, works with respect to the first six types of divergence. For convenience of presentation we denote them as d1, d2, d3, d4, d5 and d6, respectively.

Barring structural divergence (d1) all of the other five types of divergence (i.e. d2,...,d6) have further been classified into several sub-types depending upon the variations in the role of different functional tags upon translation to Hindi. Appendix-A gives a brief description of all the six divergence types mentioned above, and their sub-types. It further provides the necessary FT-features that the source language (English) sentences should have in order that a particular type/sub-type of divergence may occur. This, however, does not mean that any sentence having those FT-features will necessarily produce a divergence upon translation. As a consequence, mere examination of the FTs of an input sentence cannot ascertain whether its translation will induce any divergence or not. Hence more evidences need to be considered. In this work we describe all these evidences and how they are to be used for making a priori decision regarding whether the input English sentence will involve any divergence upon translation to Hindi.

This paper is organised in the following way. Section 2 explains the diffierent types of corpus-based evidences that are used by the proposed approach. Most of these evidences are formulated by analysing a parallel corpus comprising more than 4000 sentences collected from various sources, such as, children's stories, translation books, advertisement materials and official letters. Sections 3 explain how different evidences are generated and combined to arrive at a final decision regarding an input. Section 4 provides illustrations of the scheme, and experimental results.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


Spelling Variations in Kannada | A Survey of the State of the Art in Punjabi Language Processing | The Representation of Homosexuality - A Content Analysis in a Malaysian Newspaper | Noun Reduplication in Tamil and Kannada | Journey of Self-discovery in Anita Nair's Ladies' Coupé | A Study of Communicability and Intelligibility of Advertisements in Tamil With Special Reference to Tooth Paste and Health Drink | Explicit Grammar Instruction | Teaching English as a Second Language Using Communicative Language Teaching - An Evaluation of Practice in India | Discovering Values in English Language Teaching | The Core Functions of the Hindi Modals - Speech Act Approach | Textbook Analysis of English for Engineers | Cross-Professional Collaboration on E-Learning Courses | Reading Arundhati Roy's Fiction The God of Small Things Through Her Non-Fiction | Teaching English through Indian Writing in English in Rural India | Proverbs in Modern Tamil and Telugu Societies | Using Problem Based Learning Technique in Teaching English Grammar | Problems in Reading Comprehension Skills among Secondary School Students in Yemen | The Literary Value of the Book of Isaiah | Will Sentences Have Divergence Upon Translation? : A Corpus-Evidence Based Solution for Example Based Approach | HOME PAGE of October 2009 Issue | HOME PAGE | CONTACT EDITOR


Deepa Gupta, B.A (Hons), M.Sc(Maths), Ph.D.
Department of Mathematics
Amrita School of Engineering
Amrita Vishwa Vidyapeetham University
Kasavanahalli, Bangalore - 560 035
Karnataka, India
deepag iitd@yahoo.com

 
Web www.languageinindia.com
  • Send your articles
    as an attachment
    to your e-mail to
    msthirumalai2@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.