LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 21:9 September 2021
ISSN 1930-2940

Editors:
         Sam Mohanlal, Ph.D.
         B. Mallikarjun, Ph.D.
         A. R. Fatihi, Ph.D.
         G. Baskaran, Ph.D.
         T. Deivasigamani, Ph.D.
         Pammi Pavan Kumar, Ph.D.
         Soibam Rebika Devi, M.Sc., Ph.D.

Managing Editor & Publisher: M. S. Thirumalai, Ph.D.

Celebrate India!
Unity in Diversity!!

HOME PAGE

Click Here for Back Issues of Language in India - From 2001




BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIALS

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2021
M. S. Thirumalai

Publisher: M. S. Thirumalai, Ph.D.
11249 Oregon Circle
Bloomington, MN 55438
USA


Custom Search

Representing Structural Nuances of the Code-mixed/switched Data:
A Case Study of English-Bangla

Chaitali Chakraborty, M.A. in Linguistics


Abstract

This paper is an effort to present the annotated data and the problem in the codemixed or switched data in the case of Bangla-English. The goal of the paper is two-folded: to work out the structure of the lexical information with a special reference to the linguistic phenomena of code-mixed or code-switched data, and to find out the reason for the importance of such structural representation. It has been tried to see how the lexicon works when a systematic account of the code-mixed data is presented

Keywords: code-mixed/switched data, English-Bangla, computational linguistics, Annotation, lexicon, parsing.

Introduction

The code switched or code-mixed data generally is not regarded as the ideal data for the purpose of the regularization of rules, for understanding the core of grammar of a language, and for many theoretical or applicational purposes. Linguists for a long time have ignored such data assuming it is not fit for the description of the languages’ internal mechanism. However, recently, linguists have focused their attention on understanding the nature and grammar of the code switched or code-mixed data. It is not very dated for computational enterprises to see the data as a natural occurrence and urge to decode the data computationally. We have certainly developed an empirical understanding of the code switched or mixed data. It has led to both theoretical and implicational development in recent times; however, what we lack is an easy way forward. The nature of the problem in code mixed/switched data is certainly not easy for the researchers working in the domain of Natural Language Processing (NLP). There are various methods, approaches, and applications which decode the code switched or code-mixed data with accuracy as much as 80% and more, but it is not free from problems and irregularities. It is not only the problem that the same set of the problem is persistent, but the problem is also due to the changing nature of the data on the daily basis. Also, earlier the exposure of the data is limited due to the lack of means of collecting code mixed data. One could only find the instances of these kinds of data in bilingual natural conversation. It is not an easy task to obtain ample data in such a limited circumstance. Recently due to the surge in the use of the social media platform in the whole world, the availability of the complex nature of the data is easy and possible.

The elongated use of social media resulted in the complex nature of the data-including trilingual data.

The problem exists on all levels of linguistics, i.e., phonology, morphology, syntax and semantics. In computational linguistics, such a varied nature of data correlate with problems like identification of language (problem disassociating the phonological patterning), morphology (unable to identify the grammatical morpheme (inflectional) or agreement), POS (not enough data into the system which can check the POS in two/three languages simultaneously), syntax (difficult to choose which syntax is applicable in di/trilingual data), etc.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN PRINTER-FRIENDLY VERSION.


Chaitali Chakraborty
M.A. in Linguistics, Jadavpur University
Kolkata, India
write2chakraborty@gmail.com

Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian/South Asian scholarship.