Indo-Aryan languages

Images

Indo-European languages in contemporary Eurasia

For Students

Indo-Aryan languages summary

Characteristics of the modern Indo-Aryan languages

in Indo-Aryan languages

Also known as: Indic languages

Written by George Cardona

Fact-checked by The Editors of Encyclopaedia Britannica

Last Updated: Apr 26, 2025 • Article History

Also called:: Indic languages

Key People:: Rasmus Rask

Related Topics:: Hindi language; Sanskrit language; Prakrit languages; Bengali language; Dhivehi language

On the Web:: Parliament of Canada - Library of Parliament - The Hon. Pierre Poilievre, P.C., M.P. (Apr. 28, 2025)

See all related content

The trends noted in Middle Indo-Aryan continue in New Indo-Aryan. The Middle Indo-Aryan vowel sequences ai and au were changed to single vowels during the development of New Indo-Aryan, final vowels were shortened and deleted, and ḍ and ḍh sounds between vowels were replaced by the sounds ṛ and ṛh. The noun cases were further reduced, and the introduction of nominal (noun) forms into the verb system became more pronounced.

Literary languages tend to become somewhat removed from the usual standard colloquial. Literary, or High, Hindi, for example, tends to replace some of the Perso-Arabic vocabulary with Sanskritic items, whereas literary Urdu makes great use of Perso-Arabic words. The gap is formalized in Bengali, in which a distinction is made between the highly Sanskritic language Sadhu-Bhaṣa and the colloquial standard called Calit-Bhasa.

Phonology

[Note: The forms of the words given below reflect actual pronunciation, rather than being transliterated versions of the standard orthographies. For New Indo-Aryan the symbols ə, pronounced as the a in English “sofa,” and a are used for the sounds earlier transcribed as a and ā, respectively; e.g., Gujarati karũ “I do” and māro “beat” are now written kərũ and maro. This practice permits certain contrasts to be made among sounds that are significant in the description of dialectal features. In Kashmiri words, a is short, opposed to ā.]

Vowels in sequence contracted in early New Indo-Aryan; e.g., Old Indo-Aryan aśīti became Middle Indo-Aryan asīi, Hindi and Punjabi əssī, and Bengali aši “80.” Further, ai and au sounds changed to e and o, and aũ to ũ, while iu developed into ī. The diphthongs ai and au were retained well into the New Indo-Aryan period and are still pronounced in some areas; e.g., Braj Bhasha kərəũ “I do,” kərəi “he does.” Middle Indo-Aryan -ḍ- and -ḍh developed into the flaps ṛ and ṛh; e.g., Prākrit sāḍiā “woman’s garment,” Kashmiri, Lahnda, Hindi, Gujarati, Bhojpuri, Bengali, Oriya saṛī “sari”; and Prākrit paḍh- “recite, read,” Sindhi pəṛh-əṇu, Lahnda pəṛh-əṇ, Hindi, Punjabi pəṛh-na, Gujarati pəṛh-vũ, Marathi pəṛh-ṇə “study.”

Stress is not generally contrastive in New Indo-Aryan as it is, for example, in English (e.g., noun “éxport,” verb “expórt”), though different areas have different rules for placing major emphasis on a given syllable. For example, in Hindi, in which vowel length is pertinent, gilá “swallowed” has major stress on the last syllable, gīla “wet,” on the first. In Gujarati, on the other hand, vowel length is not pertinent; the stress position depends on which vowels occur in contiguous syllables and on the structure of the syllables, whether open or closed; e.g., júno “old,” but dukán “store.” In Bengali each syllable of a word receives about equal stress.

More From Britannica

Indo-Iranian languages

The sounds that most clearly distinguish Indo-Aryan from the rest of Indo-European are the voiced aspirate stops (gh and the like, pronounced with an accompanying audible puff of breath) and the retroflexes (ṭ and so on, pronounced by curling the tongue upward toward the hard palate). In the outlying New Indo-Aryan areas, however, the sound system is reduced. Sinhalese has no aspirated stops, Assamese has no retroflexes, and Kashmiri has no voiced aspirates. The geographic position of these languages doubtless contributed to these losses: Sinhalese coexists with Tamil, Assamese is surrounded by Tibeto-Burman languages, and Kashmiri is on the border of the Iranian area.

New Indo-Aryan shows evidence of early dialect distribution; this is discernible by considering sound changes proper to each group. The eastern group (Assamese, Bengali, Oriya) has three important changes. Long and short i and u merged; e.g., Assamese nila, Oriya niḷɔ (ɔ is similar to the o of “coffee” in some English dialects), Bengali nil “blue-black” but Sanskrit nīla; Assamese dhuli, Bengali dhulo, Oriya dhuḷi “dust” but Hindi dhūl and Sanskrit dhūli. The vowel sound a of Middle Indo-Aryan was replaced by ɔ in Bengali and Oriya and ɒ (similar to the o of “hot” in southern British English) in Assamese in initial position and open syllables; e.g., Bengali mɔron, Oriya mɔrɔn, Assamese mɒrɒn “death”; Sindhi, mərəno “mortal, death,” Sinhalese mərəṇə, Gujarati, Marathi mərəṇ (compare Sanskrit maraṇa-). Moreover, in this group a vowel is affected by the quality of the vowel in a following syllable. For example, in Bengali ami kori “I do,” the verb root has o followed by i in the next syllable, but tumi kɔro “you do” has an ɔ sound; similarly, ami kini “I buy” but tumi keno. As a result of vowel assimilation also, Assamese has an ɔ sound instead of ɒ representing Middle Indo-Aryan a: Assamese xɔhur, Bengali šošur “husband’s father” (compare Hindi səsur, Prākrit sasura-, Sanskrit śvaśura-).

Assamese and Bengali are set off from Oriya. In the former two, Middle Indo-Aryan ḍ and ḍh merge medially to ḍ (then ṛ) with a subsequent development to r in Assamese; e.g., Oriya daṛhi, Bengali daṛi, Assamese dari “beard”; Hindi, Gujarati daṛhī, Prākrit dāḍhiā. Assamese is also distinguished from Bengali by several developments, among them the merger of Assamese retroflex sounds with dental sounds; e.g., Assamese ut “camel” but Bengali uṭ, Oriya oṭɔ, Sindhi uṭhu, Lahnda, Pahari uṭṭh, and so on. Assamese also has s for earlier c and ch sounds and a z sound for j and jh; e.g., Assamese kas “glass,” Bengali kac; Assamese azi “today,” Oriya aji, Bengali, Hindi aj. In addition, Assamese replaced an s sound initially by x and between vowels by h—xɔhur.

Particular sound changes also characterize languages of the northwest. In this group, an older voiceless stop (e.g., t) became voiced (e.g., became d) after a nasal sound; in other areas, the voiceless stop is retained: Kashmiri dand, Punjabi dənd, Sindhi ḍəndu “tooth” (the ḍ in Sindhi is an imploded stop; see below) but Assamese, Bengali, Hindi, Gujarati, Marathi dãt, Sinhalese dətə (Sanskrit danta-). Moreover, in the northwest group a voiced stop (e.g., d) preceded by a nasal was assimilated to the latter, resulting in two nasals, which were subsequently reduced to one in some areas; in the rest of New Indo-Aryan, the vowel preceding the nasal was nasalized. Thus, Kashmiri don “churning stick,” Sindhi ḍənu “tribute,” Punjabi dənn “fine,” Lahnda ḍənn “force,” Kumauni dan “roof” contrast with Assamese dãr “pole,” Bengali dãṛ “oar,” Hindi dãḍ “oppression, fine,” and others; all forms derive from Old Indo-Aryan daṇḍa- “stick, staff, club, royal power, fine, punishment.”

In the sequence of a short vowel followed by two consonants, Pahari differs from the rest of the northwest group and agrees with the rest of New Indo-Aryan. In the northwest this sequence either remained unchanged or the cluster was simplified without lengthening of the vowel; other languages generally simplified the cluster and lengthened the vowel: Punjabi bhətt, Sindhi bhətu, Lahnda bhət, Kashmiri batɨ “cooked rice, food” but Nepali, Kumauni, Hindi, Assamese, Bengali, Gujarati, Marathi bhat.

Dardic occupies a special position. The sibilant sounds did not all merge here. For example, Kashmiri, a Dardic tongue, has šurah “16” with š rather than s, as in most other Indo-Aryan languages, and sat “7” with s. Further, voiced aspirated stops merged with unaspirated stops in Dardic; e.g., Kashmiri gur “horse” but Hindi ghoṛa; Kashmiri dɔd “milk” but Hindi dūdh.

One major feature distinguishing Sindhi from the rest of the northwest group is the development of a series of imploded stops (also called suction stops and recursive stops), for b, ḍ, j, and g. Implosive stops also occur in the Sindhi vicinity; for example, Kacchi has imploded b. Another feature that distinguishes Sindhi from other northwest languages, including Kacchi, is the retention of the Middle Indo-Aryan final short vowels; e.g., Sindhi əkhi “eye” but Hindi ãkh (Middle Indo-Aryan akkhi-).

Punjabi is distinguished from other members of the northwest group by its tonal system, having low (ˋ), mid (¯), and high (´) tones. Initial voiced aspirated stops of earlier Indo-Aryan appear in Punjabi as voiceless stops with low tone on the following vowel; e.g., Punjabi kòṛa but Hindi ghoṛa; Punjabi tàī “2 1/2” but Hindi ḍhaī. Non-initially, a voiced aspirate became unaspirated and the preceding vowel received high tone; thus, Punjabi dū́d “milk” but Hindi dūdh, and Punjabi láb “profit” but Hindi labh.

Gujarati, Marathi, and Konkani in the west and southwest differ from the languages of the midlands in that, as in the east, there is no contrast between long and short i and u vowels. The i of Gujarati and Marathi vis “20” is pronounced like the ee of English “teeth,” the i of Gujarati iccha and Marathi iččha “wish” like the i of “pitch,” but such a difference is not contrastive, as it is in Hindi (gīla “wet”: gila “swallowed”). Gujarati has certain features that, in turn, set it apart from the other languages of this group. In addition to e and o sounds, it has the open vowels ɛ, ɔ; e.g., cɔthũ “fourth” (Middle Indo-Aryan cauttha), bɛs-vũ “to sit” (Middle Indo-Aryan baisai “sits”). Moreover, Gujarati has murmured vowels, generally developed from vowels followed by h; e.g., kɛh che “says” (h represents murmuring of the vowel), Old Gujarati kahai chai. Marathi and Konkani have two series of affricate sounds; e.g., č (pronounced as the ch in English “chat”; the equivalent of c in some other languages) and c (pronounced as the ts of “rats”).

There was clearly mutual influence of Indo-Aryan languages at an early time, together with movement of groups of speakers (compare the position of Pahari). Thus, while Punjabi səcc “true” is the expected form comparable to Middle Indo-Aryan sacca- (Old Indo-Aryan satya-), Hindi səc “true” does not represent the expected outcome. The item səc must come from the Punjabi area.

Grammar

Like Middle Indo-Aryan, New Indo-Aryan distinguishes only two numbers—singular and plural. Unlike Middle Indo-Aryan, the New Indo-Aryan languages differ in the degree to which gender distinctions are made. Three genders are retained in the west and southwest (Gujarati, Marathi, Konkani), and this is true also of Sinhalese. Unlike Gujarati, Marathi, and Konkani, in which every noun, whether it denotes an animate being or not, has a particular gender that is unpredictable, Sinhalese restricts masculine and feminine gender to animates and neuter to inanimates. The eastern group (Assamese, Bengali, Oriya) has no grammatical gender distinctions, and two genders are distinguished elsewhere.

Over a large area of New Indo-Aryan the noun has only two cases—direct and oblique. A lack of distinction between direct and oblique cases in the plural is typical of several languages, including forms in Hindi, Gujarati, Marathi, and Bhojpuri. Direct forms are used independently, oblique forms before postpositions (words or word elements following a noun that function similarly to English prepositions) and other affixes; the combination of stem and postposition serves the function of inflected case forms of earlier Indo-Aryan. Thus, to denote an object (direct or indirect) Hindi uses the postposition ko, which occurs in direct object constructions normally only with nouns denoting animate beings; e.g., ləṛke-ko dekh-ta hɛ “He sees the boy,” ləṛke-ko miṭhaī do “Give a sweet to the boy.” Other postpositions are mē “in,” pər “on,” se “from, with, by means of.” A large group of postpositions are linked to the noun with the affix ka (oblique form ke, feminine kī), which also is used to form adjectives (possessives); e.g., ləṛke-ke sath gəya “He went with the boy,” ləṛke-ke pas hɛ “The boy has it” (literally, “It is by the boy”). Many such postpositions represent old nominal (noun) forms. Other New Indo-Aryan languages have systems similar to that of Hindi, though the forms of the postpositions differ.

Though the nominal (noun) system of Punjabi is very close to that of Hindi, it has separate ablative (indicating separation and source) and locative (indicating place) forms in the singular and plural, respectively, for nouns such as koṭha “house”; e.g., koṭhiõ “from the house,” koṭhĩ “in the houses.” Some languages have a fuller case system than that noted above; e.g., Bengali has a genitive singular ending, a genitive plural ending, and a locative case. Similarly, Kashmiri has nominative, dative, ablative, and agentive cases. Not all such case forms are inherited from Middle Indo-Aryan. In addition to case endings, these languages also use postpositions; e.g., Kashmiri garājas-andar “in the garage,” with -andar after the dative ending -as.

Adjectives behave generally in the same way as nouns but have a syntactic restriction. In Hindi the possessive is in the oblique (non-nominative) form, as is the noun after which it occurs; but in the plural, only the noun has the oblique form. Further, the formation of comparatives and superlatives with derivative affixes has been eliminated. To a Sanskrit sentence such as ime amū-bhyaḥ āḍhya-tarāḥ “These (people) are richer than those,” in which the comparative āḍhya-tara occurs construed with the ablative form, corresponds a Hindi sentence ye un-se əmīr hɛ̃, in which no comparative affix is used—literally, “These are rich from (i.e., in comparison with) those.” Comparable constructions with a postposition meaning “from” occur elsewhere in New Indo-Aryan.

The pronominal system of New Indo-Aryan formally resembles the Middle Indo-Aryan stage more than its noun system. For example, Gujarati hũ “I,” mɛ̃ “I” (agentive), əme “we” (also agentive) are directly comparable to Apabhraṃśa haũ, maĩ, amhaĩ. The number distinctions of the Middle Indo-Aryan pronoun have been replaced, however, by distinctions of familiarity and politeness. For example, Hindi and Bengali have a three-way distinction—Hindi ap, Bengali apni “you” are polite or honorific forms; Hindi tum, Bengali tumi are informal forms; and Hindi tū, Bengali tui are used only for inferiors and small children. (Hindi and Bengali differ, however, in the plural forms of these.) In Gujarati, on the other hand, tū~ is a very familiar pronoun, whereas təme is used generally, covering the approximate domains of Hindi ap and tum; ap, if used, strikes the hearer as fawning. Marathi has a similar system. Southwestern languages also make a distinction in the 1st person plural between inclusive and exclusive, the exclusive excluding the person spoken to. In the form of the relative pronoun and the 3rd person pronoun, languages differ in the degree to which gender distinctions are made, thus contrasting with Old and Middle Indo-Aryan, in which these forms had three genders. For example, Marathi has masculine, feminine, and neuter for the relative pronoun, while Bengali has animate and inanimate.

New Indo-Aryan languages differ in the degree to which finite verb forms have been replaced by nominal (noun) forms. In Bengali a contrast is made between continuous or actual present (English “be . . . -ing”) and non-continuous or habitual present; e.g., ami kaj kor-i “I work” (literally, “I do work”), with the ending -i, contrasts with ami kaj kor-ch-i “I am working,” in which ch intervenes between the root and the ending. Hindi has a similar contrast but uses nominal forms; e.g., mɛ̃ kam kar-ta hũ “I work,” mɛ̃ kam kər rəh-a hũ “I am working.” Both contain the finite form hũ of the auxiliary; but kər-ta and rəh-a are nominal forms, the latter the past of rəh-“stay.” Gujarati has both types, the present tense using finite verb forms, the imperfect employing nominal forms; e.g., hũ kam kərũ chũ “I work, am working” and hũ kam kər-to hə-to “I was working, used to work.” Even in areas in which finite forms are not used in the present, they occur in the imperative forms and what may be called the subjunctive; e.g., Hindi tum kam kər-o “work,” mɛ̃ əndər aũ “May I come in?”

The person–number system of the New Indo-Aryan verb accords with the use of pronouns. For example, the forms ja-o, kər-o in Gujarati təme kyã jao cho “Where are you going?” and šũ kəro cho “What are you doing?” are historically plurals but are used with reference to one person addressed by the pronoun təme. Similarly, in Hindi, in which a person distinction is not made in the plural, ap kəhã ja rəhe hɛ̃, ap kya kər rəhe hɛ̃, equivalent in meaning to the Gujarati sentences, have the plural form rəhe hɛ̃. Bengali has completely given up any number distinction in verb forms: ami/amra kori “I/we do.” In the 3rd person a distinction is made between ordinary and honorific: še (ordinary)/tini kɔren, plural tara/tãra kəren. Other languages (e.g., Hindi) also have honorific forms, for which the plural is used.

In the formation of the future there are again regional differences. Some retain the future in -s- (Gujarati hũ kər-iš, 3rd person e kər-š-e) or -h- (e.g., eastern dialects of Braj Bhasha, cəlihəõ “I will go”). Characteristic of the Eastern languages and of Bihari (including Bhojpuri, Magahi, Maithili) is the suffix -b-; e.g., Bengali jabe “will go.” All of these are finite forms. On the other hand, in Hindi and adjoining areas, the future is inflected for gender.

A similar contrast between the use of verbal and nominally inflected forms also appears in the past tense forms. The predominant pattern in New Indo-Aryan is that of Middle Indo-Aryan: forms are used that are etymologically participles.

The New Indo-Aryan languages retain the passive and causative forms. The causative is conservative in retaining both the affixes that appear in Middle Indo-Aryan and vowel alternation. The passive is also formed by affixation in some areas. But many languages also have a compound formation involving the verb ja “go” and an auxiliary (hɛ̃); e.g., Hindi yahã hindī bol-ī ja-t-ī hɛ̃ “Hindi is spoken here.”

There are other auxiliaries, which, like hɛ̃, can occur with any verb in the language; e.g., the verb “can,” Hindi sək-, Gujarati šək. A characteristic feature of New Indo-Aryan, however, is the use of certain verbs, variously called vector verbs or compound verbs, in restricted contexts and with particular semantics. For example, one can say mər gə-ya “He died,” bhūl gə-ya “He forgot,” bol uṭh-a “He blurted out” in Hindi, using the verbs ja “go” (masculine singular past gə-ya), uṭh “stand up.” This phenomenon is pan-Indo-Aryan and still requires investigation.

The examples cited above also illustrate the normal word order in New Indo-Aryan languages: subject (including agential forms), object (with attributive adjectives preceding), verb (together with auxiliaries). Adverbials can precede the full sentence or occur after the subject, with slight differences in emphasis; e.g., Hindi mɛ̃ kəl aũga, or kəl mɛ̃ aũga “I will come tomorrow (kəl).” Relative clauses normally precede correlatives: Hindi jo admī kəl tumhare ghər-mẽ tha vo kɔn hɛ “Who (kɔn) is the man (admī) who (jo) was in your house yesterday?” A notable exception to the normal final position for verbs occurs in Kashmiri, in which the verb usually occurs in second position after the subject; thus, to Hindi vo kha rəha hɛ “he is eating” corresponds Kashmiri su chu kh́avān with the auxiliary chu after the subject.

Vocabulary

The two most important sources of non-Indo-Aryan vocabulary in New Indo-Aryan are Persian (including Arabic items introduced through Persian), the court language of the Mughals, and English. The Perso-Arabic vocabulary permeates every aspect of New Indo-Aryan vocabulary, especially in the midlands (Uttar Pradesh through the Punjab). There are, of course, Hindi-Urdu words proper to Islām: Hindi kuran “Qurʾān,” ʿīd (name of a holy day), nəmaz (certain prayers), məsjid “mosque,” as well as the word for “religion,” məźhəb. In addition, there are numerous Perso-Arabic military and administrative terms (kila “fort,” səvar “horseman,” ədalət “court of justice”); architectural and geographic terms (imarət “building,” məkan “house,” məhəl “palace,” duniya “world,” ilaka “province”); words having to do with learning and writing (kələm “pen,” kitab “book,” ədəb “literature, good manners”) and with apparel (jeb “pocket,” moja “socks,” rumal “handkerchief”) and anatomy (khūn “blood,” gərdən “neck,” dil “heart,” bazu “arm,” sər “head”). Indeed some of the most common vocabulary is of this origin: tārīkh “date,” vəkt “time,” sal “year,” həfta “week,” umər “age,” admī “man,” ɔrət “woman,” and others. Even the grammatical apparatus of postpositions and conjunctions reflects Perso-Arabic influence; e.g., -ke bad “after,” əgər “if,” məgər “but,” ya “or.”

The colloquial language used by any Hindu or Muslim communicating in Hindi-Urdu will contain a large number of such words. There have been efforts to polarize the two, and at times champions of Indo-Aryan have tried to replace Perso-Arabic vocabulary with Sanskritic words. The style that tends toward eliminating all but the most common Perso-Arabic words may be called High Hindi, written in the Devanāgarī script, as opposed to High Urdu, which retains Perso-Arabic of long standing, uses Persian and Arabic for learned vocabulary and is written in the Perso-Arabic script.

The influence of English as a source of borrowing still continues, and it is rare to hear a conversation on any technical subject among speakers of any Indian language in which English words are not liberally used. Among loanwords from English are names of conveyances such as Hindi rel-gaṛi “railroad-train” and ṭɛksī “taxi”; profession names such as injinīr “engineer,” jəj “judge,” ḍaktər “Western doctor,” pulis “police”; and terms of educational administration such as kaləj “college” and yunivərsiṭī “university.” English words are susceptible to replacement in India by Sanskritic ones as are those of Perso-Arabic origin.

Of much lesser magnitude are New Indo-Aryan borrowings from other languages, among them Portuguese and Turkic. From the latter, the word urdū came to be used as the name of a language. From Portuguese come such Hindi words as ənənnas “pineapple,” paũ “(Western style) bread,” kəmīz “(Western) shirt,” kəmra “room,” and girja “(Christian) church.”

Writing systems

Ancient India had two main scripts in which Indo-Aryan languages were written. Kharoṣṭi, used in the northwest, is of Aramaic origin and is written from right to left; Brāhmī, of North Semitic origin, is written from left to right and appears earliest on Aśokan inscriptions in areas other than the northwest. Most scripts of New Indo-Aryan are developments of the Brāhmī. The Devanāgarī (or simply Nāgarī), used for writing Sanskrit documents in North India, is the script of Hindī and Marāṭhī as well as Nepālī. Gujarātī uses a more cursive derivative. Devanāgarī also is used, mainly among Hindus, for Kashmirī, which has, in addition, a traditional script called Sarada, which is not now in common use. The Perso-Arabic script is used instead. Also usually written in Perso-Arabic are Urdū and Sindhī (for which the Devanāgarī also is used in schools in India), whereas Punjabi employs it in Pakistan as well as a particular script of its own, known as Gurmukhi (“From the Teacher’s Mouth”) in the sacred writings of the Sikhs. In the east, the scripts used for Bengali and Assamese are closely related; and that of Oṛiyā, related to the other two, is highly cursive like that of neighbouring Dravidian languages. Such is also the case with Sinhalese.

The traditional alphabets are both over-explicit and not clear enough with regard to accurate representation of the spoken word. As systems in which a consonant symbol with no other accessory symbol accompanying it stands for the syllable consisting of the consonant followed by short a, they require previous knowledge of items for correct interpretation; Hindī kərta is written ka-ra-tā in the Devanāgarī, and, to pronounce it properly, one must know that the word has only two syllables. Although Bengali has only the spirant sound š, the alphabet has symbols for ś, ṣ, and s, as in Old Indo-Aryan; but verb forms such as kori and kəren are written ka-ri and ka-re-na, both with the same initial symbol. And, though syllabic ṛ was lost as early as Middle Indo-Aryan, the scripts have a separate symbol for this. Script reform has been suggested; it has even been proposed that all Indo-Aryan languages adopt a Latin (roman) alphabet with diacritics, but chances for this are poor. (See also alphabet.)

George Cardona