Definiteness and Information-status in Hindi

Jason Baldridge
University of Pennsylvania
Department of Linguistics
December 1996


Hindi is one of many languages which lack a definite article. And, though Hindi does have an indefinite article, its use is much more restricted than its English counterpart. Not unlike many other languages, Hindi's indefinite article, ek, is the same as the word for one. Generally, ek as an article may only be used where its meaning is equivalent to 'a certain'. Demonstratives may be used in cases where the speaker wishes to stress the identification of a referent; however, most nouns lack determiners, and their information-status is generally decidable only by the context.

Cross-linguistic evidence suggests that Prince's (1981) taxonomy of Assumed Familiarity is a more useful way of conceptualizing the information-status of discourse entities than article-driven, definite vs. indefinite approaches. English articles do not necessarily specify a particular type of Assumed Familiarity in and of themselves; that is, the linguistic form does not dictate the information-status (Prince 1992). However, ek's function is to introduce a Discourse-new Unanchored entity. Even though nouns in Hindi are generally left underspecified in regard to grammatical definiteness, the existence of an article such as ek indicates that other languages may have articles which specifically indicate other types of Assumed Familiarity (such as Evoked). Such articles serve to remove or reduce non-determinism in the hearer's process of ascertaining the information-status of a discourse entity. These observations offer insight into how Assumed Familiarity might be modeled and also how articles would fit into such a model.

The number of determiners and the forms they may take in any particular language is subject to wide linguistic variation. Both English and Spanish have indefinite articles and definite articles; however, where English has only one form, the, for the definite article, Spanish has four-el, los, la, and las-each corresponding to a particular gender and number configuration. In contrast to both of the former languages, Hindi, Korean, Russian, and many other languages simply lack either a definite article, an indefinite article, or both. Even so, such languages still manage to convey information and handle reference as well as languages which have a richer set of articles. This leads to a question-given that they are not essential, what motivates the existence of articles?

One might argue that redundant or unnecessary constructions abound in language and so we should not be surprised to find that articles exist within that class of constructions. Indeed, they have been referred to in the past as "useless ballast" and "old rubbish" in the past (see Chesterman 1991, p.4). However, before making such claims, we must consider the concept of information-status. The information-status of a given noun is the speaker's specification to his/her audience as to where the entity which that noun refers to originated-directly from the speaker's world model; from the current situation; from what is thought to be mutually known; from the discourse itself; or from what may be inferred about something in the discourse. Prince (1981) classifies information-status according to her taxonomy of Assumed Familiarity: Brand-new (anchored and unanchored), Unused, Inferrable (containing and non-containing), and Evoked (textually and situationally). This proves to be a much more useful way of thinking about discourse entities than in terms of definiteness and indefiniteness, which can often lead to confusion with grammatical definiteness (Prince 1992).

The usefulness of articles becomes more apparent when we look at how they affect the information-status of the discourse entities they introduce. Determiners, including the definite and indefinite articles, have the specific ability to signal a set of possible information-statuses for the nouns they modify. In this respect, they serve to remove or reduce non-determinism in the hearer's process of tracking the information-status of a given discourse entity. In a language devoid of articles, information-status would necessarily be conveyed through other determiners and through the flow of the discourse. Looking at different languages to see how they utilize determiners should prove useful in understanding the linguistic variation we find with articles.

Hindi provides an excellent contrast with English. The greatest difference is that it lacks a definite article. Where English uses the definite article, Hindi generally uses the zero article (i.e., no article), and in fewer cases, the demonstratives yeh 'this', voh 'that', and ve 'those'. And, while Hindi does have an indefinite article, ek, its distribution and applicability differs from that of a/an. Ek is also the Hindi word for one, which is not an uncommon relationship to find in languages. Interestingly, while Old English had a fully inflected definite article (Baugh and Cable 1993), it seems that a:n, the linguistic progenitor of both a/an and one, had only limited use as an indefinite article. A quick survey of some Old English texts seems to indicate that a:n as the article had very similar usage patterns to ek in Hindi.

When native speakers of languages which lack one or more articles speak English, they often find proper usage of English articles quite difficult, even when their fluency is quite high. The converse is true as well-Kellog (1972) notes that, "It should be observed, that most Europeans use ek for the indefinite article much too freely. In the majority of cases, it should not be translated into Hindi." People are accustomed to conveying information-status in particular ways, and these techniques seem to carry over into second languages. Indian English, which is spoken as a second language by the majority of its speakers, provides an interesting case in which many of the techniques for conveying information-status are drawn from Indian languages. The following excerpt is from a discussion about Indian English with three female Gujarati students (Baldridge 1995). The speaker is discussing a relative and one of his first experiences in Chicago.

  1. He was traveling by the train, and he was sitting on the seat
  2. somewhere over there. And one black lady came up to him and ask him,
  3. "What's up?" And in Gujarati what's up means like upar shu'n chhe ('what
  4. is above you'). So he made it into Gujarati, so he said, "The sky." So
  5. that lady gave him ten cents and went away.

Note that Gujarati is closely related to Hindi-in particular, the indefinite article is also ek in Gujarati. This passage, though in English, actually provides a sort of window into some of the ways that information-status is framed in Hindi. In (a), both uses of the definite article seem to be examples of overcompensation-in Hindi, both 'train' and 'seat' would have in most cases received the zero article, which normally corresponds to a definite reference or a non-specific indefinite reference. Line (b) shows a usage of one which exactly parallels that of ek in Hindi. Finally, that in (e) is equivalent to the use of voh 'that' as a determiner which indicates an Evoked entity.

To make progress toward answering the question first posed in this paper, I sorted 281 Hindi noun phrases according to the determiner which introduces them and then tallied the information-statuses found in each determiner class. I also looked for specificity and grammatical position . The sources of my Hindi data are two small Hindi instructions booklets which contain many short stories in Hindi. The determiners I looked at were the articles ek and zero, the demonstratives yeh/is 'this' and voh/us 'that', and the adjectives koi/kisii 'any, some' and kai 'many'. (I am using "/" to separate two inflectional forms of the same word.) None of the determiners had a specific preference for subject, object, or postpositional object position. However, they have a very telling distribution in how they specify information-status:

yeh (is)
voh (us)
koi (kisii)
'any, some'
Brand-new (unanchored)
Brand-new (anchored)
Inferrable (non-containing)
Inferrable (containing)
Situationally Evoked

As can be seen from this chart, ek specifies only Brand-new Unanchored entities, whereas the zero article can specify any type of Assumed Familiarity. Certainly, here we begin to see the answer to the aforementioned question-articles turn what is essentially a non-deterministic process of retrieving information-statuses into a deterministic or less-non-deterministic process. To ascertain the information-status of a determiner-less noun, one must utilize the context and certain grammatical features of the noun (is it mass or generic? singular or plural? specific or non-specific?). However, given an article such as ek, no such judgments are necessary. It would be interesting to see how other languages handle information-status to see if there are any languages with determiners which single-out some other category (e.g., Evoked). The other determiners-yeh/is 'this', voh/us 'that', koi/kisii 'any, some', and kai 'many'-also appear to limit non-determinism in ways quite similar to their English counterparts (excluding indefinite this).

Certainly, it now can be seen that articles (and other determiners) fulfill a useful, albeit not absolutely necessary function. This situation begs us to look at the determination of information-status as a sort of computational search. With a total lack of determiners, we might find ourselves entirely dependent on the context-which might cause considerable back-tracking in our search. However, with the presence of various determiners, we can more directly infer a particular information-status or reduced set of information-statuses for a given discourse entity.

This viewpoint calls into question approaches such as those taken by Hawkins (1978) and Chesterman (1991), which attempt to characterize conceptual definiteness with respect to grammatical definiteness. Hawkins simply stops too soon. He analyzes the articles and lists the pragmatic contexts in which they may be felicitously used, but he fails to move this into a more general framework for conceptual definiteness. Chesterman correctly observes that we cannot regard definiteness as a binary concept and that we need a more scalar notion of definiteness. However, his proposal again comes down to a characterization of articles, which though innovative and interesting, is simply not fully adequate. The main point to be taken here is that while articles may be useful for characterizing and refining our notion of information-status, the end result should not be framed in terms of articles.

What is needed is a precise model for information-status which is independent of language-specific phenomena. The taxonomy of Assumed Familiarity is an excellent move toward defining the categories which would be important to this model-its cross-linguistic applicability is a testament to this. As I envision it, the model would define transitions for the possible ways in which discourse entities may originate and be used in a discourse. Then, for any particular language, the allowable transitions which enter or re-enter an entity into the discourse could be specified, a la finite state automata, according to which determiners signal that transition (e.g., ek allowing an entity to move from the speaker's world representation into the discourse). There are, of course, issues such as accommodation for infelicitous use of articles which would need to be accounted for in the model, but the process of defining and refining a model of information-status should further our understanding considerably.


