Thursday, 26 January 2012

Donkey Anaphora and the King(s) of France

An end-of-the-semester gift from one of my semantics students:


A t-shirt for a (as yet fictitious?) band. Started as an in-class joke which arose from the juxtaposition of two topics:

(1) presupposition failure in sentences like "The king of France is bald", and
(2) issues involving the binding of pronouns in sentences like "Every farmer who owns a donkeyi beats iti."

Tuesday, 20 September 2011

Lizards, Walls, Dragons: on an apparently undocumented Nepali lexeme (भित्ति)

I have not posted in some time due to dissertating, searching for (and thankfully finding) a job, and subsequently moving. Here's a short posting on a Nepali word which I heard from my wife which I can't find in any Nepali dictionary.

When we moved into our new house, we discovered that there were a number of house-lizards already resident (and, less amusingly, quite a few German roaches), which our cat has really enjoyed hunting down. I remembered having such lizards in our house in India, and immediately I saw them remarked to my wife "देखो! छिपकली है!" (Look! There's a lizard!"), using the Hindi word for "lizard", छिपकली [chipkalī]. My wife replied, "in Nepali we call them 'bhitti' (भित्ति)."


I'd never heard this word before, and was curious. I checked Turner's A comparative and etymological dictionary of the Nepali language as well as his mammoth four-volume A comparative Dictionary of the Indo-Aryan languages. Neither mentions bhitti or anything like it. I also checked a number of Hindi dictionaries, none of which turned up anything. Except for Platts' A dictionary of Urdu, classical Hindi, and English, which has an entry for भित्तिका bhittikā:
S بهتکا भित्तिका bhittikā, s.f. Wall (=bhīt, q.v.); small house lizard.
This isn't quite bhitti, but it's close. I had already supposed (and my wife had already suggested) that bhitti was connected with the word for "wall" (in Nepali, भित्तो bhitto or भित्ता bhittā), given that they're often found on walls. So bhitti is something like "wall-(related) creature". [Turner does have an entry for bhitti, but he gives the meaning "wall".] 

Platts' entry indicates a Sanskrit origin, and indeed  bhittikā looks awfully Sanskritic, with the "diminutive" -(i)ka suffix, which is not really always diminutive, but rather can also attach to words with no change in meaning. But here perhaps a diminutive based on "wall" makes sense. 

Interesting, the Sanskrit word for "wall, panel, partition", bhittí, comes from a root √bhid- "to split", which is very dear to my heart (part of the Proto-Indo-European dragon mythology). 

So there's a "new" Nepali word:  bhitti "house-lizard", which doesn't seem to have been recorded before. It may be dialectal (i.e. I'm not sure that Kathmandu Nepali speakers would use it), and that's perhaps why it wasn't previously recorded. In any case, I think it's a cool word, given that it does sort of connect lizards and dragons, indirectly.

[Incidentally, Platts suggests that Hindi छिपकली [chipkalī] derives from the root chip- "to hide", which is what I always assumed (going back to an early Indo-Aryan *chapp- "press, cover, hide". Turner, on the other hand, derives it from Sanskrit शेप्या śepyā which means "tail" (and "penis", but I think "tail" is what is relevant here). The (potential) Nepali cognate of Hindi छिपकली [chipkalī] is छेपारो chepāro, though the latter might be more plausibly derived from  Sanskrit शेप्या śepyā "tail", especially as छेपारो chepāro seems to refer to outdoor lizards (while माङ्सुलि māṅsuli is used for house lizards).]

Friday, 8 July 2011

Some ponderings on Google's research on inter-language linking (Bengali <-> Swahili, Nepali <-> Marathi)

On the Google Research Blog, the latest post (by ) concerns inter-language linking, i.e. looking at webpages' off-site links which go to a page in another language. From the post:
Most web pages link to other pages on the same web site, and the few off-site links they have are almost always to other pages in the same language. It's as if each language has its own web which is loosely linked to the webs of other languages. However, there are a small but significant number of off-site links between languages. These give tantalizing hints of the world beyond the virtual.
I'm particularly interested in the data on Indian language webpages' inter-language linking, especially as there are some perplexing findings. But let's start with some findings which aren't really that surprising.

One of the features measured is the degree to which webpages in a particular language are "introverted" or "extroverted", where more "introverted" webpage languages have fewer inter-language off-site links. The data are summarised here:























Webpage languages which are higher (on the y-axis) are more introverted; webpage languages which are further to the right (on the x-axis) represent languages with a greater number of total webpages.

First, a word about the apparently high degree of English-language webpage "extroversion". The relatively high percentage of English-language websites which link to non-English websites is unlikely to represent a high percentage of native English speakers who are linking to non-English websites. Rather, this would seem to simply reflect English's status as a/the world language, so that even sites whose audience may largely consist of non-native English speakers may choose to create English-language websites simply in order to have a larger audience. And I suspect the "extroverted" English-language webpages are of that type: English is the language chosen for this type of website due to its ability to reach a more "universal" audience, but the site itself may have "local" interests, reflected by its linking to non-English language websites.

But it's the Indian languages that I really want to talk about. Given the large number of Hindi speakers, one might at first be surprised at the relatively small number of Hindi language sites (compared to say Japanese). This, I think, is easily explained by the status of English in India, especially amongst people who would be more likely to create and use Internet sites. In another words, many native Hindi speakers would choose to create English- rather Hindi-language webpages. The high degree of insularity ("introversion") of Hindi-language webpages in terms of inter-language linkage is likely not unconnected. In the context of modern India, choosing to create a Hindi- rather than English-language website is already a more "insular" choice, given the widespread use of English in India itself. Those website content creator who choose Hindi medium over English medium are likely to have more "insular" interests, and thus would not be as likely to link to non-Hindi sites (and even less likely to link to non-Indian language sites).

So, thus far, there isn't really anything terribly surprisingly about these findings. But when we look at the particular inter-language link connections which are strongest, especially in the case of Indian languages, there are some weird data:
















[The arrows indicate directionality of linkage; red connections are stronger than green connections.] As point out:
Surprising links include those from Hindi to Ukrainian, Kurdish to Swedish, Swahili to Tagalog and Bengali, and Esperanto to Polish.
I would add that the Swahili-Bengali and Swahili-Tagalog links are not only strong (red), but also bidirectional (e.g. Swahili pages are linking to Bengali pages, and Bengali pages to Swahili pages). It is hard to think of convincing explanations for the connections between Swahili and Bengali (or Swahili and Tagalog). One possibility comes to mind, which is that, in terms of total Internet representation, the number of pages in Bengali, Swahili, and Tagalog is relatively small. Here the Google researchers' webpage selection criteria is presumably relevant:
The particular choice of pages in our corpus here reflects decisions about what is `important'. For example, in a language with few pages every page is considered important, while for languages with more pages some selection method is required, based on pagerank for example.
This means that for languages with a smaller Internet population individuals could have a greater effect on the particular inter-language linkages than is the case for languages with larger Internet populations.  And thus perhaps the existence of a few creators of Bengali webpage content who happen to live in central eastern Africa could be responsible for some these unexpected inter-language linkages. I would be curious to what sort of Bengali sites link to Swahili sites (and vice-versa) to see if this is a plausible idea.

There is something which worries me about these data though: look at the linkages between the Indo-Aryan languages (Punjabi, Gujarati, Marathi, Bengali, Nepali, Hindi). Punjabi, Gujarati, Marathi, Bengali, and Nepali all have strong bidirectional links with Hindi, which is to be expected given Hindi's status as a Indian lingua franca. Notice however that other than being linked with Hindi, none of the other Indo-Aryan languages are inter-linked with each other: except for Nepali and Marathi.

In India,there are large Nepali communities in West Bengal and other eastern parts of India.Marathi is spoken in Maharashtra in the far western part of India. I would be unsurprised if there were strong Marathi-Gujarati inter-language linkages (since these two languages are spoken in the neighbouring states), or if there were a strong inter-language linkage between Nepali and Bengali. But a Nepali-Marathi link doesn't make sense, at least in absence of other intra-Indo-Aryan linkages.

There is one property which I can think of which does link Nepali and Marathi, namely the fact that they both are written in Devanagari script (also used for Hindi). Gujarati, Punjabi, and Bengali, on the other hand, are each written in their own scripts (distinct from Devanagari). So I wonder if there is any possibility that the script is creating "false hits" when the off-site link connections for Nepali and Marathi are being computed. 

That also makes me worry about the other surprising inter-language linkages, such as Bengali-Swahili, Swahili-Tagalog. Not, obviously, that these languages share a common script, but whether some of the apparent connections are artefacts of the algorithm, whether due to use of a common script or some other factor. If they're not simply artefacts, then it certainly would be interesting to find out why, for instance, Bengali-language and Swahili-language webpages are linking to each other.

Sunday, 3 July 2011

What speechitatest you? On engineered language change amongst high schoolers


The latest Saturday Morning Breakfast Cereal, on high school language change:




Note:
The type of language change the students are shown undergoing would require more than a source of new lexical items, I would think.

We find morphological change: Wouldsest for 2nd person singular present of "would".

And syntactic change: What speechitated Harvard? for "What did Harvard say?" (note the necessity of do-periphrasis in modern English).

How could a thesaurus (of fake synonyms) drive these sorts of changes? [Of course, under Minimalism, parametric variation, including differences in word order, is theorised to be a reflex of formal features which are borne by lexical items. So perhaps if the thesaurus had some way of encoding abstract syntactic features in such a way that they would be picked up along with the phonological and semantic aspects of the lexical item....]

Wednesday, 18 May 2011

The Rapture, now with more Harpies

The latest xkcd:
(Mouse-over text: But to us there is but one God, plus or minus one. --1 Corinthians 8:6±2.)

The first panel is really the funniest bit: a pun on raptor (referencing the Jurassic Park movie). But in fact, rapture and raptor are not only phonologically similar, they're also etymologically related: both deriving from Latin rapt-, the past participial stem of rapere "to seize, to snatch, to carry off".

Also from Latin rapere are subreptitious "snatching under", rapacious "(greedily) snatching (with the intent to eat)", and rape (originally "carrying off", then "carrying off, esp. with the intent of sexually despoiling", later coming to refer specifically to "forced sexual intercourse").

Raptor in classical Latin meant "robber, thief", which is its meaning also in early English, later on in English it can also mean "rapist". From the 18th century, it was applied to "birds of prey", whence its later extension to refer to a particular "dromaeosaurid dinosaur", the Velociraptor "swift seizer".

Rapture, on the other hand, is not found in classical Latin, though it does appear in mediaeval Latin. The earliest citation the OED provides is from an 8th-century British text, in the form raptura, referring to "poaching". Its use in English, however, originally is confined to the sense (attested from the 16th century) of "extreme joy, intense delight". Though it was also used in the 17th and 18th centuries to refer to the "carrying off" or "rape" of women.

And not until the 18th century does rapture acquire its Millenarial sense (associated with ideas originally advanced by the Puritans Increase and Cotton Mather in Massachusetts). The word rapture in this Millenarial philosophy apparently picks up on the Latin word rapiemur (from rapere, see above) used in 1 Thessalonians 4:17 to refer to the faithful being "carried up" into the air (to meet Christ) in the Latin Vulgate:
deinde nos qui vivimus qui relinquimur simul rapiemur cum illis in nubibus obviam Domino in aera et sic semper cum Domino erimus
The Latin Vulgate of course is a translation of the Koine Greek text, and in this passage Latin rapiemur glosses the Greek ἁρπαγησόμεθα "we shall be caught up":
ἔπειτα ἡμεῖς οἱ ζῶντες οἱ περιλειπόμενοι ἅμα σὺν αὐτοῖς ἁρπαγησόμεθα ἐν νεφέλαις εἰς ἀπάντησιν τοῦ κυρίου εἰς ἀέρα: καὶ οὕτως πάντοτε σὺν κυρίῳ ἐσόμεθα.
Interestingly, Greek ἁρπάζω "catch up, snatch up"---of which ἁρπαγησόμεθα is the first person plural future passive indicative form---originates from the same Proto-Indo-European root as the Latin rapere which St Jerome uses to gloss it: PIE *h1rep- "to snatch" (also the source of English reap).


From the same Greek root as ἁρπάζω "catch up" is the word which comes into English as harpy: Greek ἅρπυια "the snatcher". So, with that, I leave you with some Harpies to flavour your Rapturous visions, courtesy of Gustave Doré:

[Edit (20 May 2011): Now see Mark Liberman's "No Word for Rapture" on Language Log for further etymological discussion of rapture.]

Sunday, 15 May 2011

λ♥[love] (Linguistics Love Song)




See the Sentence First blog for the lyrics and also Language Log for comments and explanation.

[I'm currently dissertating, thus the lack of posts.]

Wednesday, 23 March 2011

Linguistics Behind the Wicket (LBW) #1: Shahid Afridi and Free Love Friday

In belated celebration of the breaking of Australia's 34-match unbeaten run in World Cup matches by Pakistan, I offer the first in what I plan to be a recurring series of cricket-related linguistic investigations. I'm dubbing this series LBW ("Linguistics Behind (the) Wicket").

Shahid Afridi after the 2011 World Cup Pakistani victory over Australia
Shahid Afridi during the Pakistani World Cup 2011 match with Australia

This first investigation is a study in onomastics, taking as its subject the name of the skipper of the Pakistan team: Shahid Afridi (Urdu: شاہد آفریدی). To find out the connection between Afridi and free, "love", and Friday, read on!

[A brief word about the sources of Hindi/Urdu words: alongside of the native Indo-Aryan vocabulary (inherited, ultimately, from a vernacular cousin of Sanskrit), both the Hindi and Urdu varieties of Hindi/Urdu employ a large number of Persian and Arabic words (as a result of the Mughal invasion of India).]

Shahid (Hindi: शहीद; Urdu: شاہد) is an Hindi/Urdu word of Perso-Arabic origins, meaning "martyr" (religious or political). It derives ultimately from an Arabic root شہد, which Platts[1] glosses as meaning "to give testimony". Not being a semiticist, I cannot offer any further interesting discussion.

It is rather the name Afridi (Hindi: आफ़्रीदी; Urdu: آفریدی) which is of more interest for me. Jokingly, I have sometimes referred to Afridi as "Afriti", since his aggressive cricketing (Afridi holds the record (37 deliveries) for fastest century in one-day cricket) and mercurial temperament is suggestive of an Arabian Afreet (an angry sort of djinn): Arabic ʻIfrīt عفريت, pl. ʻAfārīt عفاريت. [The origin of this word is rather opaque to me: Platts[1] derives it from an Arabic root عفر meaning "to roll in the dust"; the Wikipedia article suggests that it comes from عفرت (`afrt) meaning "the evil"; the translation of the Qur'anic passage, Sura An-Naml (27:39-40) seems gloss it as "strong one". Maybe semiticists could enlighten me here?]

However, Āfrīdī, in fact, has no connection with Arabic "Afreet". Rather, it is a word of Iranian origin, which, being the name of a certain Pathan tribe, is thus presumably indicative of Shahid Afridi's ancestral origins.
afridi soldiers
Some afridis in the Khyber Rifles

In terms of its etymology, the word āfrīdī can be derived from the Persian word آفريده āfrīda, which means "creature" (noun) or "created" (adjective). (The āfrīdīs are thus perhaps "the created people".)

Āfrīda itself can be derived as the past/perfect participial form of the Avestan root frī- "love" combined with the prefix ā- (theoretically contributing a sense of "near, towards", but sometimes resulting in idiosyncratic meanings). Avestan āfrīda would corresponds to Sanskrit āprīta, both meaning "gladdened, joyous" etc.

The semantic change from Avestan "gladdened, joyous" to Persian "created" is intriguing. The earlier meaning of "joy" still seems to be present in Persian (and Hindi/Urdu) āfrīn/āfirīn, which can be used to mean "bravo! well done!" (though it too can have the "create" sense, at least in the compound jahān-āfirīn "creator of the world").

The root underlying both Sanskrit āprīta and Avestan āfrīda is Proto-Indo-Iranian *prī-, which itself can be traced back to the Proto-Indo-European root *prī- whose most basic sense is "to love".

The PIE root *prī- (see Watkins[2]) is also the source of English free (from Old English frēo, derived from the verb frēon "to love, to set free"), friend (from Old English frēond "friend, lover"), and Friday (from Old English Frīgedæge "Frigg's day", where Frigg, the name of the Scandinavian goddess of love, Odin's wife, derives from Proto-Germanic *frijjō "beloved, wife"); as well as Old English frioðu "peace", which sadly has no direct reflexes in modern English.

In fact, PIE *prī- underlies not only the Persian tribal name Afridi, but also a variety of Germanic-derived names (see Watkins[2]), including:
  1. Siegfried, from Old High German Sigi-frith "victorious peace"
  2. Godfrey, from Old High German Goda-frid "peace of god"
  3. Frederick, from French Frédéric, itself a borrowing of Old High German Fridu-rīh "peaceful ruler"
  4. Geoffrey, from Old French Geoffroi from mediaeval Latin Gaufridus, itself a borrowing from Germanic *Gawja-frithu- "(having a) peaceful region"
Thus perhaps Geoffrey Boycott can mention his "prī-" connection with Shahid Afridi if he ever needs some filler material when commentating a Pakistan match...

So, this concludes the first LBW. I'm open to suggestions for other cricketers or cricket terminology to etymologise for future episodes.

Bibliography:
[1]Platts, John T. 1884. A dictionary of Urdū, classical Hindī, and English. London: W. H. Allen & Co., 1884. (Reprinted, New Delhi: Munshiram Manoharlal, 2000.) [online]
[2]Watkins, Calvert. 2000. The American Heritage dictionary of Indo-European roots. Boston: Houghton Mifflin, 2nd edn.
[3]McGregor, R.S. 1993. The Oxford Hindi-English dictionary. Oxford: Oxford University Press. (Indian edition: New Delhi: Oxford University Press, 1994.)