Tuesday 20 September 2011

Lizards, Walls, Dragons: on an apparently undocumented Nepali lexeme (भित्ति)

I have not posted in some time due to dissertating, searching for (and thankfully finding) a job, and subsequently moving. Here's a short posting on a Nepali word which I heard from my wife which I can't find in any Nepali dictionary.

When we moved into our new house, we discovered that there were a number of house-lizards already resident (and, less amusingly, quite a few German roaches), which our cat has really enjoyed hunting down. I remembered having such lizards in our house in India, and immediately I saw them remarked to my wife "देखो! छिपकली है!" (Look! There's a lizard!"), using the Hindi word for "lizard", छिपकली [chipkalī]. My wife replied, "in Nepali we call them 'bhitti' (भित्ति)."


I'd never heard this word before, and was curious. I checked Turner's A comparative and etymological dictionary of the Nepali language as well as his mammoth four-volume A comparative Dictionary of the Indo-Aryan languages. Neither mentions bhitti or anything like it. I also checked a number of Hindi dictionaries, none of which turned up anything. Except for Platts' A dictionary of Urdu, classical Hindi, and English, which has an entry for भित्तिका bhittikā:
S بهتکا भित्तिका bhittikā, s.f. Wall (=bhīt, q.v.); small house lizard.
This isn't quite bhitti, but it's close. I had already supposed (and my wife had already suggested) that bhitti was connected with the word for "wall" (in Nepali, भित्तो bhitto or भित्ता bhittā), given that they're often found on walls. So bhitti is something like "wall-(related) creature". [Turner does have an entry for bhitti, but he gives the meaning "wall".] 

Platts' entry indicates a Sanskrit origin, and indeed  bhittikā looks awfully Sanskritic, with the "diminutive" -(i)ka suffix, which is not really always diminutive, but rather can also attach to words with no change in meaning. But here perhaps a diminutive based on "wall" makes sense. 

Interesting, the Sanskrit word for "wall, panel, partition", bhittí, comes from a root √bhid- "to split", which is very dear to my heart (part of the Proto-Indo-European dragon mythology). 

So there's a "new" Nepali word:  bhitti "house-lizard", which doesn't seem to have been recorded before. It may be dialectal (i.e. I'm not sure that Kathmandu Nepali speakers would use it), and that's perhaps why it wasn't previously recorded. In any case, I think it's a cool word, given that it does sort of connect lizards and dragons, indirectly.

[Incidentally, Platts suggests that Hindi छिपकली [chipkalī] derives from the root chip- "to hide", which is what I always assumed (going back to an early Indo-Aryan *chapp- "press, cover, hide". Turner, on the other hand, derives it from Sanskrit शेप्या śepyā which means "tail" (and "penis", but I think "tail" is what is relevant here). The (potential) Nepali cognate of Hindi छिपकली [chipkalī] is छेपारो chepāro, though the latter might be more plausibly derived from  Sanskrit शेप्या śepyā "tail", especially as छेपारो chepāro seems to refer to outdoor lizards (while माङ्सुलि māṅsuli is used for house lizards).]

Friday 8 July 2011

Some ponderings on Google's research on inter-language linking (Bengali <-> Swahili, Nepali <-> Marathi)

On the Google Research Blog, the latest post (by ) concerns inter-language linking, i.e. looking at webpages' off-site links which go to a page in another language. From the post:
Most web pages link to other pages on the same web site, and the few off-site links they have are almost always to other pages in the same language. It's as if each language has its own web which is loosely linked to the webs of other languages. However, there are a small but significant number of off-site links between languages. These give tantalizing hints of the world beyond the virtual.
I'm particularly interested in the data on Indian language webpages' inter-language linking, especially as there are some perplexing findings. But let's start with some findings which aren't really that surprising.

One of the features measured is the degree to which webpages in a particular language are "introverted" or "extroverted", where more "introverted" webpage languages have fewer inter-language off-site links. The data are summarised here:























Webpage languages which are higher (on the y-axis) are more introverted; webpage languages which are further to the right (on the x-axis) represent languages with a greater number of total webpages.

First, a word about the apparently high degree of English-language webpage "extroversion". The relatively high percentage of English-language websites which link to non-English websites is unlikely to represent a high percentage of native English speakers who are linking to non-English websites. Rather, this would seem to simply reflect English's status as a/the world language, so that even sites whose audience may largely consist of non-native English speakers may choose to create English-language websites simply in order to have a larger audience. And I suspect the "extroverted" English-language webpages are of that type: English is the language chosen for this type of website due to its ability to reach a more "universal" audience, but the site itself may have "local" interests, reflected by its linking to non-English language websites.

But it's the Indian languages that I really want to talk about. Given the large number of Hindi speakers, one might at first be surprised at the relatively small number of Hindi language sites (compared to say Japanese). This, I think, is easily explained by the status of English in India, especially amongst people who would be more likely to create and use Internet sites. In another words, many native Hindi speakers would choose to create English- rather Hindi-language webpages. The high degree of insularity ("introversion") of Hindi-language webpages in terms of inter-language linkage is likely not unconnected. In the context of modern India, choosing to create a Hindi- rather than English-language website is already a more "insular" choice, given the widespread use of English in India itself. Those website content creator who choose Hindi medium over English medium are likely to have more "insular" interests, and thus would not be as likely to link to non-Hindi sites (and even less likely to link to non-Indian language sites).

So, thus far, there isn't really anything terribly surprisingly about these findings. But when we look at the particular inter-language link connections which are strongest, especially in the case of Indian languages, there are some weird data:
















[The arrows indicate directionality of linkage; red connections are stronger than green connections.] As point out:
Surprising links include those from Hindi to Ukrainian, Kurdish to Swedish, Swahili to Tagalog and Bengali, and Esperanto to Polish.
I would add that the Swahili-Bengali and Swahili-Tagalog links are not only strong (red), but also bidirectional (e.g. Swahili pages are linking to Bengali pages, and Bengali pages to Swahili pages). It is hard to think of convincing explanations for the connections between Swahili and Bengali (or Swahili and Tagalog). One possibility comes to mind, which is that, in terms of total Internet representation, the number of pages in Bengali, Swahili, and Tagalog is relatively small. Here the Google researchers' webpage selection criteria is presumably relevant:
The particular choice of pages in our corpus here reflects decisions about what is `important'. For example, in a language with few pages every page is considered important, while for languages with more pages some selection method is required, based on pagerank for example.
This means that for languages with a smaller Internet population individuals could have a greater effect on the particular inter-language linkages than is the case for languages with larger Internet populations.  And thus perhaps the existence of a few creators of Bengali webpage content who happen to live in central eastern Africa could be responsible for some these unexpected inter-language linkages. I would be curious to what sort of Bengali sites link to Swahili sites (and vice-versa) to see if this is a plausible idea.

There is something which worries me about these data though: look at the linkages between the Indo-Aryan languages (Punjabi, Gujarati, Marathi, Bengali, Nepali, Hindi). Punjabi, Gujarati, Marathi, Bengali, and Nepali all have strong bidirectional links with Hindi, which is to be expected given Hindi's status as a Indian lingua franca. Notice however that other than being linked with Hindi, none of the other Indo-Aryan languages are inter-linked with each other: except for Nepali and Marathi.

In India,there are large Nepali communities in West Bengal and other eastern parts of India.Marathi is spoken in Maharashtra in the far western part of India. I would be unsurprised if there were strong Marathi-Gujarati inter-language linkages (since these two languages are spoken in the neighbouring states), or if there were a strong inter-language linkage between Nepali and Bengali. But a Nepali-Marathi link doesn't make sense, at least in absence of other intra-Indo-Aryan linkages.

There is one property which I can think of which does link Nepali and Marathi, namely the fact that they both are written in Devanagari script (also used for Hindi). Gujarati, Punjabi, and Bengali, on the other hand, are each written in their own scripts (distinct from Devanagari). So I wonder if there is any possibility that the script is creating "false hits" when the off-site link connections for Nepali and Marathi are being computed. 

That also makes me worry about the other surprising inter-language linkages, such as Bengali-Swahili, Swahili-Tagalog. Not, obviously, that these languages share a common script, but whether some of the apparent connections are artefacts of the algorithm, whether due to use of a common script or some other factor. If they're not simply artefacts, then it certainly would be interesting to find out why, for instance, Bengali-language and Swahili-language webpages are linking to each other.

Sunday 3 July 2011

What speechitatest you? On engineered language change amongst high schoolers


The latest Saturday Morning Breakfast Cereal, on high school language change:




Note:
The type of language change the students are shown undergoing would require more than a source of new lexical items, I would think.

We find morphological change: Wouldsest for 2nd person singular present of "would".

And syntactic change: What speechitated Harvard? for "What did Harvard say?" (note the necessity of do-periphrasis in modern English).

How could a thesaurus (of fake synonyms) drive these sorts of changes? [Of course, under Minimalism, parametric variation, including differences in word order, is theorised to be a reflex of formal features which are borne by lexical items. So perhaps if the thesaurus had some way of encoding abstract syntactic features in such a way that they would be picked up along with the phonological and semantic aspects of the lexical item....]

Wednesday 18 May 2011

The Rapture, now with more Harpies

The latest xkcd:
(Mouse-over text: But to us there is but one God, plus or minus one. --1 Corinthians 8:6±2.)

The first panel is really the funniest bit: a pun on raptor (referencing the Jurassic Park movie). But in fact, rapture and raptor are not only phonologically similar, they're also etymologically related: both deriving from Latin rapt-, the past participial stem of rapere "to seize, to snatch, to carry off".

Also from Latin rapere are subreptitious "snatching under", rapacious "(greedily) snatching (with the intent to eat)", and rape (originally "carrying off", then "carrying off, esp. with the intent of sexually despoiling", later coming to refer specifically to "forced sexual intercourse").

Raptor in classical Latin meant "robber, thief", which is its meaning also in early English, later on in English it can also mean "rapist". From the 18th century, it was applied to "birds of prey", whence its later extension to refer to a particular "dromaeosaurid dinosaur", the Velociraptor "swift seizer".

Rapture, on the other hand, is not found in classical Latin, though it does appear in mediaeval Latin. The earliest citation the OED provides is from an 8th-century British text, in the form raptura, referring to "poaching". Its use in English, however, originally is confined to the sense (attested from the 16th century) of "extreme joy, intense delight". Though it was also used in the 17th and 18th centuries to refer to the "carrying off" or "rape" of women.

And not until the 18th century does rapture acquire its Millenarial sense (associated with ideas originally advanced by the Puritans Increase and Cotton Mather in Massachusetts). The word rapture in this Millenarial philosophy apparently picks up on the Latin word rapiemur (from rapere, see above) used in 1 Thessalonians 4:17 to refer to the faithful being "carried up" into the air (to meet Christ) in the Latin Vulgate:
deinde nos qui vivimus qui relinquimur simul rapiemur cum illis in nubibus obviam Domino in aera et sic semper cum Domino erimus
The Latin Vulgate of course is a translation of the Koine Greek text, and in this passage Latin rapiemur glosses the Greek ἁρπαγησόμεθα "we shall be caught up":
ἔπειτα ἡμεῖς οἱ ζῶντες οἱ περιλειπόμενοι ἅμα σὺν αὐτοῖς ἁρπαγησόμεθα ἐν νεφέλαις εἰς ἀπάντησιν τοῦ κυρίου εἰς ἀέρα: καὶ οὕτως πάντοτε σὺν κυρίῳ ἐσόμεθα.
Interestingly, Greek ἁρπάζω "catch up, snatch up"---of which ἁρπαγησόμεθα is the first person plural future passive indicative form---originates from the same Proto-Indo-European root as the Latin rapere which St Jerome uses to gloss it: PIE *h1rep- "to snatch" (also the source of English reap).


From the same Greek root as ἁρπάζω "catch up" is the word which comes into English as harpy: Greek ἅρπυια "the snatcher". So, with that, I leave you with some Harpies to flavour your Rapturous visions, courtesy of Gustave Doré:

[Edit (20 May 2011): Now see Mark Liberman's "No Word for Rapture" on Language Log for further etymological discussion of rapture.]

Sunday 15 May 2011

λ♥[love] (Linguistics Love Song)




See the Sentence First blog for the lyrics and also Language Log for comments and explanation.

[I'm currently dissertating, thus the lack of posts.]

Wednesday 23 March 2011

Linguistics Behind the Wicket (LBW) #1: Shahid Afridi and Free Love Friday

In belated celebration of the breaking of Australia's 34-match unbeaten run in World Cup matches by Pakistan, I offer the first in what I plan to be a recurring series of cricket-related linguistic investigations. I'm dubbing this series LBW ("Linguistics Behind (the) Wicket").

Shahid Afridi after the 2011 World Cup Pakistani victory over Australia
Shahid Afridi during the Pakistani World Cup 2011 match with Australia

This first investigation is a study in onomastics, taking as its subject the name of the skipper of the Pakistan team: Shahid Afridi (Urdu: شاہد آفریدی). To find out the connection between Afridi and free, "love", and Friday, read on!

[A brief word about the sources of Hindi/Urdu words: alongside of the native Indo-Aryan vocabulary (inherited, ultimately, from a vernacular cousin of Sanskrit), both the Hindi and Urdu varieties of Hindi/Urdu employ a large number of Persian and Arabic words (as a result of the Mughal invasion of India).]

Shahid (Hindi: शहीद; Urdu: شاہد) is an Hindi/Urdu word of Perso-Arabic origins, meaning "martyr" (religious or political). It derives ultimately from an Arabic root شہد, which Platts[1] glosses as meaning "to give testimony". Not being a semiticist, I cannot offer any further interesting discussion.

It is rather the name Afridi (Hindi: आफ़्रीदी; Urdu: آفریدی) which is of more interest for me. Jokingly, I have sometimes referred to Afridi as "Afriti", since his aggressive cricketing (Afridi holds the record (37 deliveries) for fastest century in one-day cricket) and mercurial temperament is suggestive of an Arabian Afreet (an angry sort of djinn): Arabic ʻIfrīt عفريت, pl. ʻAfārīt عفاريت. [The origin of this word is rather opaque to me: Platts[1] derives it from an Arabic root عفر meaning "to roll in the dust"; the Wikipedia article suggests that it comes from عفرت (`afrt) meaning "the evil"; the translation of the Qur'anic passage, Sura An-Naml (27:39-40) seems gloss it as "strong one". Maybe semiticists could enlighten me here?]

However, Āfrīdī, in fact, has no connection with Arabic "Afreet". Rather, it is a word of Iranian origin, which, being the name of a certain Pathan tribe, is thus presumably indicative of Shahid Afridi's ancestral origins.
afridi soldiers
Some afridis in the Khyber Rifles

In terms of its etymology, the word āfrīdī can be derived from the Persian word آفريده āfrīda, which means "creature" (noun) or "created" (adjective). (The āfrīdīs are thus perhaps "the created people".)

Āfrīda itself can be derived as the past/perfect participial form of the Avestan root frī- "love" combined with the prefix ā- (theoretically contributing a sense of "near, towards", but sometimes resulting in idiosyncratic meanings). Avestan āfrīda would corresponds to Sanskrit āprīta, both meaning "gladdened, joyous" etc.

The semantic change from Avestan "gladdened, joyous" to Persian "created" is intriguing. The earlier meaning of "joy" still seems to be present in Persian (and Hindi/Urdu) āfrīn/āfirīn, which can be used to mean "bravo! well done!" (though it too can have the "create" sense, at least in the compound jahān-āfirīn "creator of the world").

The root underlying both Sanskrit āprīta and Avestan āfrīda is Proto-Indo-Iranian *prī-, which itself can be traced back to the Proto-Indo-European root *prī- whose most basic sense is "to love".

The PIE root *prī- (see Watkins[2]) is also the source of English free (from Old English frēo, derived from the verb frēon "to love, to set free"), friend (from Old English frēond "friend, lover"), and Friday (from Old English Frīgedæge "Frigg's day", where Frigg, the name of the Scandinavian goddess of love, Odin's wife, derives from Proto-Germanic *frijjō "beloved, wife"); as well as Old English frioðu "peace", which sadly has no direct reflexes in modern English.

In fact, PIE *prī- underlies not only the Persian tribal name Afridi, but also a variety of Germanic-derived names (see Watkins[2]), including:
  1. Siegfried, from Old High German Sigi-frith "victorious peace"
  2. Godfrey, from Old High German Goda-frid "peace of god"
  3. Frederick, from French Frédéric, itself a borrowing of Old High German Fridu-rīh "peaceful ruler"
  4. Geoffrey, from Old French Geoffroi from mediaeval Latin Gaufridus, itself a borrowing from Germanic *Gawja-frithu- "(having a) peaceful region"
Thus perhaps Geoffrey Boycott can mention his "prī-" connection with Shahid Afridi if he ever needs some filler material when commentating a Pakistan match...

So, this concludes the first LBW. I'm open to suggestions for other cricketers or cricket terminology to etymologise for future episodes.

Bibliography:
[1]Platts, John T. 1884. A dictionary of Urdū, classical Hindī, and English. London: W. H. Allen & Co., 1884. (Reprinted, New Delhi: Munshiram Manoharlal, 2000.) [online]
[2]Watkins, Calvert. 2000. The American Heritage dictionary of Indo-European roots. Boston: Houghton Mifflin, 2nd edn.
[3]McGregor, R.S. 1993. The Oxford Hindi-English dictionary. Oxford: Oxford University Press. (Indian edition: New Delhi: Oxford University Press, 1994.)

Tuesday 8 March 2011

Indian voices from 1913-1929: Gramophone Recordings from the Linguistic Survey of India

George Grierson pioneered the vast Linguistic Survey of India in 1894, an immensely useful resource for anyone working on languages of the Indian subcontinent. A set of recordings were also made as part of the survey, which were recently uncovered in the British Library. These recordings are now freely available from the University of Chicago's Digital South Asian Library at http://dsal.uchicago.edu/lsi/



In order that the languages might be more easily compared (and because "it contains the three personal pronouns, most of the cases found in the declension of nouns, and the present, past, and future tenses of the verb"), Grierson chose to use translations of the Biblical "Parable of the prodigal son", and many of the recordings are of speakers reciting this parable in their native language.

Here is the recording of the "Parable of the prodigal son":
  In Hindi (one of the major languages of India)
  In Khasi (a Mon-Khmer language spoken in Shillong, Meghayala, [the former capital of Assam])

OPEN Magazine has a great article about these recordings, their rediscovery and content, available here: Voices from Colonial India

It's well worth a read, but here are a few highlights. For instance:

Some of the Sanskrit recording took a bit of doings. Background: strict followers of the Vedic/Hindu tradition are supposed to safeguard the Vedas from the ears of those who are not dvijas ("twice-born", those who wear the sacred thread). This prohibition was taken seriously by some authorities, for instance, in the Gautama Dharma Sutra we find:
अथ हास्य वेदमुपशृणवतस्त्रपुजतुभ्यांश्रोत्रप्रतिपूरणमुदाहरणे जिह्वाच्छेदो धारणेशरीरभेदः
"Now if he [a Shudra = a non-dvija/untouchable] listens intentionally to (a recitation of) the Veda, his ears shall be filled with (molten) tin or lac.   [Gautama Dharma Sutra 12.4]
From the OPEN Magazine article:
...All of this, of course, could not have been accomplished without some Brahminical drama. The scholar Ganganath Jha, who was approached for the Sanskrit reading, was scandalised to learn that a mlechha [Sanskrit for "barbarian", "foreign devil", and thus by definition a non-dvija] would be privy to his chaste Sanskrit. A demand was made for a certifiably Brahmin gramophone operator. The Raj, almost as unbending as Brahmins, refused. A compromise was reached: Jha sat in a room and spoke into a large horn-like object that projected his voice into another room where the operator sat. Communication between the two was by means of a complicated system of switches to ensure that the operator didn’t physically hear the Sanskrit. And that was enough to assuage the Brahmin guilt about speaking Sanskrit into a device that held the power to broadcast it to the world...
Jha's recording must have been of some Vedic text, because I am unaware of any general prohibition against speaking Sanskrit in the presence of non-dvijas. Sadly, I cannot find this recording on the University of Chicago's Digital South Asian Library site (they do have a general entry for Jha here: https://coral.uchicago.edu:8443/display/lasa/Ganganath+Jha+Ken.+Sanskrit+Vidyapith+%28Allahabad%2C+India%29).

[Brahminical rationalisations can be both amusing and creative: My advisor, who is a (German) Sanskrit scholar, once told me about one spoken Sanskrit conference he attended (where, I believe, he was the only non-Brahmin/non-Indian) at which there was one attendee who was a bit unhappy with the presence of a non-Brahmin, and was careful not to let my advisor's shadow touch him... Other attendees came up with rationalisations: German sounds a bit like Sharma, a Brahmin surname, and so they theorised that Germans are perhaps "long-lost" Brahmins, and therefore my advisor's presence could be a acceptable.]

Another interesting bit from the OPEN Magazine article:
Many of the speakers chose to sing or recite poems or limericks. Particularly lingering is the voice of Hassaina of Delhi who has clips in the Ahirwati and Mewati languages. Who was this girl who sang with such sang-froid of love and waiting on 26 April 1920?  Nothing is known of her. She survives only as a voice.
Here is Hassaina's song: http://dsal.uchicago.edu/lsi/6838AK

[27 May 2011: Nepali is actually represented too, hidden under "Khaskura", including both the parable of the prodigal son translation, and a delightful song sung by a Shillongwala Nepali, Babu Dhan.]

Thursday 10 February 2011

Minecræft. (Minakraft?)

The biggest indie game of 2010: Minecraft. Since it shares half a compound noun with the title of this blog, that seems as good an excuse as any to look at the "etymology" of Minecraft.

First, in case you're unfamiliar with Minecraft, here's a video:



In terms of Minecraft's design inspiration, its creator, Markus "Notch" Persson, attributes its origins to Infiniminer, Dungeon Keeper, Left 4 Dead, and Dwarf Fortress (the last has been described as a mixture of Nethack, Oregon Trail, The Sims, and Lemmings).

Etymology 1: "the art of mining"
If Minecraft were truly parallel to stæfcræft, then it would be what is referred to in Sanskrit grammar as a tatpuruṣa (तत्पुरुष) compound , that is, a compound of the form YX, where X is the head noun and Y relates to X as if it were some non-nominative case form, e.g. a genitive, dative, etc. For example, the tatpuruṣa compound mousehunter is a hunter (=X) of mice (=Y). If this were the case then Minecraft would mean something like "the craft of mining", i.e. "the art of mining",---which in fact is an analysis which makes eminent sense, given that mining is a major component of the game.


Etymology 2: "mining and crafting"
Another possibility, however, is that Minecraft is what is referred to in Sanskrit grammar as a dvandva (Sanskrit द्वन्द्व dvandva 'pair') compound, that is a compound of the form YX, where X and Y could be otherwise expressed as X and Y. For example, the dvandva compound producer-director (duo) is someone who is a producer and a director a pair of people, one of whom is a producer, and one a director [examples: (1), (2), (3), (4)]. This would also seem a plausible analysis, given that aside from mining, crafting items is the other major component of the game (well, alongside trying not to get eaten by zombies or blown up by creepers).


An excursus on mine and craft
A third possibility, which presents itself in view of the fact that the game's creator, Persson, is Swedish, is that it is indeed a tatpuruṣa compound, but a Swedish compound rather than an English one. To explore this possibility, it's worthwhile to delve deeper into the etymologies of mine and craft.

Mine, in English, derives from an Anglo-Norman word mine, which was more or less a form of Old French (the word mine is found c1220 in Old French with sense 'underground cavity or excavation where metals and minerals are found'). The Old French/Anglo-Norman word itself was most likely borrowed from some continental Celtic language (compare Welsh mwyn 'mineral, mine' (14th cent.); Old Irish méin 'ore, metal'; Scottish Gaelic mèinn 'ore, mine'; further etymology of the Celtic words is uncertain). A Swedish cognate, mina, is attested from the 17th century; cognate forms appear in other Scandinavian and Germanic languages, as well as in other Romance languages (Spanish, Portuguese, etc.), but all appear to ultimately be borrowings from the French. The Scandinavian words are likely to have been borrowed from German Mine (itself, of course, also originally borrowed from the French word).

Craft is an interesting word. Like mine its ultimate ancestry is uncertain (i.e. there is no obvious reconstructable Proto-Indo-European source for either mine or craft), it appears only in Germanic, with no apparent cognates in other Indo-European languages. The origin sense of craft is "power, might, strength". This was one of the prevalent senses of craft in early English (the last attestation in the Oxford English Dictionary for this sense is from 1526, W. Bonde Pylgrimage of Perfection ii. sig. Kviii, "By the craft [=power] of nature."), and this is in fact the only sense borne by its cognates in other Germanic languages.

Etymology 3 ("the Swedish etymology"): "the power of the mine"
The development into the more familiar English senses of craft, e.g. "skill, art" is a solely English development---though it took place very early in English, as evidenced by an abundance of words in Old English of the type stæfcræft ("skill of letters; grammar"). This development seems to have involved a metaphorical extension of craft's original sense "(physical) power" as "intellectual power" and therefore "ability, skill, art". Thus, its Swedish cognate, kraft, has only the more original sense "power, might, strength". Therefore, if Minecraft were to be actually be a Swedish coinage (or, at least, an anglicisation of such a compound, which, I think, would have been Minakraft), then it could be treating as a (tatpuruṣa) compound meaning something like "power of the mine".


The real etymology
However, none of these proposed etymologies appear to in fact be correct. Persson on his blog (14 May 2009) originally proposed to call the game "Minecraft: Order of the Stone", a name "awesome but insane people in #tigirc helped [him] come up". Further investigation reveals that "[it was] RinkuHero...who suggested "Minecraft" (as an analogy to "Starcraft"), having not played the game and knowing nothing about it other than that it was a type of strategy game involving mines."

So Minecraft is simply an analogical form based on Starcraft (Starcraft is a game having to do with stars, and therefore a game having to do with mines is Minecraft). Now, Starcraft itself appears to be analogical form based on the title of one of Blizzard Entertainment's other games(/game franchises): Warcraft. If the compound type (tatpuruṣa) is carried over in the analogical process, then, in a sense, Minecraft should indeed mean something like "art of mining", which was the first of the proposed etymologies.


Miscellany: wars and crafts
The word warcraft itself is of course not a new coinage specific to the Blizzard Entertainment series. Interestingly though, warcraft does appear to be a relatively recent coinage (recent compared to the history of English at least), with the Oxford English Dictionary's earliest citation being from around 1660 (T. Fuller Worthies (1662) Lanc. 124 "Duke Hambleton‥had Officers who did Ken the War-craft, as well as any of our Age."). Though warcraft itself appears relatively late in the history of English, there are earlier formations ending in -craft which bear the same meaning that appear in Old English, including beaducræft(ig), gūðcræft, and wīgcræft (the last is the most widespread; the first two occur only in Beowulf).

Incidentally, war is a word with a weird history, it's a "returnee"-type borrowed word: it derives from a Germanic word, borrowed into French, and thence "given back" to English (a Germanic language). [It appears in late Old English (c1050) in the form wyrre, werre, a word borrowed from North-eastern Old French werre (cp. modern French descendant, guerre "war") which itself was borrowed from Old High German werra "confusion, discord, strife", related to the Old High German verb werran "to bring into confusion or discord" (cp. modern German wirren "to confuse, perplex"), ultimately from a Proto-Germanic root *werz-, *wers-, which is the origin also of the modern English word worse.] 


Anglo-Saxon Minecraft
Returning to Minecraft: what---you didn't ask---would be the Old English form of Minecraft, given that we have determined that it must mean "art/skill of mining"? Probably Delfingcræft.

Thus, on that note, we close with some gratuitous screenshots of Heorot, the famous meadhall of Beowulf, as constructed in Minecraft:





Friday 4 February 2011

Trocheeotomy? Trocheeectomy.

The latest xkcd:

If you Huffman-coded all the 'random' things everyone on the internet has said over the years, you'd wind up with, like, 30 or 40 bytes *tops*.

Click for larger image

Trocheeotomy? Or should it be trochee-ectomy? (~trocheeectomy)

[Some additional things:

Here's an interesting chart of trochee bigrams from the xkcd blag:

[again, click for larger image]


And Mark Liberman on Language Log offers some discussion here. (I imagine xkcd is thinking of Snow Crash rather than The Big U. By the bye, the linguistic-y part of Snow Crash I've always felt to be the weakest part of the book.) ]

The alt-text:

Monday 24 January 2011

Good-bye, Good Luck, and Godspeed: On linguistic (de)secularisation

Godspeed
What exactly does someone mean if they wish you Godspeed? Here's one answer, courtesy of comedian Eddie Izzard:

In fact, the speed of Godspeed refers to one of speed's other early senses, namely "success" or "(good) luck, fortune, prosperity". The OED's [1] earliest citation for Godspeed is from Tyndale's Bible translation:
[1526] Bible (Tyndale) - 2 John 10   Yf ther come eny vnto you and bringe not this learninge him receave not to housse: neither bid him God spede.
(Roughly: "If anyone comes to you who does not bring this (Christian) learning, don't let him into your house, nor wish him Godspeed.")
In such early uses of Godspeed, it appears that spede is used as a subjunctive verbal form, so that God spede means something like "may God speed (you)", i.e. "may God grant you success/prosperity".
[1597] Shakespeare Richard II i. iv. 31   A brace of draimen bid, God speed him wel.
More familiar is the use of Godspeed as a noun-noun compound, as in:
[1865] J. R. Lowell Polit. Ess. (1888) 229   Every humane and generous heart‥has wished us God-speed.
However, I have reason to suspect that examples like those from Tyndale and Shakespeare represent a later refashioning: in Old English, the verb spēdan does not seemed to have been used a causative (nor does there appear to be any other causative form of it), while the 16th century examples treat it as such (i.e. X speed Y as "may X cause Y to be speedy/to succeed"). 

Rather Old English spēdan meant simply "to speed, to be successful, to have good fortune", in which case, the later Middle/Modern English God spede should have meant something like "may God be successful" (or "may God go really fast"?!). The equivalent of "God causing someone to be successful"  in Old English required a periphrasis of the sort "God gave speed (i.e. success) to someone", as in:
Exodus 153b-4b: þær him mihtig god on ðam spildsiðe spede forgefe
("if Mighty God would grant them success on the destructive quest")
I argue that the origin of Godspeed is in fact as a compound word, formed of good+speed, which was later reanalysed as God+speed, whence back-formations like God spede (ye), with spede being reanalysed as a causative (i.e. "may (he) cause you to be successful").

Before turning to details of the analysis proper, a quick lesson on some Middle and early Modern English sound changes.

First, the Great English Vowel Shift. I won't go into all of the details here, for they don't concern us, but in general the Great English Vowel Shift raised all long vowels (and diphthongised /ī/ to /aɪ̯/ and /ū/ to /aʊ̯/). More relevantly, Old English gōd "good" (noun and adjective) became /gūd/. By other sound changes, Old English gŏd "god" (originally a neuter noun, but a masculine form was innovated in Christian contexts) became /gɔd/, which by later changes became /gɒd/ (British) or /gɑd/ (American). There were yet later changes affecting ū which caused various (and somewhat sporadic) shortenings, sometimes to /ʌ/ (e.g. blood /blʌd/ from OE blōd) and sometimes to /ʊ/, as in the case of good /gʊd/.

Additionally, in Middle English a phonological change occurred which shortened long vowels in closed syllables which preceded another syllable. For example, consider the changes which occurred in vowel of the first member of the compound words shepherd and wisdom, contrasted with the lack of change in the simplex words sheep and wise, as summarised in the Table below.

Old English Middle English
scēap sheep [shēp]
scēaphirde shĕpherd
wīs wīs
wīsdom wĭsdom

Here the simplex words wise and sheep thus remained unaffected (and became, by the Great English Vowel Shift, /waɪ̯s/ and /ʃīp/, respectively), but the vowels of first component in shepherd ("sheep-herd(er)") and wisdom were shortened.

With these sound changes in mind, let us consider what would have happened to a hypothetical Old English compound *gōd-spēd ("good fortune, good luck"): by the Middle English closed-syllable-before-another-syllable rule, this would become /godspēd/, which would thus have become Modern English Godspeed: /gɒdspīd/ (British), /gɑdspīd/ (American). 

If this etymology is correct, then the original sense of Godspeed is good speed, in other words good luck---which makes eminent sense as a formula of well-wishing (note above in the above video clip, Izzard in fact glosses "good luck" as "Godspeed").

 In fact, an Old English form gōd-spēd is not entirely hypothetical: the adjectival form gōdspēdig is recorded in the verse "translation" of Genesis A:
Genesis A 1008b-9b: Him þa brego engla, godspedig gast geanþingade, "Hwæt...'
("The Lord of Angels (i.e. God), the "good-speedy" spirit, answered him: 'Listen...'")
[This is God addressing Cain, right after Cain utters his signature "I am not my brother's keeper" line.]

The question is how to translate godspedig (though, as is typical of Old English manuscripts, the vowel quantity is not indicated, it is fairly obvious that it is in fact gōdspēdig and not gŏdspēdig, given that it characterises God himself). If it is an adjectivalised noun-noun compound (as Bosworth and Toller treat it in their dictionary [2]), then perhaps "rich in good(ness)". But it could be based on an adjective-noun compound  gōdsped "good fortune", and thus mean something like "one is good at success, full of good fortune". 

In any case the occurrence of gōdspēdig in Old English strengthens the case for Godspeed as originating from gōdspēd "good luck" with god spēde "may God speed/grant good fortune to (you)" being a later reinterpretation (with reanalysis of spēde as causative).

Gospel
Another word which has a similar history is gospel. The original form in Old English was gōdspel "good news", glossing Latin ēvangelium, (from Greek ευαγγελιον (evangelion), from eu- "good" and angelion "message"; but which in classical Greek meant only "a reward for bringing good news," and in the plural "a sacrifice offered on receiving good news"). Now, by the Middle English shortening rule described above, gōdspel would have become gŏdspel, and thus Modern English gospel /gɒspɛl/  (ignoring the loss of the d).

However, in this case, the word was in fact actually reanalysed much earlier, at some point in Old English, as gŏdspel "news about God", apparently based on the written form (which would not have usually indicated vowel quantity), as evidenced by the forms in other Germanic language (the other Germanic peoples were evangelised by Anglo-Saxons): Old Saxon godspell, Old High German gotspell, Old Norse guð-, goðspiall.

Good-bye
Assuming the above proposed etymology for Godspeed is correct: that the original was good speed "good luck" and thus secular in nature, it is interesting to note that exactly the opposite change occurred in the case of Good-bye: an originally religious formula was secularised. Good-bye is a shortening of God be with you(/ye). Here God has been substituted by Good, presumably by analogy with/contamination by other leave-taking formulae like Good Day, Good Night, Good Morning etc.

Summary
Godspeed likely originated as a secular formula "good speed" (i.e. "good luck"), but due to a phonological change in Middle English affecting long vowels in close syllables followed by one or more syllables the vowel of the first word was shorten, becoming homophonous with God, thus giving rise to a reinterpretation of spede as a causative verb meaning "cause to succeed" and resulting in formations like God spede "God prosper (you)". While Good-bye originally had religious connotations, being a shortened form of "God be with you/ye", but by analogy to other leave-taking formula like Good Night, God was substituted by Good. 

One secular formula, go(o)d-speed, was thus reinterpreted as religious in nature, and the other, originally religious formula, go(o)d-bye, took on a secular nature.  

[Updates (04-Feb-2011):

Some new information gleaned from the comments at languagehat's blog:

I. MMcM points out that:
(1) Webster (ca. 1830) glosses godspeed as "good speed" [link to Google Books page here].
(2) Tyndale's Bible translation (whence the first citation of godspeed, see above), also includes good speed [link to Google Books page here]: "LORde God of my maſter Abrahã, ſend me good ſpede this daye, & ſhewe mercy vnto my maſter Abraham".

II. Goodspeed also appears as a surname, e.g. Ben Zimmer's maternal line [hattip Ben Zimmer], the author of the 1923 American Translation of the New Testament, Edgar J. Goodspeed [hattip John Cowan].

]

References:
[1]The Oxford English Dictionary, September 2009 rev. ed.
[2]Bosworth, Joseph and T. Northcote Toller. 1898. An Anglo-Saxon dictionary. London: Oxford University Press.

Wednesday 12 January 2011

On analogy: Octopuses, Octopi, Octopodes, Emacsen

From a post on boingboing.net, Merriam Webster editor Kory Stamper discusses the "correct" plural of octopus:



Octopus:octopi is a standard example illustrating ("false") analogy that I've used in class before. The story I've told goes like this: Octopus sounds like a Latin word, and so by analogy to Latin borrowings like syllabus:syllabi, alumnus:alumni, people often form its plural as octopi. But, so the standard story goes, it's not a borrowing from Latin, but rather from Greek, so octopi is technically incorrect. The proper Greek plural is rather octopodes.

[More technically, it is a borrowing from Latin, but the Latin word itself is a borrowing/coinage from Greek, and the Greek word would be (in nominative singular) ὀκτώπους (oktṓpous), whose (nominative) plural would be ὀκτώποδες (oktṓpodes). Given that it's a scientific name, it will in fact be a Latin word (albeit one of Greek origins). By modern Latin rules for Greek borrowings, it should be a third declension noun, and form its plural with -es. Thus the Latin forms are octopus:octopodes.]

In the video, however, Stamper makes the following arguments: (1) octopodes sounds rather pedantic (I think a good compromise here though is to pronounce it to rhyme with nodes), and (2) once a word is borrowed into English, it becomes an English word and so should form its plural according to the standard English rules for pluralisation, i.e. it should be octopuses.

In fact, though it is true that -s is the dominant plural ending in English, and thus the one usually used for borrowings and new coinages, it is not the only possibility. Even coinages don't always form plural with -s. For example, there is a powerful text-editing program called Emacs. Different varieties of this program have arisen, and thus a plural form is sometimes called for. And the standard plural used is Emacsen (by analogy to ox:oxen; cf. boxen and VAXen). The point being simply that  if even coinages don't always use the -s plural, then we needn't expect that borrowings should either. And therefore, nothing forces us to accept octopuses as the "correct" plural. [Caveat: of course there is no real "correct" plural for any word, aside from whatever people accept/use.]

But, as it stands, it would seem that the "correct" pluralshould be either octopuses, since that conforms to the dominant pluralisation rule for English, or else octopodes, since octopus is a coinage made from Greek components.

As one of the commenters to the boingboing post (Anon #80) points out though, there is in fact a case to be made for octopi as the "historically correct plural". The case is as follows: Linnaeus may have coined octopus ("eight foot (creature)") by analogy to the old Latin word for "octopus", namely polypus ("many foot (creature)"). Now polypus is obviously also a borrowing from Greek, but in Latin the normal plural of polypus was in fact polypi! (And, likewise, the plural of the modern scientific term polypus is also polypi). 

[The commenter goes on to add that even the Greeks sometimes treated πολύπους (polúpous) as a second declension noun (which would give it a nominative plural of πολύποι (polúpoi). So even the Romans might have had a precedent for their -i plural of polypus.]

So if octopus is seen as a modern "updating" of the original Latin word for "octopus" (polypus), then there is an interesting case to be made for octopi as the (historically) "correct" plural.

Thursday 6 January 2011

Free OED

Indulge your thirst for etymology. Use the online version of the Oxford English Dictionary free (for a month; until 5 February 2011).

Login with "trynewoed"/"trynewoed."

From languagehat.