Monday, July 15, 2013

Linus Torvalds: " There aren't enough swear-words in the English language", or, perkeleen vittupää

Linus Torvalds, commenting on a Linux commit, had to result to swearing in (his semi-native) Finnish (emphasis added):
Sat, 13 Jul 2013 15:40:24 -0700
Subject Re: [GIT pull] x86 updates for 3.11
From Linus Torvalds <>

On Sat, Jul 13, 2013 at 4:21 AM, Thomas Gleixner  wrote:
>    * Guarantee IDT page alignment

What the F*CK, guys?

This piece-of-shit commit is marked for stable, but you clearly never even test-compiled it, did you?

Because on x86-64 (the which is the only place where the patch
matters), I don't see how you could have avoided this honking huge
warning otherwise:
 arch/x86/kernel/traps.c:74:1: warning: braces around scalar
initializer [enabled by default]
  gate_desc idt_table[NR_VECTORS] __page_aligned_data = { { { { 0, 0 } } }, };
 arch/x86/kernel/traps.c:74:1: warning: (near initialization for
‘idt_table[0].offset_low’) [enabled by default]
 arch/x86/kernel/traps.c:74:1: warning: braces around scalar
initializer [enabled by default]
 arch/x86/kernel/traps.c:74:1: warning: (near initialization for
‘idt_table[0].offset_low’) [enabled by default]
 arch/x86/kernel/traps.c:74:1: warning: excess elements in scalar
initializer [enabled by default]
 arch/x86/kernel/traps.c:74:1: warning: (near initialization for
‘idt_table[0].offset_low’) [enabled by default]
and I don't think this is compiler-specific, because that code is
crap. The declaration for gate_desc is very very different for 32-bit
and 64-bit x86 for whatever braindamaged reasons.

Seriously, WTF? I made the mistake of doing multiple merges back-to-back with the intention of not doing a full allmodconfig build
in between them, and now I have to undo them all because this pull
request was full of unbelievable shit.

And why the hell was this marked for stable even *IF* it hadn't been
complete and utter tripe? It even has a comment in the commit message
about how this probably doesn't matter. So it's doubly crap: it's
*wrong*, and it didn't actually fix anything to begin with.

There aren't enough swear-words in the English language, so now I'll
have to call you perkeleen vittupää just to express my disgust and
frustration with this crap.

 Reddit has some additional comments.

What does perkeleen vittupää mean?

  • Pää =  head (From Proto-Uralic *päŋe)
  • Vittu = vulgar term for female genatalia  (< Swedish fitta, with the same meaning, apparently ultimately from fett, meaning "fat" (n.), itself a borrowing in Swedish from Middle Low German vet, from Old Saxon *fētid, from Proto-Germanic *faitidaz )
  • Perkele = the name of the chief deity of the pre-Christian Finnish pantheon, now usually meaning something like "Devil" (itself ultimately from Proto-Indo-European; cp. Perkūnas, the common Baltic name for the god of thunder, deriving from Proto-Indo-European *Perkwunos, itself cognate with *perkwus, a word for "oak", "fir" or "wooded mountain")
Thus the gloss of one Redditor.

For what it's worth, note that two of the three morphemes (and the two which are actually crucial to the compound's vulgarity) are in fact of Indo-European extraction (or, at least Germanic, in the case of vittu), so at least it's not a defiency in Indo-European/Germanic...

Language Log has more on both vittupää and perkele.

[Bonus vulgarity: The version control system used for Linux kernel development is called "Git" (also originated & coined by Linus Torvalds).]

Saturday, August 4, 2012

The garden path to heaven

There is a Facebook page called "I Miss Someone Really Bad Who Is In Heaven".

My initial reading was:

Villain: "If only my parents could see me now..."
Sidekick: "Sir, I am sure they're smiling down from evil heaven."

I was somewhat disappointed to find that this is not the intended reading.

Saturday, July 28, 2012

Singular "they" and Minecraft

A quickish note about a recent posting by Notch of Mojang. Notch, the original creator of Minecraft (which I really haven't had a chance to play for some time), notes that though the default character skin appears somewhat masculine (and is referred to as "Steve"), the original intent was that characters in Minecraft be genderless. Notch points to the genderless aspects of the other living creatures in Minecraft (cows, birds, pigs etc.) and the fact that all of these can breed with any other member of the same species to produce offspring as part of the same outlook.

The linguistic angle is his closing footnote, which relates to referring to Minecraft's default character as him:
* I do regret using masculine terms to talk about the default character. These days I try to use the up-and-coming use of “they” as a genderless pronoun.

They, of course, has been an "up-and-coming" genderless pronoun for at least a few hundred years now:
Matt. 18:35: So likewise shall my heauenly Father doe also vnto you, if yee from your hearts forgiue not euery one his brother their trespasses. [Tyndale's translation, 1526]
It has been pointed out repeatedly that singular they has been used in the Biblical translations of Tyndale and the King James translators, as well as other reputed writers of English literature such as Shakespeare and Jane Austen:
There's not a man I meet but doth salute me
As if I were their well-acquainted friend
[Shakespeare, A Comedy of Errors IV, 3]
"It had been a miserable party, each of the three believing themselves most miserable."
[Austen, Mansfield Park]
 For more on singular "they", see Language Log's collection of posts on the topic, as well as Wikipedia's extensive page.

Monday, June 18, 2012

"Oh, no, maadarcho-": On subtitling vulgarities in Hindi films

John McWhorter, in his recent New Republic article, "Gosh, Golly, Gee: Mitt Romney's verbal stylings", discusses what he appears to view as Romney's over-sanitised style of public speaking as a marker of inauthenticity. Lucy Ferriss, in her post "Jeepers!" on the Lingua Franca site, is somewhat sceptical of this argument.

However, what struck me in Ferriss's post was the following paragraph, because it touches on something I was pondering a couple of days ago.
It is certainly true, as McWhorter observes, that public discourse has grown more casual and that examples of “taking the name of the Lord in vain” are not so proscribed as they once were. I only became aware of my own habitual use, not only of various expletives involving Judeo-Christian names for the deity, but of designated euphemisms, when I was in Pakistan recently. I would start to say, “Jesus, it’s hot,” and realize that my hosts’ theological frame of reference was somewhat different. Soon I began censoring not only “God” and “Christ,” but also “jeez,” “criminy,” “omigod,” and “lordy.” It was surprisingly easy to do, and as my speech changed, I also noticed no swearing (at least in English) on the part of my interlocutors, who did use other American slang freely.
 An oddly persistent feature of Hindi-language film English subtitling is the bowdlerisation of cursing. A particularly amusing instance of this occurs in the film Murder 2, a somewhat gruesome thriller. The main character, a hard-boiled ex-cop, is verbally abusing another character, and calls him मादरचोद (mādarchod).* Now mādarchod means "one who has sexual relations with his mother" and thus has a readily available and obvious English gloss. However, in the English subtitles mādarchod is rendered as "scoundrel". The disparity between the original and the translation afforded me a good chuckle (my wife simply ignores the subtitles, so wondered why I started laughing).

What is even more amusing is that this bowdlerised subtitling extends to subtitling English as well. So, in the same film, when the hero disgustedly says "Fuck." in sotto voce, the subtitles tell us that he said "Oh, no!".

I wonder if there is a certain subset of South Asians (who can speak English, and reside somewhere in South Asia, as opposed to abroad) who are uncomfortable with cursing in English (even if they do so in other languages) - this subset would seem to include everyone who provides English subtitles for Hindi films.

Postscript: "Taking the name of the lord in vain" doesn't translate well very into a Hindu setting. Hindi speakers will exclaim हे भगवान! (he bhagwān) "Oh, lord!" in times of crisis (or mock-crisis), and likewise will say "Oh, lord!" or "Oh, god!" in English in the same fashion. But these are all what I would call vocative uses, supplications to divine powers for assistance (and I would think "lordy" would fit into this category too). I can't think of Hindi language curses which parallel zounds (< "by god's wounds"). Hindi swearing usually involves some sort of reference to sex or sexual organs, usually involving someone else's mother or sister --- बहिनचोद (bahinchod) "one who has sexual relations with his sister" being in fact a bit more typical than मादरचोद (mādarchod).

 * मादरचोद (mādarchod) is interesting from the standpoint that mādar is a borrowing from Persian but is infrequent outside of this compound. That may well not be accidental --- its use in other contexts may be "blocked" by association with mādarchod.

Thursday, January 26, 2012

Donkey Anaphora and the King(s) of France

An end-of-the-semester gift from one of my semantics students:

A t-shirt for a (as yet fictitious?) band. Started as an in-class joke which arose from the juxtaposition of two topics:

(1) presupposition failure in sentences like "The king of France is bald", and
(2) issues involving the binding of pronouns in sentences like "Every farmer who owns a donkeyi beats iti."

Tuesday, September 20, 2011

Lizards, Walls, Dragons: on an apparently undocumented Nepali lexeme (भित्ति)

I have not posted in some time due to dissertating, searching for (and thankfully finding) a job, and subsequently moving. Here's a short posting on a Nepali word which I heard from my wife which I can't find in any Nepali dictionary.

When we moved into our new house, we discovered that there were a number of house-lizards already resident (and, less amusingly, quite a few German roaches), which our cat has really enjoyed hunting down. I remembered having such lizards in our house in India, and immediately I saw them remarked to my wife "देखो! छिपकली है!" (Look! There's a lizard!"), using the Hindi word for "lizard", छिपकली [chipkalī]. My wife replied, "in Nepali we call them 'bhitti' (भित्ति)."

I'd never heard this word before, and was curious. I checked Turner's A comparative and etymological dictionary of the Nepali language as well as his mammoth four-volume A comparative Dictionary of the Indo-Aryan languages. Neither mentions bhitti or anything like it. I also checked a number of Hindi dictionaries, none of which turned up anything. Except for Platts' A dictionary of Urdu, classical Hindi, and English, which has an entry for भित्तिका bhittikā:
S بهتکا भित्तिका bhittikā, s.f. Wall (=bhīt, q.v.); small house lizard.
This isn't quite bhitti, but it's close. I had already supposed (and my wife had already suggested) that bhitti was connected with the word for "wall" (in Nepali, भित्तो bhitto or भित्ता bhittā), given that they're often found on walls. So bhitti is something like "wall-(related) creature". [Turner does have an entry for bhitti, but he gives the meaning "wall".] 

Platts' entry indicates a Sanskrit origin, and indeed  bhittikā looks awfully Sanskritic, with the "diminutive" -(i)ka suffix, which is not really always diminutive, but rather can also attach to words with no change in meaning. But here perhaps a diminutive based on "wall" makes sense. 

Interesting, the Sanskrit word for "wall, panel, partition", bhittí, comes from a root √bhid- "to split", which is very dear to my heart (part of the Proto-Indo-European dragon mythology). 

So there's a "new" Nepali word:  bhitti "house-lizard", which doesn't seem to have been recorded before. It may be dialectal (i.e. I'm not sure that Kathmandu Nepali speakers would use it), and that's perhaps why it wasn't previously recorded. In any case, I think it's a cool word, given that it does sort of connect lizards and dragons, indirectly.

[Incidentally, Platts suggests that Hindi छिपकली [chipkalī] derives from the root chip- "to hide", which is what I always assumed (going back to an early Indo-Aryan *chapp- "press, cover, hide". Turner, on the other hand, derives it from Sanskrit शेप्या śepyā which means "tail" (and "penis", but I think "tail" is what is relevant here). The (potential) Nepali cognate of Hindi छिपकली [chipkalī] is छेपारो chepāro, though the latter might be more plausibly derived from  Sanskrit शेप्या śepyā "tail", especially as छेपारो chepāro seems to refer to outdoor lizards (while माङ्सुलि māṅsuli is used for house lizards).]

Friday, July 8, 2011

Some ponderings on Google's research on inter-language linking (Bengali <-> Swahili, Nepali <-> Marathi)

On the Google Research Blog, the latest post (by ) concerns inter-language linking, i.e. looking at webpages' off-site links which go to a page in another language. From the post:
Most web pages link to other pages on the same web site, and the few off-site links they have are almost always to other pages in the same language. It's as if each language has its own web which is loosely linked to the webs of other languages. However, there are a small but significant number of off-site links between languages. These give tantalizing hints of the world beyond the virtual.
I'm particularly interested in the data on Indian language webpages' inter-language linking, especially as there are some perplexing findings. But let's start with some findings which aren't really that surprising.

One of the features measured is the degree to which webpages in a particular language are "introverted" or "extroverted", where more "introverted" webpage languages have fewer inter-language off-site links. The data are summarised here:

Webpage languages which are higher (on the y-axis) are more introverted; webpage languages which are further to the right (on the x-axis) represent languages with a greater number of total webpages.

First, a word about the apparently high degree of English-language webpage "extroversion". The relatively high percentage of English-language websites which link to non-English websites is unlikely to represent a high percentage of native English speakers who are linking to non-English websites. Rather, this would seem to simply reflect English's status as a/the world language, so that even sites whose audience may largely consist of non-native English speakers may choose to create English-language websites simply in order to have a larger audience. And I suspect the "extroverted" English-language webpages are of that type: English is the language chosen for this type of website due to its ability to reach a more "universal" audience, but the site itself may have "local" interests, reflected by its linking to non-English language websites.

But it's the Indian languages that I really want to talk about. Given the large number of Hindi speakers, one might at first be surprised at the relatively small number of Hindi language sites (compared to say Japanese). This, I think, is easily explained by the status of English in India, especially amongst people who would be more likely to create and use Internet sites. In another words, many native Hindi speakers would choose to create English- rather Hindi-language webpages. The high degree of insularity ("introversion") of Hindi-language webpages in terms of inter-language linkage is likely not unconnected. In the context of modern India, choosing to create a Hindi- rather than English-language website is already a more "insular" choice, given the widespread use of English in India itself. Those website content creator who choose Hindi medium over English medium are likely to have more "insular" interests, and thus would not be as likely to link to non-Hindi sites (and even less likely to link to non-Indian language sites).

So, thus far, there isn't really anything terribly surprisingly about these findings. But when we look at the particular inter-language link connections which are strongest, especially in the case of Indian languages, there are some weird data:

[The arrows indicate directionality of linkage; red connections are stronger than green connections.] As point out:
Surprising links include those from Hindi to Ukrainian, Kurdish to Swedish, Swahili to Tagalog and Bengali, and Esperanto to Polish.
I would add that the Swahili-Bengali and Swahili-Tagalog links are not only strong (red), but also bidirectional (e.g. Swahili pages are linking to Bengali pages, and Bengali pages to Swahili pages). It is hard to think of convincing explanations for the connections between Swahili and Bengali (or Swahili and Tagalog). One possibility comes to mind, which is that, in terms of total Internet representation, the number of pages in Bengali, Swahili, and Tagalog is relatively small. Here the Google researchers' webpage selection criteria is presumably relevant:
The particular choice of pages in our corpus here reflects decisions about what is `important'. For example, in a language with few pages every page is considered important, while for languages with more pages some selection method is required, based on pagerank for example.
This means that for languages with a smaller Internet population individuals could have a greater effect on the particular inter-language linkages than is the case for languages with larger Internet populations.  And thus perhaps the existence of a few creators of Bengali webpage content who happen to live in central eastern Africa could be responsible for some these unexpected inter-language linkages. I would be curious to what sort of Bengali sites link to Swahili sites (and vice-versa) to see if this is a plausible idea.

There is something which worries me about these data though: look at the linkages between the Indo-Aryan languages (Punjabi, Gujarati, Marathi, Bengali, Nepali, Hindi). Punjabi, Gujarati, Marathi, Bengali, and Nepali all have strong bidirectional links with Hindi, which is to be expected given Hindi's status as a Indian lingua franca. Notice however that other than being linked with Hindi, none of the other Indo-Aryan languages are inter-linked with each other: except for Nepali and Marathi.

In India,there are large Nepali communities in West Bengal and other eastern parts of India.Marathi is spoken in Maharashtra in the far western part of India. I would be unsurprised if there were strong Marathi-Gujarati inter-language linkages (since these two languages are spoken in the neighbouring states), or if there were a strong inter-language linkage between Nepali and Bengali. But a Nepali-Marathi link doesn't make sense, at least in absence of other intra-Indo-Aryan linkages.

There is one property which I can think of which does link Nepali and Marathi, namely the fact that they both are written in Devanagari script (also used for Hindi). Gujarati, Punjabi, and Bengali, on the other hand, are each written in their own scripts (distinct from Devanagari). So I wonder if there is any possibility that the script is creating "false hits" when the off-site link connections for Nepali and Marathi are being computed. 

That also makes me worry about the other surprising inter-language linkages, such as Bengali-Swahili, Swahili-Tagalog. Not, obviously, that these languages share a common script, but whether some of the apparent connections are artefacts of the algorithm, whether due to use of a common script or some other factor. If they're not simply artefacts, then it certainly would be interesting to find out why, for instance, Bengali-language and Swahili-language webpages are linking to each other.