Thursday, September 16, 2010

Philology and (La)Tex: on Proto-Indo-European dragon-slaying and Hittite ḫ

A couple of years ago I made the switch from Word to LaTex. At the time I was in the middle of writing a paper on formulaic language in Proto-Indo-European, specifically working on the reconstruction of formulae connected with the PIE dragon-slaying mytheme. Though the (first draft of the) paper was mostly written, I decided I would reset it in LaTeX. This was a rather labourious task, but resulted in much more aesthetically-pleasing document, and LaTeX allows for a much easier system of referring to numbered examples than does Word (amongst other benefits of the LaTeX type-setting system). [I use Wolfgang Sternefeld's linguex package for example numbering.]

As this was a philological paper dealing with a number of different languages (Old Irish, Old English, Old Saxon, Gothic, Vedic Sanskrit, Classical Greek, Avestan, Pahlavi, and Hittite), special diacritics and characters were required. Rei Fukui's TIPA package handles almost all of the characters/diacritics which were needed. The one exception was the Hittite "laryngeal " and polytonic classical Greek.

I. How to typeset Hittite in LaTeX:
The character may be defined by the following macro (assuming that the TIPA package has been loaded in the preamble by \usepackage{tipa}):
Then whenever is required, it may be called via the command {\hith}, as in the following text:
n=an=za namma \super{\sc{mu\v{s}}} illuyanka$[$n$]$ tara{\hith}{\hith}\={u}wan d\=ai\v{s}
which results in:
n=an=za namma MUŠilluyanka[n] taraḫḫūwan dāiš
(meaning "He (the storm god) began to overcome the serpent"; from KBo. 3.7 iii 24-5)

II. How to typeset classical Greek in LaTeX:
In the philological tradition, the only language using a non-Latinate script which is not transliterated is Greek (I've always found this a bit unfair: why isn't Sanskrit rendered in Devanagari?). To typeset polytonic (ancient) Greek in LaTeX, we'll need the following packages: babel, teubner, fontenc, cbgreek. Defining a macro \greekfont then allows us to switch to polytonic Greek.

A minimal example illustrating the usage:
\usepackage{mathptmx} %OPTIONAL, in order to set Latin/English in Times font
\usepackage{tipa} %OPTIONAL, for typesetting diacritics/special characters for other lgs.


{\noindent}From Pindar's \textit{Olympian} 13.63--4:
{\noindent}\greekfont{\Ar{o}c t\cap{a}c \s{o}fi\'hdeos u\r{i}\'on pote Gorg\'onos \cap{\s{h}} p\'oll> \s{a}mf\`i krouno\cap{i}c\\P\'agason ze\cap{u}xai poj\'ewn \Gs{e}pajen}


(meaning "who (rel. pro.) beside the Springs, striving to break the serpent Gorgon's child, Pegasos, endured much hardship")

You'll need to make sure you have the full cbgreek package, otherwise the Greek font will be blurry and ugly.

III. Post scriptum
Here's the rub (of course): having produced a beautifully typeset document, I submitted it to Historische Sprachforschung (Adalbert Kuhn's old Zeitschrift für vergleichende Sprachforschung). It was accepted, but the journal could only process Word documents. So I had to go back and retypeset the whole thing in Word (again). This involved a lot of using find-and-replace (to turn LaTeX code/macros into Unicode characters or Word formatting or example numbers etc.) Unfortunately, this almost meant that the table I had managed to fit on a single page (in order that the various formulae could be easily compared) using smaller font sizes and rotating it horizontally using the package lscape, thus

is in HS split across three pages...

Happily, the original LaTeX-produced version did in fact appear earlier in Studies in the Linguistic Sciencies: Illinois Working Papers 2009 (who do accept LaTeX submissions (since I designed a LaTeX style file for the journal...)).


[1] Slade, Benjamin. 2008[2010]. How (exactly) to slay a dragon in Indo-European? PIE *bheid- {h3égwhim, kwŕ̥mi-}. Historische Sprachforschung 121: 3-53. [link]
[2] Slade, Benjamin. 2009. Split serpents and bitter blades: Reconstructing details of the PIE dragon-combat. Studies in the Linguistic Sciences: Illinois Working Papers 2009: 1-57. [link]


  1. Things are moving in the TeX font world: these days XeTeX lets you write directly in unicode and the (XeTeX-only) fontspec package gives you easy access to system fonts. I've only used it for modern Greek so I can't say if it's mature enough for your more complex needs, but perhaps worth a look? (I have a blog post with some related notes and source links.)

  2. I know about Xe(La)Tex, but I prefer macros to inputting unicode directly (I hate having to search for symbols). Also, the last time I tried XeTex, I couldn't get it to work properly. I also don't know how well it works for polytonic Greek.

    I read somewhere (though I don't remember where) that Unicode support is going to be integrated into LaTeX itself (or pdflatex anyway) at some point in time.

    I'll have to play around with XeTeX again sometime though.

  3. Fair enough; most of the benefits of straight unicode input disappear if you can't type the characters you need!

    The whole area of font handling (including input encodings) seems to be a development hotspot at the moment, it's probably worth checking back every two or three years to see if things have improved.

  4. Sounds fascinating. Are you allowed to host preprints, or can you send them to the ArXiv?

    See also Nick Nicholas's essay Don't Proliferate; Transliterate!

  5. @Tikitu: There is some package which silently translates TIPA into Unicode characters (I think it's a preprocessor or the like). But so far I haven't really needed anything other than TIPA characters so there hasn't really been any need for me to use XeTeX. But I am interested in using Unicode input for Devanagari & Sinhala script at some point in time (though I haven't really needed to use it, for reasons mentioned in John Cowan's linked essay).

    @John: Thanks for the link. It's an interesting article. The nice thing with *not* relying on Unicode is that new characters can be created as needed.

    I think the typical linguist's equivalent of ArXiv is lingbuzz; but I think in theory I'm not supposed to do either. However, since the SLS "working paper" version is almost exactly the same article (except with a prettier table), I don't think I really need to worry about it anyway. Link to the SLS version is here:

  6. #1: Thank you, thank you, thank you, for sharing a way to set /ḫ/. My inability to do it with TIPA has been bothering me for awhile.

    #2: I imagine historically it was probably just a complete pain in the ass for European typesetters in the 19th century to set combined devanagari in mundane academic publications, and the tradition of transliteration just stuck... Still, no excuse not to write in devanagari on the blackboard during a presentation.

  7. @Mattitiahu: On #2 - I suppose it would have been difficult for 19th-c. typesetters to set devanagari, but then again 19th-c. typesetters actually seem to have done a lot of rather difficult typesetting. Some of the 19th-c. books at least have typeset devanagari (like Speijer's Sanskrit syntax). And Greek must be harder to typeset than Roman too, yet Greek is rarely transliterated. Greek I think has/had a long history in European academia which resulted in Greek letters being entrenched.

  8. I think I mean moreso that it would be time-consuming and expensive, and I'd imagine fewer typesetters would have devanagari faces. I open admit I'm speculating with no concrete facts. Still, I agree in general that it's unfair to the other languages that everything gets transliterated in IE-istik except for Greek. I mean, I want to see more typeset Gothic. Gothic has such a *cool* alphabet...

    On the other hand now that we have cuneiform unicode fonts, I wouldn't want to have to be yanking the sign-list off the shelf every time I read an article on Hittite. Mηδὲν ἄγαν, I suppose.

  9. When I studied Sanskrit, I absolutely hated devanagari -- I used transliteration every chance I got, and never did really learn it well. This pissed off my Sanskrit teacher (who was a jerk anyway), but it's perfectly natural for an Indo-Europeanist, who has no interest in reading Kalidasa but simply wants to know the forms. As cool as various scripts are, transliteration is the only sensible way to go; I'd even be willing to use transliterated Greek, though it would feel weird.

  10. @Mattitiahu: I was thinking the same thing about Hittite.

    @Language: My Sanskrit teacher gave us a week to learn devanagari. For whatever reason, I didn't find it too bad and it's been very useful to know it (especially since I work on Hindi & Nepali too, which use the same script). But I've never successfully learned an Indian script since then. Bengali script, which of course derives from the same root as devanagari, has characters similar to devanagari, but unfortunately the similar characters often have different (phonetic) values---so actually knowing devanagari almost makes it worse. And I'm still trying to get a firm grasp on Sinhala script.

    Even as an Indo-Europeanist, sometimes devanagari would be useful. Some important books on Sanskrit (like Speijer's Sanskrit syntax) use untransliterated (and often untranslated) devanagari.

  11. I'm a bit late to this party, but let me say that XeTeX works just dandy for polytonic Greek. Everything I've done for in the last few years is done in XeTeX. With some metrical symbols: Delectus Indelectatus (PDF).

  12. Thanks, Wm. Good to know that XeTeX works for polytonic Greek as well. Can one use the cbgreek fonts with it though?

  13. I'm afraid I can't say. Once I went to XeTeX, I abandoned the cbgreek fonts for things like Gentium and the many fine faces from the Greek Font Society.

  14. @Tikitu: Just to update: Luatex supports Unicode directly, and the plan is that pdflatex will merge/become Luatex:

  15. Cheers, good to know. LuaTeX looks like the happening thing (and honestly the idea is just so sensible... use TeX for what it's good at, and something else better for what it's not).

  16. Thanks for showing me a way to typeset the h with breve below, but how do I make it a capital?

  17. @Pål:

    The following should work

    Define in preamble:

    And then use \hitH in the document for the capital version (H with breve beneath).

  18. @be_slayed:
    Thank you, worked like a charm.