texish

Bengali–Assamese

texish sets the Bengali–Assamese script — used for Bengali, Assamese and several other languages — with the syllable shaping the script needs. Like Devanagari it is written left to right,…

texish sets the Bengali–Assamese script — used for Bengali, Assamese and several other languages — with the syllable shaping the script needs. Like Devanagari it is written left to right, so it needs no bidirectional reordering; what it needs is work inside each syllable. A consonant carries an inherent vowel; a vowel sign attaches above, below, before or after it; and two consonants joined by the hasant (the virama) fuse into a conjunct. The engine segments a run into orthographic syllables and shapes each one — building the conjuncts, reordering and splitting the signs that are written and drawn in different places, and positioning the marks — all automatically from text typed in the usual Unicode order.

The Bengali font

A Bengali face, Noto Serif Bengali, is bundled in a regular and a bold cut. Select it with \font bengali (or the alias \font assamese):

{\font bengali 14 regular
বাংলা একটি সমৃদ্ধ ভাষা।
}
{\font bengali 18 bold নমস্কার!}

(As with the bundled Hebrew, Arabic, CJK and Devanagari faces, the in-browser Scala.js build does not ship the Bengali font, to keep the download small.)

What the engine shapes

Everything below happens automatically for any run set in a Bengali font — you type the characters in reading order and the engine does the rest.

The pre-base vowel signs reorder. The i, e and ai signs (ি, ে, ৈ) are typed after their consonant but drawn before it. In কে (ka + e-sign) the sign is set to the left of the ka even though it follows it in memory, and the font’s word-initial variant is chosen when the sign opens a word.

The two-part o and au signs split. The o sign (ো) and au sign (ৌ) are single characters that render in two pieces — an e-sign before the base and an aa-sign (for o) or a length mark (for au) after it. The engine splits them, following Unicode’s canonical decomposition, before shaping: কো, নৌকা.

Consonants fuse into conjuncts. A consonant, the hasant, and another consonant join into a conjunct ligature: ক্ত, ছাত্র (tra), জ্ঞান (jñ), কষ্ট (ṣṭ), বিদ্যা (dya).

A word-initial ra becomes a reph. followed by the hasant at the head of a syllable is not drawn in place but as a small stroke above the syllable’s base — কর্ম, ধর্ম. A after the hasant becomes the post-base ya-phalaa (মধ্য), and a ra after another consonant hangs below it, fused into the glyph — প্রেম, গ্রাম.

A nukta composes with its consonant. The dot written below a consonant joins it into a single letter — the rra, rha and yya of বড়, আষাঢ়, আয়না.

Vowel signs and syllable marks are positioned by the font. The below-base u signs, and the anusvara and visarga, are placed on their base by the font’s anchors — the same mechanism that positions Hebrew niqqud and Arabic harakat: কু, কং, কঃ.

Setting a paragraph

Bengali sets in ordinary running text: the line breaker finds its breaks at the interword spaces exactly as for Latin, and only the shaping within each word is special. A whole Bengali paragraph needs nothing beyond selecting the font.

{\font bengali 12 regular
এটি বাংলা লিপিতে লেখা হয়েছে। প্রতিটি অক্ষর একটি ব্যঞ্জন এবং তার সঙ্গের স্বর মিলে তৈরি হয়,
আর দুটি ব্যঞ্জন হসন্ত দিয়ে যুক্ত হয়ে যুক্তাক্ষর গঠন করে।
}

Because Bengali sets left to right like a Latin text face, it uses the same TeX-style punctuation shorthands: `` and '' become curly quotes, --/--- en/em dashes, and ... an ellipsis, all of which the bundled face carries. (The right-to-left Hebrew and Arabic faces do not — their quote conventions differ, so those documents type the marks literally.)

The bundled scripts/bengali-demo.script sets a full page covering each of these features.

Search

Esc
to navigate to open Esc to close