Shaping Ligatures in Monospace Fonts

Shaping Ligatures in Monospace Fonts


Published on
to joshleeb’s blog

For some time I’ve been working towards building a graphical code editor from
scratch. I’m still in the explorative phase of this project which involves
creating many small, conceptual pieces to better understand
the various problem spaces. The problem space I’m working through at the
moment is text rendering, with the current focus being shaping.

Shaping is the process of converting text (UTF-8 code points in our case) to a
sequence of glyphs with positional information to be rendered. It can get very
complex and computationally expensive. For more info take a look at the
HarfBuzz Manual.

You might think shaping for a code editor with a monospace font should be
trivial (hint: nothing with text rendering is trivial). The text is almost all
English with glyphs from the Basic Latin Unicode
block. That means there are rarely diacritics or
other complex font structures. So we should be able to match every code
point to a single glyph, and each glyph has the same width as we’re working
with a monospace font.

With these simplifications you can go a long way with text rendering for a
basic code editor, that is, until you want to handle emojis (which also relies
on a font fallback subsystem) or ligatures.

Inspecting the Glyphs

For this post, we’ll be using the swash crate to provide us with a
shaping algorithm. We’ll also be using the monospace font MonoLisa
which has ligature support, though similar behavior has also been observed
with Fira Code.

Now, let’s try shaping some ligatures and inspect the sequence of glyphs we
get back.

for cluster in shape("#{Q}", Ligatures::Enabled) {
    for glyph in cluster.glyphs {
        println!("id: {}, name: {}", glyph.id, glyph.name);
    }
}

// id: 0763, name: "numbersign_braceleft.liga"
// id: 1252, name: "LIGSPACE"
// id: 0129, name: "Q"
// id: 0705, name: "braceright"
Code 1. Glyph sequence from shaping the text “#{Q}”
with ligatures enabled.

And we’ll compare this to the output when ligatures are disabled.

for cluster in shape("#{Q}", Ligatures::Disabled) { ... }

// id: 0694, name: “numbersign”
// id: 0704, name: “braceleft”
// id: 0129, name: “Q”
// id: 0705, name: “braceright”
Code 2. Glyph sequence from shaping the text “#{Q}”
with ligatures disabled.

From the glyph names it appears we are correctly receiving the ligature
“numbersign_braceleft” when ligatures are enabled. To be sure, let’s rasterize
glyph #763 to see what we get.

Figure 1. MonoLisa glyph #763.

That looks good to me. But hold on… If this ligature is combining “#” and
“{” into a single glyph #763 representing “#{”, then shouldn’t we see an
output sequence of three glyphs, not four? What is this glyph “LIGSPACE”?

Inspecting the available glyph codes in Apple Font Book we see glyph #1251
and glyph #1253, but glyph #1252 is nowhere to be found.

Let’s see if we can determine what this glyph is in code. The swash
GlyphCluster has a very convenient is_ligature
function we can call. I expect that at least one of #763 and #1252 will be
identified as a ligature.

for cluster in shape("#{Q}", Ligatures::Enabled) {
    for glyph in cluster.glyphs {
        println!("id: {}, is_lig: {}", glyph.id, cluster.is_ligature());
    }
}

// id: 0763, is_lig: false
// id: 1252, is_lig: false
// id: 0129, is_lig: false
// id: 0705, is_lig: false
Code 3. Glyph sequence from shaping the text “#{Q}”
including `is_ligature` result.

But that isn’t what we get. At this stage it’s not obvious how to handle the
“LIGSPACE” glyph, others like it, or even identify them in the first place.

As a quick aside, when I initially encountered this I thought it might be
specific to the shaping algorithm used by swash. I tried
[rustybuzz][rustybyzz] and inspecting the GDEF table but got
effectively the same results. I also thought this might be specific to the
MonoLisa font, but all the same is true with Fira Code.

How Ligatures Should Work

Perhaps this is just how monospace fonts handle ligatures. To get a better
idea let’s take a look at how ligatures are shaped for the Apple Color Emoji
proportional font.

for cluster in shape_emoji("\u{1f3f3}\u{fe0f}\u{200d}\u{1f308}") {
    for glyph in cluster.glyphs {
        println!("id: {}, name: {}", glyph.id, glyph.name);
    }
}

// id: 0967, name: “u1F3F3_u1F308”
Code 4. Glyph sequence from shaping an emoji with
four UTF-8 code points.

This works exactly as expected – the four UTF-8 code points get mapped to a
single glyph #967 which is rasterized to the correct image.

Figure 2. Apple Color Emoji glyph #967.

If the expectation is that shaping a ligature will map multiple code points to
a single glyph then maybe we should be ignoring this “LIGSPACE” glyph. Even
though we don’t know how to reliably determine if a glyph is a ligature
spacer, I’ll hardcode skipping the glyph with id #1252 when rendering…

let mut image = RgbaImage::new();
let mut cursor = 0.0;
for cluster in shape("#{Q}", Ligatures::Enabled) {
    for glyph in cluster.glyphs {
        if glyph.id != 1252 {
            render(cluster, &mut image, cursor);
            cursor += glyph.advance;
        }
    }
}
Code 5. Text rendering that skips the ligature spacer
glyph #1252.

… which clearly produces the wrong output.

Figure 3. Incorrect render of “#{Q}” with ligatures
enabled.

Correctly Handling Ligatures

From this experiment we know that the ligature spacer glyph has some
information we need to process and since we’re working with a monospace font
my best guess is it’s the horizontal advance. So we’ll update our loop to
accumulate the advance of all glyphs including #1252…

let mut image = RgbaImage::new();
let mut cursor = 0.0;
for cluster in shape("#{Q}", Ligatures::Enabled) {
    for glyph in cluster.glyphs {
        if glyph.id != 1252 {
            render(cluster, &mut image, cursor);
        }
        cursor += glyph.advance;
    }
}
Code 6. Text rendering that skips rasterizing the
ligature spacer glyph #1252 but accumulates its advance.

… which produces the output we’re looking for.

Figure 4. Correct render of “#{Q}” with ligatures
enabled.

Identifying Ligature Spacers

Now that we have something working, we need to get rid of hardcoding the glyph
id. We will always get back false when inspecting if the glyph is a ligature
so it’s unclear what we’re meant to do to identify this case.

Actually though, it turns out that there is a more general solution that will
handle ligature spacers as well as any other glyph where rasterization should
be skipped. To illustrate this, let’s try to rasterize each glyph and inspect
the size of the produced image.

for cluster in shape("#{Q}", Ligatures::Enabled) {
    for glyph in cluster.glyphs {
        let image = rasterize(glyph.id);
        println!("id: {}, dim: {}", glyph.id, image.dimensions());
    }
}

// id: 0763, dim: 17x12
// id: 1252, dim: 2x0
// id: 0129, dim: 9x11
// id: 0705, dim: 8x12
Code 7. Glyph sequence from shaping the text “#{Q}”
including rasterization dimensions.

Of course, the most interesting output is that glyph #1252 (our ligature
spacer) rasterizes to an empty image. What this means is that we don’t need to
identify and ignore ligature spacer glyphs specifically but rather any glyph
that has an empty rasterization.

What’s Going On

Honestly, I’m not certain.

I’m not an expert with text rendering. I haven’t read the full OpenType
spec, nor the Harfbuzz shaping algorithm. I’m not a font
creator. And when I searched all over the web to find answers (for ligatures
in monospace fonts specifically) I couldn’t find a thing.

However, going off these experimentations I can make an educated guess.

Let’s take a look at the advance width of each non-empty glyph, i.e. skipping
our ligature spacer glyph. The min X value of each blue box is the cursor
position and the width is the glyph’s horizontal advance width.

Figure 5. Advance-width boxes of glyphs for text “#{Q}” skipping
glyph #1252.

Since we are working with a monospace font, every glyph must have the same
advance, but the glyph advance is too small for the ligature “#{”. To get
around this, it seems the font designers use the ligature spacer glyph which
can’t be rendered (with dimensions of 2 x 0 pixels) but will add to the
advance of the previous glyph.

Figure 6. Advance-width boxes of glyphs for text “#{Q}” including
glyph #1252.

Wrapping Up

Having come to this conclusion of how to handle ligature spacer glyphs in
monospace fonts, it all seems very logical and straight forward. I was,
however, surprised that I didn’t encounter an explanation of this given the
number of code editors, text editors, word processors, browsers, terminals,
and any other program that needs to shape and render text.

For anyone else undertaking their own text rendering journey, I hope this
helps make it a little bit easier. And for those with much more text
experience, if you know of any articles that either back up this educated
guess, or disprove it, please let me know.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *