Deduplication in Embedded Fonts

When a presentation is exported from LibreOffice Impress as SVG, it embeds all of the fonts that are used in the document. This is done conservatively: the embedded fonts only include the glyphs that are actually used in the presentation. This is a great feature, but it introduces some complications we need to deal with.

When I started importing multiple SVG presentations into a single HTML presentation, each SVG would come with its own set of glyphs. This worked, but meant that each time I imported a new file I was adding a mostly-redundant embedded font to the document. Particularly after I added the ability to re-import exported presentations to make changes, this led to documents accumulating a lot of unnecessary bloat.

Fortunately this was not very hard to deal with, since SVG fonts are just normal DOM objects that can be manipulated like any other. After spending some time with the SVG fonts specification to figure out what distinguished a particular font definition, I wrote some functions to handle this. Each time a file is imported its font definitions are loaded into memory and removed from the SVG element. Redundant glyphs are ignored. Once the files have all been loaded we generate new embedded fonts, with no redundant data.

SVG fonts are not as powerful as “real” font formats. There’s no hinting available, and the kerning options aren’t as flexible. (In most cases the kerning isn’t an issue for us because LibreOffice manually positions the characters, but that’s not a general solution.) Therefore if a font is installed on the system, we would prefer to use that local copy instead of the embedded copy.

Previously, I had been rewriting every font reference in the document to refer to both the embedded and native fonts, using different font-family names, with the native fonts having a higher priority. While I was reading font specifications to work on deduplication, a more elegant solution ocurred to me.

Instead of inserting the embedded font directly into the document, it’s encoded as a base64 data: URL and embedded in a CSS @font-face declaration. These declarations let us specify a list of font sources, all grouped under a single font-family. For example, if we use Consolas in a presentation our declaration will first reference local(“Consolas”), the installed system font, before falling back to url(“data:[…]”) format(“svg”), our embedded font.

This is a nice solution all around. It has no unnecessary bloat, and elements in our document just need just to reference their desired font-family in the normal manner, without needing to worry about fallbacks.


Follow-up to “HTML5 Media Events – .cancelable?”

Firefox developers recently responded my bug report that I previously mentioned, about HTML5 media events mistakenly being cancellable, even though this contradicts the specification and has no effect. The conclusion is that

this is just a case of the spec changing and nobody noticing.

The specification was updated in April 2009 to include the behaviour I discussed. Firefox, WebKit and Opera all implemented the events before that, and didn’t entirely update their behaviour to reflect the new changes. A patch for Firefox has now been uploaded and tickets have been filed against WebKit and Opera.

Ironically, Internet Explorer 9 and 10 apparently do conform to the spec — probably because they didn’t implement any HTML5 media until years after the changes were made.


The only use of Slide-Drive so far has been to produce some demos, so I haven’t focused on user-friendliness. LibreOffice Impress doesn’t export links (yet), so when I want to include them in examples I’ve just edited the source or used debugging tools to make the changes to the DOM while the presentation is loaded in the editor. Importing files is (obviously) an essential part of using our editor, but the only way to do that has been to drag-and-drop files onto the Butter timeline. There’s been no way for a user to know this without being told.

To address this, I’ve added a floating control panel with these options. It allows users to import files using the standard file selector and allows them to add and remove links from SVG slides. I’ll probably add some controls to deal with other shortcomings of the SVG output. For example, exported shapes and lines do not include a stroke width, so if they have a border it will always been one pixel wide, regardless of what it was in the original presentation. Add a control to adjust these widths would help deal with this.

Another very important addition has been the ability to reload exported presentations. Until now, it has been impossible to edit a presentation once it’s been exported, resulting in a ton of repeated effort just to make a small change to the presentation. It’s now possible to import previously-exported presentations, loading their slides into a new track and copying any embedded fonts. (Their audio tracks will not be imported, for two reasons: we don’t want to clobber the audio tracks that are used by the current presentation (we might be copying slides to an existing presentation), and the presentations might be located at different paths, causing the audio file’s paths to be invalid.)

This change has already saved me quite a bit of headache when producing the latest demo.

Finding the external coordinates of an SVG element

Firefox’s SVG support is pretty good, but it has one irritating shortcoming: it’s impossible to select text. Being able to select and copy text from slides is an important feature for us to have, so it was necessary to come up with a workaround. I decided to copy all of the text from the SVG into invisible <div>s that were placed on top of the SVG. It would appear the same to the user, but whenever they attempted to select text they’d actually be interacting with HTML elements instead of SVG elements.

Unfortunately, elements inside of an SVG document don’t use the same coordinate system as the rest of the HTML document. In fact, they don’t even use a single consistent coordinate system. For example, you can add the property transform=”scale(0.5)” to reduce the size of an element and all its contents by one half. This has the effect that an element with width=”200″ inside of this element would be equivalent to an element with width=”100″ outside of it.

The other most common transformation is translate(x, y), which shifts an element and its contents by the specified amount. The two accounted for almost all of the transformations I encountered in files I was working with, so they’re the ones I focused on addressing. It’s also possible to rotate() or skew() the coordinate system, but handling those would make things many times more complicated for relatively little payoff.

When I started working on this, I quickly realized another issue: if you allow the SVG document to be resized, for example to fit to the size of the window, this causes the relationship between its internal coordinate system and its external dimensions to be inconsistent. If you make an <img> wider without making it taller, the image will be stretched to fit. If you make an <svg> wider without making it taller, it will just increase its internal width and shift its contents so they remain centred. I could have a <text> element at (0, 0) and position a <div> on top of relative to the SVG, but if I stretch the SVG the <text> element may move to (100, 0) while the <div> remains fixed.

I addressed this by making sure that the SVG elements maintained their aspect ratio when they were resized. As long as I did this and defined the coordinates in terms of percentages of the SVG’s overall dimensions, things would be fine. However, there’s no built-in way to force an SVG to maintain a fixed aspect ratio like there is for <img>s. My solution generates something like this:

<div style=”position: relative;”>
<img src=”data:…” style=”width: 100%; height: auto; display: block;” />
<svg style=”position: absolute; top: 0; bottom: 0; left: 0; right: 0; width: 100%; height: 100%;”>…</svg>

I use a canvas to generate a transparent image with the same dimensions as the SVG initially had. The image is scaled to fit the available width; its height will automatically adjust to maintain the same aspect ratio. This causes the dimensions of the parent <div> to be stretched to match. I have the <svg> absolutely positioned to fill this <div>, causing it to match the dimensions of the <img>, causing it to maintain a fixed aspect ratio. Any elements I want to position relative to the SVG (the text overlays) can also be positioned relative to the <div>.

This works, thankfully. Being forced to recalculate all of the coordinates whenever the window was resized would have been obnoxiously laggy in many cases.

Within the SVG document, each element’s coordinates and dimensions are defined relative to a “viewport” element. In our case this will be the closest ancestor that defines a transformation or, if no ancestor is transformed, the root SVG element.

There may not have been a function to get the coordinates and dimensions of an element relative the HTML document, or even relative to the root SVG element, but if there were at least some way to find the coordinates and dimensions relative to the viewport then it would just be a matter of stepping up through the viewport, the viewport’s viewport, the viewport’s viewport’s viewport and so on, using the internal and external dimensions and coordinates of each element to work out the linear relationship between their coordinate systems and iteratively applying these relationships until you’re left with the relationship between the element and the root SVG.

At first, I thought I had found such a function: element.getBBox(). Playing with it in the console, it gave me the values that looked like they could be right, and I wrote the rest of the logic assuming they were. When I finally ran the code, it sort-of worked: on some slides, the text would be positioned perfectly. However, on others it would be too low, too high, or stretched awkwardly.

It turns out that getBBox() wasn’t returning the actual dimensions and coordinates of the element itself, but of a bounding box containing all of the rendered elements inside of the element. If you have a giant <g> element but the only renderable element it contained was a tiny <rect>, you’d end up with dimensions that were too small. If you had a tiny <g> element containing a <rect> that was far larger than it was, you’d end up dimensions that were too large.

I couldn’t find an elegant solution to this. What I ended up doing was, instead of applying getBBox() to the element itself, I created a <rect> inside of the element, gave it the necessary attributes so that it would have the same coordinates and dimensions as the element (x=”0″ y=”0″ width=”100%” height=”100%”), and then applied getBBox() to this rect instead. Because the <rect> was a renderable element and has no children, its bounding box would always be exactly the same as its coordinate and dimensions, which would be exactly the same as those of the element I was actually interested in. The rect was still considered renderable even if it had fill=”none” and stroke=”none”, so I left these elements in the <svg> instead of removing them. (If I removed them I’d need to re-add them the next time I needed to find the coordinates of the element, which would be slower and would happen several times for most elements, since they’d be the ancestors of multiple text elements.)

After implement this, everything finally worked! The hacky workarounds left me feeling a little dirty, my relief and satisfaction were enough to squash that. If you’re interested, you can find all of this code and a few other SVG-related functions in /js/svg-container.js.

HTML5 Media Events – .cancelable?

I’ve recently been trying to track down a bug related to permalinks in Slide-Drive. They usually work fine, jumping to the right point in the presentation as soon as the document is loaded. However, it would occasionally jump right back to the beginning after jumping to the correct time. Removing MediaElementPlayer fixed the bug, but I haven’t been able to narrow it down any further and have difficulty even reproducing it reliably.

My current goal is to have a working demo, not an entirely polished product, so after fighting with this for a while I decided to just try to find a workaround. I could see that a timeupdate event was being fired each time the audio would jump back. Inspecting the event, I noticed that its .cancelable property was true.

Great! I’ll just add a handler to cancel any timeupdate events that try to move to time 0 within a short time of the page loading. It’s ugly, but it should serve well enough for my demo, right?

Wrong. I tried this but it had no effect. My handler would run and call .cancelDefault(), but nothing would happen. I threw in .stopPropagation() and return false; on the chance that it would help, but to no avail.

Frustrated, I consulted the specification for HTML5 media elements and events. Most of the events it specified, including timeupdate, were described as “simple events”. It described a “simple event” as an event… (emphasis mine)

…which does not bubble (except where otherwise stated) and is not cancelable (except where otherwise stated), and which uses the Event interface

As it happens, it is not otherwise stated that timeupdate should be cancelable, so it should not be. Upon reflection, this is reasonable: cancelling a direct navigation to a specific time makes sense, but timeupdate is also fired at intervals while the video is playing normally. What would it mean to cancel one of those? Perhaps pause the video and rewind to the time of the last timeupdate event? Something could be defined, but there isn’t an obvious interpretation.

So, why was the event marked as .cancelable?

I experimented with some other “simple events” on HTML5 media: play, playing, pause, seeking, seeked. None of them were specified as cancelable in the specification and none of them responded to .preventDefault(), but all of them were marked as .cancelable. This behaviour was the same in Firefox, Chrome, Safari and Opera.

The specification seems unambiguous, and the behaviour it describes actually makes sense. Why do no browsers adhere to it?

I’ve filed a bug against Firefox. I’m going to wait for a response to see if there’s some explanation I’ve missed. If not, I’ll file bugs against the others too.

SVG and Web-Safe Fonts

Fonts have been a source of annoyance as I’ve worked on Slide Drive’s LibreOffice-SVG importing. SVG doesn’t have any way to flow text like it does in a web page; it must be positioned on a per-line or per-character basis. This makes it particularly important to get fonts right. If text is positioned per-line, using the wrong font will cause lines to be too short or too long, possibly ruining the slide layout. If text is positioned per-character, using the wrong font will cause the inter-character spacing to be really off, making it ugly and much more difficult to read.

Things initially looked good: Marco Cecchetti programmed LibreOffice’s SVG exporting feature to embed copies of all necessary font/characters combinations in the exported document. It seemed like users would be able to use any fonts they wanted without needing to worry. Unfortunately, I soon realized that Firefox does not support embedded fonts in SVGs. This isn’t going to change, either: they chose not to implement them because of concern that it could harm adoption of the Web Open Font Format.

Tools exist to convert between these formats, but none of them are in JavaScript, which is the only thing we’re using on the client and the server (Node.js). We’re left with nothing but the fonts that happen to be installed on the users’ systems.

There are a handful of “safe” fonts that we can expect to find on 90%+ of clients. There are no fonts with 100% reach. Most fonts, including the widely-used ones like Arial, are proprietary and aren’t always available on *nixes. However, a couple of alternatives each for Arial, Times New Roman and Courier New have been designed with the goal of perfectly matching the kerning/spacing of the original fonts. These won’t appear identically, but they shouldn’t maintain the readability and layout of the originals.

The “Liberation” font set is one of these alternatives, and is becoming incredibly popular on Linuxes. Where it isn’t supported, Google has their own set that are available through Google Web Fonts, providing a cross-platform option (at least when Internet is available).

So, here’s the approach I’m going with: include a list of alternatives for many common fonts. The top priority are fonts with identical metrics, as mentioned above, but at a lower priority we’ll include approximately-similar fonts, and finally fall back to a generic font category (serif, sans-serif or monospace). For each font in the document, try the following options in order:

  • Check if the font is installed on the system.
  • Check if the font is embedded in the SVG, if supported.
  • Check if any of the fallback fonts are installed.
  • Check if any of the fallback fonts are available through Google web fonts.
  • Give up.

When the user is creating a presentation, they’ll be warned if they use a font that we don’t expect to be “safe”, and warned more strongly if they use a font we don’t recognize at all (i.e. have no fallbacks for, of any quality).

This isn’t perfect, but it should provide the best experience possible in most cases.

GSoC Progress Report: Week of May 4th, 2012

(GitHub diff for the week.)

Implemented HTML/JavaScript tool to convert LibreOffice’s SVG exports to Slide Drive presentations

So far I’ve been using a Python command-line script to convert LibreOffice’s exported SVG documents into Slide Drive presentations. This served its purpose for testing, but fails pretty hard when it comes to user friendliness.

The new tool still needs some polish, but isn’t too bad. The user is able to specify the audio source(s) as well as the durations and transcript text associated with each slide. Embedded fonts are stripped from the SVG (we avoid them due to incomplete browser support) and references to them are edited to refer to system fonts instead.

Heavily Refactored Existing JavaScript

Our existing JavaScript works fine for stand-alone presentations, but Butter is a much more temperamental environment. Among other issues, race conditions would frequently cause the page to fail to load. We also needed to be able to specify different behaviour for (1) a normal presentation, (2) a presentation being edited in Butter and (3) a presentation that has been exported from Butter. The existing code structure made this difficult and ugly.

I merged external/popcornjs/parser.deckjs.js (it wasn’t actually external) and js/setup.js into a single file. After Butter-supporting changes these two were very interdependent and it didn’t feel like a very natural way to split the code.

Modified Popcorn Plugin, Implemented Butter Template

The existing Popcorn plugin we were using assumes that slides are going to have the same order in the DOM as they are supposed to have in the presentation. Neither of these assumptions is safe when running in Butter. We also needed the slide’s transcript to be part of the Popcorn event for it to be editable from Popcorn.

Once this and the refactoring were done, I put together a Butter template that could be used to edit presentations. There’s still a lot of work to be done here, but the basic case is working.