Home | Products | Issue Tracker | FAQ | Download | |
Date: | 2013/06 |
---|---|
Author: | Thomas Bonfort |
Contact: | thomas.bonfort@gmail.com |
Status: | Adopted |
Version: | MapServer 7.0 |
When a feature needs to be labelled, the following (simplified) steps are undertaken:
the string eventually goes through iconv to be converted to utf8
the string goes through fribidi to reorder the glyphs from “logical order” to “visual order”, to support languages that are written from right to left. e.g. supposing capital letters are arabic letters, the text which is stored in the feature as this is some ARABIC text is transformed into this is some CIBARA text in order to be fed transparently to the renderers that currently layout glyphs from left to right
the string goes through line breaking, which is currently broken for RTL scripts for line breaking as we render (for arabic \n text)
TXET
CIBARA
instead of the required
CIBARA
TXET
the string goes through alignment, which is currently hacky and unprecise (thanks to yours faithfully) as we (try to) achieve alignment by padding with spaces instead of using precise offsets per line
the string is then passed to the renderers as-is, who are responsible for the whole laying out of each glyph
While the current situation is simple to understand, and works reasonable well for latin languages, it has a number of shortcomings that this RFC aims to resolve:
For the big picture, see the state of text rendering.
In short, instead of passing a string of text and a starting position to the renderers, we pass a list of glyphs (i.e. a glyph is a specific entry inside a specific font file) along with their precise positioning, very similarly to what we do currently when rendering FOLLOW labels, as is already outlined in bug 3611 . The actual shaping and layout happens in a single mapserver function, and the renderers just dumbly place glyphs where they are told to.
Architecturaly, this modification is significant, as the whole label rendering chain needs to be refactored in order to take a list of positionned glyphs into account instead of just plain text strings.
Note that the Fribidi and Harfbuzz dependencies will remain optional and can be disablable at compile time for those only treating latin scripts. Harfbuzz and Fribidi will need to be enabled or disabled together.
(fontcache.c) A global (thread protected if needed) glpyh and font cache will ensure that cached glyphs are reusable across multiple requests for the fastcgi case, but in turn requires some thread-level protection and probably some pruning in order for it to remain of reasonable size. Some APIs have changed in order to have the fontcache accessible. The fontcache contains caches for:
This step will be added to mapserver, and consists in coordinating the outputs from freetype, the bidi algorithm, harfbuzz, line-breaking, text alignment, and in the future text and line spacing. This step could be in most part implemented transparently through pango, however pango is hard to support accross platforms, and, to my knowledge, has a hard dependency on fontconfig which is incompatible with our fontset approach.
Text is represented by a “textPathObj” which is basically a list of positioned glyphs. (e.g the word “Label” at size 10 for an arial font is represented by arial font’s glyph “L” at position (x,y)= (0,0), glyph “a” at position (10,0) , “b” at (18,0), etc...). Multiline text is handled transparently by having glyphs positioned at different y values. A textPath can be either “absolute” (i.e. the glyph positions are in absolute image coordinates, used to position glyphs for angle follow labels), or “relative”, in which case they must be offset by their labeling point.
All the shaping happens in textlayout.c, who’s principal role is to take a string of text as input, and return a list of positioned glyphs as output. The input string goes through multiple steps, and is plit into multiple run. Each run will have a distinct line number, bidi direction, and script “language”.
As an example, we’ll be working with the input unicode string “this is some text in english, ARABIC and JAPANESE”. Capital letters are used to denote non latin glyphs, also note that ARABIC is stored in logical (=reading) order, whereas it would be rendered as CIBARA.
run1 = "this is some text in english, ARABIC and JAPANESE", line=0
run1 = "this is some text in english,", line=0
run2 = "ARABIC and JAPANESE", line=1
run1 = "this is some text in english,", line=0, direction=LTR
run2 = "ARABIC" line=1, direction=RTL
run3 = " and JAPANESE", line=1, direction = LTR
run1 = "this is some text in english,", line=0, direction=LTR, script=latin
run2 = "ARABIC" line=1, direction=RTL, script=arabic
run3 = " and " line=1, direction=LTR, script=latin
run4 = "JAPANESE" line=1, direction=LTR, script=hiragana
LABEL "arialuni,arial,cjk,arabic"
can now be written prefixed by a script identifier, i.e.
LABEL "arialuni,en:arial,ja:cjk,ar:arabic"
This is needed as there is and will be overlap between font glyph
coverages, and it should be possible to prioritize which font is used
for which language.
The labelcache and the renderers will need to be updated to work with a list of glyphs. Changes here are extensive but should remain conceptually simple. Individual renderers are substantially simplified.
Work has been done to trim down the labelcache computations as much as possible:
When inserting features into the labelcache:
At the msDrawLabelCache phase:
We delay computation of the label text bounding box to after we have checked conditions that would cause it not be renderered, i.e.
The Collision detection has been optimized:
The speedups for these changes are extremely important for cluttered maps, c.f. https://plus.google.com/u/0/118271009221580171800/posts/PrwhFYSkhea (e.g. rendering time goes from 800 to 1 second for 500.000 labels)
Potentially numerous: