Escaping and markup
-------------------
In some cases it is common to have other kinds of markup mixed in with
translatable text, especially for things like HTML/web outputs. Handling these
requires extra functionality to ensure that everything is escaped properly,
especially external arguments that are passed in.
For example, suppose you need embedded HTML in your translated text::
happy-birthday =
Hello { $name }, happy birthday!
In this situation, it is important that ``$name`` is HTML-escaped. The rest of
the text needs to be treated as already escaped (i.e. it is HTML markup), so
that ```` is not changed to ``<b>``.
fluent-compiler supports this use case by allowing a list of ``escapers`` to be
passed to the ``FluentBundle`` constructor or to ``compile_messages``.
.. code-block:: python
bundle = FluentBundle('en', resources, escapers=[my_escaper])
An ``escaper`` is an object that defines the following set of attributes. The
object could be a module, or a simple namespace object you could create using
``types.SimpleNamespace`` or the provided
:class:`fluent_compiler.escapers.Escaper` dataclass, or an instance of a class
with appropriate methods defined. The attributes are:
- ``name: str`` - a simple text value that is used in error messages.
- ``select(**hints)``
A callable that is used to decide whether or not to use this escaper for a
given message (or message attribute). It is passed a number of hints as
keyword arguments, currently only the following:
- ``message_id: str`` - a string that is the name of the message or term. For terms
it is a string with a leading dash - e.g. ``-brand-name``. For message
attributes, it is a string in the form ``messsage-name.attribute-name``
In the future, probably more hints will be passed (for example, comments
attached to the message), so for future compatibility this callable should use
the ``**hints`` syntax to collect remaining keyword arguments.
The callable should return ``True`` if the escaper should be used for that
message, ``False`` otherwise. For every message and message attribute, the
``select`` callable of each escaper in the list of escapers is tried in turn,
and the first to return ``True`` is used.
- ``output_type: type`` - the type of values that are returned by ``escape``,
``mark_escape``, and ``join``, and therefore by the whole message.
- ``escape(text_to_be_escaped: str)``
A callable that will escape the passed in text. It must return a value that is
an instance of ``output_type`` (or a subclass).
``escape`` must also be able to handle values that have already been escaped
without escaping a second time.
- ``mark_escaped(markup)``
A callable that marks the passed in text as markup i.e. already escaped. It
must return a value that is an instance of ``output_type`` (or a subclass).
- ``join(parts: Iterable)``
A callable that accepts an iterable of components, each of type
``output_type``, and combines them into a larger value of the same type.
- ``use_isolating: bool | None``
A boolean that determines whether the normal bidi isolating characters should
be inserted. If it is ``None`` the value from the ``FluentBundle`` will be
used, otherwise use ``True`` or ``False`` to override.
The escaping functions need to obey some rules:
- ``escape`` must be idempotent:
``escape(escape(text)) == escape(text)``
- ``escape`` must be a no-op on the output of ``mark_escaped``:
``escape(mark_escaped(text)) == mark_escaped(text)``
- ``mark_escaped`` should be distributive with string
concatenation:
``join([mark_escaped(a), mark_escaped(b)]) == mark_escaped(a + b)``
(This is used for optimizing the generated code)
Example
~~~~~~~
This example is for
`MarkupSafe `__:
.. code-block:: python
from fluent_compiler.escapers import Escaper
from markupsafe import Markup, escape
empty_markup = Markup('')
html_escaper = Escaper(
select=lambda message_id, **hints: message_id.endswith('-html'),
output_type=Markup,
mark_escaped=Markup,
escape=escape,
join=empty_markup.join,
name='html_escaper',
use_isolating=False,
)
This escaper uses the convention that message IDs that end with
``-html`` are selected by this escaper. This will match
``message-html``, ``message.attr-html``, and ``-term-html``, for
example, but not ``message-html.attr``.
We have set ``use_isolating=False`` here (which will override the
``use_isolating`` value specified in ``compile_messages``) because isolation
characters can cause problems in various HTML contexts - for example:
::
signup-message-html =
Hello guest - please remember to
make an account.
Isolation characters around ``$signup_url`` will break the link. For HTML, you
should instead use the `bdi element
`__ in the FTL
messages when necessary.
Escaper compatibility
~~~~~~~~~~~~~~~~~~~~~
When using escapers that with messages that include other messages or terms,
some rules apply:
- A message or term with an escaper applied can include another message or term
with no escaper applied (the included message will have ``escape`` called on
its output).
- A message with an escaper applied can include a message or term with the same
escaper applied.
- A message with an escaper applied cannot include a message or term with a
different escaper applied - this will generate a ``TypeError`` in the list of
errors returned.
- A message with no escaper applied cannot include a message with an escaper
applied.