By Wojciech Gawroński | January 21, 2019
The pain of fiddling with XML via
Let’s agree that the official library - called
xmerl is far from perfection, mostly because it does not contain
sane defaults for DTD (like XML entities), has deficiencies when it comes to XSD validation, but from the
other hand contains exciting stuff like the one documented by Brujo Benavides here.
However, there are some justified cases when you have to deal with it
xmerl, mostly because of legacy reasons.
Fortunately, the elements mentioned above are less important - you are able to live without them. Although, there is
one feature which
xmerl does not have, and it is critical.
You are able to parse
CDATA section, but you cannot write it out. 😱
How old problem is that? The first mention that you can discover after searching for the phrase, and it will point you here - to an old thread from the official mailing list.
BEAM me up, Scotty!
Click and enter your email to get access to the useful resources and get notified whenever we publish a new blog post on our website.Subscribe me
What is CDATA and why is it important?
Let’s check the official documentation here.
According to the W3C standard
CDATA is the following:
CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string “ <![CDATA[ “ and end with the string ” ]]> “
Those sections are relevant when you need to pass any characters that should not be directly interpreted as
part of the XML tree, outside of the termination token
In many cases you can work around that problem, by encoding XML entities (e.g.
< is encoded as
not in all cases.
Problem: Auth-Info Code
What is an AuthInfo? In the domain industry it is a way of ensuring the identity of domain owner:
An Auth-Code (also called an Authorization Code, Auth-Info Code, or transfer code) is a code created by a registrar to help identify the domain name holder (also known as a registrant or registered name holder) of a domain name in a generic top-level domain (gTLD) operated under contract with ICANN.
In other words, to invoke a possibly destructive operation, a registrar will ask you to provide them authorization code, as a confirmation, e.g. in case of domain transfer.
As a base of their APIs most of the domain registrars are using a standard called EPP, which is XML based.
How the example AuthInfo code looks like? The EPP standard (RFC5731)
defines the constraint of this element as being of XML type
eppcom:pwAuthInfoType which is itself defined
in RFC5730 which says in summary that it is an XML normalizedString.
Basically, it can be a string of any length and any characters (except three: newline, carriage return, and tab), so this one also fits:
Ouch. Now you can see the problem - to send such code to the API which is XML based, and does not
know that we will encode XML entities we need to send out that code inside
What if I need to deal with CDATA?
Are we doomed? Luckily, there is one mechanism which we can leverage available in
xmerl called callbacks:
How does it work?
xmerl allows us to pass our own callback implementations when serializing (using
or similar). To satisfy that, we need to create our own module, which looks like this:
One important thing is the section with
'#xml-inheritance#'() which allow us to use already defined
xmerl and just add on top our support of
Now, in order to write out XML with
CDATA we need to invoke seralization method with our module like this:
Why do I see escaped XML entities here?
One more problem
Unfortunately, by default the whole content passed to the
export/2 or related functions is escaped before
it will be passed to the callbacks. So as an argument of
cdata/4 we receive a string with escaped XML entities.
Luckily, it escapes only
> so we can reliably unescape it:
One additional pass for
& at the beginning is necessary if someone actually passes the string with encoded
XML entities to the serialization.
After applying that fix, we can finally cheer and use that library to solve our problem described above:
Phew. That’s all. Enough struggling with the
xmerl. If you are forced to deal with this library like us and
you want to avoid doing such acrobatics on your own, we have combined those in the helper library which we
xmerl_ext and it is available here:
Enjoy! And remember: friends do not let friends use
xmerl for XML manipulation in Erlang.
Veteran Elixir/Erlang Team Available
Are looking for Elixir or Erlang experts?
You are in the right place! We truly love working with that technology, and as a side effect, it turned out that we have mastered it.