Wednesday, October 29, 2014

org.w3c.dom.DOMException: Motörhead at org.apache.harmony.xml.dom.NodeImpl.setNameNS(NodeImpl.java:241)

As an example, let's take a look at the simple XML file (test.xml):
<?xml version="1.0" encoding="UTF-8" ?>
<test>
 <Motörhead>Ace of Spades</Motörhead>
</test>
If you are confused by the tag name that contains special character, rest assured that is perfectly valid, accents and diacritics are acceptable, see here.

Let's take the following Java snippet and run it with 1.6 or 1.7 compiler:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
        
DocumentBuilder db = factory.newDocumentBuilder();
File file = new File(INPUT_FOLDER + "test.xml");
Document document = db.parse(file);
Everything runs fine, as it should. However, if we run the same code on Android (any version), we'll get the following:

org.w3c.dom.DOMException: Motörhead
at org.apache.harmony.xml.dom.NodeImpl.setNameNS(NodeImpl.java:241)
at org.apache.harmony.xml.dom.ElementImpl.(ElementImpl.java:51)
...
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:183)


Ah, that little green robot ... always full of surprises. Let us not be lazy and see what is really happening in Android's NodeImpl.setNameNS() method:

(240)  if (!DocumentImpl.isXMLIdentifier(qualifiedName)) {
(241)      throw new DOMException(DOMException.INVALID_CHARACTER_ERR, 
               qualifiedName);
(242)  }
Ok, let's take a look at DocumentImpl.isXMLIdentifier()
(92)   private static boolean isXMLIdentifierStart(char c) {
(93)       return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') 
               || (c == '_');
(94)   }
(95)   
(96)   private static boolean isXMLIdentifierPart(char c) {
(97)       return isXMLIdentifierStart(c) || (c >= '0' && c <= '9') 
               || (c == '-') || (c == '.');
(98)   }
(99)   
(100)  static boolean isXMLIdentifier(String s) {
(101)      if (s.length() == 0) {
(102)          return false;
(103)      }
(104)   
(105)      if (!isXMLIdentifierStart(s.charAt(0))) {
(106)          return false;
(107)      }
(108)   
(109)      for (int i = 1; i < s.length(); i++) {
(110)          if (!isXMLIdentifierPart(s.charAt(i))) {
(111)              return false;
(112)          }
(113)      }
(114)   
(115)      return true;
(116)  }
Faulty implementation. Which is not surprise since Android's org.apache.harmony.xml has a lot of issues open. I submitted this one here.

Now I am sure that you, just like me, can not wait for Android engineers to fix the problem, besides the faulty library is already present on millions of Android devices right now and your app has to run on those devices too. So workaround would be not to use Android's library at all, but to replace it with something reliable. I have solved my problems using Xerces for Android that I have found here.

Case closed!