Wednesday, October 29, 2014

org.w3c.dom.DOMException: Motörhead at org.apache.harmony.xml.dom.NodeImpl.setNameNS(NodeImpl.java:241)

As an example, let's take a look at the simple XML file (test.xml):
<?xml version="1.0" encoding="UTF-8" ?>
<test>
 <Motörhead>Ace of Spades</Motörhead>
</test>
If you are confused by the tag name that contains special character, rest assured that is perfectly valid, accents and diacritics are acceptable, see here.

Let's take the following Java snippet and run it with 1.6 or 1.7 compiler:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
        
DocumentBuilder db = factory.newDocumentBuilder();
File file = new File(INPUT_FOLDER + "test.xml");
Document document = db.parse(file);
Everything runs fine, as it should. However, if we run the same code on Android (any version), we'll get the following:

org.w3c.dom.DOMException: Motörhead
at org.apache.harmony.xml.dom.NodeImpl.setNameNS(NodeImpl.java:241)
at org.apache.harmony.xml.dom.ElementImpl.(ElementImpl.java:51)
...
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:183)


Ah, that little green robot ... always full of surprises. Let us not be lazy and see what is really happening in Android's NodeImpl.setNameNS() method:

(240)  if (!DocumentImpl.isXMLIdentifier(qualifiedName)) {
(241)      throw new DOMException(DOMException.INVALID_CHARACTER_ERR, 
               qualifiedName);
(242)  }
Ok, let's take a look at DocumentImpl.isXMLIdentifier()
(92)   private static boolean isXMLIdentifierStart(char c) {
(93)       return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') 
               || (c == '_');
(94)   }
(95)   
(96)   private static boolean isXMLIdentifierPart(char c) {
(97)       return isXMLIdentifierStart(c) || (c >= '0' && c <= '9') 
               || (c == '-') || (c == '.');
(98)   }
(99)   
(100)  static boolean isXMLIdentifier(String s) {
(101)      if (s.length() == 0) {
(102)          return false;
(103)      }
(104)   
(105)      if (!isXMLIdentifierStart(s.charAt(0))) {
(106)          return false;
(107)      }
(108)   
(109)      for (int i = 1; i < s.length(); i++) {
(110)          if (!isXMLIdentifierPart(s.charAt(i))) {
(111)              return false;
(112)          }
(113)      }
(114)   
(115)      return true;
(116)  }
Faulty implementation. Which is not surprise since Android's org.apache.harmony.xml has a lot of issues open. I submitted this one here.

Now I am sure that you, just like me, can not wait for Android engineers to fix the problem, besides the faulty library is already present on millions of Android devices right now and your app has to run on those devices too. So workaround would be not to use Android's library at all, but to replace it with something reliable. I have solved my problems using Xerces for Android that I have found here.

Case closed!

2 comments:

  1. Hey, thanks a lot for this post! I was having the same issue (not with Motörhead but with some cyrillic letters) and posted a question on stackoverflow.com. Someone referred your issue in Google Code.
    I am going to try using Xerces for parsing. However, I am new to the Android platform (I was coding only for iOS until recently) and I find it difficult to do. The documentation on http://xerces.apache.org/ isn't very helpful, either. Would you bother to help me import the Xerces lib and point to its implementation instead of the Android built-in? You can reach me at skype burningaxe.

    ReplyDelete
  2. Hello there. No, I won't be able to help you, I haven't used that version of xerces library. See the link at the end of the blog post.

    ReplyDelete