2010-07-19

Tips on XML

Common XLM Schemas

SAX2 Features and Properties

Making JAXP recognize RELAX-NG schema

Although JDK 1.5 and 1.6 are aware of RELAX-NG considering XMLConstants.RELAXNG_NS_URI, JDK 1.5 and JDK 1.6 don't include RELAX-NG implementation. So, if you use RELAX-NG as a schema for the validation API of JAXP, you should include RELAX-NG implementation to use in your classpath and set system variable to use it before the validation code.


   ...
   System.setProperty("javax.xml.validation.SchemaFactory:" + XMLConstants.RELAXNG_NS_URI, 
         "com.thaiopensource.relaxng.jaxp.CompactSyntaxSchemaFactory");

   SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
   ...

The most well-known RELAX-NG implementation in Java seems to be jing.

Define empty element using XML Schema

Built-in Datatypes of XML Schema

  • A value of '100.0' is invalid with xsd:integer or it's subtypes because xsd:integer is defined with it's fractionDigits is 0.

Meaning of Fundamental Element of XML Schema

simpleType, complexType, simpleContent, complexContent, ... all these are very confusing. So, you need to understand the exact meaning or usage of each element and tell the differences between them.
simple types
  • Elements that contain numbers (and strings, and dates, etc.) but do not contain any subelements are said to have simple types.
complex types
  • Elements that contain subelements or carry attributes are said to have complex types.
simpleContent
  • The simpleContent element can specify attributes for simple types.
  • The simpleContent element can specify attribute types via extension or restrict existing attribute types via restriction to simple types or to complex types with simple content.
complexContent
  • The complextContent element can specify nested element types. This includes the special case of zero element, also known as 'empty content'. The comlextContent also provides functionality that permits text interspered with elements, also known as 'mixed content'.

Thread-safeness of Factories in JAXP

Factory classes in JAXP such as SAXParserFactory, DocumentBuilderFactory, SchemaFactory are not thread-safe.

DocumentBuilder

DOM related core classes in JAXP, in other words DocumentBuilderFactory, DocumentBuilder, and Document are not thread-safe. Mutator methods such as DocumentBuilderFactory.setSchema(), DocumentBuilderFactory.setFeature(), DocumentBuilderFactory.setIgnoringComments(), DocumentBuilderFactory.setNamespaceAware(), DocumentBuilderFactory.setValidating(), DocumentBuilder.setEntityResolver(), and DocumentBuilder.setErrorHandler() means those classes are not thread-safe.

DocumentBuilderFactory or DocumentBuilder object may be relatively resource demanding. So, they should not be instantiated every time you build a document. But they are not thread-safe, so they should be confined properly.
In usual case where your application need parsing method for your specific documents, you would better provide public parser method that create document object and reuse document builder factory and document builder internally.


public class ApplicationDocumentParsers{

   protected DocumentBuilder orderDocBuilder;
   protected DocumentBuilder paymentDocBuilder;
   protected DocumentBuilder deliveryDocBuilder;

   public ApplicationDocumentParsers(){

      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
      dbf.setNamespaceAware(true);
      dbf.setValidating(false);
      dbf.setXIncludeAware(false);

      //initiate document builders
      dbf.setSchema("order.xsd");
      this.orderDocBuilder = dbf.newDocumentBuilder();

      dbf.setSchema("payment.xsd");
      this.paymentDocBuilder = dbf.newDocumentBuilder();

      dbf.setSchema("delivery.xsd");
      this.deliveryDocBuilder = dbf.newDocumentBuilder();

   }

   public Document parseOrderDoc(InputStream is){
      return this.orderDocBuilder.parse(is);
   }

   public Document parsePaymentDoc(InputStream is){
      return this.paymentDocBuilder.parse(is);
   }

   public Document parseDeliveryDoc(InputStream is){
      return this.deliveryDocBuilder.parse(is);
   }

I think there maybe threa-safe or immutable document builder or document builder factory but, I haven't still found well-known one.

javax.xml.validation

SchemaFactory class is not thread-safe, but Schema class is immutable and thread-safe.

Meaning and Pronunciation of Xerces

As for me who is not native with English, the word starting with 'x' is very unfamiliar and can't even imagine how to pronounce such words. Although I have used Apache Xerces for more than 5 years, I recently become to know the exact pronunciation of 'Xerces'. This may be silly to those who use English as their mother tongue, most of application developers around me are same with me. Anyway you can hear the pronunciation and read the meaning of 'Xerces' at the following pages.

Resources on JAXB

There aren't so much books, tutorials or articles about JAXB as JAXP. Specially, it's much difficult to find in-depth materials on JAXB 2.0.
The followings are the one I have found to be useful.

Embedding Schematron to XML Schema

Schematron can defines complex rules such as relations between values of elements which can't be expressed with XML Schema. But defining the whole schema of an XML document using Schematron is too expensive and improper. So, defining basic structure and rules using XML Schema and more complex rules using Schematron seems to be good strategy.

Then, maintaining a pair of schema files for a XML document is somewhat bothering. Is it possible to merge two schema files into one ?

The following article explains how to embedding constraints expressed with Schematron syntax into the XML Schema file. The basic idea is using element.

Resources on XML Catalog

Document type declaration, public identifier, system identifier

Syntax
document-type-declaration = (external-subset, internal-subset)|external-subset|internal-subset

document-type-delaration = '<!DOCTYPE' root-element-name external-subset? ('[' internal-subset ']')? '>'

external-subset = ('PUBLIC' public-identifier system-identifier)|('SYSTEM' system-identifier')
  • public-identifier : identifier which is meant to be universally unique within its application scope.
  • system-identifier : is typically a fragmentless URI reference which is intended to identify a document type which is used exclusively in one application.
Sample
  • XHTML 1.0
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
                             "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
       
  • Web application deployment descriptor for Servlet 2.3
    <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
                                "http://java.sun.com/dtd/web-app_2_3.dtd">
       
  • IoC configuration of Spring framework 2.0
    <!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN 2.0//EN"
                "http://www.springframework.org/dtd/spring-beans-2.0.dtd">
       
  • SQL map of iBATIS 2.0
    <!DOCTYPE sqlMap PUBLIC "-//ibatis.apache.org//DTD SQL Map 2.0//EN"
                            "http://ibatis.apache.org/dtd/sql-map-2.dtd">
       
  • DocBook 5.0
    <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V5.0/EN"
                   "http://www.oasis-open.org/docbook/xml/5.0/docbook.dtd" [
    <!ENTITY chap1 SYSTEM "chap1.xml">
    <!ENTITY chap2 SYSTEM "chap2.xml">
    ]>
       

XML Schema Documentation Tools

  • xsddoc
    The xsddoc subproject is a XML Schema documentation generator for W3C XML Schemas.
  • xs3p
    The XS3P schema documentation generator is simply an XSLT stylesheet, which generates HTML documentation from an XSD schema file.

Small Tips

0 comments:

Post a Comment