XML Schema for E-commerce

Tom Worthington FACS

Visiting Fellow, Department of Computer Science, Australian National University, Canberra

For: Computing 3410 Students, The Australian National University

This document is Version 2.0 8 August 2001: http://www.tomw.net.au/2001/schema.html

Introduction
The Schema Problem
XML Schema
Five core points about XML Schema
Schema Applied
Sequence of Elements
Complex Type
Abstract Attribute
Enabling Business Components

Introduction

This material was prepared for the unit Information Technology in Electronic Commerce (COMP3410) at the Australian National University, semester 2, 2001. Accompanying documents discuss A Common Understanding and Electronic Document Management and the Digital Library for E-commerce.

This work is summarized from Using Schema and Serialization to Leverage Business Logic by Eric Schmidt, Microsoft Corporation, April 2001. It omits the details of software implementation to give an overview of features of XML Schema.

This document is intended to provide both a set of "slides" for a group presentation and notes. The notes can be read or printed for individual use. For a slide-show group presentation, set your web browser to use a large font size and the accompanying style sheet, then select the frames version of the document. The style sheet is designed to omit the notes sections of the document, which are marked with the class definition "optional" and leave a large margin before titles marked "newslide". These slides do not fit precisely on screen, but provide more flexibility than a conventional slide show.

The Schema Problem

Data models (structure, content, and semantics) lagging XML use.
DTDs document, not data, focused
Numerous XML validation, structuring, and typing systems have been created, including DCD, SOX, Schematron, RELAX and XDR.
W3C created XML Schema for defining the structure, content, and semantics of XML documents
XML Schema should be extensible business or processing logic.

The surge of XML usage over the past several years has not led to a complimentary increase in defined data models (structure, content, and semantics) for XML documents.

Document Type Definitions (DTDs) focus on XML from a document perspective and not a data and type perspective. Due to the simple structuring and typing mechanisms in DTDs, numerous XML validation, structuring, and typing systems have been created, including Document Content Description (DCD), SOX, Schematron, RELAX and XML-Data Reduced (XDR).

Building on the lessons learned from previous schema implementations, the W3C XML Schema working group created XML Schema for defining the structure, content, and semantics of XML documents. Ultimately, this specification should provide an extensible environment so that it could be applied to any type of business or processing logic.

XML Schema

XML Schema recommendation has:

Part 0: Primer,
Part 1: Structures, and
Part 2: DataTypes.

Five core points about XML Schema are:

XML Schema is represented in XML 1.0 syntax
Data typing of simple content: primitive data types (string, float, double, and so on), derived types (int, short, and unsignedShort), user-defined types, constraints like length, range, and format (patterns).
Typing of complex content: define content models as types, explicit or abstract, restricted or extended.
Distinction between the type definition and instance of that type: Unlike XDR, XML Schema type definitions are independent of instance declarations to reuse type definitions.
W3C support and industry implementation: Approved May 2, 2001, Open Source and Web Based Tools are available.

Core points in detail:

XML Schema is represented in XML 1.0 syntax: This makes parsing XML Schema available to any XML 1.0-compliant parser.
Data typing of simple content: XML Schema provides a specification for primitive data types (string, float, double, and so on) found in most common programming languages. Expanding upon these primitive types, XML Schema provides derived types like int, short, and unsignedShort. In order to extend these primitive and derived types, XML Schema provides the ability to create user-defined types. In addition, you can further restrain the types with constraints like length, range, and format (patterns). This typing facility provides a mechanism to provide validity constraints for any type of XML document.
Typing of complex content - XML Schema provides the ability to define content models as types. These types can be explicit or abstract and can be restricted or extended in a type instance. For example, you can create a manager type that is based on an employee content model.
Distinction between the type definition and instance of that type - Unlike XDR, XML Schema type definitions are independent of instance declarations. This makes it possible to reuse type definitions in different contexts to describe distinct nodes within the instance document. For example, manager and supervisor elements within the same instance document can both be instances of the same type.
W3C support and industry implementation - On May 2, 2001, the XML Schema specification reached recommendation status. This means that the specification is stable and can be used as the basis for production level implementations. Like many other XML based recommendations (for example: DOM, XSLT and XPath) produced by W3C working groups, these recommendations become the sole industry standard. Born from these standards are implementations, which are the "center of the universe" for any XML enabled application. The most common implementation is the parser, but most parsers do more than just parse XML. These parsers or engines provide a spectrum of services including parsing, validation, DOM creation, firing SAX events, and XLST and XPath functionality. All of these must be done in a compliant manner. These types of standards-based implementations are crucial for interoperability between services and systems.

Schema Applied

This article focuses on the concept of purchase order processing. Here is a snapshot of a purchase order schema:

<xsd:element name="PurchaseOrder" type="PurchaseOrderType"/>   

<xsd:complexType name="PurchaseOrderType">
   <xsd:sequence>
<xsd:element name="Comment" type="xsd:string"/>
<xsd:element name="PurchaseOrderID" type="xsd:string"/>
<xsd:element name="PurchaseOrderDate" type="xsd:date"/>
<xsd:element name="BuyerInformation" type="BuyerInformationType"/>
<xsd:element name="BillingInformation" type="BillingInformationType"/>
<xsd:element name="ShippingInformation" type="ShippingInformationType"/>
<xsd:element name="OrderLineItems" type="OrderLineItemsType"/>
<xsd:element name="ShipTerms" type="xsd:string"/>
<xsd:element name="ShippingCost" type="xsd:float"/>
<xsd:element name="SubTotal" type="xsd:float"/>
<xsd:element name="TaxesAndFees" type="xsd:float"/>
<xsd:element name="Total" type="xsd:float"/>
<xsd:element name="PaymentInformation" type="PaymentInformationType"/>
   </xsd:sequence>
<xsd:attribute name="CorrelationID" type="xsd:string"/>
<xsd:attribute name="OriginatorID" type="xsd:string"/>
</xsd:complexType>

The top-level element of the schema is named PurchaseOrder, which is of type PurchaseOrderType. PurchaseOrderType contains a sequence of elements, all of which are typed with built-in XML Schema types or references to declared user defined types. The sequence element is important here because it enforces the order of the elements as they appear below PurchaseOrder. Sequencing is not mandatory, but it can provide a more restrictive environment for consumers. For example, parsers and processors can look for data at certain ordinal locations in an XML document by name if sequencing is guaranteed. Typing is also very important because it provides type safety and reflection about a given element. For example, the above element named ShippingInformation is based on the complexType named ShippingInformationType.

Sequence of Elements

<xsd:complexType name="ShippingInformationType">
   <xsd:sequence>
      <xsd:element name="Name" type="NameType"/>
      <xsd:element name="StreetAddress" type="ShippingStreetAddressType"/>
      <xsd:element name="BriefContact" type="BriefContactType" minOccurs="0"/>
      </xsd:sequence>
</xsd:complexType>

The ShippingInformationType is built upon a sequence of elements - Name, StreetAddress, and BriefContact. Notice that minOccurs attribute for the BriefContact element is set to zero. This provides an alternative for my application to omit the BriefContact element. The significant element above in the ShippingInformationType is the StreetAddress element. The StreetAddress element is based on the type ShippingStreetAddressType below. This is the implementation for the complexType named ShippingStreetAddressType.

Complex Type

<xsd:complexType name="ShippingStreetAddressType">
   <xsd:complexContent>
      <xsd:extension base="AbstractStreetAddressType">
         <xsd:sequence>
            <xsd:element name="HouseColor" type="xsd:string" 
                        minOccurs="0" maxOccurs="unbounded"/>
         </xsd:sequence>
         </xsd:extension >
      </xsd:complexContent>
</xsd:complexType>

The complexType ShippingStreetAddressType is based upon an extension with a base of AbstractStreetAddressType. The extension mechanism provides the ability to inherit from the AbstractStreetAddressType. This is extremely powerful because any XML Schema-enabled consumer will understand that the complexType ShippingStreetAddressType is really based on the AbstractStreetAddressType type or class. This is the implementation for the AbstractStreetAddressType.

Abstract Attribute

   <xsd:complexType name="AbstractStreetAddressType" abstract="true">
      <xsd:sequence>
         
         <xsd:element name="AddressCode">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:maxLength value="100"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
            
         <xsd:element name="AddressLine" minOccurs="0" maxOccurs="2">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:maxLength value="100"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
         
         <xsd:element name="City">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:maxLength value="75"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
         
         <xsd:element name="State_Province" type="State_ProvinceEnum"/>
            
         <xsd:element name="PostalCode">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:pattern value="[0-9]{5}(-[0-9]{4})?"/>         
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
         
         <xsd:element name="Country" type="xsd:string"/>
         <xsd:element name="Room" type="xsd:string"/>
         <xsd:element name="Building" type="xsd:string" 
          minOccurs="0" maxOccurs="unbounded"/>
      </xsd:sequence>
   </xsd:complexType>

Notice, on the complexType named AbstractStreetAddressType that the abstract attribute is set to true. This provides a mechanism to restrict the use of this type only through a derived complexType. This is similar to a base, abstract class in C++. This is exactly what the ShippingStreetAddressType does, in fact he ShippingStreetAddressType was extended by adding a HouseColor element that was not provided by the AbstractStreetAddressType. This model provides a completely typed environment. Anytime that street address information is dealt with, the base class can be used to build a content model in a consistent, typed manner.

Enabling Business Components

Logical view of processing XML in this application:

Figure 1. How XML is processed in this application

Step 1: Validation

The first task is to hydrate an object instance of the PurchaseOrder class. Basically, this will deserialize an XML document instance that represents a purchase order by mapping the XML data to the members of the object. In order to deserialize the purchase order XML document, the document must be checked to be an XML document and valid against the Purchase Order schema. The easiest way to accomplish this task is with a validating parser or validating XML reader.

Step 2: Deserialization

Now that the validation code is implemented, we can start de-serializing the XML and hydrating the Purchase Order object. This reads XML from the source; in this case it is reading a string of XML, by reading the XML in a node-based fashion from the source buffer. As it reads the XML, you have the full control over what data you want to interrogate or pull from the reader.

Step 3: Serialization

Okay, so we have successfully deserialized the XML and hydrated the purchase order. Now we need to serialize the purchase order back out to XML for continued processing by another service. This is a key point. You can write serialization code to target any necessary service. If a down level service needs the purchase order in a binary format, simply add another serializer.

There are several ways to serialize data into XML. The most barbaric way would be to do string concatenation. Although this is simple to do, the code is very fragile and there is no built-in mechanism for constructing a well-formed document instance. A more efficient and safe means of serializing your data into XML is to use some type of XML writer.