Home

Metadata and Electronic Document Management for Electronic Commerce

Contents

  1. Introduction
  2. Metadata
  3. Standards for eCommerce
  4. E-commerce Examples
  5. Electronic Document Management
  6. Digital Library
  7. Publishing
  8. Future Use
  9. Home Page

Introduction

This material was first presented for the Australian National University course "Information Technology in Electronic Commerce" (COMP3410/COMP6341, prepared and presented by Tom Worthington FACS HLM, a Visiting Fellow in the Department of Computer Science at the Australian National University (and Director Tomw Communications Pty Ltd).

Two topics are introduced: metadata and data management (digital library, electronic document management). Use of the technology for practical e-commerce and e-publishing applications is emphasized using case studies and anecdotes drawing on the author's experience. The material is divided into three sections on metadata, three on Data Management, a tutorial on Metadata, plus a tutorial on Data Management, assignments and examination questions. There have been revisions to the material from 2001/2002, 2003, 2004 and 2005/2006.

This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The material may also be of use to those interested in the issues, but not undertaking formal study. However, it is not intended as an on-line course. Those wishing to use the material as part of a course are invited to contact the author.

Standards for a Civil Society, Government and Commerce

  1. The Internet: a global computer data communications network
  2. The World Wide Web: electronic documents hypertext linked
  3. Now add document and e-commerce standards for global services

Before looking in detail at metadata and electronic document technology, it is useful to consider what the technologist is attempting to accomplish with these. The Internet was designed to provide a global computer data communications network. The World Wide Web provides electronic documents hypertext linked, most commonly over the Internet. The aim with metadata, document and e-commerce standards is to add an additional layer over the web to provide a global services for civil society, government and business.

Services should be available from hand held wireless devices, as well as desktop computers. These services should use formats which are globally standardized, usable for decades and have legal standing. The systems of non-government, government and commercial organisations should be able to securely inter operate to provide services to the public. Such a global service is now achievable.

The Internet was intended for people to communicate with computers using computer terminals, and for computers to communicate with each other using specially designed machine to machine protocols. The web was designed for people to create electronic documents for people to read. Later web technologies, such as XML, were introduced for machine to machine communications (as for use in e-commerce).

Electronic formats can be optimized for either human or machine communication. However, there are benefits from compromising with a format which can be used for both. Web technologies provide this ability. Documents can have sufficient structure to be processed by an automated system, but also rendered to a format readable by a person.

Building systems which can be read by both people and machines is challenging. Such formats need to be efficient for storage and transmission, while being able to be converted into a format for human reading (rendered). The format needs to be agreed by all those who use it (ideally worldwide) and fixed for long enough to be useful (ideally for decades), but adaptable for use.

Metadata provides a tool to make electronic documents more efficient and flexible. In many cases a short summary of a document (the metadata) can be used in place of the full document, saving on transmission and processing as well as saving time for the human reader. The metadata can be used to manipulate the information in documents to create new documents. The same encoding used for describing documents can be used by data processing systems to carry out electronic commerce.

Standards are Difficult

  1. W3C Aus Standards Symposium 28 August 2007

  2. OASIS standards: ODF, UBL), CIQ and CAP
  3. Geoscience Australia for geospatial standards and Tsunami warnings.

To a computer scientist web and XML formats look trivial. However, the politics in creating standards which will be used in an industry sector, a country or internationally takes years of complex negotiation. A one day Standards Symposium was held in in Canberra on 28 August 2007. The symposium was organized by the World Wide Web Consortium's Australian office ( W3C Aus), located in the Computer Science and Information Technology Building at ANU.

W3C issue what they call "recommendations", but which are really standards, for HTML, XML, CSS and other key web technologies. Also represented at the symposium was the Organisation for the Advancement of Structured Information Standards (OASIS), which was foundered in 1993 for SGML related standards (more recently XML standards). More than 60 technical committees create standards such as ODF, based on the OpenOffice.org office document format. OASIS produces horizontal standards (general purpose technology) and vertical standards (for a particular business function). Other standards are Universal Business Language (UBL), Customer Information Quality (CIQ) for identifying locations, organisations and people and Common Alert Protocol (CAP) for emergency messages.

Geoscience Australia is an Australian government agency working on national and international geospatial standards and Tsunami warnings.

Standards are Useful

  1. ACS Digital Library
  2. online sales
  3. tax statements online.

Once the difficult work of creating standards for metadata, e-commerce and electronic documents are done, they can be used to build very useful systems. As an example, the ACS Digital Library provides academic publications online. Amazon.com provide online sales and the Australian Taxation Office accepts tax statements online.

Getting there

This section is adapted from "Documents and databases: Making sense of developments in eBusiness, eCommerce, ePublishing and eLaw", Tom Worthington, Information Industry Outlook Conference, Canberra, 2002, URL: http://www.tomw.net.au/2002/ebcwxml.html

  1. Get on-line and get email
  2. Get Internet banking
  3. Get a website, initially to advertise the business phone number and email address
  4. Get an interactive dynamic e-commerce system integrated with traditional business systems
  5. Get voice and data systems integrated.

From "Accelerating the Uptake of E-Commerce by SMEs: A Report and Action Plan, SME E-Commerce Taskforce, July 2002 URL: http://www.setel.com.au/smeforum2002

The first three steps of this action plan for accelerating the uptake of e-commerce by Australian small and medium sized enterprises (SMEs) are good simple advice. But step 4 ("Get an interactive dynamic e-commerce system...") is an absurdly large leap. This is the equivalent of telling a new aerospace company to first build a wood glider, then a space shuttle. There need to be more steps for an easier transition between a simple web site and an e-commerce system.

The last step of voice and data integration doesn't appear to relate to e-commerce and seems to have been included because the list came from a telecommunications vendor. ;-)

Electronic publishing provides a transition step between e-mail and e-commerce. Business documents can first be made to be transmitted electronically (e-publishing) and then made able to be automatically processed (e-commerce).

Using the Internet for business is much harder than it looks. Small business can be shown how to save money by using the Internet to do simple things like replacing paperwork with electronic documents. They will then be ready to do something more complex, with integrated e-commerce. New XML technologies, can make that transition possible.

Steps for SME e-commerce:

  1. Internet access: email & web
  2. Internet banking
    • Check statements
    • Receive payments
    • Make payments
  3. Website for business details
  4. Electronic documents to replace paper
    • Electronic brochures
    • Invoices and payment advice
  5. Automate processing of e-documents
    • On-line catalogues
    • On-line ordering

Documents or databases?

document, n. ...

Something written, inscribed, etc., which furnishes evidence or information upon any subject, as a manuscript, title-deed, tomb-stone, coin, picture, etc.

Database ...

A structured collection of data held in computer storage; esp. one that incorporates software to make it accessible in a variety of ways; transf., any large collection of information.

From: OED Online, SECOND EDITION, 1989

Documents and databases represent two extremes in the aims and methods of electronic commerce. At the one extreme we have electronic documents which are fixed in content and format, are individual distinct entities, can be displayed using software from different suppliers, are expected to last for years and outlive the software which created them. At the other extreme a database has content which changes, can be displayed in different ways, may only be of value for minutes or months and may depend on one version of database software. This is not to say that all documents are fixed and all databases fluid, but is a useful generalization.

At the one extreme HTML provides a way to create simple electronic documents which can display on a variety of systems, including small wireless devices, TV displays and on special devices for the disabled. But HTML doesn't provide fine control over the format of the document, especially when printed.

At the other extreme PDF provides a format for close control over the look of a document, as to layout, font and such like, but less flexibility. While recent improvements in PDF do allow more options for flowing text to make it more readable and to structure the document in an XML-like format, this requires extra work from the author and so far few people have bothered. In practice two versions of a document have to be produced: the web version for on-screen display and the PDF version for printing. Even where these two versions are automatically generated from the one common source, they involve extra effort for the people creating and reading them.

XML The Answer?

XML now provides formatting options to allow the HTML-like flexibility, plus the fine formatting control of PDF. OpenOffice.org's XML based file format is not perfect, but it does provide a way to package up all the elements of an XML document (including images) into one compressed file. This provides the prospect of formats which can be edited in a word processor, displayed as a web page, transformed for a hand held device or printed with specific styles.

In 2002 OASIS announced a committee to work on an office XML standard format:

  1. it must be suitable for office documents containing text, spreadsheets, charts, and graphical documents,
  2. it must be compatible with the W3C Extensible Markup Language (XML) v1.0 and W3C Namespaces in XML v1.0 specifications,
  3. it must retain high-level information suitable for editing the document,
  4. it must be friendly to transformations using XSLT or similar XML-based languages or tools
  5. it should keep the document's content and layout information separate such that they can be processed independently of each other, and
  6. it should "borrow" from similar, existing standards ...

From: "OASIS TC Call For Participation: Open Office XML TC", Karl Best, 4 Nov 2002, URL: http://lists.oasis-open.org/archives/tc-announce/200211/msg00001.html

Open Office Specification

The committee decided the OpenOffice.org XML format specification met these criteria and had proven its value in real life, so used it as the basis for its work. The first draft was released in March 2004, but at 607 pages long is complex to implement.

This document defines an XML schema for office applications and its semantics. The schema is suitable for office documents, including text documents, spreadsheets, charts and graphical documents like drawings or presentations, but is not restricted to these kind of documents.

The schema retains high-level information suitable for editing document and is friendly to transformations using XSLT or similar XML-based languages or tools.

From: "Open Office Specification 1.0", Committee Draft 1, 22 Mar 2004 Document identifier: office-spec-1.0-cd-1.sxw, URL: http://www.oasis-open.org/committees/download.php/6037/office-spec-1.0-cd-1.pdf

Are e-documents legal?

In 2003 the High Court of Australia has considered the difficult question as to if the MIGRATION ACT's definition of documents included electronic documents stored in a database:

  1. ... The ordinary dictionary meaning of "document" is a printed or written paper containing information. ... No violence is done to the object or language of s 418(3) by holding that "document" includes information that is stored in a computer or a fax machine and which can be printed out by pressing one or more keys or buttons. No reason appears for thinking that Parliament intended to distinguish between information stored on paper and information stored in the electronic impulses of a computer that can be printed on paper by pressing a key or keys on the computer's keyboard. ...

From: "Muin v Refugee Review Tribunal; Lie v Refugee Review Tribunal", 8 August 2002, High Court of Australia, http://www.austlii.edu.au/au/cases/cth/high_ct/2002/30.html

Identifying e-documents

The High Court also considered how you "give" someone a document which is stored in a database.

  1. "Documents" may include electronic documents: ... Today, in ordinary speech, one can readily refer to a "document" in a database, although such a document may never have been reduced to tangible form. Typically, a database will yield information that appears in paginated format....

  2. ... Electronic "documents" could perhaps be "given" by separate identification and annexure to an electronic transmission. Yet even that was not done in the present case. Merely making such "documents" (or some of them) "available" in a mass of undifferentiated material in a database of constantly changing content does not comply with the language and particular design of the Act ...

From: "Muin v Refugee Review Tribunal; Lie v Refugee Review Tribunal", 8 August 2002, High Court of Australia, http://www.austlii.edu.au/au/cases/cth/high_ct/2002/30.html

High Court Referencing Web Pages

The High Court didn't say if it wanted documents printed in a particular font or with page numbers, but the decision itself is published as a web page with no font style or size specified and with paragraph numbers, rather than page numbers. As the High Court web site has links to such documents, it could be assumed the court is happy with this format.

From: "Legal Links", High Court of Australia, 2001(?), URL: http://www.hcourt.gov.au/legal.html