Metadata and Electronic Document Management for Electronic Commerce

Outline

Tom Worthington

Version of 2 August 2008 (also available 2009 version )

Introduction

This material was first presented for the Australian National University course "Information Technology in Electronic Commerce" (COMP3410/COMP6341). The segment is designed to be delivered as six one hour lectures. It deals with metadata and electronic document management, digital library, document representation and e-commerce applications and their technology.

Contents

  1. Introduction and overview
  2. Metadata
  3. Standards for eCommerce
  4. Electronic Document Management
  5. Digital Library
  6. Publishing

Overview

The fundamental concept is 'electronic document management'. The way 'metadata' is used to manage 'electronic documents' is explained and examples are given for its use in 'electronic publishing' and 'Electronic Commerce'.

Electronic documents have existed for many years and predate the web. However, web technology, particularly the use of XML standards facilitate document production and management. Some pre-web examples of formats will be given, but XML implementations will be emphasized.

With the expansion of web based systems and there acceptance by the business and general community, it is likely that they will be the first choice for electronic document management systems. The use of 'social networking' may accelerate that use and see EDM systems be more than just computerized versions of paper files.

Several examples are introduced, from business, government and academia. While the use of web technology is normally thought of as applying to desktop computers with wired broadband, the technology is likely to expand for mobile wireless applications.

Lecture Outline

This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The Slides and these notes are provided in the one HTML document, using HTML Slidy.

There have been revisions to the material from 2001/2002, 2003, 2004 and 2005, 2006 and 2007.

Standards for a Civil Society, Government and Commerce

  1. The Internet: a global computer data communications network
  2. The World Wide Web: electronic documents hypertext linked
  3. Add document and e-commerce standards for global services

Before looking in detail at metadata and electronic document technology, it is useful to consider what the technologist is attempting to accomplish with these. The Internet was designed to provide a global computer data communications network. The >World Wide Web provides electronic documents hypertext linked, most commonly over the Internet. The aim with metadata, document and e-commerce standards is to add an additional layer over the web to provide a global services for civil society, government and business.

Services can be available from hand held wireless devices, as well as desktop computers. These services should use formats which are globally standardized, usable for decades and have legal standing. The systems of non-government, government and commercial organisations should be able to securely inter operate to provide services to the public. Such a global service is now achievable.

The Internet was intended for people to communicate with computers using computer terminals, and for computers to communicate with each other using specially designed machine to machine protocols. The web was designed for people to create electronic documents for people to read. Later web technologies, such as XML, were introduced for machine to machine communications (as for use in e-commerce).

Electronic formats can be optimized for either human or machine communication. However, there are benefits from compromising with a format which can be used for both. Web technologies provide this ability. Documents can have sufficient structure to be processed by an automated system, but also rendered to a format readable by a person.

Building systems which can be read by both people and machines is challenging. Such formats need to be efficient for storage and transmission, while being able to be converted into a format for human reading (rendered). The format needs to be agreed by all those who use it (ideally worldwide) and fixed for long enough to be useful (ideally for decades), but adaptable for use.

Metadata provides a tool to make electronic documents more efficient and flexible. In many cases a short summary of a document (the metadata) can be used in place of the full document, saving on transmission and processing as well as saving time for the human reader. The metadata can be used to manipulate the information in documents to create new documents. The same encoding used for describing documents can be used by data processing systems to carry out electronic commerce.

Open Access to Information

The Australia license has now been integrated into the Creative Commons licensing process, so you are able to license your works under this jurisdiction's law.

The latest version of the licenses available for this jurisdiction are:

From: Australia, Creative Commons, 2007

The Internet has allowed lower cost access to information, placing pressure on governments and others to provide the information. Systems such as Creative Commons provide a way to licence to provide information freely, while retaining ownership.

The Victorian Parliament is conducting an inquiry into government open access.

Social Networking for Business and Government

... <h1 id="name">
<span class="fn n">
<span class="given-name">Tom</span>
<span class="family-name">Worthington</span>
</span></h1> </div>
<div class="content">
<div class="info"> <div
class="image">
<img src="http://media.linkedin....jpg" class="photo" alt="Tom Worthington"></div>
<p class="headline title">Adjunct Senior Lecturer ...</p>
<div class="adr"> <p class="locality"> Canberra Area, Australia ...

From: Source code of a Profile, LinkedIn Corporation, 2008

Social networking software allows for a computer system to help people interact in groups. While normally thought of for social purposes, it is now being adopted for business. Linked-In provides a way for professionals to interact with each other, find colleagues. Naymz provides a reputation management service. It is likely such systems will be used within and between organisations, including government, to manage work, grant access to information, and work out remuneration for staff. This requires the metadata about people and their actions to be carefully encoded and stored.

HTML has only limited provision for metadata. Systems such as Liniked in get around the problem using Microformats, using HTML class names for the metadata element names. This allows the metadata to be included in the body of the HTML document, instead of the header and requires less duplication of information.

Documents or databases?

document, n. ...

Something written, inscribed, etc., which furnishes evidence or information upon any subject, as a manuscript, title-deed, tomb-stone, coin, picture, etc.

Database ...

A structured collection of data held in computer storage; esp. one that incorporates software to make it accessible in a variety of ways; transf., any large collection of information.

From: OED Online, SECOND EDITION, 1989

Documents and databases represent two views of data in a computer system. Electronic documents are fixed in content and format, are individual distinct entities, can be displayed using software from different suppliers, are expected to last for years and outlive the software which created them. Database have content which changes, can be displayed in different ways, may only be of value for minutes or months and may depend on one version of database software. This is not to say that all documents are fixed and all databases fluid, but is a useful generalization.

XML The Answer?

  1. it must be suitable for office documents containing text, spreadsheets, charts, and graphical documents,
  2. it must be compatible with the W3C Extensible Markup Language (XML) v1.0 and W3C Namespaces in XML v1.0 specifications,
  3. it must retain high-level information suitable for editing the document,
  4. it must be friendly to transformations using XSLT or similar XML-based languages or tools
  5. it should keep the document's content and layout information separate such that they can be processed independently of each other, and
  6. it should "borrow" from similar, existing standards ...

From: "OASIS TC Call For Participation: Open Office XML TC", Karl Best, 4 Nov 2002, URL: http://lists.oasis-open.org/archives/tc-announce/200211/msg00001.html

XML now provides formatting options to allow HTML-like documents and database processing. OpenOffice.org's XML based file format provides a way to package up all the elements of an XML document (including images) into one compressed file. This provides the prospect of formats which can be edited in a word processor, displayed as a web page, transformed for a hand held device or printed with specific styles.

In 2002 OASIS announced a committee to work on an office XML standard format based on OpenOffice.org's XML format. The first draft was released in March 2004.

Are e-documents legal?

  1. ... The ordinary dictionary meaning of "document" is a printed or written paper containing information. ... No violence is done to the object or language of s 418(3) by holding that "document" includes information that is stored in a computer or a fax machine and which can be printed out by pressing one or more keys or buttons. No reason appears for thinking that Parliament intended to distinguish between information stored on paper and information stored in the electronic impulses of a computer that can be printed on paper by pressing a key or keys on the computer's keyboard. ...

From: "Muin v Refugee Review Tribunal; Lie v Refugee Review Tribunal", 8 August 2002, High Court of Australia, http://www.austlii.edu.au/au/cases/cth/high_ct/2002/30.html

In 2003 the High Court of Australia concluded that the MIGRATION ACT's definition of documents included electronic documents stored in a database.

Identifying e-documents

  1. "Documents" may include electronic documents: ... Today, in ordinary speech, one can readily refer to a "document" in a database, although such a document may never have been reduced to tangible form. Typically, a database will yield information that appears in paginated format....

  2. ... Electronic "documents" could perhaps be "given" by separate identification and annexure to an electronic transmission. Yet even that was not done in the present case. Merely making such "documents" (or some of them) "available" in a mass of undifferentiated material in a database of constantly changing content does not comply with the language and particular design of the Act ...

From: "Muin v Refugee Review Tribunal; Lie v Refugee Review Tribunal", 8 August 2002, High Court of Australia, http://www.austlii.edu.au/au/cases/cth/high_ct/2002/30.html

The High Court also concluded electronic documents need to be separately identified.



© Tomw Communciations Pty Ltd 2006 - 2008

Creative Commons License
Metadata and Electronic Document Management for Electronic Commerce by Tom Worthington is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 Australia License.

Web page by