Metadata for Publishing

Tom Worthington FACS HLM

Visiting Fellow, Department of Computer Science, Australian National University, Canberra

For: COMP3410: Information Technology in Electronic Commerce, at The Australian National University
This document is Version 2.1 13 August 2003:



This material is part of Metadata and Electronic Document Management: Searching for a Common Understanding, prepared for "Information Technology in Electronic Commerce" (COMP3410), at the Australian National University, semester 2, 2003.


The Oxford English Dictionary describes metadata as:

metadata n., a set of data that describes and gives information about other data...
[1968 Proc. IFIP 4th Congr.: Suppl. 10 I. 113/2 There are categories of information about each data set as a unit in a data set of data sets, which must be handled as a special meta data set.] 1987 Philos. Trans. Royal Soc. A. 322 373 The challenge is to accumulate data..from diverse sources, convert it to machine-readable form with a harmonized array of *metadata descriptors and present the resulting database(s) to the user. 1998 New Scientist 30 May 35/2 With XML, attaching metadata to a document is easy, at least in theory.
Oxford English Dictionary, (Online) Draft entry Dec. 2001, URL:

Metadata can be described more simply as "Data about Data". As an example the "creator" of this document is "Tom Worthington". The data is "Tom Worthington" and the medadata is "creator".

Metadata is essential for e-commerce, as it provides standard data items to allow parties to communicate about their organisations, products, terms and conditions. The actual payment and the "money" itself consists of data in an agreed metadata format, in an electronic transaction. Without suitable metadata standards, e-commerce could not take place and "money" in our online financial systems would cease to exist.

Metadata can also be used to describe published documents. The use of metadata for e-commerce and for publishing has converged in the last few years with the use of the same XML based technology for both applications.

Here is an example of the metadata for the ANU home page:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name='area' content='Corporate Information Services' >
<meta name='contentStatus' content='official'>
<meta name='dc.creator' content=''>
<meta name='' content=''>
<meta name='' content='2003-2-1'>
<meta name='' content='2003-12-31'>
<meta name='dc.description' content='The Australian National University's home page'>
<meta name='dc.publisher' content='Director, Corporate Information Services'>
<meta name='' content=''>
<meta name='dc.subject' content='Australia, Canberra, university, Australian National University, Institute of Advanced Studies, research, undergraduate, graduate, students, CRICOS Provider Number: 00120C '>
<title>The Australian National University</title>
From: The Australian National University, Marketing and Communications, 17 July 2003, URL:

Here is an example of an e-commerce transaction. This is an Australian Taxation Office electronic tax form for the Goods and Services Tax (GST):

<EFT_CODE> 51111 121 059 9059</EFT_CODE>
<GST_LABEL_TEXT>for the QUARTER from 1 Jul 2001 to 30 Sep 2001</GST_LABEL_TEXT>
From: Formatting the eBAS with XSL, Tom Worthington, 29 November 2002, URL:

The (Australian) Commonwealth Government Entry Point: indexes over 500 Federal Government web sites using AGLS, with 1,000,000 pages using metadata:

Commonwealth Departments and agencies (including authorities) are required to:
From: Government Policy –, Department of Communications, Information Technology and the Arts, 2001, URL:

The Politics of Data Standards

The common theme of this work is the creation, transmission, storage, discovery and display of information in electronic format. The subtitle "Searching for a Common Understanding" comes from the need for those creating electronic information to agree a common format for the information to be understood. The challenge is to create formats which are sufficiently expressive to be able to communicate what is needed, but simple enough to be implemented efficiently.

Those involved in creating a standard, and in using it, must have a common understanding of what is needed and what is enough. In implementing metadata and data management standards IT professionals need to keep the politics of standards development in mind. Most standards need to be profiled, to create a workable subset, before they can be used for practical purposes. Some standards need to be enhanced and others not used at all.

The World Wide Web Consortium (W3C) standard for Scalable Vector Graphics (SVG), provides a way to define images in web pages. As well as the expected features of shapes, filling, symbols, colours and patterns there is the 'metadata' element:

<!ENTITY % metadataExt "" >
<!ELEMENT metadata (#PCDATA %metadataExt;)* >
<!ATTLIST metadata %stdAttrs; >
From 21.2 The 'metadata' element, Scalable Vector Graphics (SVG) 1.0 Specification W3C Proposed Recommendation 19 July, 2001, URL:

This apparently technically simple definition is made politically complex by a preceding paragraph:

Individual industries or individual content creators are free to define their own metadata schema but are encouraged to follow existing metadata standards and use standard metadata schema wherever possible to promote interchange and interoperability. If a particular standard metadata schema does not meet your needs, then it is usually better to define an additional metadata schema in an existing framework such as RDF and to use custom metadata schema in combination with standard metadata schema, rather than totally ignore the standard schema.
From 21.1 Introduction, Scalable Vector Graphics (SVG) 1.0 Specification W3C Proposed Recommendation 19 July, 2001, URL:

The important points here are: " to define their own metadata schema but are encouraged to follow existing metadata standards ... better to define an additional metadata schema in an existing framework ...". In some ways the ease of defining metadata using new web based tools has made the standardization process more difficult. It is very tempting if an exiting definition is not quite right to define a new standard and hope that some tool will allow conversion between the standards. However, having many standards is a similar problem to no standards at all.


According to the official version of events, the Australian Government Locator Service (AGLS) metadata standard (discussed in detail later) was originally called "AUSGILS" and intended to be based on the U.S. Government Information Locator Service (GILS), but this was abandoned in favour of the Dublin Core metadata standard in 1997:

At the time of the IMSC it was thought that an Australian Government Locator Service would be a variant of the U.S. Government Information Locator Service (GILS). Consequently, for much of its gestation period what is now known as AGLS was referred to as AUSGILS. However, late last year when a workshop of experts convened to develop the AUSGILS standard it was decided to abandon the GILS framework and instead base the online locator service on the Dublin Core metadata standard.
From: Enabling Seamless Online Access to Government, Adrian Cunningham, National Archives of Australia, 26 August 1998, URL (archived copy):

However, this author's recollection differs. The proposed standard was first called "AGILS" in an earlier architecture proposal:

META tag of HTML is used in the header section of HTML documents. Example: <META NAME="Date" CONTENT="1966-01-12">. The field identifiers from the selected meta-data set is used in the NAME field and the field value in CONTENT. The set of meta-data definitions being used (the meta-meta-data) should be included in a tag. Example: <meta name="metadata" content="AGILS">.
From: Architecture For Access To Government Information, Report of the IMSC -Technical Group, Commonwealth of Australia, 25 July 1996, URL (archived copy):

This was done for political reasons, to suggest compatibility with the US Government standard. There was not necessary any intention to achieve computability. The name was later shortened to AGLS.

Standards, Definitions and Dollars

Standards politics are very important to metadata and electronic document development in the real world. Few of decisions are made based on the technical merits of proposals. There are few cases where metadata standards are developed from first principles. Selections are made from existing metadata standards, based on the level of support for those standards, and the perceived importance of those organisations and individuals supporting them. Standards are then adapted, extended, made into subsets or combined.

Thousands of millions of dollars in business for e-commerce and electronic publishing depend on decisions to be made over what standards to use. Previously separate standards for electronic commerce, documents and television are converging to use the same format (XML). These same formats are proposed to be used for areas such as TV. How rapidly and how effectively will this convergence happen?

One example of where standards for document formats and commercial interests collide is the Portable Document Format (PDF). Developed by Adobe as an extension to the Postscript format for desk-top publishing, PDF has provide a popular electronic document format. However, PDF has a number of limitations as an on-screen format and for disabled users. Adobe have attempted to address these limitations with "Tagged Adobe PDF", which adds some XML interoperability to the PDF format.

Adobe Acrobat 5.0 software introduces tagged Adobe PDF, an enhancement to the PDF specification that allows PDF files to contain logical document structure. Logical structure refers to the organization of a document, such as the title page, chapters, sections, and subsections. Tagged Adobe PDF documents can be reflowed to fit small-screen devices and offer better support for repurposing content. They also are more accessible to the visually impaired.
From Adobe PDF, Adobe Systems Incorporated, 2001, URL:

However additional work is needed by document creators to use these new features. There is also an inherent contradiction between one of PDF's original selling point of providing an accurate representation of a printed document and the aims of the enhancement of allowing the representation to be transformed. Adobe are not the only ones struggling with this problem. One possible solution is's XML Packages format. This packages up XML documents and supplementary binary format data, such as images, in ZIP file format.

Dublin Core

Dublin Core (DC)is a metadata standards project originating from a workshop held in Dublin, Ohio, USA in 1995. "Dublin Core" metadata element set is a small set of metadata definitions intended for cross-domain information resources. However, DC has its origins in the work of librarians and so tends to work better for describing printed text, than other items, such as video.

The intention with DC is to provide a brief standard set of essential metadata items for resources:

The Elements

  1. Element Name: Title
  1. Element Name: Creator
  1. Element Name: Subject
  1. Element Name: Description
  1. Element Name: Publisher
  1. Element Name: Contributor
  1. Element Name: Date
  1. Element Name: Type
  1. Element Name: Format
  1. Element Name: Identifier
  1. Element Name: Source
  1. Element Name: Language
  1. Element Name: Relation
  1. Element Name: Coverage
  1. Element Name: Rights

Reformatted from: Dublin Core Metadata Element Set, Version 1.1: Reference Description, DCMI, 2003-06-02, URL:

Encoding the date value

Metadata items may be free text fields, but will typically be constrained in some way. As an example the element Date is recommended is to be encoded using ISO 8601 (the International Standard for representation of dates). DC give the example of the form YYYY-MM-DD.

Controlled vocabulary

Some elements are recommended to be used with a value from a controlled vocabulary . As an example for the Type element, the DCMI Type Vocabulary includes "Event" and "Text":

Term Name: Event ...
Definition: An event is a non-persistent, time-based occurrence. Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, responsible agents, and links to related events and resources. The resource of type event may not be retrievable if the described instantiation has expired or is yet to occur. Examples - exhibition, web-cast, conference, workshop, open-day, performance, battle, trial, wedding, tea-party, conflagration. ...
Term Name: Text ...
Definition: A text is a resource whose content is primarily words for reading. For example - books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre text. ...
From DCMI Type Vocabulary, DCMI Usage Board, 2003-02-12, URL:

Other examples of controlled vocabulary are using the Internet Media Types ( MIME) for defining computer media formats in the format element and language tags, such as "en-AU" for Australian English.

Australian Digital Theses Program

The Australian Digital Theses Program provides a database of digitised theses produced at Australian Universities. Authors at ANU use a deposit form, the data from which is expressed as DC metadata. and provided via a search facility.

Other Dublin Core Projects are listed at URL:

Australian Government Locator Service (AGLS)

The Australian Government Locator Service (AGLS) metadata standard is a set of 19 descriptive elements to improve the visibility and accessibility of services and information over the Internet. The AGLS standard is based the 15 Dublin Core elements, plus four extra elements:





The business function to which the resource relates

<META NAME="AGLS.Function" CONTENT="School Education">


How the resource can be obtained or accessed, or contact information

<META NAME="AGLS.Availability" CONTENT="Medical assistance is available by contacting the after hours hotline on 1800 123456">


The target audience of the resource



A specific legal instrument which requires a resource to be created or made available


Complied from AGLS Metadata Element Set, Part 2: Usage Guide, Version 1.3 , National Archives of Australia, 2002, URL:

No elements are mandatory for DC, but AGLS requires five (or six) of the following:

From: AGLS Metadata Element Set, Part 2: Usage Guide, Version 1.3 , National Archives of Australia, 2002, URL:


Qualifiers are used to restrict the semantics of the relationship between the resource and the element value. AGLS encourages more use of qualifiers than DC, but does not require it:

Qualifiers are additions and extensions to the metadata elements that give metadata creators the option to refine the semantics of the element set, and add precision to the values of the metadata elements. For example, it may be useful to indicate that the value has been selected from a particular controlled vocabulary, such as a list of keywords, or is encoded using a particular convention "“ the format for dates is an important case "“ or in a particular natural language.
From: AGLS Metadata Element Set, Part 2: Usage Guide, Version 1.3 , National Archives of Australia, 2002, URL:

AGLS uses two types of qualifiers:

  1. Element refinements are represented in HTML <meta> syntax with qualifiers appended to to the element names. For example: "DC.Type.documentType". Note that the "T" in "Type" in the example is in upper case, whereas the "d" of "document" is not. This is a somewhat odd practice in DC.

  2. Encoding schemes indicate how the value is to be interpreted if it has been chosen from a controlled vocabulary, or externally defined standard. For example:

<META NAME="DC.Date.modified" SCHEME="ISO8601" CONTENT="1998-08-27">.

Metadata Tools

Metadata is rarely entered be the document author typing in text. When encoded in the header of a HTML document the metadata is not displayed by a web browser. Specialized software, such as a content management systems, or features in word processors are used to enter and display the metadata. The user of the system is likely to be unaware they are using a metadata standard or how it is encoded. Examples of how these systems will be shown later.

The Distributed Systems Technology Centre (DSTC Pty Ltd), has produced a metadata tool to create AGLS and Dublin Core metadata. Reggie, can be used to generate AGLS metadata syntax. This would be too cumbersome for creating real metadata, but is a useful way to learn about the process.

Further Information

Copyright © Tom Worthington 2000 - 2003