Metadata and Electronic Document Management for Electronic Commerce
Metadata for Publishing
Version of 2 August 2008
This item on "Metadata for Publishing" is the second of a segment on "Metadata and Electronic Document Management for Electronic Commerce" first presented for the Australian National University course "Information Technology in Electronic Commerce" (COMP3410/COMP6341).
This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The Slides and these notes are provided in the one HTML document, using HTML Slidy.
metadata n., a set of data that describes and gives information about other data...
[1968 Proc. IFIP 4th Congr.: Suppl. 10 I. 113/2 There are categories of information about each data set as a unit in a data set of data sets, which must be handled as a special meta data set.] 1987 Philos. Trans. Royal Soc. A. 322 373 The challenge is to accumulate data..from diverse sources, convert it to machine-readable form with a harmonized array of *metadata descriptors and present the resulting database(s) to the user. 1998 New Scientist 30 May 35/2 With XML, attaching metadata to a document is easy, at least in theory.
Metadata can be described simply as "Data about Data". As an example the "creator" of this document is "Tom Worthington". The data is "Tom Worthington" and the metadata is "creator".
Metadata is essential for e-commerce, as it provides standard data items to allow parties to communicate about their organisations, products, terms and conditions. The payment and the "money" itself consists of data in an agreed metadata format, in an electronic transaction. Without suitable metadata standards, e-commerce could not take place and "money" in our online financial systems would cease to exist.
Metadata can also be used to describe published documents. The use of metadata for e-commerce and for publishing has converged in the last few years with the use of the same XML technology for both applications.
Australian Government Metadata
<meta name="DC.Publisher" scheme="X500" content="ou=Australian Government Information Management Office (AGIMO) ; o= Commonwealth of Australia ; c=AU">
<meta name="DC.Description" content="The australia.gov.au website is your connection with government in Australia...">
<meta name="DC.Subject" scheme="TAGS" content="Government information; Federal government; Government services; Government publications; Web sites">
<meta name="DC.Type.documentType" scheme="agls-document" content="homepage">
This metadata from the Australian Government home page. It was intended that data in this format would be inserted into the HEAD of all government web pages, to aid data retrieval.
The challenge is to create formats which are sufficiently expressive to be able to communicate what is needed, but simple enough to be implemented efficiently.
Creating and using metadata standards is both a technical and political process. Most standards need to be profiled, to create a workable subset, before they can be used for practical purposes. Some standards need to be enhanced and others should not be used at all.
Tax Office e-commerce transaction
<FORM_PERIOD_LABEL_TEXT>July to September 2001</FORM_PERIOD_LABEL_TEXT>
<EFT_CODE> 51111 121 059 9059</EFT_CODE>
<GST_LABEL_TEXT>for the QUARTER from 1 Jul 2001 to 30 Sep 2001</GST_LABEL_TEXT>
Here is an example of an e-commerce transaction. This is an Australian Taxation Office electronic tax form for the Goods and Services Tax (GST). This is a different use of metadata, for defining the data in a financial transaction.
Scalable Vector Graphics Metadata Definition
<!ENTITY % metadataExt "" >
lt;!ELEMENT metadata (#PCDATA %metadataExt;)* >
<!ATTLIST metadata %stdAttrs; >
The World Wide Web Consortium (W3C) standard for Scalable Vector Graphics (SVG), provides a way to define images in web pages. As well as the expected features of shapes, filling, symbols, colours and patterns there is the 'metadata' element.
Scalable Vector Graphics Metadata Explanation
Individual industries or individual content creators are free to define their own metadata schema but are encouraged to follow existing metadata standards and use standard metadata schema wherever possible to promote interchange and interoperability. If a particular standard metadata schema does not meet your needs, then it is usually better to define an additional metadata schema in an existing framework such as RDF and to use custom metadata schema in combination with standard metadata schema, rather than totally ignore the standard schema.
Simple definition politically complex
The ease of defining metadata using new web based tools has made standardization more difficult. It is technically simple to define a new standard if an exiting definition is not quite right. However, having many standards is as much a problem as having no standards at all.
AUSGILS to AGLS
At the time of the IMSC it was thought that an Australian Government Locator Service would be a variant of the U.S. Government Information Locator Service (GILS). Consequently, for much of its gestation period what is now known as AGLS was referred to as AUSGILS. However, late last year when a workshop of experts convened to develop the AUSGILS standard it was decided to abandon the GILS framework and instead base the online locator service on the Dublin Core metadata standard.
According to this official version of events, the Australian Government Locator Service (AGLS) metadata standard (discussed later) was originally called "AUSGILS" and intended to be based on the U.S. Government Information Locator Service (GILS), but this was abandoned in favour of the Dublin Core metadata standard in 1997. However, the proposed standard was first called "AGILS" in an earlier architecture proposal:
META tag of HTML is used in the header section of HTML documents. Example: <META NAME="Date" CONTENT="1966-01-12">. The field identifiers from the selected meta-data set is used in the NAME field and the field value in CONTENT. The set of meta-data definitions being used (the meta-meta-data) should be included in a tag. Example: <meta name="metadata" content="AGILS">.
This was done for political reasons, to suggest compatibility with the US Government standard. The name was later shortened to AGLS.
Standards, Definitions and Dollars
Adobe Acrobat 5.0 software introduced tagged Adobe PDF, an enhancement to the PDF specification that allows PDF files to contain logical document structure. Logical structure refers to the organization of a document, such as the title page, chapters, sections, and subsections. Tagged Adobe PDF documents can be reflowed to fit small-screen devices and offer better support for repurposing content. They also are more accessible to the visually impaired.
Standards politics are important to metadata and electronic document development in the real world. Standards are selected based on the importance of the organisations and individuals supporting them, not technical merit. Standards are then adapted, extended, made into subsets or combined.
E-commerce and electronic publishing depend on decisions made on what standards to use. Previously separate standards for electronic commerce, documents and television are converging to use the same format (XML).
One example of where standards for document formats and commercial interests collide is the Portable Document Format (PDF). Developed by Adobe as an extension to the Postscript format for desk-top publishing, PDF has provide a popular electronic document format. However, PDF has a number of limitations as an on-screen format and for disabled users. Adobe have attempted to address these limitations with "Tagged Adobe PDF", which added some XML interoperability to the PDF format.
However additional work is needed by document creators to use these features. There is also an inherent contradiction between one of PDF's original selling point of providing an accurate representation of a printed document and the aims of the enhancement of allowing the representation to be transformed. Adobe are not the only ones struggling with this problem. One possible solution is OpenOffice.org's XML Packages format. This packages up XML documents and supplementary binary format data, such as images, in ZIP file format.
Title Typically, Title will be a name by which the resource is formally known. Creator Examples of Creator include a person, an organization, or a service. ... Subject ... keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. Description ... an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. ...
Dublin Core (DC)is a metadata standards project originating from a workshop held in Dublin, Ohio, USA in 1995. "Dublin Core" metadata element set is a small set of metadata definitions intended for cross-domain information resources. However, DC has its origins in the work of librarians and so tends to work better for describing printed text, than other items, such as video.
The intention with DC is to provide a brief standard set of essential metadata items for resources: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights.
Other examples of controlled vocabulary are using the Internet Media Types ( MIME) for defining computer media formats in the format element and language tags, such as "en-AU" for Australian English.
Australian Digital Theses Program
Dublin Core metadata will be automatically generated out of the ADT Deposit form. This metadata will form the basis of the database of distributed digitised theses across the 7 participating institutions. ...
<meta name="DC.language" scheme="RFC3066" content="en">
*** English will be the default language. In order to add another language the Deposit form will need to be amended to add another field. As theses will be predominantly in English, this will remain the default and the issue of other languages and the appropriate scheme to use will be investigated at a future date if necessary.
Other Dublin Core Projects are listed at URL: http://dublincore.org/projects/subject.shtml
The Australian Digital Theses Program provides a database of digitised theses produced at Australian Universities. Authors at ANU use a deposit form, the data from which is expressed as DC metadata. and provided via a search facility.
Element Example Function <META NAME="AGLS.Function" CONTENT="School Education"> Availability <META NAME="AGLS.Availability" CONTENT="Medical assistance is available by contacting the after hours hotline on ..."> Audience <agls:audience>anglers</agls:audience> Mandate <META NAME="AGLS.Mandate.case" SCHEME="URI" CONTENT="http://...">
The Australian Government Locator Service (AGLS) metadata standard is a set of 19 descriptive elements to improve the visibility and accessibility of services and information over the Internet. The AGLS standard is based the 15 Dublin Core elements, plus four extra elements:
AGLS Mandatory Elements
- Publisher (note: this element is not mandatory for descriptions of services)
- Subject OR Function
- Identifier OR Availability
No elements are mandatory for DC, but AGLS requires five (or six) of them.
Qualifiers are additions and extensions to the metadata elements that give metadata creators the option to refine the semantics of the element set, and add precision to the values of the metadata elements. For example, it may be useful to indicate that the value has been selected from a particular controlled vocabulary, such as a list of keywords, or is encoded using a particular convention - the format for dates is an important case - or in a particular natural language.
Qualifiers are used to restrict the semantics of the relationship between the resource and the element value. AGLS encourages more use of qualifiers than DC, but does not require it.
Element refinements are represented in HTML <meta> syntax with qualifiers appended to the element names. For example: "DC.Type.documentType". Note that the "T" in "Type" in the example is in upper case, whereas the "d" of "document" is not. This is a somewhat odd practice in DC.
Encoding schemes indicate how the value is to be interpreted if it has been chosen from a controlled vocabulary, or externally defined standard. For example:
<META NAME="DC.Date.modified" SCHEME="ISO8601" CONTENT="1998-08-27">
AGLS uses two types of qualifiers.
This is a demonstration of DSTC's Reg metadata editor. Reg allows you to:
- enter metadata
- export metadata in a number of syntaxes
- save metadata records to a test repository
- reload metadata records from a repository for editing
Reg uses metadata schemas to customize itself for different metadata element sets. ...
Metadata is rarely entered be the document author typing in text. When encoded in the header of a HTML document the metadata is not displayed by a web browser. Specialized software, such as a content management systems, or features in word processors are used to enter and display the metadata. The user of the system is likely to be unaware they are using a metadata standard or how it is encoded. Examples of how these systems will be shown later.
The Distributed Systems Technology Centre (DSTC Pty Ltd), has produced a metadata tool to create AGLS and Dublin Core metadata. Rege, can be used to generate AGLS metadata syntax. This would be too cumbersome for creating real metadata, but is a useful way to learn about the process.