Metadata and Electronic Document Management for Electronic Commerce

Electronic Publishing

Tom Worthington

Version of 12 August 2008

This item on "Electronic Publishing" is one of a segment on"Metadata and Electronic Document Management for Electronic Commerce" first presented for the Australian National University course "Information Technology in Electronic Commerce" (COMP3410/COMP6341).

This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The Slides and these notes are provided in the one HTML document, using HTML Slidy.

Electronic Publishing

The Australian Computer Society (ACS) has been publishing for thirty years in support of its mission and objects.

  • Mission
    • To advance professional excellence in information technology.
  • Principal Object
    • To promote the development of Australian information and communications technology resources.
  • Secondary Objects
    • To advance professional excellence in information and communications technology.
    • To further the study, science and application of information and communications technology. ...

"The ACS", Australian Computer Society, 2003, URL:

How and why might the ACS do scholarly publishing online?

Petition from Public Library of Science

We believe that the permanent, archival record of scientific research and ideas should neither be owned nor controlled by publishers, but should belong to the public, and should be made freely available.

We support the establishment of international online public libraries of science that contain the complete text of all published scientific articles in searchable and interlinked formats.

From: "Open Letter", Public Library of Science, Patrick O. Brown and Michael Eisen, 2001, URL:

The issue of how free and open access to scholarly research should be, came up with a 2001 petition from the Public Library of Science. An Advocacy Group made up of 11 people from US based and one from UK academic institutions proposed the establishment of international online public libraries of science with the complete text of all published scientific articles

The group claimed 26,144 researchers from 170 countries signed the open letter urging publishers to allowing research reports from their journals to be publicly available. The web site for the group is maintained by Patrick O. Brown, Stanford University School of Medicine and the Howard Hughes Medical Institute and Michael Eisen of the Lawrence Berkeley National Lab and University of California at Berkeley.

There was no subsequent boycotting of traditional publishers. But there has been a gradual change in the way research publishing is done. An interesting issue is the position of information technology researchers on the issue, given their role in creating the technology used for electronic publishing.

E-Publishing at ACS

The Australian Computer Society (ACS) publishes:

Some editions of some publications are made available free on-line in PDF or web format. However, there was no overall digital library. The ACS took several years considering publishing strategies, including e-publishing.

Open Archives Initiative

Digital Library Federation Encourages Use of Open Archives Initiative The Digital Library Federation (DLF) is supporting the development of a small number of Internet gateways through which users will access distributed digital library holdings as if they were part of a single uniform collection. The gateways will be built using the OAI Metadata Harvesting Protocol. DLF gateways will contribute to a practical evaluation of the OAI's harvesting technique and its application within libraries to encourage digital collection managers to expose metadata and build services.

From: Open Archives Initiative, URL:, 2001

Activities such as the Open Archives Initiative are attempting to construct a virtual library of material using distributed document archives and shared metadata.

Organisations now considering electronic publications strategies can use an integrated approach using newer XML tools to create and maintain content. The ACS had a tradition of providing the content of its journal free for non-profit use. This was extended into an electronic edition in a format suitable for direct citation and annotation with metadata in a format suitable for harvesting by specialised virtual library tools as well as traditional web search engines. The content was made available for education.

ACM Digital Library

The Association for Computing Machinery (ACM) is a professional society that publishes research journals and magazines in computer science. It also organizes a wide variety of conferences, many of which publish proceedings. ACM is typical of the publishers that have moved rapidly into electronic publication of conventional journals. In 1993, the ACM decided that its future publication process would be a computer system that creates a database of journal articles, conference proceedings, magazines and newsletters, all marked up in SGML. Subsequently, ACM also decided to convert large numbers of its older journals and build a digital library covering its publications from 1985. The digital library will eventually extend back to ACM's foundation in 1948.

From: "Preservation of Scientific Serials: Three Current Examples", WILLIAM Y. ARMS, The Journal of Electronic Publishing December, 1999, Volume 5, Issue 2 ISSN 1080-2711,

A pioneer of e-publishing for IT has been the Association for Computing Machinery (ACM). The ACM collection was made available on-line in 1997 and the web interface allows the contents pages of the journals to be browsed and metadata searched. New content was created in SGML, then web, PDF and print versions generated for that. The online service is by paid subscription to members, non-members and institutions or sales per article. The service has proved popular and ACM is considering discontinuing some print titles.

ACM journals accept articles in a number of electronic formats using supplied templates. The PDF versions of documents generated are close in format to the print editions, but the HTML versions use a different format more suited to on-line viewing. Graphics are shown as small thumbnail versions, with links to high resolution versions.

E-publishing not easy

... the current track 1 production process:

1. The paper is received from EIC, and is logged into the system.

2. The paper is converted from whatever original format into SGML (requires intervention). For mathematics, ACM requires that minimum customization be inserted into LaTeX. ....

3. The SGML is copy edited (by the managing editors). ... email notification to the lead author to let them know to expect a galley in one week and that they will have 48 hours to respond to the galley. ...

From: "Minutes of the Publications Board Meeting", ACM, May 5, 2000, URL:">

E-publishing still problematic

Some of these issues were to do with limitations in electronic publishing software, which are still apparent today:

4. The reference section is created separately from the SGML file because it has to be citation-linked ....

5. ... Proof is sent to the author before any tweaking takes place. After feedback from the author, layout is tweaked ...

6. Problems in layout: tables with multiple columns which have different widths (the auto-table generator makes all columns of equal widths, so these must be tweaked by hand during composition).

7. Illustrations and figures are processed separately. If received figures are in TIF or EPS, they can be electronically processed and inserted during composition. Many times, the EPS file is non-standard ...

From: "Minutes of the Publications Board Meeting", ACM, May 5, 2000, URL:">


Given the rapid development in XML it was considered better for the ACS to wait until the technology was more widely available, rather than implement a SGML/PDF system which would then have to be replaced.

IEEE Xplore, the online delivery system for all the IEEE's journals, magazines, conference proceedings, and standards, is now bigger and better than ever, thanks to its latest release, launched in December. ...

Another enhancement is full-text HTML formats for issues of IEEE Spectrum and Proceeding of the IEEE going back to January 2002. PDF versions are still available, but articles presented in HTML are easier to navigate, Williams says.

From: " Upgrade Makes IEEE Xplore Easier to Explore", ERICA VONDERHEID, IEEE, 23 February 2004

The minutes of the ACM Publications Board show the considerable complexities and manual processing steps which had to be automated.

JRPIT PDF example

JRPIT is published in a relatively efficient PDF format (only 39 kbytes for a 10 page paper with one photo).

PDF example

"The Future of Open Source Software", Bill Appelbe, JRPIT, Volume 35, No. 4, 2003, URL:


Zooming in to be able to read the text results in lines dropping off right hand side of the screen:

Detail from PDF example

Detail from "The Future of Open Source Software", Bill Appelbe, JRPIT, Volume 35, No. 4, 2003, URL:

Using to Translate Documents

Template Translation to OpenOffice XML Format
Template instructions.ms_word TRANS-JOUR.DOC
Size (Kbytes) 23 111
Converted File instructions.ms_word.sxw TRANS-JOUR.sxw
Size (Kbytes) 10 39
XML icontent.xml tjcontent.xml

An example of a document converted using, is "ICT Development in Australia - A Strategic Policy Review" prepared for the Australian Computer Society by Professor Houghton. The web adaption of the report was created from the MS-Word version. This was done by first importing the MS-Word document into and saving in HTML. The HTML was run through the "Tidy" utility to replace formatting commands throughout the document with styles. The table of contents was then manually re-linked to the document sections and ALT text placed on images.

Using to produce HTML has limitations. A better approach may be to use OpenOffice's internal XML format as an intermediate format. This retains more information about the original MS-Word document, than is present in a HTML translation.

As an example the Microsoft Word Style Files for ACM Journals and IEEE Transactions were converted to OpenOffice format:

OpenOffice files are stored as a directory of ZIP compressed files. The text of the word processing document is stored in a file labeled "content.xml" in the directory. Images and other binary files are stored in sub-directories.

WP Styles Translate to XML


<text:p text:style-name="Title">This Is the Title of the Paper</text:p>


<text:p text:style-name="Primary Head">1. INTRODUCTION </text:p>

From: " Publishing Options for Academic Material", Tom Worthington, 2002: URL:

XML Document Formats

Styles from the original style sheet are reflected in text styles in the translated XML documents.

A modified version of the Sun Microsystems developed Open Office format was adopted as an OASIS Standard on May 1, 2005. This "Open Document Format" (ODF) was adopted as an international standard ISO/IEC 26300:2006 in May 3, 2006.

Microsoft's Office Open XML format (OOXML) has similar features to ODF and is Draft International Standard 29500.

Both ODF and OOXML suffer from being derived from legacy word processing packages. A better alternative would be to use XHTML 2 and new CSS standards.

Progress on Publishing Systems

  1. Arrow Discovery Service with metadata for scholars
  2. ACS Digital Library
  3. IFIP Digital Library

While the formats for publishing have been controversial, progress has been made on the metadata for publishing systems. The ACS has produced a Digital Library system which implements metadata for scholars via services such as the Arrow Discovery Service. A similar system is being implemented at the ANU for IFIP in 2008: IFIP Digital Library.

© Tomw Communciations Pty Ltd 2006 - 2008

Creative Commons License
Metadata and Electronic Document Management for Electronic Commerce by Tom Worthington is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 Australia License.