Case Study: Metadata Management Facility And Search Tool for New Zealand

Tom Worthington FACS

Visiting Fellow, Department of Computer Science, Australian National University, Canberra

For: Computing 3410 Students, The Australian National University

This document is Version 1.1 9 August 2001: http://www.tomw.net.au/2001/nzmmf.html

Contents

Introduction

This material was prepared for the unit Information Technology in Electronic Commerce (COMP3410) at the Australian National University, semester 2, 2001. Accompanying documents discuss A Common Understanding and Electronic Document Management and the Digital Library for E-commerce.

On 6th August 2001 the New Zealand government issued a Request for Information (RFI) for the provision of a Metadata Management Facility (MMF), encompassing all metadata-related functions required to support their E-government Citizens Portal. The RFI will be used to assess the general availability of products, the current state of the market, and indicative costing for budgeting and planning purposes. If it appears suitable products exist, then Requests for Products (RFPs), are used to solicit competitive tenders. The MMF is required by the end of 2001, with phase one of the Portal due 30 June 2002, leaving little time for selection and installation.

The NZ RFI is a useful case study for an Australian audience, as it is current, similar to Australian requirements and self-contained. At the same time the RFI is board in its scope, covering essentially all on-line information for the government of a country. It is envisaged to create, collect, maintain, and search, the metadata for citizens interacting with the Government on-line.

Request for Information

The RFI includes two short appendices giving an overview of the NZ E-government Programme and the scope and key functional components of the envisaged system. The NZ E-Government web site gives details of the NZ E-government Strategy and projects. The RFI is part of the Portal Strategy, one of approximately 15 E-government projects in progress.

The NZ Government has sought part and complete solutions for the Metadata Management Facility. A preference for proven, standards-based products was expressed. Given the rapid development of metadata standards, the project will need to make difficult decisions between proven products and standards conformance. As an example the "XML Schema" standard discussed below, which may be critical to the project, was issued in May 2001, only a few months before the RFI. The NZ Government will need to decide between support for such standards using new software or using older software which does not support standards.

The response asked for:

  1. a brief summary of the products offered and their applications;
  2. the company's profile, size, and support base in the New Zealand region;
  3. sufficient information to understand the company's capability in key areas;
  4. a list of organisations where the products are currently deployed for a similar purpose;
  5. indicative pricing of the products, and licensing/maintenance charges;
  6. indication of the extent to which the products use Open Source components;
  7. a list of the platforms capable of hosting the products.

These are typical requirements for a RFI, apart from the second last: "indication of the extent to which the products use Open Source components". While not explicitly stated, it can be assumed that Open Source products are preferred. This is a radical step for a government agency and may be the first time Open Source has been explicitly solicited for a government project of this type anywhere in the world. It should be noted that this is different to a requirement for the support of standards., which is common in government IT RFI/Ts. There is no necessary link between standards compliance by software and Open Source.

While requesting a company's profile, size, local support base and clients is usual for an RFI, it may be less relevant for one seeking Open Source and standards based technology. It is generally assumed that a large company with local staff and a large number of clients is a good indication of capability, in the Internet age it may be the opposite. Open Source technology is developed by individuals and small organisations cooperating world wide. There are rapid developments with the standards. The result may be that large established companies, relying on local staff to support existing products for many clients may be least able to offer good products. Smaller, newly established companies which can implement the latest developments quickly and support the clients remotely vie the Internet may provide a better service.

The RFI was also somewhat unusual in the "open" way it was distributed. Government RFI/Ts are now routinely made available freely and free on-line., However, at least in Australia, there is typically a requirement for companies to register their contact details before being provided with the documents. The on-line systems for this are typically cumbersome to use and frequently do not work at all, slowing down access to the documents. This encumbrance is usually justified by the agencies claiming there is a requirement to provide potential tenderers with updates of the tender documents. However, an optional registration for updates would be adequate.

Typically fax, telephone and postal communication are emphasised, even where the government agency has an on-line RFT. Agencies appear to have difficulty handling e-mail queries. The NZ RFI is unusual in expressing a strong preference for e-mail: "All questions or communication should be sent via e-mail ...". Formal responses to an RFI are also usually required to be on paper, with electronic delivery not permitted, or required to be confirmed by a paper copy. This is justified by agencies by claiming that electronic documents lack sufficient authentication and security, despite there being case law accepting communication by more primitive electronic means. The NZ RFI simply requires an e-mail message.

The RFI asks for responses by the same e-mail address as used for queries. This may create some problems. While those sending responses are requested to use a subject line of 'Metadata RFI', there is the potential for confusion as to what is a formal response and what is a query. It may have been better to create a special e-mail address for responses and have it issue automated acknowledgement of receipt.

To authenticate information about the RFI, the official web site is relied on: "Any current or future reference to this project on any website other than the State Services Commission's corporate site (www.ssc.govt.nz) or the E-government Unit site (www.e-government.govt.nz) is NOT authorized or recognized by the Commission". Given the RFI is about metadata, it is ironic that the web version of the document contains no metadata, apart from the title. The Requests for Information/Proposals web page the RFI is referenced from has description ("New Zealand's E-government Programme") and keywords ("e-government, interoperability, portal, information infrastructure, S.E.E, metadata, e-procurement") as metadata.

The deadline for responses is Friday 24th August 2001, giving less than a month for responses. This is less onerous due to the omission of company financial status and other detail required for an RFT and the ability to submit by e-mail.

The RFI is from the NZ E-Government Unit, which was established in July 2000 as part of the State Services Commission, to optimize the use of information and information technology in the public sector. It should be noted that the E-Government Unit addresses only internal public sector ICT processes. The equivalent agency in the Australian Government was Office for Government Online (OGO). The Australian Government later established the National Office for the Information Economy (NOIE) in 1997, addressing technical, regulatory and social issues affecting government, business and consumers, in the take-up of online services and the development of the information economy. The similarity in function between OGO and NOIE, addressing many of the same issues, but one within the Public Service and the other for the wider community, was addressed in October 2000 when OGO was incorporated into NOIE. The issue of the impact of the RFI on the wider NZ economy is not addressed in the RFI. However, there is an E-Government Advisory Board, with private sector members which might help provide a broader perspective.

Key Functional Components

The high-level summary of the key functional components of the Metadata Management Facility in the RFI required:

  • A web-based metadata creation tool, with the following functionality:
    1. enforced validation against NZGLS criteria;
    2. links to externally managed thesauri, controlled lists, and directories;
    3. customised entry forms to meet specialised entry requirements;
    4. handling of customised element extensions for individual users;
    5. full, simultaneous handling of English and Maori, for the data entry, display and storage of metadata, based on the Unicode (UTF-8) standard;
    6. context sensitive help.
  • A metadata repository, or database, to securely manage NZGLS metadata and deliver it to searchers on the E-government portal.
  • A programmatic interface to enable searching of the metadata repository.
  • An import/harvesting mechanism to collect NZGLS metadata from clients in RDF/XML or HTML metatag format.
  • A web-based export mechanism to deliver NZGLS metadata to clients in RDF/XML or HTML metatag format.
  • Software to create thesauri, maintain them, and publish them on the web.
  • A built-in process for managing the workflow and quality assurance of metadata.
  • A role-based security system controlling access to all features of the system.

(RFI)

These are very similar to the approach taken by the Australian Government:

5.1 For the short term, adoption of a cooperative model in which locally created metadata, together with index data for document full texts (where appropriate) would be gathered in a shared index for the whole-of-Australian government search facility. This model could, in the first instance, be an adaptation of an existing Internet search service to meet the functional requirements set out in this report.

5.2 For the medium and long term, adoption of a distributed model for the whole-of-Australian government search facility. This model would involve the creation and indexing of AGLS metadata by individual agencies or consortia in the agency layer. The whole-of-Australian government search facility would interact with the agency indexes by invoking the respective agency or consortia search engines via the middle layer. From Functional Requirements for a Whole-of-Australian-Government Search Architecture, Search Engine Working Group (SEWG), January 1998.

In practice the use of a middle layer has not been implemented by the Australian Government, with meta-data currently harvested directly from web pages. Also there is more reliance on free text searching than meta-data. There have been considerable developments in metadata software and standards which may make the NZ implementation easier than in Australia. One requirement only considered briefly in Australian requirements was that of multiple languages. NZ requires simultaneous handling of English and Maori.

NZGLS (New Zealand Government Locator Service)

The NZ Government is planning to implement an adaption of the Australian Government Locator Service (AGLS), which is in turn based on the widely adopted Dublin Core metadata standard. Version 1.0 of the New Zealand Government Locater Service (NZGLS) was released in April 2001. According to the manual NZGLS is based closely on AGLS, and it is intended that NZGLS remain compatible AGLS.

The need for compatibility creates some complex restrictions on the use of metadata:

A guiding principle established by the Dublin Core Metadata Initiative (DCMI) to ensure compatibility is the so-called 'dumb-down' rule. Put simply, this rule states that an element value must be meaningful when no qualifiers are present. Thus, for example, a value for DC.Coverage.spatial must be meaningful when the element is in the simple form DC.Coverage. Similarly in NZGLS, the value for the qualified element NZGLS.Mandate.act must make sense when the element is in the form NZGLS.Mandate. This rule does allow for the use of value components that are not part of the DC standard, because value components appear as part of the value. However, in order to ensure that the 'dumb-down' rule can be applied to values containing value components, element values containing more than one value component must be contained in a single instance of the relevant element. In other words, when using value components, individual value components should not be put into repeated instances of the base element. From 6.5 RFI

XML Open Source Tools for Metadata

One of the difficulties of implementing metadata standards has been the lack of widely available tools to create, check and interpret the data. The recent adoption of the XML Schema, may considerably simplify the process, allowing for automatic generation of input and output systems. A draft XML Schema for Dublin Core has already been created and this could be expanded to cover NZGLS. This would then allow the use of tools developed, including open source code, for applications.

XML Schema's richer expressive power could be used to enforced validation against NZGLS criteria, with less need for custom software development or additional tools. Links to externally managed thesauri, controlled lists, and directories could be implemented using the same syntax. Customised entry forms to meet specialised entry requirements, could be automatically generated from the NZGLS definition. Handling of customised element extensions for individual users could be done using the Complex Type, Abstract Attribute and other advanced features of XML Schema. Available Open Source software could be used to collect NZGLS metadata from clients in RDF/XML or HTML metatag formats and translate to a suitable format for internal storage. A web-based export mechanism to deliver NZGLS metadata to clients in RDF/XML or HTML metatag format could be generated using already existing tools.

Tools and infrastructure built for publishing web pages are now being expanded through new XML standards to implement business computing functions. In particular the Apache XML Project may provide a suitable set of tools, built on to the popular Apache open source web server, to provide e-business applications. This approach would allow the NZ Government to easily add business applications which exploit the metadata stored in the system to provide custom applications for citizens. While manual procedures and automated tools can be used to check the quality and constancy of metadata entered, the use of that metadata in systems used by the clients will ensure any deficiencies will be quickly identified and rectified.

Further Information

Copyright © Tom Worthington. 2001