Namur, January 2004
Ian H. Witten
Department of Computer Science, University of Waikato, New Zealand
ihw@cs.waikato.ac.nz
Digital libraries are large, organized collections of information objects. Whereas
standard library automation systems provide a computerized version of the catalog—a
gateway into the treasure-house of information stored in the library—digital
libraries incorporate the treasure itself, namely the information objects that
constitute the library’s collection. Whereas standard libraries are, of
necessity, ponderous and substantial institutions, with large buildings and
significant funding requirements, even large digital libraries can be lightweight.
Whereas standard libraries, whose mandate includes preservation as well as access,
are “conservative” by definition, with institutional infrastructure
to match, digital libraries are nimble: they emphasize access and evolve rapidly.
Most existing digital library projects, being research-oriented, are predicated
on state-of-the-art equipment and interfaces, academic and research institutions,
special collections. In contrast, this article argues for universal access:
digital library technology can and should be available to everyone, on all platforms,
in all countries; and it can and should enable ordinary people to exercise their
creative powers to conceive, assemble, build, and disseminate new information
collections that are designed not just for western academics but for a wide
diversity of different audiences throughout the world. Though less glamorous,
this may, in the end, be a more important goal for society. Digital libraries
pose an inherent tension between the technologist’s desire for advanced
solutions that use the latest and greatest hardware and software, and the librarian’s
desire for wide, cross-platform availability and long-term preservation—as
epitomized by the sustained success of paper as a delivery medium. To achieve
universal access for both information consumers and collection-builders is really
a problem for HCI.
In the next section we examine the social need for digital libraries, particularly
in developing countries, by briefly sketching some trends in commercial publishing
and contrasting them with a growing international perspective of information
as a public good. We draw out the implications for the user interface, which
is the principal bottleneck in allowing non-specialist people to make public
information available in focused collections that are universally usable. Then
we introduce digital library technology and illustrate it with a particular
example, the Greenstone digital library software, which is designed for a broad
user base and is in widespread use in many corners of the world—from Uganda
to the US, Kazakhstan to Canada, Nepal to New Zealand. Following that, we review
a project that is applying digital library technology to the distribution of
humanitarian information in the developing world, a context that is both innovative
and socially motivated. Next we discuss issues of universal access and illustrate
them with reference to the Greenstone software. We include a brief demonstration
of a prototype system that is intended to allow anyone to build and disseminate
information collections, and illustrates some human interface challenges that
arise when providing necessarily complex functionality to a non-computer-oriented
user base. We close with the hope that future digital libraries will find a
new role to play in helping to reduce the social inequity that haunts today’s
world, both within our own countries and between nations.
Today, the long-standing three-way tension between the commercial interests
of publishers, the needs of society and information users, and the social mandate
of public libraries, is being pulled and stretched as never before.
First, the very notion of a “book” is evolving in many different
directions: books become more interactive; publishers rent content; books are
distributed under restrictive conditions that mechanically prohibit sharing.
While it would be premature to make specific predictions, it seems likely that
these trends will further disadvantage the disadvantaged—particularly
those in poorer countries who have yet to benefit from ready access to ordinary
books. Second, a huge body of information is becoming freely available on the
Internet. Much is of questionable quality, but some is very good indeed. In
many cases the information is provided for the “public good” rather
than for commercial profit, and the redistribution of such information is likely
to be encouraged, rather than prohibited, by those who make it available. Initiatives
like UNESCO’s “Information for all” program and the upcoming
World Summit on the Information Society highlight the importance of public information;
they are founded on the belief that information literacy will help alleviate
many of the problems confronting human societies. Third, the implications for
libraries are mixed. Whereas new controls by publishers over how the content
they own may be used presents libraries with significant problems, the ready
availability of “public good” information meshes well with library
philosophy. A new role is emerging for information professionals who can select
material, index it, add appropriate metadata, and redistribute it in added-value
form for the good of society. Suitable technological infrastructure is being
provided by the open source movement, which is making available high-quality
software for repackaging and distribution of information (and not just on computer
networks).
We expand on each of these points below, and then summarize the prospects of
digital libraries and what we see as the implications for the field of HCI.
What future has the book in the digital world? The question is a complex one
that is being widely aired (see Lynch, 2001, for a particularly thoughtful and
comprehensive discussion). Authors and publishers ask how many copies of a work
will be sold if networked digital libraries enable worldwide access to an electronic
copy of it. Their nightmare is that the answer is one: how many books will be
published online if the entire market can be extinguished by the sale of one
electronic copy to a public library (Samuelson and Davis, 2000)? To counter
this threat, the entertainment industry is promoting new “digital rights
management” (DRM) schemes that permit a degree of control over what users
can do that goes far beyond the traditional legal bounds of copyright. Indeed,
the acronym is more aptly expanded as “digital restrictions management”
because it is concerned solely with content owners rights and not at all with
user’s rights. It is, in effect, a “private governance system in
which computer systems regulate which acts users are and are not authorized
to perform” (Samuelson, 2003). Anti-circumvention rules are sanctioned
by the Digital Millennium Copyright Act (DMCA) in the US (similar legislation
is being enacted in other countries). The DMCA has been used, for example, to
prosecute a Norwegian teenager for writing software to play a DVD that he had
purchased on a computer for which no commercial playback systems exist.
Can DRM be applied to books? The motion picture industry can compel manufacturers
to incorporate encryption into their products because it holds key patents on
DVD players. Commercial book publishers are promoting e-book readers that, if
adopted on a wide scale, would allow the same kind of control to be exerted
over reading material. Basic rights that we take for granted (and are legally
enshrined in the concept of copyright)—such as the ability to lend a book
to a friend, resell it on the second-hand market, keep it indefinitely, continue
to use it when your e-book reader breaks down, donate it to charity, preserve
it for your grandchildren, copy excerpts without resorting to a handwritten
transcription—are in jeopardy. DRM allows such rights to be controlled,
monitored, and withdrawn instantly, and DMCA legislation makes it illegal for
users to seek redress by taking matters into their own hands. Fortunately, perhaps,
lack of standardization and compatibility issues are delaying consumer adoption
of e-books.
In the realm of scholarly publishing, digital rights management is more advanced.
Academic libraries license access to content in electronic form, often in tandem
with purchase of print versions too. They have been able to negotiate reasonable
conditions with publishers—probably because they represent the lion’s
share of the scholarly market. However, the extent of libraries’ power
in the consumer book market is moot. One can envisage a scenario where publishers
establish a system of commercial, pay-per-view, libraries for e-books and refuse
public libraries access to books in a form that can be circulated (Roehl and
Varian, 2001, describe an interesting parallel between historical circulating
libraries and video rental stores).
These new directions present our society with puzzling challenges, and it would
be rash to predict what society’s response will be. But one thing is certain:
they will surely increase the degree of disenfranchisement of those who do not
have access to the technology.
In parallel with publishers’ moves to reposition books as technological
artifacts with refined and flexible control over how they can be used, an opposing
trend has emerged: the ready availability of free information on the Internet.
Of course, the world-wide web is an unreliable source of enlightenment, and
undiscriminating use is dangerous—and widespread. As early as 1996 complaints
arose that the Web’s contents are largely unattributed, undated, unannotated,
unreliable; information about author and publisher is unavailable or incomplete;
far too many resource catalogues (“hubs”) are chasing far too few
original or non-trivial documents (“authorities”) (Ciolek, 1996)—complaints
that are very familiar today. But one thing has changed: search engines and
other portals have enormously increased our ability to locate information that
is at least ostensibly relevant to any given question. Teachers complain bitterly
that students view the Web as a replacement for the library, harvesting information
indiscriminately to provide answers to assignments that are at best shallow
and at worst incoherent and incorrect. One consolation is that the very same
search facilities can be used to detect plagiarism.
Nevertheless, the Web abounds with accessible, high-quality information. Many
social groups, non-profit societies and charities make it their business to
create sites and collect and organize information there. To take a single example
at random, a Google search for diabetes returns three national diabetes associations
(US, Canada, UK) in the top ten hits, and of course many more exist. Each of
these sites offers a cornucopia of valuable information on the disease, which
is not commercial and provided for the public good. Widespread use is strongly
encouraged, and it seems likely that arrangements could be made for re-distribution
of the material presented there, particularly it was intended as a not-for-profit
service and appropriate acknowledgement was made.
One of the key problems with information distribution via the Web is that it
disenfranchises developing countries. Although the Web does not extend into
the homes of the socially disadvantaged in developed countries either, various
national programs are working to provide access (such as the Bill and Melinda
Gates Foundation grants to public libraries). But network access varies enormously
across the world. Whereas in 1998 more than a quarter of the US population were
surfing the Internet, the figures for Latin America and the Caribbean was 0.8%,
for Sub-Saharan Africa 0.1%, and for South Asia 0.04% (UNDP, 1999). Schools
and hospitals in developing countries are poorly connected. Even in South Africa,
the best-connected African country, many hospitals and 75% of schools have no
telephone line. Universities are better equipped, but even there up to 1,000
people can depend on just one terminal. The Internet “is failing the developing
world” (Arunachalam, 1998).
Prompted by this inequity, the importance of information, and particularly public
information, is today being highlighted by prominent international bodies. For
example, UNESCO’s “Information for all” programme was established
in 2001 to foster debate on the political, ethical and societal challenges of
the emerging global knowledge society and to carry out projects promoting equitable
access to information. It reflects a growing awareness that information is playing
an increasing role in generating wealth and human capital, and that participation
in the “global knowledge society” is essential for social and individual
development. Information literacy is described as “a new frontier”
by the Director of UNESCO’s Information Society Division (Quéau,
2001). The International Telecommunications Union has established a World Summit
on the Information Society, held in Geneva in December 2003 and Tunis in 2005,
to promote a global discussion of the fundamental changes that are being brought
about in our lives by the transformation from an industrial to an information
society, and to confront the extreme disparities of access to information between
the industrialized countries and the developing world.
What is the librarian to make of all this? The mandate of today’s public
libraries, in sharp contrast to that of publishers, is to facilitate the open
distribution of knowledge. Librarians strive to enable the free flow of information.
Their traditions are liberal, founded on the belief that libraries should serve
democracy. To help fulfill their mission as resource centers for citizens, public
libraries maintain collections of records, policy statements, government documents,
and so on. A recent promotional video from the American Librarian’s Association
exults that “the library is democracy’s place of worship”
(ALA, 2002).
Clearly, the impending redefinition of the book as a digital artifact that is
licensed rather than sold, tied to a particular replay device, with restrictions
that are clearly laid out and mechanically enforced, is an innovation that goes
right to the heart of libraries. The changing nature of the book may make it
hard, or even impossible, for libraries to fulfill their mandate by providing
quality information to readers. And on the other hand, the emergence of a vast
storehouse of information on the Internet poses a different kind of conundrum.
Librarians, the traditional gatekeepers of knowledge, are in danger of being
bypassed, their skills ignored, their advice unsought. Search engines send users
straight to the information they require—or so users think—without
any need for an intermediary to classify, catalogue, cross-reference, advise
on sources.
The ready availability of information on the Internet, and its widespread use,
really presents librarians with an opportunity, not a threat. Savvy users realize
they need help, which librarians can provide. A good example is Infomine, a
cooperative project of the University of California and California State University
(amongst others) (Mason et al., 2000). Infomine contains descriptions and links
to a wealth of scholarly and educational Internet resources, each of which has
been selected and described by a professional academic librarian who is a specialist
in the subject and in resource description generally. Participating librarians
see this as an important expenditure of effort for their users, a natural evolution
of their traditional task of collecting and organizing information in print.
What kind of technical infrastructure is needed to support and promote this
kind of work? Open source software is a powerful ally for librarians who wish
to extend liberal traditions of information access. These systems make the source
code freely available for others to view, modify, and adapt; and the very nature
of the licensing agreement prevents the software from being appropriated by
proprietary vendors. But the open-source movement is more than just a vehicle
for librarians to use: its link with library traditions goes much deeper. Public
libraries and open source software both enshrine the same philosophy: to promote
learning and understanding through the dissemination of knowledge. Both are
pervaded by a sense of community, on the one hand the kind of inter-institutional
cooperation exemplified by inter-library loan and on the other teams of designers
and programmers that frequently cross national boundaries.
New trends in information access present librarians in developed countries with
difficult and conflicting challenges. Meanwhile, however, the situation in the
developing world is dire. Here, traditional publishing and distribution mechanisms
have failed tragically. For example, according to the 1999 UN Human Development
Report (UNDP, 1999), whereas a US medical library subscribes to about 5,000
journals, the Nairobi University Medical School Library, long regarded as a
flagship center in East Africa, last year received just 20 journals (compared
with 300 a decade before). In Brazzaville, Congo, the university has only 40
medical books and a dozen journals, all from before 1993, and the library in
a large district hospital consisted of a single bookshelf filled mostly with
novels.
Traditional libraries are substantial institutions that occupy physical space,
present a physical appearance, and exhibit tangible physical organization. When
standing on the threshold of a large bricks-and-mortar library you gain a sense
of presence and permanence that reflects the care taken in building and maintaining
the collection inside. Digital libraries, in contrast, are lightweight. But
they provide potentially far greater accessibility, which means that they will
have even greater social effects. Once created, they can, without significant
institutional support, continue to serve users. They can be distributed throughout
most of the developed world over the Internet. In developing countries and remote
corners of the developed world they can be circulated on removable media—CD-ROM,
DVD, or 100 Gb disk units the size of videocassettes—and updated over
radio (or in Internet cafés). Issues of copyright pose difficult problems,
but they are manageable. For example, there is plenty of non-copyright material,
or material whose owners are prepared to donate copyright for socially useful
purposes, and trends towards more open access to academic and humanitarian information
are visible. Not everyone sees digital rights management and the DMCA as the
way forward, and in the longer term publishers, to remain viable, will have
to investigate alternative revenue models for the information they own. No wonder
international organizations such as the United Nations, along with many smaller
non-government organizations (NGOs), are keenly interested in digital library
technology.
Advances in digital library technology are radically lowering the bar for the
design and production of richly-organized, coherent, focused collections of
information. Now, anyone with access to sufficient source material can use public-domain
software to build large, fully-searchable, collections the size of traditional
personal or institutional libraries—in minutes. Let the minutes stretch
to hours and the collection can be polished, organized, branded, distributed.
It can include fully-illustrated text, images, video, music. It can present
attractively-designed pages with consistent use of icons. Keywords, key phrases,
even acronyms and their definitions, can be extracted—automatically—and
used to underpin novel means of access. Let the hours stretch to days and metadata
can be manually added that permits further levels of organization. Given access
to programming skills, creative new facilities that stretch the imagination
can be rapidly integrated into the system.
All this, one might say, can be done with ordinary Web sites: there is no need
for digital library technology. However, bitter experience has shown that all
but the most rudimentary sites do require significant institutional support—for
organization and maintenance. The Web is littered with incomplete, unfinished,
unmaintained, out-dated, inconsistently-organized, useless information collections.
Just as traditional library cataloging procedures integrate new works into existing
collections with minimal overhead so that they immediately become first-class
members of the collection, so digital libraries allow new documents to be added
completely automatically. In the case of traditional libraries this is done
through the small but non-negligible overhead of generating a new catalog entry
(one to two hours per book). With ordinary Web sites it requires inserting links
manually into index pages and the like, and may involve adding links not only
into the new document but also into existing ones that ought to reference it—it’s
like rewriting the book, and maybe revising all other books in the library too!
In contrast, digital libraries bring access structures instantly and effortlessly
up to date whenever new documents are added.
The challenge for HCI is to design and build digital library systems that fulfill
the potential of digital libraries as a “killer app” for computers
in developing countries, which will bring concomitant benefits in almost every
other sphere of application. For the information consumer we need access to
information that is guaranteed across space, time, and culture. We need flexible
distribution mechanisms for documents, and for information objects of all types,
that can be accessed on all computer platforms—including the lowliest.
We need a choice of distribution over the Web or on removable media such as
CD-ROM or DVD. Digital libraries can incorporate flexible presentation that
caters to individual differences, such as large-font displays or spoken output
for the visually impaired. Libraries are places where information is preserved,
not rendered obsolete, and digital libraries must instill confidence that information
prepared today can be accessed next week, next decade, next century—regardless
of technological changes. An important aspect of digital libraries is their
ability to work in local languages, promoting pluralism and reducing the risks
of homogeneity. Because language is the vehicle of thought, communication, and
cultural identity, this will encourage diversity and strengthen individual cultures.
But there is a long way to go: even Unicode is woefully incomplete in certain
areas, such as African languages.
Naturally, today’s digital library systems focus principally on the reader:
the consumer of the material stored in the library’s treasure-house. But
digital libraries make a more radical, and perhaps ultimately more important,
contribution by empowering ordinary users to conceive, assemble, build, and
disseminate new information collections themselves. In principle, modest computing
resources are quite sufficient to enable users to build new collections by gathering
together material in local files or on the Web (or both); augmenting it with
appropriate metadata that supports convenient search and browsing operations;
incorporating advanced features like key-phrase extraction, document summarization,
and metadata extraction; designing an attractive and functional interface; and
publishing the collection on a variety of different media that are suitable
for the intended readership. The HCI challenge is to realize this potential
for users—such as most librarians—who have a strong understanding
of information and its organization, but no more interest in computers than
they have in paper-making technology, last millennium’s vehicle for information
dissemination.
A digital library is an organized collection of information,
a focused collection of digital objects, including text, video, and audio, along
with methods for access and retrieval, and for selection, organization, and
maintenance of the collection.
(Witten and Bainbridge, 2002)
This definition deliberately accords equal weight to user (access and retrieval)
and librarian (selection, organization and maintenance). The latter functions
are often overlooked by digital library proponents, who often work from a technology
perspective rather than from the viewpoint of library or information science,
but it is precisely these aspects that allow digital libraries to be used to
democratize information dissemination.
As a concrete example, consider the Humanity Development Library, a collection
of some 1,200 authoritative books and periodicals, produced by many disparate
organizations—UN agencies and other international organizations—on
various areas of human development, from agricultural practice to economic policies,
from water and sanitation to society and culture, from education to manufacturing,
from disaster mitigation to micro-enterprises. It contains 160,000 pages and
30,000 images, which if printed would weigh 340 kg, cost $20,000, and occupy
a small library book stack. Instead, it takes the form of a digital library
and is distributed on a CD-ROM throughout the developing world at essentially
no cost. (It’s also on the Web at nzdl.org.)
The Humanity Development Library is produced using the Greenstone software,
a freely-distributed open-source project whose aim is to create novel digital
library technologies and make them available for others to use. Greenstone digital
libraries are arranged in collections. A collection comprises several (typically
several thousand, or several million) documents, and a library may include several
collections, each organized differently. Collections built with Greenstone offer
simple but effective searching and browsing facilities based on metadata and
the full text of electronic documents. Each collection is individually designed
to take advantage of whatever metadata is available. All collections support
full-text searching and most provide several different browsing options, although
they differ depending on the collection design and the metadata available. Typically
you can search for particular words that appear in the text, or within a section
of a document, or within a title or section heading. A variety of interfaces
exist for browsing collections by title, subject, date, or any other metadata
chosen by the collection designer.
Greenstone collections present documents as automatically-generated Web pages.
This allows documents in different source formats to be presented in a consistent
manner, and lets users view the entire collection with a standard Web browser—no
special viewing applications are required. However, the collection maintainer
may choose to present the original source document (whether Word, PDF, PostScript,
PowerPoint, Excel, a QuickTime movie, an audio file, or whatever) instead of,
or as well as, the HTML version, and rely on the user’s web browser to
select a suitable application to display the document. In general, Greenstone
deals well with documents and metadata in a wide variety of different formats.
Digital libraries provide perhaps the first really compelling raison d’être
for computing technology in the developing world. Priorities in these countries
include health, agriculture, nutrition, hygiene, sanitation, and safe drinking
water. Though computers per se are not a priority, simple, reliable access to
practical information relevant to these basic needs certainly is. In an article
entitled “The promise of digital libraries in developing countries”
Witten et al. (2002) mention ten information collections, available on the Web
(at nzdl.org) and CD-ROM, from organizations ranging from UN agencies to small
NGOs, in which Greenstone is being used to deliver humanitarian and related
information in developing countries. For example, the Humanity Development Library
described above is a compendium of practical information aimed at helping reduce
poverty, increasing human potential, and giving a practical and useful education.
Rather than recapitulating the brief summaries of collections that appear in
the above-cited paper, we describe four new ones that have been created recently,
and distributed in the same way.
The Researching Education Development library is a project of the Department
for International Development (DFID), a British government department responsible
for promoting development. Its central focus is a commitment to an internationally
agreed target of halving the proportion of people living in extreme poverty
by 2015. Associated targets include ensuring universal primary education, gender
equality in schooling, and skills development. It works in partnership with
other governments and multilateral institutions, with business and the private
sector, with civil society and the research community. It has created a CD-ROM
library containing many education research papers and other documents. Each
one represents a study or piece of commissioned research on some aspect of education
and training in developing countries.
The Energy for Sustainable Development library was initiated as part of the
outreach phase of the World Energy Assessment, which was initiated jointly by
the United Nations Development Programme (UNDP), the United Nations Department
of Economic and Social Affairs (UNDESA), and the World Energy Council (WEC),
along with funding from the United Nations Foundation. This library contains
a broad and valuable collection of 350 documents (26,000 pages) from UNDP, UNDESA,
WEC and many other organizations. It includes titles that all these organizations
have published on the subjects of energy for sustainable development—technical
guidelines, journals and newsletters, case studies, manuals, reports, and other
training material. The documents are in English, Spanish and French, and one
document has Arabic, Russian and Chinese translations as well.
The UNAIDS Library contains publications in the “Best Practice”
collection (including key materials, case studies, technical updates, and points
of view) which form a unique resource for those working in planning and practice.
It is produced by the Joint United Nations Programme on HIV/AIDS, whose global
mission is to lead, strengthen and support a response to the AIDS epidemic that
will prevent the spread of HIV, provide care and support for those infected
by the disease, reduce the vulnerability of individuals and communities to HIV/AIDS,
and alleviate the socioeconomic and human impact of the epidemic.
The Health Library for Disasters is the result of a collaboration between the
emergency and disaster programs of the World Health Organization (WHO) and the
Pan American Health Organization (PAHO), with the participation of many other
organizations: the United Nations High Commissioner for Refugees (UNHCR), the
United Nations Children’s Fund (UNICEF), the International Strategy for
Disaster Reduction (EIRD); the Red Cross Movement (ICRC and IFRC); the SPHERE
Project; non-governmental organizations such as OXFAM; and national organizations
such as the National Emergency Commission of Costa Rica. It contains more that
300 technical and scientific documents on disaster reduction and public health
issues related to emergencies and humanitarian assistance. A follow-up to the
Spanish-language Biblioteca Virtual de Desastres discussed by Witten et al.
(2002), it includes technical guidelines, field guidelines, case studies, emergency
kits, manuals, disaster reports, and other training materials.
Universal access to digital libraries presents huge challenges to software
engineers and HCI practitioners. The Greenstone digital library software allows
us to glimpse some of the issues, although it certainly does not yet effectively
address them all. We summarize some technical details in the next subsection,
before turning to more interesting questions of access for readers, collection
builders, and international users.
Most digital libraries are accessed over the web, using any web browser. However,
in many environments, particularly in developing countries, web access is insufficient
and the system must run locally. And if people are to build and control their
own libraries, a centralized solution is inadequate: the software must run on
their own computers. Thus digital library systems intended for broad access
should run on a wide variety of computer systems, particularly low-end ones.
Developed under Linux, the Greenstone server runs on any Windows, Unix, or MacOS/X
system. All versions of Windows are supported, from 3.1 up (including 3.1/3.11,
95/98/ME/, NT/2000 and XP). Supporting primitive platforms poses substantial
challenges of a rather mundane nature: for example, Microsoft compilers no longer
support Windows 3.1 and it is necessary to acquire obsolete versions (e.g. at
software auctions). Under Windows, pre-built collections can be viewed on any
system with at least 8 Mb RAM, but collections cannot be built under Windows
3.1/3.11—for this at least a Pentium processor is generally required,
except for very small collections. The fact that Greenstone does not run on
early Macintosh systems is a serious drawback in certain environments (e.g.
many schools).
In an international cooperative effort established in August 2000 with UNESCO
and the Belgium-based Human Info NGO, Greenstone is being distributed widely
in developing countries with the aim of empowering users, particularly in universities,
libraries, and other public service institutions, to build their own digital
libraries. UNESCO recognizes that digital libraries are radically reforming
how information is acquired and disseminated in its partner communities and
institutions in the fields of education, science and culture around the world,
and particularly in developing countries. Their hope is that this software will
encourage the effective deployment of digital libraries to share information
and place it in the public domain.
The UNESCO distribution of Greenstone is a CD-ROM that contains the full source
code and executable binaries for Windows and Linux, along with all necessary
associated software (e.g. Perl for Windows). Full documentation (four PDF manuals)
and five demonstration collections are included. The current CD-ROM is trilingual,
with complete interfaces, instructions, and documentation in English, French
and Spanish. For those with Web access, the same package is also available for
download from the Greenstone Web site (greenstone.org), often in a form that
is slightly ahead of the CD-ROM version—for example, many other language
interfaces are included and full documentation is available in Russian and Kazakh
too. Providing accessibility in different languages is more difficult than one
might at first realize. As well as the manuals, installation instructions and
installation prompts, the licensing agreement, and the readme files have to
be translated too.
Greenstone collections like the Humanity Development Library can be published
as standalone collections on removable media such as CD-ROM, or presented on
the Web. CD-ROM is a very practical format in developing countries. Any Greenstone
collection can be converted into a self-contained Windows CD-ROM that includes
the Greenstone server software itself (in a version that runs right down to
Windows 3.1) and an integrated installation package. The installation procedure
has been thoroughly honed to ensure that only the most basic of computer skills
are needed to install and run a collection under Windows.
Even standalone Greenstone users interact through a Web browser: Netscape is
supplied on each CD-ROM for those who do not already have a browser. In standalone
mode the software runs locally but incorporates a Web server so that if the
system happens to be connected to a network—say a hospital or school intranet—information
is available to other machines that may not possess CD drives. This happens
automatically: no special configuration is necessary. Another difficult engineering
challenge is checking for the existence of a network. While installed network
software is easily detected, it is hard to determine non-intrusively whether
it is operational (sending oneself a message often results in the user being
asked to dial their local Internet service provider). Incorrectly installed
or configured software is endemic in developing countries, because computers
there are often cast-offs whose software is inappropriate to their present environment,
yet system support to rectify the problems is unavailable. It is essential for
universal access that such problems are addressed properly and solved satisfactorily
without involving the user, even though they are mundane and time-consuming.
Greenstone provides some support for the visually impaired by incorporating
a “textual” mode of access that replaces all images by textual prompts.
This output is suitable for users with speech synthesizers or other specialized
access devices. However, the facility is not well advanced: in particular, we
have not yet refined it through usability testing and interface improvement.
Effective human development blossoms from empowerment rather than gifting.
As the Chinese proverb says, “Give a man a fish and he will eat for a
day; teach him to fish and he will eat for the rest of his days.” Disseminating
information originating in the developed world, like the Humanity Development
Library, is a useful activity for developing countries. But a more effective
strategy for sustained long-term human development is to disseminate the capability
of creating information collections, rather than the collections themselves.
This will allow developing countries to participate actively in our information
society, rather than observing it from outside. It will stimulate the creation
of new industry. And it will help ensure that intellectual property remains
where it belongs, in the hands of those who produce it.
Users whose skills resemble those of librarians rather than computer specialists
should be able to build and distribute their own digital library collections.
Greenstone’s “librarian” interface allows you to gather together
sets of documents, import or assign metadata, build them into a Greenstone collection,
and serve it from your web site or distribute it on a self-installing Greenstone
CD-ROM. It supports six basic activities: opening an existing collection or
defining a new one; copying documents into it, with metadata attached (if any);
enriching the documents by adding further metadata to individual documents or
groups; designing the collection by determining its appearance and the access
facilities it will support; building it using Greenstone; and previewing the
newly created collection from your Greenstone home page (Bainbridge et al.,
2003).
The interface explicitly supports four levels of user:
∑ Library assistants can add documents and metadata to collections, and
create new ones whose structure mirrors that of existing collections
∑ Librarians can, in addition, design new collections, but cannot use
advanced design features (e.g. regular expressions)
∑ Library Systems Specialists can use all design features, but cannot
perform troubleshooting tasks (e.g. interpreting debugging output from Perl
scripts)
∑ Experts can perform all functions.
The Librarian Interface builds on lessons learned in previous interactive interfaces
to Greenstone, and has been honed through beta testing by UNESCO at sites in
Argentina, India, Kazakhstan, Mexico, and South Africa. It has extensive on-line
help facilities, and is being translated in its entirety into French, Spanish
and Russian by UNESCO.
An important component of access is allowing people to control the appearance
of the collections they create. Many who build digital libraries want to brand
them to ensure that they with an appropriate personal, institutional, or corporate
image, and some can only contemplate software solutions that allow them to do
so. Although Greenstone comes with a standard appearance—a distinctive
bar down the side of all pages except those that show documents in the library,
a green access bar with yellow buttons, etc.—the interface is highly configurable.
Greenstone creates all pages that appear on the screen on the fly: none are
stored in advance. They are generated using macros, written in a simple language
specially designed for the job, that perform textual replacement. One reason
is that Greenstone accommodates a large number of different interface languages
(see next subsection), and macros help cope with this. All text fragments are
couched as macro definitions. To add a new language, just the macro contents
need to be translated—no web pages need be reworked. Every page displayed
by the system is passed through a macro interpreter that expands all the macros
on the page. The interpreter checks a language variable and uses the macro definitions
pertaining to it, which loads the page in the appropriate language.
Macros can have parameters. In this case, the parameter is the language variable:
it causes the appropriate text fragment to be used for the macro’s expansion.
If there is no Arabic version for a particular macro, the interpreter will automatically
substitute the default version (English). This lets system developers experiment
with the interface without having to worry about translating every little bit
of new text immediately. Defaulting to English is not ideal—it reflects
an Anglo-centric mindset—but it seems better than displaying nothing.
(However, if “nothing” were preferred, it would be a simple matter
to alter the software to default to the empty language!)
Macros are also used to deal with display variables. Whenever a web page contains
information that is not known in advance—like the number of documents
returned by a search, or the value of a particular metadata item, or the content
of a document page—a macro name is used in the page description. Unlike
language macros, these macros are dynamic: their content is not stored in advance
but generated by the system in accordance with the value of the variable in
question.
Users can completely alter the form of the user interface by rewriting the macro
files—or even by writing their own web pages which embed the dynamic macros
that generate bits of Greenstone output.
The international Unicode character set is used throughout Greenstone, and documents
in any Unicode-supported language and character encoding can be imported. (In
fact, the software can automatically detect the language and encoding of most
documents.) Collections of documents in Arabic, Chinese, Cyrillic, English,
French, Spanish, German, Hindi, and Maori are publicly available. The New Zealand
Digital Library Web site (nzdl.org) hosts many of these, and the Greenstone
Web site (greenstone.org) links to sites that contain further examples.
It makes little sense to have a collection whose content is in Chinese or Russian,
but whose supporting text—instructions, navigation buttons, labels, images,
help text, and so on—are in English. Consequently, the entire Greenstone
interface has been translated into a range of languages, and the interface language
can be changed by the user as they browse from the Preferences page. As noted
above, all the language fragments in the interface (and also the contents of
language-dependent images) are stored in macro files. These have been translated
by Greenstone users in other parts of the world and contributed back to the
project. (The same mechanism provides text-only versions of the interface to
accommodate visually impaired users.) Currently, interfaces are available in
Arabic, Czech, Chinese, Dutch, French, Galician, German, Hebrew, Indonesian,
Italian, Kazakh, Maori, Portuguese, Russian, Spanish, Turkish, and English.
Managing the organizational and software complexity of any comprehensive and
evolving open source software system presents a significant challenge. However,
the challenge is greatly magnified when the interface is available in different
languages, for enhancements to the software and changes to the interface must
be faithfully reflected in each language version. No single person knows all
interface languages; no single person knows about all modifications to the software—indeed
there is likely no overlap at all between those who translate the interface
and those who build the software. Currently, Greenstone has about twenty interface
languages and there are around 600 linguistic fragments in each interface, ranging
from single words like search, through short phrases like search for, which
contain, of the words, to sentences like More than ... documents matched the
query, to complete paragraphs like those in the on-line help text. Maintaining
the interface in many different languages is a logistic nightmare. The solution
adopted by Greenstone is to incorporate a language translation facility, which
allows authorized people to update the interface in specified languages. A standard
version control system is used to manage software change, and from this the
system automatically determines which language fragments need updating and presents
them to the human translator.
By allowing people to easily create and disseminate large information collections,
digital libraries extend the applications of modern technology in socially responsible
directions, and counter a possible threat towards the commercialization of information
in line with practices developed by the entertainment industry. As far as the
developing world is concerned, digital libraries may prove to be a “killer
app” for computer technology—that is, an application that makes
a sustained market for a promising but under-utilized technology. The World-Wide
Web is often described as the Internet’s killer app. But the Internet
does not really extend to developing countries, and the developing world is
missing out on the prodigious amount of basic, everyday human information that
the Web provides, and its enormous influence on promoting and internationalizing
business opportunities. There is little incentive to make copies of the entire
Web available locally because of its vast size, rapid change, and questionable
information value per gigabyte. However, it is easy to provide focused information
collections on both the Web and, in exactly the same form, on removable media
such as CD-ROM, DVD, or bulk disk storage devices—indeed, the Greenstone
software described above allows one to create a complete, runnable, self-installing
CD-ROM image from a Web collection in just a few mouse clicks.
Public libraries are founded on the principle of universal access, and digital
libraries should be too. This provides HCI with enormous practical challenges.
Universal access means running on low-end devices, but one does not want to
provide a lowest-common-denominator solution that sacrifices high-end capability
where it is available. Universal access means that interfaces should be available
in the world’s languages, but one does not want the burden of translation
to stifle the development of new functionality and features. Universal access
means educating users: UNESCO is mounting training courses on building collections
with Greenstone in Bangalore, Almaty, Senegal, and Suva, and discussions are
underway for Latin America; the Tulane Institute has run courses that use Greenstone
collections as a resource in many locations in Africa (e.g. Burkina Faso, Cameroon,
Cote d’Ivoire, Democratic Republic of Congo, Ghana, Rwanda, Senegal, Sierra
Leone, Togo) and Latin America (e.g. Argentina, Bolivia, Colombia, Ecuador,
Guatemala).
Universal access also means that non-textual material should enjoy first-class
status in a digital library—perhaps first-class status in “the literature.”
This has important cultural ramifications. It should be possible to create digital
library collections intended for use by people in oral cultures, who may be
illiterate or semi-literate. Or people who, though they can read and write their
own language, cannot speak or read the language of the digital library. Imagine
having access to collections that spring out of the rich cultures of China or
Arabia, created by people who grew up in these cultures, without having to learn
a new language. More practically—since you, the reader, being culturally
privileged, can probably access this kind of information in translation—imagine
giving someone in the highlands of Peru, fluent and literate in her native language
of Quechua, first-hand access to the information in humanitarian collections
such as the Humanity Development Library (currently available only in English
and French) or the Biblioteca Virtual de Desastres (until recently available
only in Spanish). Opening up digital libraries for the illiterate is a radical
and potentially revolutionary benefit of new interface technology.
An important, and liberating, different between digital libraries and conventional
ones is that anyone should be able to create their own digital collections.
This presents HCI challenges that are difficult yet more conventional: providing
non-computer users with access to advanced and complex functionality. Users
should be able to collect their own source material, provide their own metadata,
design their own collections, and present it through their own interface.
Digital libraries give software engineers and HCI practitioners a golden opportunity
to help reverse the negative impact of information technology on developing
countries and reduce the various “digital divides” that cleave our
world (Norris, 2001)—the “social divide” between the information
rich and the information poor in our own nations, the “democratic divide”
between those who do and do not use the panoply of digital resources to engage,
mobilize and participate in public life, as well as the “global divide”
that reflects the huge disparity in access to information between people in
industrialized and developing societies.
I gratefully acknowledge all members of the New Zealand Digital Library project
for their enthusiasm, ideas and commitment, particularly David Bainbridge and
John Thompson who worked on the Greenstone librarian’s interface. I have
benefited enormously from cooperation with John Rose of UNESCO, and Michel Loots
of Human Info NGO. Harold Thimbleby of University College London and Gary Marsden
of the University of Cape Town made very useful comments on a draft of this
paper.
ALA (2002) “Rediscover America @ your library.” Video produced by
the American Library Association, Chicago, IL. Available from www.ala.org/@yourlibrary/rediscoveramerica.
Arunachalam, S. (1998) “How the Internet is failing the developing world.”
Presented at Science Communication in the Next Millennium, Egypt; June.
Bainbridge, D., Thompson, J. and Witten, I. H. (2003) “Assembling and
enriching digital library collections.” Proc Joint Conf on Digital Libraries,
pp. 323-334.
Ciolek, T. M. (1996) “The six quests for the electronic grail: Current
approaches to information quality in WWW resources.” Review Informatique
et Statistique dans les Sciences humaines (RISSH), No. 1-4. Centre Informatique
de Philosophie et Lettres, Universite de Liege, Belgium. pp. 45-71.
Lynch, C. (2001) “The battle to define the future of the book in the digital
world.” First Monday, Vol. 6, No. 5; June.
Mason, J., Mitchell, S., Mooney, M., Reasoner, L. and Rodriguez, C. (2000) “INFOMINE:
Promising directions in virtual library development.” First Monday, Vol.
5, No. 6; June.
Norris, P. (2001) Digital divide? Civic engagement, information poverty and
the Internet worldwide. Cambridge University Press, New York
Quéau, P. (2001) “Information literacy: a new frontier.”
UNISIST Newsletter, Vol. 29, No. 2, pp. 3-4.
Roehl, R. and Varian, H.R. (2001) “Circulating libraries and video rental
stores.” First Monday, Vol. 6, No. 5; May.
Samuelson, P. and Davis, R. (2000) “The digital dilemma: a perspective
on intellectual property in the information age.” Presented at the Telecommunications
Policy Research Conference, Alexandria, Virginia; September.
UNDP (1999) Human development report 1999. UNDP/Oxford University Press, New
York.
Witten, I.H., Loots, M., Trujillo, M.F. and Bainbridge, D. (2002) “The
promise of digital libraries in developing countries.” The Electronic
Library, Vol. 20, No. 1, pp. 7–13.
Witten, I.H. and Bainbridge, D. (2002) How to build a digital library. Morgan
Kaufmann, San Francisco.
Insitut d'Informatique - 19/10/2000