semantic libraries

October 5th, 2005 by Romeo Anghelache

Why semantic libraries

1.Context

The scientific research and its use by the public is strongly affected by the way authors, librarians and publishers interact.

The fast evolution of the digital environment brought this interaction in a modern crisis: the scientists use today non-semantic software tools for authoring their articles (tools designed around certain types of media, rather than around the semantic document concept), while the librarians and the publishers try to sloppily recover as much ad-hoc semantics as they can to answer to their on-line users (among which are the researchers themselves).
This semantics recovering effort is also one of the reasons for the recent prices escalation by commercial publishers, phenomenon which ignited reactions such as the Open Access initiative.

Ignoring the lack of semantic depth in the scientific documents produced with traditional tools will only lengthen the current crisis, any other way of solving this conflict is only masking its primary cause: the high cost of dealing with digital documents built on shallow semantics.

Without a semantic authoring language focused on domains which are relevant for the needs of scientific authors, and for the librarians and publishers involved in their research process, it is virtually impossible to improve substantially the quality of the modern research activity or to pull it out of the current flow of scientific information crisis.

The current digital technologies allow a better, systemic and long run, approach to the process of building science and its history, in an Open Access paradigm.

I salute the plans for European digital libraries, and I hope to get directly involved in them.

To construct a clear picture, I propose three definitions: semantic library, functional document, semantic authoring tool.

A functional document is a digital, semantically rich, platform independent document, which allows reuse, data mining and interoperation with other digital documents or applications by providing a list of digital resources it contains and an interface to it for external entities to use (parse, manipulate).

A semantic library is a digital library built on functional documents.

A semantic authoring tool is a software providing support for building functional documents.

2.Proposal

As an attempt to alleviate the burdening effects of putting shallow semantics documents in circulation, I propose:

  • the study, design and implementation of a human-friendly, semantically rich, authoring language for scientists, along with grammar based tools able to transform documents authored using this language into machine friendly documents (e.g. XML), and round-trip between these two structures (i.e. the authoring friendly space and the machine friendly space) while preserving the document’s semantics in the process.
  • initiating an international collaborative process to construct domain specific controlled vocabularies, or semantically enriching the existing ones (e.g. OpenMath, MathML, MusicML, ChemML), by proposing appropriately focused E.U. research projects and by building collaborative consortia of interested parties (university/research libraries, publishers of science, researchers involved in language structure, data mining, domain-specific vocabularies, semantic annotations, digital ontologies, education etc.)
  • helping, through example, the scientific and education communities to become aware of the benefits of authoring documents with a flexible and layered semantic architecture for their own, and their readers, use.

No short-term project can really cover all these directions: my experience in the field suggests there are difficult, subject-specific issues of legacy to solve, as a prerequisite to making full benefit of these solutions, while the building and extending of domain-specific controlled vocabularies is, in principle, a never ending set of necessarily parallel tasks.

The optimal framework to a concerted approach to these problems is the long term study, design and creation of a set of Open Source software tools and specifications for authoring scientific documents and build, with them, semantic libraries for public use.

This direction of research will address the needs of authors, librarians and publishers in a democratic way by continuously incorporating their feedback through fully exploiting the currently typical Internet facilities (e.g. collaborative content management tools and communication standards), so that imbalances like the crisis mentioned in the context can no longer appear or last.

The author of this proposal is prepared to get involved in the development of semantic libraries at any level of detail.

3.Benefits

The beneficial consequences of such an effort on the modern scientific research processes are multiple and deep:

  • the semantics used at the authoring stage hints the archiving agents or library engines, that means enabling a high quality library service, and a high efficiency of reusing research results;
  • the librarians and/or the professional groups will have a well defined framework for developing and refining controlled vocabularies and build richer semantic structures based on them;
  • the researchers/authors can reuse these vocabularies for better structuring their documents and for better using the documents themselves by feeding their structured sections to automata where appropriate.
  • from a library which stores semantic documents, a researcher or student can effectively assemble up-to-date monographs on the fly, based on a class of subjects of interest;
  • history of science and the issues of long term preservation can be effectively supported because the archiving process is semantics oriented, semantics which has been made available at the authoring stage, by the creators of the document, and can preserve the usability when facing a new drastic change of media.
  • a more effective scientific exchange is enabled because the semantic structures can be rendered in the notional space of arbitrary readers;
  • the publishing industry is freed to focus on providing renderings better tuned to specific users, machines or media, due to the availability of rich semantics in the original documents.

4.The big picture

  • let scientists easily interleave their own natural language with controlled vocabularies they helped create while authoring,
  • so that librarians can use the layered semantics for long term preservation and satisfying research queries with answers of higher relevance than today,
  • to enable publishers to improve the quality, and maintain a low price of their offerings by making their reader related processing orthogonal to the authoring process,
  • to encourage and build a sustainable concept of scientific self-archiving while simplifying the peer review processes.

For the legacy documents, available in physical form and waiting to be digitized, I have some comments related to their copyright.

Those interested further in this subject may want to read this simple essay on the meaning of scientific documents.

2 Responses to “semantic libraries”

  1. Humanist @ roua.org » Blog Archive » USA and Europe Says:

    [...] I lived 3 years in Berlin and 1 year in Vienna, the rest (37) in Romania. The Germanic part of Europe seemed much more attentive and eager to discuss about what’s to be fixed. I felt them as healthy societies albeit the climate must affect their overall well-being. Peanut butter is not so good in Europe, unless imported from US, but public transportation, groceries and cafes allow people to socialize, to mature as humans. The high-density of people in Europe pushes many of them to fight to create a necessity and occupy a job to resolve it, meaning there are layers upon layers of people in unnecessary positions. This helps one mature, in general, through the brushing-with-each-other phenomenon, but also drives one to cynicism: non-manufacturing people start manufacturing interferences, smoke and mirrors. High-density population generates plenty of bullshit in Europe. In the background, Europe tries to copy all the free-capital based mistaken optimism, and it will fail similarly, with a variation: it did not assume that oil is forever; otherwise, the same wrong assumption that a higher number of people means a larger market and that this is supposed to solve all the problems. Because of the bullshitters in Europe, I couldn’t get a research grant related to public digital libraries for four tries (one a year) in a row, and that in the context of the “knowledge-based society”. Knowledge-based society my foot. The bullshitters need ignorants to work for them, not documented and inventive neighbors. European bullshitters think Europe can exist as an entity and work for them because it is or becomes a unified market. My hope is that a common language can save the European Union from committing the same basic mistakes as the US. [...]

  2. Humanist @ roua.org » Blog Archive » intalnirea de 20 ani, concluzii din mers Says:

    [...] finanțate public nu sunt accesibile publicului), cu acces deschis (cam ca aici); am trimis schița de proiect și CVu’ la Ministerul Învățământului, Biblioteca Academiei (Iași și București), [...]