humanist @ roua.org : - fais que ton rêve soit plus long que la nuit

semantic libraries

Written by Romeo Anghelache no comments

Why semantic libraries

1.Context

The scientific research and its use by the public is strongly affected by the way authors, librarians and publishers interact.

The fast evolution of the digital environment brought this interaction in a modern crisis: the scientists use today non-semantic software tools for authoring their articles (tools designed around certain types of media, rather than around the semantic document concept), while the librarians and the publishers try to sloppily recover as much ad-hoc semantics as they can to answer to their on-line users (among which are the researchers themselves). This semantics recovering effort is also one of the reasons for the recent prices escalation by commercial publishers, phenomenon which ignited reactions such as the Open Access initiative.

Ignoring the lack of semantic depth in the scientific documents produced with traditional tools will only lengthen the current crisis, any other way of solving this conflict is only masking its primary cause: the high cost of dealing with digital documents built on shallow semantics.

Without a semantic authoring language focused on domains which are relevant for the needs of scientific authors, and for the librarians and publishers involved in their research process, it is virtually impossible to improve substantially the quality of the modern research activity or to pull it out of the current flow of scientific information crisis.

The current digital technologies allow a better, systemic and long run, approach to the process of building science and its history, in an Open Access paradigm.

I salute the plans for European digital libraries, and I hope to get directly involved in them.

To construct a clear picture, I propose three definitions: semantic library, functional document, semantic authoring tool. A functional document is a digital, semantically rich, platform independent document, which allows reuse, data mining and interoperation with other digital documents or applications by providing a list of digital resources it contains and an interface to it for external entities to use (parse, manipulate).

A semantic library is a digital library built on functional documents.

A semantic authoring tool is a software providing support for building functional documents.

2.Proposal

As an attempt to alleviate the burdening effects of putting shallow semantics documents in circulation, I propose:

  • the study, design and implementation of a human-friendly, semantically rich, authoring language for scientists, along with grammar based tools able to transform documents authored using this language into machine friendly documents (e.g. XML), and round-trip between these two structures (i.e. the authoring friendly space and the machine friendly space) while preserving the document's semantics in the process.
  • initiating an international collaborative process to construct domain specific controlled vocabularies, or semantically enriching the existing ones (e.g. OpenMath, MathML, MusicML, ChemML), by proposing appropriately focused E.U. research projects and by building collaborative consortia of interested parties (university/research libraries, publishers of science, researchers involved in language structure, data mining, domain-specific vocabularies, semantic annotations, digital ontologies, education etc.)
  • helping, through example, the scientific and education communities to become aware of the benefits of authoring documents with a flexible and layered semantic architecture for their own, and their readers, use.

No short-term project can really cover all these directions: my experience in the field suggests there are difficult, subject-specific issues of legacy to solve, as a prerequisite to making full benefit of these solutions, while the building and extending of domain-specific controlled vocabularies is, in principle, a never ending set of necessarily parallel tasks.

The optimal framework to a concerted approach to these problems is the long term study, design and creation of a set of Open Source software tools and specifications for authoring scientific documents and build, with them, semantic libraries for public use.

This direction of research will address the needs of authors, librarians and publishers in a democratic way by continuously incorporating their feedback through fully exploiting the currently typical Internet facilities (e.g. collaborative content management tools and communication standards), so that imbalances like the crisis mentioned in the context can no longer appear or last.

The author of this proposal is prepared to get involved in the development of semantic libraries at any level of detail.

3.Benefits

The beneficial consequences of such an effort on the modern scientific research processes are multiple and deep:

  • the semantics used at the authoring stage hints the archiving agents or library engines, that means enabling a high quality library service, and a high efficiency of reusing research results;
  • the librarians and/or the professional groups will have a well defined framework for developing and refining controlled vocabularies and build richer semantic structures based on them;
  • the researchers/authors can reuse these vocabularies for better structuring their documents and for better using the documents themselves by feeding their structured sections to automata where appropriate.
  • from a library which stores semantic documents, a researcher or student can effectively assemble up-to-date monographs on the fly, based on a class of subjects of interest;
  • history of science and the issues of long term preservation can be effectively supported because the archiving process is semantics oriented, semantics which has been made available at the authoring stage, by the creators of the document, and can preserve the usability when facing a new drastic change of media.
  • a more effective scientific exchange is enabled because the semantic structures can be rendered in the notional space of arbitrary readers;
  • the publishing industry is freed to focus on providing renderings better tuned to specific users, machines or media, due to the availability of rich semantics in the original documents.

4.The big picture

  • let scientists easily interleave their own natural language with controlled vocabularies they helped create while authoring,
  • so that librarians can use the layered semantics for long term preservation and satisfying research queries with answers of higher relevance than today,
  • to enable publishers to improve the quality, and maintain a low price of their offerings by making their reader related processing orthogonal to the authoring process,
  • to encourage and build a sustainable concept of scientific self-archiving while simplifying the peer review processes.

For the legacy documents, available in physical form and waiting to be digitized, I have some comments related to their copyright.

Those interested further in this subject may want to read this simple essay on the meaning of scientific documents.

subiect și obiect

Written by Romeo Anghelache no comments

Ca să poți observa un obiect, e necesar să notezi un număr oarecare de evenimente, în esență, să numeri. Numărul ăsta trebuie să fie mai mare decât numărul de evenimente proprii obiectului (obiectul observat rămâne relativ "același", are niște trăsături invariante observatorului, pe timpul observării).

Așa că, prin definiție, observatorul (subiectul) are o unitate de timp proprie (caracteristică) mai mică decât a observatului (obiectul), cu alte cuvinte, o entropie mai mică (o să vedem, un pic mai încolo, de ce și cum unitatea de timp proprie unui ceva e proporțională cu entropia acelui ceva).

reprezentarea lui a fi

Written by Romeo Anghelache no comments

Și numele astea, pe care le inventăm pentru cele ce nu fac parte (încă) din vreo "lege a naturii": cum le alegem? Ori avem deja ceva asociații în minte, ș'atunci numele e, să'i zicem așa, social, ori nici o asociație satisfăcătoare nu'i posibilă ș'atunci numele e sinestezic, aproape arbitrar: depinde de un acord de moment între simțurile proprii și mintea'ți, un acord ce'ți promite o degradare mai lentă.

Oricum am da'o, orice nume care n'aparține încă unei legi declarate, sociale, "a naturii", aparține unei legi aproape individuale, nelocale în orice caz. O versiune de lege neverificată explicit/social, da'n relație cu biologia mamiferelor să zicem, sau cu experiențele tale de până atunci, cu ce'ai mâncat în ziua în care ai inventat numele pentru un lucru cu care nu te'ai mai întâlnit, pe care nu l'ai mai gândit.

Un nume inventat e un acord psihologic între tine și tine însuți, poate fi ca o revelație, poate da o senzație de completitudine, e o rețea de fire ce conectează o gramadă de evenimente aparent necorelate. Și dacă pune'n legătură multe din cele pe care vecinii tăi le simt asemenea, numele ăla devine social, și dacă simplifică rezolvarea vreunei probleme, devine lege a naturii, în timp ce mută alte nume în irelevanță, tăcere; și sfărâmă alte legi, alte gramatici, care rezolvau alte probleme...

representations of being

Written by Romeo Anghelache no comments

Is means limited, and viceversa, as pointed before.

In describing something which is, the natural tendency is to simplify the expression of this experience, that is, to reduce it to a particular case of a rule, a law of "nature". This is a choice, and, by commiting to it, we push some of the rest of the world outside our descriptive reach.

That is, once a set of laws has been chosen to describe things that are, we are left with a rest of existent things which can be only enumerated, only pointed out, things which are out of the picture covered by our choice of laws. Enumerated, not necessarily counted, that is, all we can do is name them. This naming choice, is, at best, a choice of a law placeholder but without any warranties until a society of observers settle to fill it out.

These enumerable things can only be met, stumbled upon, say, touched, and sometimes these events force us to review our previous picture choice, our previous set of "natural" laws. By modifying our choice, we include the newly met things in our reductionist picture, but we're bound to add some of the old "explained/covered by law" things, to the new only enumerable set.

The laws we choose to use at any time, we call them "natural", and they are opportunistic choices, according to our wishes, incomplete, as the part of all they are describing: wave/particle, or gravitation/probability.

Our reductionist part of knowledge, whatever our choice is now or may be whenever, is incomplete relative to the incompleteness of is itself.

Briefly: is is already different from all, less, and a law simplifying the less is even lesser, because the law is ours.

To put it differently: all(nothing) > set of primary signs (is, etc.) > set of laws(relational attributes, grammars).

version 0.9.3

Written by Romeo Anghelache no comments

Hermes version 0.9.3 is online:

  • the library document is made of sections (type envelope, section, subsection, subsubsection, bibliography etc.)
  • it has better structured metadata,
  • the citations/bibliography have gone semantic too (no longer on the fly id generation)
  • some bugs fixed (DeclareMathOperator of AMSLaTeX works fine now).
  • the publishing stylesheet is more elegant and exports the metadata in the xhtml typical fields

The example collection of converted articles (XML+MathML+Unicode) is available here.

enjoy.

Rss feed of the articles