philosophy - humanist @ roua.org :

semantic libraries

Written by Romeo Anghelache no comments

Why semantic libraries

1.Context

The scientific research and its use by the public is strongly affected by the way authors, librarians and publishers interact.

The fast evolution of the digital environment brought this interaction in a modern crisis: the scientists use today non-semantic software tools for authoring their articles (tools designed around certain types of media, rather than around the semantic document concept), while the librarians and the publishers try to sloppily recover as much ad-hoc semantics as they can to answer to their on-line users (among which are the researchers themselves). This semantics recovering effort is also one of the reasons for the recent prices escalation by commercial publishers, phenomenon which ignited reactions such as the Open Access initiative.

Ignoring the lack of semantic depth in the scientific documents produced with traditional tools will only lengthen the current crisis, any other way of solving this conflict is only masking its primary cause: the high cost of dealing with digital documents built on shallow semantics.

Without a semantic authoring language focused on domains which are relevant for the needs of scientific authors, and for the librarians and publishers involved in their research process, it is virtually impossible to improve substantially the quality of the modern research activity or to pull it out of the current flow of scientific information crisis.

The current digital technologies allow a better, systemic and long run, approach to the process of building science and its history, in an Open Access paradigm.

I salute the plans for European digital libraries, and I hope to get directly involved in them.

To construct a clear picture, I propose three definitions: semantic library, functional document, semantic authoring tool. A functional document is a digital, semantically rich, platform independent document, which allows reuse, data mining and interoperation with other digital documents or applications by providing a list of digital resources it contains and an interface to it for external entities to use (parse, manipulate).

A semantic library is a digital library built on functional documents.

A semantic authoring tool is a software providing support for building functional documents.

2.Proposal

As an attempt to alleviate the burdening effects of putting shallow semantics documents in circulation, I propose:

  • the study, design and implementation of a human-friendly, semantically rich, authoring language for scientists, along with grammar based tools able to transform documents authored using this language into machine friendly documents (e.g. XML), and round-trip between these two structures (i.e. the authoring friendly space and the machine friendly space) while preserving the document's semantics in the process.
  • initiating an international collaborative process to construct domain specific controlled vocabularies, or semantically enriching the existing ones (e.g. OpenMath, MathML, MusicML, ChemML), by proposing appropriately focused E.U. research projects and by building collaborative consortia of interested parties (university/research libraries, publishers of science, researchers involved in language structure, data mining, domain-specific vocabularies, semantic annotations, digital ontologies, education etc.)
  • helping, through example, the scientific and education communities to become aware of the benefits of authoring documents with a flexible and layered semantic architecture for their own, and their readers, use.

No short-term project can really cover all these directions: my experience in the field suggests there are difficult, subject-specific issues of legacy to solve, as a prerequisite to making full benefit of these solutions, while the building and extending of domain-specific controlled vocabularies is, in principle, a never ending set of necessarily parallel tasks.

The optimal framework to a concerted approach to these problems is the long term study, design and creation of a set of Open Source software tools and specifications for authoring scientific documents and build, with them, semantic libraries for public use.

This direction of research will address the needs of authors, librarians and publishers in a democratic way by continuously incorporating their feedback through fully exploiting the currently typical Internet facilities (e.g. collaborative content management tools and communication standards), so that imbalances like the crisis mentioned in the context can no longer appear or last.

The author of this proposal is prepared to get involved in the development of semantic libraries at any level of detail.

3.Benefits

The beneficial consequences of such an effort on the modern scientific research processes are multiple and deep:

  • the semantics used at the authoring stage hints the archiving agents or library engines, that means enabling a high quality library service, and a high efficiency of reusing research results;
  • the librarians and/or the professional groups will have a well defined framework for developing and refining controlled vocabularies and build richer semantic structures based on them;
  • the researchers/authors can reuse these vocabularies for better structuring their documents and for better using the documents themselves by feeding their structured sections to automata where appropriate.
  • from a library which stores semantic documents, a researcher or student can effectively assemble up-to-date monographs on the fly, based on a class of subjects of interest;
  • history of science and the issues of long term preservation can be effectively supported because the archiving process is semantics oriented, semantics which has been made available at the authoring stage, by the creators of the document, and can preserve the usability when facing a new drastic change of media.
  • a more effective scientific exchange is enabled because the semantic structures can be rendered in the notional space of arbitrary readers;
  • the publishing industry is freed to focus on providing renderings better tuned to specific users, machines or media, due to the availability of rich semantics in the original documents.

4.The big picture

  • let scientists easily interleave their own natural language with controlled vocabularies they helped create while authoring,
  • so that librarians can use the layered semantics for long term preservation and satisfying research queries with answers of higher relevance than today,
  • to enable publishers to improve the quality, and maintain a low price of their offerings by making their reader related processing orthogonal to the authoring process,
  • to encourage and build a sustainable concept of scientific self-archiving while simplifying the peer review processes.

For the legacy documents, available in physical form and waiting to be digitized, I have some comments related to their copyright.

Those interested further in this subject may want to read this simple essay on the meaning of scientific documents.

representations of being

Written by Romeo Anghelache no comments

Is means limited, and viceversa, as pointed before.

In describing something which is, the natural tendency is to simplify the expression of this experience, that is, to reduce it to a particular case of a rule, a law of "nature". This is a choice, and, by commiting to it, we push some of the rest of the world outside our descriptive reach.

That is, once a set of laws has been chosen to describe things that are, we are left with a rest of existent things which can be only enumerated, only pointed out, things which are out of the picture covered by our choice of laws. Enumerated, not necessarily counted, that is, all we can do is name them. This naming choice, is, at best, a choice of a law placeholder but without any warranties until a society of observers settle to fill it out.

These enumerable things can only be met, stumbled upon, say, touched, and sometimes these events force us to review our previous picture choice, our previous set of "natural" laws. By modifying our choice, we include the newly met things in our reductionist picture, but we're bound to add some of the old "explained/covered by law" things, to the new only enumerable set.

The laws we choose to use at any time, we call them "natural", and they are opportunistic choices, according to our wishes, incomplete, as the part of all they are describing: wave/particle, or gravitation/probability.

Our reductionist part of knowledge, whatever our choice is now or may be whenever, is incomplete relative to the incompleteness of is itself.

Briefly: is is already different from all, less, and a law simplifying the less is even lesser, because the law is ours.

To put it differently: all(nothing) > set of primary signs (is, etc.) > set of laws(relational attributes, grammars).

definition of being

Written by Romeo Anghelache no comments

In "Kant and the platypus", Umberto Eco notes that there is something, before somebody talks about it.

I think that is is exactly and entirely equivalent with limited. A subject can't notice something without perceiving, or inventing, a limit (or attribute) of that something.

The general case is the universe. Here, we invented a name that means all, that is, unlimited; that is, unlimited by any distinctive feature, because it can't be compared to another all.

Therefore all, the universe, includes everything, and, along with it, anything anybody can imagine beyond everything. The only notion equivalent with all is nothing.

Therefore, because any subject/observer is necessarily a part of all/nothing, or a feature of it, it follows that any subject is bound to consider anything noticeable as being, that it is.

So, is means limited, that is, has attributes, that is, can be talked about or indicated, enumerated.

So, I think that is, or can be talked about, or can be described, or can be pointed at, are entirely equivalent and synchronous attributes of a subject's knowledge.

It also follows that the only something which is in itself, independent of any observer's existence, is all (has all the imaginable and unimaginable properties).

And all, the universe, can be, in a simpler manner, denoted as nothing: the all has all the properties and their opposites simultaneously.

If this picture seems unintelligible to you, note that the number 0 (zero) can be represented as a compensating sum of arbitrary integer numbers, but any part of this combination can be said to have a specific attribute, that is, can be said to be.

This picture also implies, as a byproduct, that any subject/observer is compensated for by the rest of the universe, which means, operationally, that no observer can be precisely delimited objectively, that is, any defined observer is a convention among many other observers, or, briefly, there is no unique and local definition of an observer/subject. The same applies to anything that can be said to be.

Therefore a unique and local definition of anything, is, in fact, a belief, a dogma, a convention, a contract; and needs to be interpreted as nothing more than a proposal.

One has the freedom to agree or not with this proposal, but naturally any choice has consequences. And from here comes the criteria we use when making choices: the best choice seems to be the one which also gives you the means to estimate, and check (while you still can do something obvious about it), the consequences you care about. That's one of the reasons why science is preferable to any institutionalized religion.

publicly funded copyright

Written by Romeo Anghelache no comments

If the public of a country funds some research or educational activity which results in an article, book or report, that should be accessible unconditionally to that public.

In other words, the results of any kind of activity that is at least partially funded from public money, should be accessible to the public, right? There's no justification for copyright, then.

Ah, some would say, public money, ok, but accessing the results of private research should be paid for. Wait a minute, the public pays that too, if you buy an apple, or a kind of detergent, you are funding the research of that company which sells you the detergent or the apple. So you have the right to access it and use the results.

When you hear that a large company is funding a large musical event, remember it's your own money at work if you ever bought something from them, if not, then it's your neighbour's money, so go thank him for that.

Copyright is a form of getting paid at least twice for the same thing. And it is only encouraged by the people who get a profit out of maintaining the copyright without participating in the creation effort of the copyrighted work (lawyers, publishers).

In the current form, copyright is just another way of transferring money from those who work, to those who make a business out of handling that work, and outside of that work. Aren't you tired of it?

Sounds too radical, or abstract? Read on.

For example, the spanish people should have the right to access directly the results of a group of spanish researchers who seem to have found an effective solution for a certain class of cancers. Clicking on the above link will give you the abstract, would you like to see the details? Pay 23 USD. But the spanish people already paid for that.

So, what's the point of the copyright then? The only point is to make money at least once more for those who claim to protect such a concept, without ever getting involved in the real work. The irony is, they are already paid once by the same public, either by private or public funding, or by buying from them different consulting services.

Nobody writes or does something out of thin air, there are research grants people use to write books, and they get a salary for that too, or a raise, from either the government or a private company. And the public pays them both. So the public has the right of accessing their results.

My point is that whoever structures information, has the natural right to be considered the author of that work, and that's all of it. Because of that, the author gets known, consulted, hired and paid for those services. Who will hire someone else for help in that specific area where the author commited the work, unless that someone else became a specialist in the same area by making some other work visible?

Beside paying several times for this, everybody's access to the work paid for is effectively cut: copyright stands against progress, it slows down or postpones work built on previous works. If you want to acknowledge the funding of your public, copyleft your work or use a Creative commons license which ensures others can build their work on yours.

It's relatively cheap these days to provide access, electronically, to the research the public paid for, because almost everybody's editing on a computer. Don't forget to ask that access for "free" to your government, today. It's not for free anyway: you already paid for it.

on limited wealth or property

Written by Romeo Anghelache no comments

I was defining, a few days ago, the principle of limited property.

Today, I used Google and Yahoo to locate "the principle of limited property", or "put limits to wealth", or "imposing limits to wealth", curious enough, only my blog entry was shown, or no finding at all.

However, playing with alternative formulations of it, I found a page containing something qualitatively equivalent.

But I was expecting most of us are concerned with this issue because it's the primary thing which shuffles our lives permanently since the history has been heard of. Surprise. What were the socialists been doing? Just tuning up the taxes?

I'll keep an eye on this, and will add new links if I discover anything related.

Rss feed of the category