Meaning of scientific documents

I will be enumerating here a couple of pressing issues, related to semantic authoring and preservation, in the context of digital creation, administration and usage of scientific documents; accordingly, some present and future solutions to these are sketched out.

Status of authoring Although some standards related to the structuring of documents have emerged: DocBook for structuring generic documents (books, articles), MathML and/or OpenMath for semantically clean authoring of mathematical expressions, MARCXML for representing and communicating bibliographical records and related metadata, then Unicode for unambigous specification of international or domain specific symbols, and so on, currently, the authors of scientific documents still use TeX or an alternative, proprietary software solution, for authoring.

That is, the scientist is still in the same situation as 10-20 years ago, while authoring his articles or books: no clue as to how these standards can be of help to him, no effective, open source or otherwise, tools, to help him make use of them effortlessly. Why?

Stating the issues Part of the answer is that neither the publishers, nor the librarians helped the author become aware of, or concerned with, the fate of their own written works. This awareness was not an urgent matter in the paper publishing era (the article will last as long as the paper and sit on a shelf), but in the digital document era it becomes a real issue: it is easy and cheap to create multiple versions or multiple copies of a digital document, so how can the author make sure that these versions are not being corrupt in the process, or their rendering is not broken at a later time (when the reader accesses it), or that they are stored in a place where an indexing machine can find it and list it in on the appropriate query?

The answer to this question is of a much higher priority than, say, digital access rights, unless one chooses to protect a corrupted representation of one's work.

The answer is bound to rely on the open standards noted above.

In comparison to these, proprietary formats and proprietary document authoring solutions do not guarantee an appropriate rendering (or meaning) in the future (be it near or far), unless they commit to a standard semantic vocabulary (or a set of them) which should be used by the author while editing his document.

Defining vocabularies with a meaning (that is, with a formally defined way to use them) is an exciting research topic today (the steps and standards needed to create ontologies in the digital era, are detailed by others), but one cannot reasonably expect an author to suddenly jump from writing plain text or mathematical expressions directly to using ontology defined concepts, simply because the authoring process becomes tedious and would resemble more to computer programming; practically the author is still helpless in ensuring that his work will be reachable and useable after a period of time.

The ontologies are more helpful in extracting and managing the knowledge created by the authors and machines. We are, though, concerned here mainly with the knowledge creation process.

The need for an effective authoring solution, positioned between being useful directly to the machines and being plain simple to humans to type, is becoming obvious. A bias towards protecting the time of the human authors will be present at sketching a solution in the following sections.

What do I mean? To whom? These are common questions in the author's mind: the meaning of his work is its capability of being used for a purpose (whether intended or not).

A handwritten article will have a meaning to an appropriately educated human; a computer typed text will have a meaning to some rendering, printing or indexing software (this is the lowest level semantic layer in a digital document) and a different meaning to the final human reader (presumably the highest level of semantics); again, a scan of an old article will have a meaning for the graphical rendering software, another meaning for the character recognition software and a different meaning for the final human reader.

We note, even if it sounds to some as a trivial statement, that an article is, in all cases, meant to be found, read and used by a human being: it is, in short, a message.

The machines can help in the process: index an article, act in a certain way while a specific expression is found (flag a misspelling, validate an expression or start an external process), advertise the presence of the article to the interested audience, check its consistency according to the available semantic rules, render it on different media, append a reader's comments to a section of it, store it in the appropriate digital library slot and relate its presence to the other neighbouring articles, keep a version history of it, assemble it with other documents according to an editor's, or library user's, request.

These functionalities depend on the availability of the semantic layers in the digital document. A collection of such documents, with the services they enable, would form a semantic library.

Some of these layers can be hinted by the author: the computer cannot even infer where a paragraph starts unless the author types some specific keys, it also cannot relate accurately concepts (the consequence of this is the inability of getting effectively useful search results) without the author's hints to a vocabulary of concepts.

Defining a semantic solution The cardinality of this set of hints should stay minimal while maximizing the functional space to which the document can be made part of. The fuzzy constraint to this problem is the author's patience: he has always the alternative of creating a semantically flat document at the cost of his editors' time and his audience's time and size (a cost which is almost invisible at the time of authoring).

One can name the above requirement: user-friendliness of the authoring package.

But also, the author of a scientific article wants to communicate something and to preserve that message for future readers.

This requirement means: the authored document has to have a well defined structure. Well defined, in turn, means that the document should satisfy the following conditions, at the end of the authoring process:

  • be created in an open format which is platform neutral (XML),
  • contain enough information to locate it (administrative metadata: author, date etc.),
  • contain enough semantic hints for a librarian to store, preserve and manage it (document structure definition for a validating procedure)
  • contain enough presentational semantic hints for a publisher to render it or relate it to other documents, (TeX-like suggestions about how some symbol should look like)
  • contain enough hints for the reader to locate and use it (using consistently semantic vocabularies defined in open standards, e.g.MathML-content; and using keywords as often as, and wherever, necessary).

The authoring tool The requirements above can be satisfied by an authoring tool allowing the author to type cursively, instead of switching between the computer's input devices.

The author should be able to type natural language and expressions belonging to controlled vocabularies. The authoring tool should provide a straightforward way of creating new definitions by combining older ones, or just renaming them into a shorter/friendlier form, adapted to that specific author's needs and habits, without deteriorating the semantics in the background. The author should also be able to apply semantic emphases on portions of the message. These become hints to the search agents, and shrink the dimensionality of the search space. This technique enables a much faster location of, say, particular concepts or subtle differences the authors want to point out. Usage of this latter feature would make circulation and reviewing of concepts more fluid than it is today.

paid online advertisement

Paid online advertisement on sites belonging to other than the producers, advertisements listing engines, or the local administration, is a category of spam and should be outlawed. In terms of today only Froogle-like sites or passive/administrative listings of services/products offered by the entrepreneurs should be legal to exist.

That is because: 1. it's consuming the involuntary reader's bandwith (especially the graphics) and that costs money; 2. it's cluttering the sought after information with garbage (especially the text ads) and it consumes the involuntary reader's time, which also costs money; 3. it mixes the marketing power with the real qualities of the products offered; and this gives a huge advantage to monopolies, that is, delays or kills any alternative new solutions, that is, it results in a monoculture and a society of flat brained morons, that is, it goes against any notion of free market economy (if that ever stood for something); 4. it damages the visual and auditive sensibility of any human receiver, as well as his discernment.

These points are aggressions commited on members of the society, at random. This is nothing less than violence, and, as such, it should be stopped.

If this continues, our society builds on distorsions, and any member of it is bound to feel the consequences, one way or the other.

If you publish paid advertisements for things you don't produce then you are a dealer or reseller, so anything else you have on your site is a fake: a masked intention to make the reader to spend his resources on something he wasn't looking for.

Restricting, by law, the paid online advertisement doesn't affect anybody's wish to put links to products/services they prefer personally if they are done for free, not for a fee.

piață și prețuri

Argumentul că prețurile în Romania trebuie să se alinie cu cele din Uniunea Europeană e fals. Sigur că doar cei ce vând (sau cei ce'i ajută să) sunt interesați să spună asta: dacă pot convinge profitul lor crește.

E fals pt. că singura bază rațională de discuție e salariul mediu pe economie, sau, poate, o cifră și mai oficială: venitul brut pe căciulă (GDP). Asta e singura resursă care pune piața în mișcare sau o blochează.

De exemplu, pe piața apartamentelor, cică prețurile medii sunt de 50-60000 euro la 3 camere. Gargară, numai un rechin ar da atât, și asta numai dacă are un plan de câștigat mult mai mult de atât după achiziție.

Bazele de date românești despre apartamente spun multe: 1. raportul între numărul de oferte de vânzare și numărul de oferte pt. cumpărare e 100/1, adică piața nu există, cred și eu. 2. există oferte false: anunț vechi de o lună: "cumpăr cu 65000 euro apartament în Berceni/București"; hai lasă'mă, între timp vârful distribuției "prețurilor de vânzare" listate a căzut cu 10000 de euro, de la 50-60000 la 40-50000 doar în 3 săptămâni. Ofertele false de cumpărare au rostu' să mențină aparența prețurilor ridicate. Probabil anunțul de mai sus a fost postat de un manager disperat de la o agenție imobiliară, care și'a uitat minciuna :).

Se poate totuși estima un preț, pe baza unui raționament simplu: GDP în Franța e 25000 euro/an, GDP în Romania e 2500 euro/an (estimare optimistă). Cam ăsta'i raportu'ntre resursele noastre și ale lor: 10. Salariile medii sunt într'un raport aproximativ de 1200/150 euro/luna, adică 8.

În medie, un apartament de 2 camere într'un cartier relativ (adică nu ultra) central din Paris costă 150000 euro. Cât ar trebui să coste atunci, în medie, un apartament de 2 camere într'o zonă relativ centrală din București (au cam aceeași suprafață)? Clar că de 8-10 ori mai puțin: adică undeva între 15-18000 euro.

Puteți să calculați și singuri, luînd drept exemplu orice piață europeană (Franța e aproape de media pe U.E.). Oricine vine cu altă estimare, pretins mai realistă, descrie de fapt ce ar vrea să fie, nu ce este de fapt, și n'are legătură cu resursele cumpărătorilor.

Apartenența la U.E. nu promite direct nici creșterea GDP nici creșterea salariului mediu pe economie. Dac'ajută, tot GDP sau salariul mediu e la baza oricărei discuții despre piață.

Morala: - pentru cei ce se gândesc să'și vândă apartamentul: să nu'și facă iluzii, deși agențiile imobiliare au tot interesul să le întrețină (dacă o agenție reușește să facă o singură tranzacție la preț de 3 ori mai mare decât cel de fapt, își poate permite s'o aștepte pe următoarea, în timp ce proprietarii interesați să'și vândă casa visează mălaiul pe propria cheltuială); o posibilitate rezonabilă de a sări peste piedica pusă de intermediari e să puneți anunțuri pe stâlpi, unde e legal, sau să v'anunțați prietenii și colegii ce'aveți de vândut; nu plecați urechea la zvonurile agenților imobiliari, întrebați mai degrabă pe vecinu' notar, care oficializează tranzacțiile reale și'i cam singuru' care'a auzit de vreun preț plătit cu adevărat; - pentru agențiile imobiliare: dac'ați reușit să trageți un tun, două, e cazul să vă considerați norocoși, da'i cazu' să'nțelegeți odată ca o afacere profitabilă pe termen lung și stabilă nu se clădește pe tunuri; încetați să răspândiți zvonuri și cereți notarilor să informeze publicul care e cuantumul și numărul tranzacțiilor reale; - pentru cei ce vor să'și cumpere un apartament: prețurile sunt umflate de 3 ori acum, adică anunțurile de vânzare sunt anunțuri echivalente cu "poate pică ceva", nu sunt intenții reale de vânzare ci trageri la loz în plic; atenție de asemenea la zvonurile agenților imobiliari.

the end of all the evils ;)

Here it is: in countries were most people live on wages (that is all the countries I heard of), there is a natural limit for the total wealth an individual can have: a lifetime of average wages.

I would call this the principle of limited property, and it should belong to your (and my) country's constitution, nothing less.

Most people's property/wealth would not be affected by this principle. Just ask yourself if you, or your friends, would. And then doublecheck: if the average monthly salary in your country is currently 1000 monetary units, then your total wealth should be limited to 120 years x 12 months x 1000= 1440000 monetary units worth (assuming the expected lifetime of a human would be 120 years). Compare with how much you own now :).

Anyone who manages to pull such a wealth at anytime, can afford to relax for the rest of his life. If someone is 70 years old, he still has the right to the maximum wealth, although, most probably, he won't live that long to enjoy it unless he will gamble it.

When the average wage changes, so does the wealth limit, so if there is real progress in your surrounding society, you'll be able to have more.

Some consequences of applying this principle: - the wealthiest people of us would be more honestly concerned with the well being of the society as a whole; - they won't be able to twist the arm of your government as easy as they are doing it now; - monopolies would be harder to assemble, you need more wealthy people to control anything significant, and more people collaborating means a better society; - possibly the wealthiest people will move in countries were the average wage is higher at that moment, but that means they will give away the power they had in the state they grew wealthy, to a more normal status in the new state: some of them will do it, but most of them won't, it's not the wealth, but the power of influence it represents that matters to them; - a smaller distance between the wealthiest and the poorest in the country, which will make the life in that society less tensed.

Do you see other consequences? Express them.

la treabă, online

... de discutat despre unde mergem, ce are rost, ce n'are, ce lipsește și ce nu, ce'avem de făcut, pe românește.

