Diachronic publishing and the Octopus document

March 16th, 2006 by Romeo Anghelache

This piece was reachable on my website at a different URL, since 1999, but I just decided to move it here.
****

Abstract (of a talk at IUK99), 1999, March 24

A possible way of maintaining validity and accessibility of references in an electronically published scientific document, is presented. A definition of Active Brokers Network, as a technical mean of maintaining valid links from the document containing the reference towards the referenced documents, is tried. Such tools would enable diachronic publishing.

Detail
Science is built step by step starting from an idea, axiom, reasoning or empiric result. So, a further step must reference the earlier one(s). That is, an actual reader of a scientific paper (A) must have access to the context, foundation, premises and details of what is read, therefore (A) contains references which must remain valid over time if (A) and the citations used in it are to remain a comprehensive document.

Many of us are still spending a lot of time for searching/piling cited papers in order to cover the most of the subject in a read paper. An Internet based solution for citing/referencing has naturally a dynamic character as opposed to the printed paper containing static pointers to other sources of information.

We read some (Bx) electronically stored documents and want to cite them when writing an (A) document. We can do that by HTML but what if one/all of those (Bx) documents are moved in another public place? A cited scientific paper (B) maintains its validity as a referenced document if it remains accesible and its content doesn’t change over time. How can we keep the validity of the reference?

Some partially active solutions of keeping it accessible might be:

* (HTML) (B) leaves traces and the reader of (A) tracks them from the initial location and updates the reference to (B) to its final location; this is obviously inefficient because traces can be erased, or can become very long.
* (XML) (A) contains multiple references to the already mirrored (B); we encounter the same inefficiency because of the static character of the references.

Using entirely active solutions might be more promising, so where should we implement the “activeâ€? mechanism: in (A), in (B) or at a point between them? (A) would make traffic by checking the location of (B) periodically, (B) doesn’t care/know about (A) therefore we need a middlepoint activation: an Active Brokers Network (ABN) to which (B) beeps when it’s moving. (A) declares its own properties to an AB point (ABP) and cites the (potentially different) ABP to which (B) has already declared its properties.

The properties (A) should declare to the ABP might be the classical ones: title, author, date of creation, keywords, its (ABN assigned) Unique Identifier (UID), and the ABN UIDs of the cited documents.

Therefore the ABN is a distributed database of (the above) properties which should relate UIDs to the (older) referenced UIDs. A query on it should reveal paths through series of scientific articles having in common a property. Obviously, references point backwards in time, this feature may be used as a part of the validation criteria of an electronic reference or/and as a simplifying constraint of searching.

Actually, a new (A) document should look this way as a folder, containing, beside its internal objects (texts, equations, data tables, scripts, graphics..) other folders representing the referenced (B) documents and so on recursively. That is, every (A) becomes a virtual and distributed file system, based on ABN.

The addition of a document to an ABN may be done through an enhancement to the current operating systems, e.g. adding an active “public/local/privateâ€? property beside the passive “rwxâ€? ones. When a document gets the “publicâ€? attribute, the operating system managing it should beep to the closer ABP its creation/movement.

The ABPs could implement various mechanisms such as obsoletingof its documents (but better not if we want a history of science, once a scientific document is made “publicâ€? it must be frozen and cited as it is), or extinction: a document which is not cited at all for a long duration (say, ten years) should dissapear (this way, scientific garbage is thrown out simply by the Time).

Possible results of implementing the Active Brokers Network:

* General
o boneless, flexible documents (Octopus) which can incorporate knowledge related to a subject over time, i.e. diachronic publishing. The Octopus document becomes a continuously growing monograph written by several authors and thus it tends to form itself as an exhaustive/comprehensive unit of scientific knowledge.
o a keyword search in such a distributed database would reveal paths (unidirectional in time) with referenced/referring papers creating ad-hoc dynamic documents structured and focused on a specific scientific subject. These things bring to mind the percolation and all the mathematics/physics associated with random walks allowing a deeper formal understanding of information retrieval and structured knowledge.
o a search by keyword AND author may reveal true schools of thought in science.
* Publishing
o the refereeing process of (B) can be done by the authors of (Ax) which cite (B) being interested_in / aware_of its content and not by hidden, possibly indifferent, readers.
o the editor’s selection process of a review’s issue can be done by simple path selections through these Octopus documents .

Comments are closed.