Journal Articles

CVu Journal Vol 14, #5 - Oct 2002 + Programming Topics
Browse in : All > Journals > CVu > 145 (10)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: 4DML Revisited

Author: Administrator

Date: 03 October 2002 13:15:54 +01:00 or Thu, 03 October 2002 13:15:54 +01:00

Summary: 

Body: 

I'm not happy with the way I explained 4DML in C Vu 14.4 p26, particularly the way the "depth'' dimension seemed to be grafted on to the design almost arbitrarily. My description followed the historical order of the design process, but it would perhaps be clearer if we ignored this and took another approach.

Attributes as Co-ordinates

If you want to represent a sparse N-dimensional matrix of objects (i.e. an N-dimensional space that contains objects), then each object must somehow be associated with N numerical values, to position it on each of the N dimensions. You can think of these N values as extra attributes that the object has. You can also do the reverse: Take a collection of objects that all possess certain numerical attributes, and treat these attributes as dimensions in N-space; an object's co-ordinates are given by its values of the said attributes.

It is, of course, possible to do this with attributes that are not numerical, so long as they are sortable, or at least comparable so that two objects that are equal in some attribute can be put at the same co-ordinate on the axis (dimension) that corresponds with that attribute. However, 4DML only uses numerical values, for reasons I'll come back to later.

More generally, objects can have differing numbers of attributes; they are not limited to having a fixed number of them. If we imagine a non- Euclidean space in which objects can have positions on some dimensions but not others, then this space can represent a general collection of objects with arbitrary (numerical) attributes. Working with a non-Euclidean space does make the geometry slightly more interesting, but it turns out that a surprising number of operations are conceptually unchanged.

We now have a way of taking a near-arbitrary collection of objects, arranging them (in N-space) by their attributes, and doing geometric operations on this arrangement. This does not add any functionality, but a lot of people find geometry easier to think about than algebra, so expressing a problem in geometric terms can help.

Trees as Attributes

The next thing to note is that an object's position in a tree can be represented by a list of attributes. For example, one attribute can indicate which one of the top-level branches the object is to be found under; another attribute can represent which one of the second-level branches to go down; and so on. Since these attributes can also correspond to positions in the non-Euclidean N-space, this shows that the N-space has all the functionality of a tree.

This is significant, because matrix-like data (such as a musical score) is often marshallised (that's "serialized'' in Java-speak) into a hierarchical (tree- like) representation such as XML, and then programmers try to work with it as a tree (by using re-writing systems and so forth) without being free to perform as many geometric operations as they could have done on the geometric representation. However, the above method, of representing the tree as attributes and attributes as geometry, provides a `natural' way of converting between the two representations. This can make serialized matrix-like data considerably easier to process (and to transform into completely different serializations), and it can also manifest benefits when working with data that would not normally be thought of as matrix-like. As a bonus, it is possible to represent several independent hierarchies over the same data just by merging their attribute lists (so long as there are no naming ambiguities).

Labels for Co-ordinates

The next thing to note is that an object's position in a tree can be represented Now, the problem is that having attributes such as "which top-level branch to go down'', "which second-level branch to go down'' and so on is not sufficient if those branches have names as well as numbers and we want to be able to set constraints on both. It is, of course, possible to store these names in extra attributes, or additional objects, or in some other manner, and all kinds of complex constraints could be designed into the system to make sure that they are used sensibly, but that might make the design too complicated. I preferred to support attributes with pairs of values, that is, each attribute can have both a named value and a numerical value. This can be regarded as attaching a label to each one of an object's co-ordinates in the N-space. One obvious way of extending this is to support more than one label on each co-ordinate, but people who need that functionality perhaps really should be thinking about using additional attributes instead.

4DML (four-dimensional markup language) is so-called because the prototype represents its non-Euclidean N-space as a set of four-dimensional points; each one of those points gives a reference to the object that it is helping to describe, the attribute that it is setting, the value, and the label. For historical reasons the prototype calls these "scope'', "depth'', "position'' and "name'' respectively, and lists them in reverse order. 4DML can represent trees and matrices, and blurs the distinction between the two; it can also be hacked to represent arbitrary relationships between objects (objects can be regarded as related if they share a common value of a certain attribute, i.e. they are at the same position on a certain dimension).

Not a `Real' Database

Earlier, I mentioned that 4DML only treats numerical values as co- ordinates, although it does support labels. It is important to realise that 4DML is somewhat different from a conventional database. In most databases, an object can have attributes of various types and the attributes store the data; for example, a record about a book might have as an attribute (or `field') the author's name(s). It might then be possible to sort the database by the "author names'' attribute. If you wanted to do the equivalent in 4DML, you would have to give each author a position number (effectively pre-sorting them), and then arrange for the name and any other information, including all of the books that s/he wrote, to be stored at that position on an "author'' axis of the N-space. The point is that there is no information conveyed in the position (possibly excepting positioning information); it only serves to categorise and structure the information that is stored in the objects themselves (usually strings) which are opaque to 4DML's organisation. It's like using a database system in which everything is indexed by unique identifier. This keeps the design simpler and also helps prevent certain kinds of mistake (such as assuming that people's names are unique).

Although 4DML can be used for database-like applications, its main purpose is to represent documents in various notations. Documents are essentially collections of symbols with one or more reading orders (that is, sequences in which the symbols can be arranged); any markup over the symbols should reflect the arranging and interpretation of the symbols in the document, not the symbols themselves, and by extension, any values of any attributes associated with such markup should also reflect only this. It is possible to hack things differently, but I'd call that an improper use of the design. Perhaps I was wrong to use the analogy of objects and attributes to explain 4DML (it wasn't the way I designed it); it might make things easier to understand, but only if taken in the proper context.

Notes: 

More fields may be available via dynamicdata ..