    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: An Introduction to 4DML</title>
        <link>https://members.accu.org/index.php/articles/1179</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Design of applications and programs + CVu Journal Vol 14, #4 - Aug 2002</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c67/">Design</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c113/">144</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c67-113/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c67+113/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;An Introduction to 4DML</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 August 2002 13:15:53 +01:00 or Sat, 03 August 2002 13:15:53 +01:00</p>
<p><strong>Summary:</strong>&nbsp;<p>In my research project I have developed a data structure called 4DML (four-dimensional markup language), which is something like a cross between a tree and a matrix.</p></p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e22" id="d0e22"></a></h2>
</div>
<p>In my research project I have developed a data structure called
4DML (four-dimensional markup language), which is something like a
cross between a tree and a matrix. It's not particularly efficient,
but it can be an interesting way of representing things; if you can
think in the appropriate paradigm, there are some tasks that can be
implemented quite rapidly.</p>
<p>Computer Science has always used tree-like structures to
represent things. Mathematical expressions are structured
hierarchically, and so are computer languages, as you would know if
you've ever tried to use a compiler generator. There is a vast body
of theory that deals with storing and manipulating tree-like data
structures for all kinds of purposes. The recent popularity of
generalised markup languages such as XML is a continuation of this
trend.</p>
<p>Much data is also matrix-like in nature. Tables and spreadsheets
are matrices. A musical score (which represents parallel streams of
events) is matrix-like, and so are parallel translations of
literary works. Often it is possible to read a document in several
different ways, using, in effect, several different methods of
indexing into the items of data that make up the document. It can
be useful to treat these indices as the different dimensions of a
multi-dimensional matrix, so that switching from one system to
another amounts to switching to a different dimension.</p>
<p>However, not everything makes sense as a matrix. Sometimes it is
better to use a conventional tree-like hierarchical structure,
particularly if the data's structure is recursive (as is the case
with mathematical expressions, which can contain other expressions
to any depth). But using a hierarchical structure throughout makes
things more complex when they would perhaps be better represented
as matrices.</p>
<p>The 4DML data structure aims to represent data in such a way
that it can be treated as either or both, as appropriate. As a
side-effect, it also allows multiple, independent hierarchies over
the same data. When you read a book, you are looking at at least
two independent hierarchies - the logical hierarchy of headings,
paragraphs and sentences, and the physical hierarchy of pages,
columns and lines. Sometimes it is necessary to store both; for
example, Braille versions of books frequently have special notes to
tell the blind reader where they are in the printed edition, so
that, if they are in a classroom full of sighted people and the
teacher says &quot;now look at the top of the page'', they'll be able to
find where this is in the Braille. Some religious texts have
chapter and verse numbers, which is yet another hierarchy and can
be more useful than layout (life in a diverse classroom would be so
much easier if all teaching materials were numbered like that).</p>
<p>Why might mixing trees with matrices be useful? For one thing,
navigation around the structure becomes easier to program. For
example, imagine a document with several sections. Each section has
several subsections, and each subsection has several paragraphs.
Now imagine you have an iterator (or cursor) that is pointing to
section 2, subsection 3, paragraph 6. If the document is
represented as a hierarchical, tree-like structure, it should be
fairly simple to move that iterator on to the next paragraph or the
previous paragraph, or to go up one level and move it on to the
next or previous subsection. This kind of navigation is sufficient
for most conventional tasks that involve documents.</p>
<p>However, now suppose that the document has a translation into
another language. The translation is represented as a second
document, with the same structure of sections, subsections and
paragraphs as the first. The iterator, as before, is in section 2,
subsection 3, paragraph 6, of document 1. For some applications, it
can make sense to jump from this point to the same point in
document 2, that is, document 2, section 2, subsection 3, paragraph
6. If the documents can only be navigated hierarchically, then this
jump will involve quite a number of navigational steps. In this
particular case, the problem can be addressed by maintaining two
separate iterators (one in each document), but things could get
more complex.</p>
<p>For example, imagine a music book with many songs. Each song has
many translations, each translation has several verses, and each
verse has many words. Each song also has music, which has several
parts for different instruments. The whole thing can be represented
hierarchically, but it would be a fairly complex task to jump from
(song 3, translation 1, verse 2, note 5) to the same place in
translation 2, or to the same place in verse 3, or to the data that
represents which notes are being played at that time. Such jumps
would be required very frequently by certain kinds of typesetting
algorithms.</p>
<p>Problems like the above arise because some hierarchical
representations are not &quot;simply connected&quot;. In a tree-like data
structure, there may be no direct link from, for example, (song 3,
translation 1, verse 2, note 5) to the same place in translation 2,
even though there is a conceptual link as far as some types of
processing are concerned. If conceptual links are not physically
represented in the data structure, then the algorithms that use
them will become more complex and difficult to develop, since there
is a more considerable code overhead for navigating around the
data.</p>
<p>The problem could be addressed by adding more links (pointers)
in the hierarchical structure, but doing this can become
cumbersome. However, adding more dimensions can allievate the need
for such pointers, because the items that are linked are adjacent
to each other in a different dimension. (Of course, the actual
implementation might use references or pointers, but the programmer
doesn't have to think about that.)</p>
<p>To illustrate: take a piece of paper, mark any two points on it,
and then try to fold the paper in such a way that those two points
touch each other. In two-dimensional space (on the piece of paper),
those two points are quite far apart, but in our 3D space, they are
touching. If you were trying to explain all this to imaginary
beings who can only understand two dimensions, then you would have
to say that there is some kind of &quot;wormhole&quot; magically linking the
two points together. But if you can think in three dimensions, you
don't have to invent such things as wormholes - the points are
touching because the paper is folded, and they are adjacent to each
other in the third dimension. Mathematically, the space on the
paper used to be &quot;simply connected&quot; (each place on the paper was
only &quot;touching&quot; the places next to it), but when the paper was
folded, it was no longer simply connected (some places were now
touching other places that were quite far away on the paper).
However, in three dimensions, it is still simply connected.</p>
<p>Then why not work with N-dimensional matrices directly?</p>
<p>Computer memory is one-dimensional; everything is indexed by
memory address. It is possible to represent objects of two or more
dimensions in computer memory, but doing so requires somebody
(either the programmer or the compiler writer) to think about how
to store such things in a one-dimensional space. This can be
particularly difficult if there is no limit to how the object can
grow. Anybody who has tried to write a simple full-screen text
editor from scratch will testify that this is not as easy as a
novice might imagine; as the user moves the cursor on a
two-dimensional screen, the editor has to calculate where it is in
the one-dimensional sequence of characters that makes up the file
being edited.</p>
<p>This problem is made more acute when the number of dimensions is
not constant. Additionally, when representing a document in an
Ndimensional matrix, the resulting matrix is sparse; most of the
data is close to the origin, and it would be very wasteful to
allocate a large Ndimensional array that is big enough to hold the
maximum extent of every dimension.</p>
<p>There are many ways of representing sparse N-dimensional
matrices in computer memory. 4DML is one of these, but it has been
designed in such a way that it can also represent hierarchical
structures, or a mixture of the two. To begin with, we note that an
N-dimensional matrix can be represented as a set of
three-dimensional points:</p>
<p>(dimension, position, object)</p>
<p>In this representation, each object (which we assume can be
uniquely identified) is given a position on each of the dimensions.
For example, if object Q is at co-ordinates (5,8,13,21,34,55) then
the threedimensional point-set will contain the following points
(in no particular order):</p>
<p>(1, 5, Q), (2, 8, Q), (3, 13, Q), (4, 21, Q), (5, 34, Q), (6,
55, Q)</p>
<p>In a real application, it is useful to give the values of the
first co-ordinate names rather than numbers; 4DML terms them
&quot;elements&quot;, since they often reflect the names of XML elements, and
it is important not to confuse a dimension in the N-dimensional
matrix with a dimension of the 3D representation.</p>
<p>The fourth dimension in 4DML arises from the need to also
represent hierarchical structures. It is a depth dimension, and it
is used to disambiguate elements with the same name at different
depths in the tree.</p>
<p>Thus 4DML is a collection of points of the form:</p>
<p>(element, position, depth, object)</p>
<p>These points can represent any hierarchical data structure and
also any N-dimensional matrix, or a combination of the two. They
can also represent several different structures over the same data.
The ordering of the points themselves is not significant, which
means they can be kept in a hash table or other efficient data
structure.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e70" id="d0e70"></a>Transforming
it</h2>
</div>
<p>An algorithm called &quot;transformation by model&quot; takes two 4DML
documents, an &quot;input&quot; and a &quot;model&quot;, and transforms the input as
necessary to reflect the structure of the model. The algorithm can
also report which objects have been lost in the process, if
any.</p>
<p>The algorithm works by performing a top-down traversal of the
model, and reads off relevant parts of the input as it does so.
Since 4DML can be read in many different ways, it is not difficult
to read it in whatever way is dictated by the structure of the
model, regardless of whether or not this matches the original
structure. Hence it is possible to perform complex structural
transformations merely by writing down the form of the desired
result. Of course, it is also possible to follow the structure of
the input like a conventional stylesheet processor.</p>
<p>I won't go into the precise workings of the algorithm here (it
would take too long) but readers are welcome to download and
experiment with the hacky Python prototype on my website, at
<a href="http://www.cus.cam.ac.uk/~ssb22/4dml/" target=
"_top">http://www.cus.cam.ac.uk/~ssb22/4dml/</a>. The prototype can
take its input in XML as well as a few simple text-based languages
specially invented for the purpose (these are usually easier to
hand-code than XML is). It is only a proof of concept though; it
can be used, but it would need more development before it is a
quality software deliverable (but then, academic research is not
about making software deliverables).</p>
<p>Of course, feedback is most welcome, although I cannot guarantee
to reply to all of your hundreds of emails :-)</p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
