Journal Articles

CVu Journal Vol 12, #6 - Dec 2000

Browse in :

All > Journals > CVu > 126 (17)

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Members' Experiences

Author: Administrator

Date: 02 December 2000 13:15:41 +00:00 or Sat, 02 December 2000 13:15:41 +00:00

Summary:

Body:

Beware of Causality

Steve Cornish

I was recently migrating a web application from Windows NT4/IIS 4.0 to Windows 2000/IIS 5.0. The aim of the exercise was to validate that the code behaved the same when running on Windows 2000. Needless to say, there were issues and the rest of this piece describes the most interesting of the problems. This article assumes the reader is familiar with Microsoft's Internet Information Server and COM threading models.

The design (simplified)

The client calls into a web server, which executes an ASP, which uses a COM object, which calls into an NT Service, which creates a new thread to service the request.

What happened

When running this scenario under Windows NT4/IIS 4.0, the application executes fine. The Service returns results, the COM object arranges the content, and the user sees a web page. However, when running on Windows 2000, the application reaches deadlock inside the Service. Eventually, the deadlock times out, leaving the user with an error page.

Why did that happen?

Because of causality folks! We need to take a closer look at how IIS works, and what exactly the Service is doing.

If a COM object marked as threading model "both" is created by an ASP page, IIS loads it into a single-threaded apartment (STA). This means that only a single thread is running in the object. The ASP page calls a method on the COM object, which creates another object (lets say implementing the interface ISCCallback) to pass as a parameter to the NT Service. The thread from the STA blocks whilst waiting for the Service to return.

It is important detail to mention that this Service is an out-of-process COM server, and happens to be multi-threaded. Also, the object implementing ISCCallback does not marshal-by-value.

If the Service were to call a method on the callback object, how could deadlock be avoided, since the ASP's thread is blocked? The answer lies with something called a causality id. The Service thread inherits the causality id from the client thread that called it. When the Service calls a method on ISCCallback, the COM runtime realises that the object lives in a blocked STA. It compares the causality id from the calling thread with the causality id from the STA thread, and if they match, it allows the STA thread to temporarily ignore its blocked status to facilitate the callback. This makes sense.

However, our Service is not doing a straight callback, but instead creates a worker thread. The RPC thread then sleeps until the worker thread is finished. This worker thread is the thread that performs our callback.

Child threads do not inherit causality ids from their parents, so when the COM runtime checks the worker thread's causality id with the STA's, there is no match, so the COM runtime does not allow the STA thread to unblock, causing the worker thread to block. Because the worker thread is blocked, it never finishes, and so never signals the RPC thread to continue, hence eventual timeout for the ASP page. The underlying behaviour makes sense - otherwise you would be able to create multiple threads that could all call into an STA, which defeats the purpose of STA's.

The blocking of the worker thread is the behaviour observed under Windows 2000, which uses the COM+ runtime. Under Windows NT4, the worker thread callback is successful, which breaks the causality rules.

Conclusion

When first run on Windows 2000, the application hung. My first reaction was one of annoyance, because this app runs perfectly okay under NT4. On investigation we gradually realise that the app should not have run even on NT4. Of course, the very fact that the service was doing a cross-process callback was cause for a brief re-design (there was no reason for the object to exist anywhere else but in the service). MSDN contains hardly any information on causality, which is why I am sharing this experience with you all. Hopefully, if you experience an app that fails under Windows 2000 when it used to work under NT4, you can rub your chin sagely and say, "beware of causality!"

Wikis for Workgroups

Bryan Scattergood

Take back your whiteboards

I changed jobs at the end of the summer. By then I had spent a decade working with a research group, both as a student and at a related company. Whiteboards are essential in such an environment: they provide the space to thrash out ideas. Unfortunately, they are also used as giant post-it notes. The edges silt up with debris like phone numbers, to-do lists, and reminders. Discussions are crammed into the clear patches because erasing the board might destroy important information (whiteboards are also hard to back up.)

Moving the debris onto our network was clearly a good idea, but I could not figure out how to make it happen. Our internal web server was great for static documentation, but updating material was a pain. A web front-end to a database was a possibility, but we could not agree what the solution would look like, never mind finding the time to build it. We even considered commercial software, but the available groupware solutions are all proprietary, expensive, complicated, and widely loathed by those forced to use them.

Then, during a session at the spring JACC, someone mentioned the original WikiWikiWeb [1] used by the patterns and extreme-programming communities. I suddenly realised that this could be the solution I had been looking for.

What's a Wiki?

There are two aspects to this question: technical and social.

In a technical sense, a Wiki is simply a writable, extensible web site. Click on a page's edit button and the source for the page appears in a textfield in your browser. Click the submit button and your changes are uploaded to the site. Refer to a new page and it will be created when you follow that link. Simple "structured text" conventions are used to vastly reduce the need to enter markup tags. The mechanics are as simple as they can be without needing additional features in your web browser.

The social aspect is more complicated. A Wiki provides a space in which people can interact. Mailing lists and newsgroups do the same, but the character is different. While newsgroup threads eventually drift and die, each page in a Wiki is a collaborative work in progress. It may settle down to a stable state, but it is still available for reference from other pages.

The pattern and extreme programming communities have made use of the original Wiki, and visiting can help give a feel for the tone of a successful community, but a Wiki can be used for whatever you want. My hope was that I could use one to hold the clutter from our whiteboards.

Finding a Wiki

Having decided that a Wiki sounded promising, I went looking for one. I found dozens [2] and settled on using one called Zwiki™ [3] which is just one of many extension modules for Zope™[4].

Zope™ is a web-application server. It is cited by Eric S Raymond [5] as an example of the "give away the recipe, open a restaurant" strategy for open source software. ZWiki™ inherits many desirable features from Zope™, including web-based administration, access control, rollback, hooks to external databases, and user-written modules.

Living with ZWiki™

Zope™ can be installed on most platforms (including Win32) either alongside an existing web server (via CGI) or as a web server in its own right. We chose to deploy it and ZWiki™ on an elderly 64M P166 box running FreeBSD™4 [6]. It has run flawlessly for months.

After announcing that our Wiki was available, I transferred my to-do list from a whiteboard and settled back to see how other people were going to use it. Pages for the other technical staff appeared and gradually filled with some of the items from the whiteboards, after which we dumped the remaining debris onto one page and ceremonially cleaned the boards.

Descriptions of our network and backup procedures were added. Some fun articles appeared. It was used to collaborate on the guest list for an anniversary party, and for the specification and ordering of a PC cluster. A small extension module allowed ZWiki™ pages to monitor and control our dial-up connection.

After less than six months our Wiki had become a valuable resource. Important company information - previously only available by asking key staff or by rummaging through daybooks - was easily shared, searched, updated and backed up. All of these are major benefits, and because the software is open source the only cost was a few days of staff time performing the installation.

Conclusion

Wikis are great tools for workgroups. The software is free, the hardware requirements are reasonable, you control the structure and contents, and staff will find the best way to use it over time. This compares well with traditional "corporate groupware."

Acknowledgements

My former colleagues Paul Whittaker and Phil Armstrong took an empty Wiki and ran with it, helping it to grow into a useful resource. They also provided valuable comments on early drafts of this article.

References

[R1] http://c2.com/cgi/wiki

[R2] http://c2.com/cgi/wiki?WikiWikiClones

[R3] http://joyful.com/zwiki

[R4] http://www.zope.org/

[R5] The Cathedral and the Bazaar: Musings on Linux and Open Source, EricSRaymond, O'Reilly, 1999, ISBN 1-56592-724-9. Also from http://www.tuxedo.org

[R6] http://www.freebsd.org/

Apache Cocoon and XML Publishing

Silas Brown

The Apache Cocoon™ project is an open source XML publishing system, written in Java. Because of the recent prevalence of XML (eXtensible Markup Language) and a host of other related acronyms, I gave the idea a try.

If you have written any Web pages then you hopefully know what HTML looks like. HTML (Hyper Text Markup Language) is a "markup" language - it puts "tags" around text to suggest formatting and so forth, which is rendered by a Web browser. An XML file looks very similar to an HTML file, except that the meaning of each tag is not defined - how to interpret the file is up to the parser. This makes XML a popular format for data files with arbitrary hierarchical structures; a good XML library takes away most of the work of writing your "saving and loading" code, and you stand a chance of exchanging data with other XML programs (although this is ridiculously over hyped - XML does not magically make all programs compatible with each other; you still have to interpret the tags correctly).

XML also has a stricter syntax than HTML (and indeed SGML, the Standard Generalised Markup Language, from which both HTML and XML are taken). For example, in HTML, some tags, such as <H1>, need closing tags such as </H1>, but other tags, such as <HR>, are not closed. In XML, every tag must be closed (you can abbreviate by writing, for example, <BR /> instead of <BR></BR>). This restriction simplifies parsing and validation. There is a variant of HTML called XHTML, which is XML compliant (every tag must be closed); existing Web browsers have no trouble with it because they just ignore the surplus closing tags.

The idea behind XML publishing is to separate the "content" (the textual data in a document) from the "presentation" (the layout and formatting that a Web browser might perform). XML tags can be used to indicate the abstract structure of the data they enclose, such as <news-item>, <weather-report>, <example> and so on; how to translate these into actual formatting is specified in a different file (which might be maintained by a different person). The potential advantages of this separation are obvious: Content and presentation can be independently maintained and debugged; there can be different versions of the presentation for different browsers, media types (not necessarily limited to HTML), intended audiences, and so on; presentations can select relevant data out of the document rather than having to display all of it; and any text that is presented in multiple places need only be written once. Note that XML does not actually enforce any of this; you still have to use it properly.

Before XML was all the rage, there was CSS (Cascading Style Sheets), which went some way toward separating content from presentation by separating off details like fonts and colours. CSS is a "client-side technology"; its processing is done by the browser, which means that if you do not want the style sheets then you can turn them off (if you know how), use a browser that does not support them, or get my access gateway to quietly remove them. However, CSS does not really go far enough from the XML point of view; a paragraph is still a paragraph and a table is still a table, no matter what its fonts, colours, borders and margins are, and you still cannot edit a document's content without thinking in HTML. XML allows more abstraction, and because it is "server-side" (the browser only sees the HTML result), it does not introduce additional browser compatibility issues. However, XML is not a replacement for CSS; they can be used together (although you still have to be careful with CSS because there are so many buggy browsers out there).

Emacs has a mode for editing XML. You can write a Document Type Definition (DTD) file, which lists the names of your tags (and attributes) and specifies how they are related to each other (what can go inside what); Emacs can then help you by checking for errors, telling you which tags are valid at the cursor position (and inserting them for you), doing your indentation, hiding away the tags that you don't want to see, and maybe highlighting it all in different colours, but you have to give it the right commands (or put them in your startup file).

Before I could download Cocoon, I tried transforming an XML file into HTML with some "search and replace" operations (first with sed, then with perl). This was surprisingly effective, but it was limited, sometimes difficult, and not very robust. XML publishing systems like Cocoon use scripts in a language called XSL to transform the XML into HTML. XSL is a mixture of HTML (actually XHTML) and processing directives, built around "templates" that match the tags in your XML. XSL files can include other XSL files, call procedures in them, set variables, test conditions, and so on.

Cocoon is based on a Java "servlet" that renders the XML (using an XSL stylesheet) "on the fly" when the document is requested by a Web browser (it does cache documents it has already rendered). Installation is complex, but it is easier to install it as a "package" from a Linux distribution (I installed the Debian package).

At the time of writing, Cocoon is very much under development; many features (such as XSP) are not yet implemented. One problem with writing about a large open source project is that it may well have moved on by the time the article is published. Nevertheless, I found a number of shortcomings in Cocoon 1.5 and I feel I should at least mention some of them.

For a Web server, Cocoon is slow, particularly on the first request of a page; it sometimes takes several seconds to respond on my 150MHz processor. The Debian package has scant documentation on XSL; you have to look at examples, use trial and error, or decompile the program. Some parts of the XSL processor are case sensitive, which is not intuitive in markup languages and can be a trap. The XML parser is not fully internationalised; certain character set encodings, such as the ISO-2022 encodings of Chinese, Japanese and Korean, can confuse it (you can still use these languages but you have to encode them in EUC or UTF-8). The XSL commands to examine the browser's query string do not work at all, and the XSL "include" commands must be given absolute pathnames. Error reporting is scant; if you do make a mistake then you might end up with the raw data in your browser, and then Cocoon's cache will not recognise any changes to the XSL until you update the time stamp on the XML file itself. And although an XSL file is itself an XML document, Emacs cannot treat it as such because its "psgml mode" does not support XML namespaces (":" in tags), so you have to edit XSL files in Fundamental mode (this is not Cocoon's problem but it does make debugging more difficult).

At university I help maintain a website for one of the Chinese societies, and I moved it to XML with the aid of Cocoon. The use of XML allows the committee to make changes to the content without having to worry about the complex HTML that their design requires, and the XSL stylesheets can make sure that the various indices on the site are kept up-to-date and the different versions of each page (default, "noframes", and text only) are kept in step. Writing and debugging the XSL files was more effort than I had anticipated, and the site still needed a shell script to get Cocoon to process the files and save the resulting HTML, because this was the only working way of applying more than one XSL stylesheet to the same XML file and allowing the user to choose between versions (and anyway the site had to go on an ordinary Web server).

Given its prevalence, XML is probably worth knowing about, and XML publishing can be useful for managing complex websites, although I would not recommend moving every website to it just for the sake of keeping up. Cocoon shows promise (especially as an Apache project), but only time will tell if it becomes as common as the Apache Web server.

ED: Should You Know It?

Silas S. Brown

<<silas@flatline.org.uk>>

ED™ is an editor that runs under Unix. It is hardly an ideal programming environment by modern standards, because, rather than displaying the file you are editing and letting you cursor through it, everything is done at a command prompt that accepts cryptic-looking commands to edit a line at a time, and there is very little on-line help. It s a bit like a more powerful version of DOS's EDLIN™ (which was based on it), or the 'editor' that came with BBC BASIC™. So why bother with it nowadays?

The answer is that, particularly if you are involved in Unix system administration (or trying to help someone to install your software), then sooner or later you may have to edit a file using a machine with a telnet client that does not know how to position the cursor and therefore cannot run a full-screen editor, or does know how to position the cursor but is full of bugs that corrupt the display when it does so. Alternatively, you might be doing the same when the remote machine is over a very slow link or on the other side of the planet, and the latency destroys the ease of full-screen editors. I've been in that situation and I found a rudimentary knowledge of ED useful.

ED is invoked by typing "ed" with a filename, and usually responds with the size of the file in bytes. It then waits for a command, such as 1,10n to list lines 1 through 10 with line numbers ('l' instead of 'n' lists without line numbers; this is less useful because practically anything you do with ED will involve knowing about the line numbers). $ represents the last line in the file, and offsets are possible, so $-9,$n lists the last 10 lines and 1,$n lists the entire file (with paging). The "current" line is represented by . (dot); this is initially the last line (so you can type ".a" to append to the file) but is usually set to any line you "do something" with.

For small configuration files, this may be all the navigation you need; you can view the whole file and decide what to change. To change a line, use "c", e.g. 27c changes line 27 - just type the replacement text (it may be more than one line) and end by typing a single dot on a line of its own (as in the SMTP protocol). You can also insert lines - 27i inserts before line 27 and 27a appends after line 27. 'd' deletes (so 1,5d deletes the first five lines). Change ('c') is essentially delete and insert (so you can give it a range of lines too). Also worth knowing is "wq" (write and quit), "q" (quit without saving - you have to enter it twice if the file has been modified), and "u" (undo - but it only remembers one operation).

For larger files, ED's regular expression facilities are useful. /x/ finds the next line containing x (?x? searches backwards); both of these are used in line numbers, so /x/+1n means list the line after the next line containing x (errors can happen if you try to list lines beyond the beginning or the end). A small edit will often take the form of

/x/n

.c

to find (and list) an occurrence of something and then change the line it is on.

There is also a search and replace facility; 1,10s/A/B/g/ means replace all (g=global) occurrences of the regular expression A with B, in lines 1 through 10. A number can be substituted for the 'g' to mean replace only that occurrence (1 for the first occurrence and so on). Note that in all cases we are dealing with regular expressions (like grep uses), not simple search strings.

When dealing with a slow remote machine, it helps to go into "line mode" so that a whole line of input may be edited locally before it is sent to the remote machine in one go. On most telnet clients this can be done by going into command mode (usually Control-']') and typing "mode line"; you may have to do this twice (for some reason).

Finally

Season's Greetings and best wishes for the coming year from all who write C Vu to all who read it. Remember we need each other. Let us make 2001 even better than 2000.

Notes:

More fields may be available via dynamicdata ..