ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinCompiling a Static Web Site Using the C Preprocessor

Overload Journal #108 - April 2012 + Programming Topics   Author: Sergey Ignatchenko
Sometimes the obvious way is still too complex. Sergey Ignatchenko relates how ‘No Bugs’ Bunny found an unexpectedly simple approach to creating a web site.

Disclaimer: as usual, the opinions within this article are those of ‘No Bugs’ Bunny, and do not necessarily coincide with opinions of the translator or Overload editors; please also keep in mind that translation difficulties from Lapine (like those described in [LoganBerry2004]) might have prevented from providing an exact translation. In addition, both translator and Overload expressly disclaim all responsibility from any action or inaction resulting from reading this article.

Quite recently, I was told by one of my fellow rabbits that they’re compiling their web site; moreover, that they’re doing this using the C preprocessor. My first reaction? ’You guys must be crazy!’ My second thought was ‘This is so crazy it might just work’, so I’ve taken a deeper look into it, and the more I looked the more I liked it (of course, it is not a silver bullet, but in some cases it can certainly save significant time and effort).

It was so interesting that I felt obliged (with the kind permission of the authors) to share it with the audience of Overload. This approach shall be most interesting for C- and C++-oriented readers, the preprocessor is a tool of the trade for most of them.

The task in hand

Once upon a time, there was a software development company, mostly specializing in C/C++ coding. And at some point they needed to create a web site. Nothing fancy, just your usual small company web site with predominantly static content and updates no more frequent than once a week. And we’re in 201x, the website needed to be ‘Web2.0-ish’ and (more importantly for our purposes) needed to be usable on PCs, Macs and on smartphones. This is where our story starts.

Desktop != mobile

Rather quickly it was realized that if you want to ensure a reasonable user experience both on desktops and mobiles, there is absolutely no way you can use the same HTML for both sites. After some deliberation they decided to use jQuery Tools for the desktop site and jQuery Mobile for the mobile one, but the exact choice is not important for the purposes of this article; what is important is that these days we tend to have to assume that the HTML will be different for the mobile and desktop versions. Unfortunately, despite all the efforts of CSS, merely changing the CSS to switch from desktop to mobile is good enough only in textbook examples; it doesn’t mean that CSS shouldn’t be used – in fact, it was used very extensively in this case (in particular, both jQuery Tools and jQuery Mobile rely on CSS heavily), but CSS is not enough – there are way too many differences between desktop and mobile sites, from potentially different numbers of HTML pages, to very different navigation. And when you have two HTML codebases with essentially the same content, any update needs to be copied very carefully to two places, which as we all know, is not a good thing. In HTML it becomes even worse, as a missing </div> can easily break the whole thing and figuring out what went wrong can be difficult even if source control is used. Also, in a static website there are lots of similar (usually navigational) fragments across all the pages, and maintaining them manually (style changes, if page is added, moved etc.) certainly means a lot of mundane, tedious and error-prone work.

The classical approach: PHP+MySQL

Usually these problems are addressed by using a server-side engine, like PHP/ASP/Perl/etc., with content essentially residing in a database and formatted into HTML on each request. This would certainly work in this case too, but it was argued that for a site with just a dozen predominantly static pages, having to study one more programming language, installing and maintaining the database (with backups etc. etc.) and dealing with the additional security issues, is not exactly desirable. As I was told, at that point the attitude was ‘yes, this is a solution but it is really bulky for such a simple task; is there a chance to find something simpler and more familiar?’

When there is a will, there is a way

Os e layth Frithyeer hyaones, on layth zayn yayn dahloil

If it’s sunny today, we’ll go and find dandelions

Eventually somebody had a thought: if we describe what we need in familiar terms, then the task can be stated as follows: we need to compile the web site from source texts into multiple HTML targets, one being for desktops, another for mobile devices. As soon as this was stated, things moved rather quickly and the solution came up pretty soon; due to background of the guys involved, the solution was based on the familiar C preprocessor and on sed.

C preprocessor for compiling web sites? You guys must be crazy!

According to the idea of compiling the web site from source texts into HTML, the source code was structured as follows:

  • there are text files. These contain all the site’s textual content and are allowed to have only very limited HTML (usually restricted to stuff like <b>, <p>, <h*>, occasional <a> and so on); they are the same for both desktop and mobile versions.
  • there are a few HTML template files (in this specific case they were given .c extensions – I was told to avoid some obscure problem with applying GCC to .html files). These are regular HTML files but with C preprocessor directives allowed, the most important being #include. These template files are specific to the desktop/mobile version, and in particular it is easy to have different layouts which is important for dealing with jQuery Mobile. These HTML template files #include *.txt files to include specific content.

The devil is in the details

‘Es lay elil?’ e laynt meth.

‘Are you an enemy?’ he said.

The basic file structure described above is enough to start developing in this model, but as usual figuring out more subtle details may take a while.

First of all, in practice 99.9% of the text content is the same for both sites, but the remaining 0.1% needs to be addressed. To deal with these cases, usual C-style macros like DESKTOP_ONLY and MOBILE_ONLY were used (with a parameter telling what to insert). There is a caveat though – with the C preprocessor one cannot use arbitrary strings as parameters unless they’re quoted, and when you use quoted strings the quotes themselves are inserted into the resulting HTML. To get around this the following solution (IMHO quite a dirty one, but it does work) was used:

  • the macro is defined in the following manner:
    		#define DESKTOP_ONLY( x ) @@x@@
  • the macro is used like DESKTOP_ONLY( "<br"> ). Then after the preprocessor is run it becomes @@"<br>"@@
  • after running the preprocessor, the Unix-like sed with a rule like
    		s/@@"\([^"]*\)"@@/\1/g
    is run over the post-preprocessed code. This changes our @@"<br>"@@ into <br>, which is exactly what is needed. Note that @@ has no special meaning (importantly not in the C preprocessor, nor in sed and not in HTML); it is used merely as a temporary escape sequence which should not normally appear, but if your HTML does include it, you can always use another escape sequence.

Obviously this whole exercise only makes sense if DESKTOP_ONLY is defined this way only when generating desktop HTML templates, and is defined as

  #define DESKTOP_ONLY( x )

for mobile HTML templates. Also it should be mentioned that while this solution is indeed rather ‘dirty’, it doesn’t clutter the text files, and this is what really important.

Another similar example occurs if you need to concatenate strings (note that the usual C technique of putting 2 string literals next to each other is not recognized in HTML). So, for example, a macro to insert an image may look like

  #define IMG( x ) <img src="images/"@*@x>

with an additional sed rule being

  s/"@*@"//g

Many other situations can be handled in a similar way.

Additional benefits

Now, as string stuff has been handled, the system is ready to go, and there are several additional (and familiar for C developers) features which can be utilized.

In many cases there will be repeated HTML parts on many HTML pages (headers, footers, navigation and so on), and with this approach this repeated stuff can be easily moved either into #include s or macros. I’ve seen it, and it is indeed a great improvement over repeating the same stuff over and over if editing HTML manually.

Another such feature is conditional compilation; as practice has shown it is particularly useful when dealing with navigation. One typical pattern I’ve seen was used to have the same navigation pattern for a set of the pages while using macros to disable links back to itself using the following technique:

  • in HTML template file:
    		#define NO_HOME_LINK
    		#include "navigation.inc"
  • in navigation.inc:
    		...
    		#ifndef NO_HOME_LINK
    		<a href="/">Home</a>
    		#else
    		<b>Home</b>
    		#endif
    		...

While the same thing can be achieved with Javascript, why not do it once instead of doing it on each and every client computer (not to mention that #ifdef is a much more familiar for developers with a C or C++ background)?

Using sed also provides its own benefits: for example, to reduce the size of the resulting file, the following sed rule is useful:

  /^$/d

If you don’t want to look for a preprocessor command-line switch which removes generated #line statement, the following sed rule will help:

  /^#/d

And for an easy fix of problems with handling &apos; (which works everywhere, except for Internet Explorer) – the following sed rule will help:

  s/&apos;/\&#039;/g

Pros and cons

So, now we’ve described the system, the question is: what are advantages and disadvantages of this approach?

Compared to manual HTML programming

Pros:

  • single content source in text files, easily editable (even by management)
  • much easier maintenance due to better HTML code structuring (using #includes / macros)

Cons:

  • initial setup costs, although these seem low: the guys have told me it took 3 hours by one person from coming up with the idea to a working implementation

Compared to classical ‘PHP+MySQL’ stuff

Pros:

  • no bulky installation (PHP or another engine, database with backups, etc.)
  • no need to learn a new programming language (Javascript is bad enough on its own!)

Cons:

  • works only for rather static sites, where updates are not more frequent than around once per day

Is it only for *nix?

If you liked the idea but are working with Windows, don’t worry – at least in theory the same approach should work too. If you’re using Microsoft Developer Studio, something like

      cl /P /EP

should perform the preprocessing, and sed for Windows can be found, eg [GnuWin]. You’ll still need to learn how to deal with a command line though ;).

Is the C preprocessor the best tool for the job?

While this example has been concentrating on the C preprocessor, it is not the only tool which can help. In fact, almost any kind of text processor can be used to compile web sites in a similar way to what has been described above (for example, m4 or some kind of standalone XSLT processor can be used), but which one is the best, especially for small projects, depends a lot on your previous experience, so for C-oriented audience the C preprocessor might indeed be just the ticket.

References

[LoganBerry2004] David ‘Loganberry’, Frithaes! – an Introduction to Colloquial Lapine!, http://bitsnbobstones.watershipdown.org/lapine/overview.html

[GnuWin] http://gnuwin32.sourceforge.net/packages/sed.htm

Overload Journal #108 - April 2012 + Programming Topics