ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinEmbedded Scripting Languages

Overload Journal #55 - Jun 2003 + Programming Topics   Author: Jonathan Tripp

or how to add extra user functionality to your application

Why Do It?

What do I mean by an embedded scripting language and why are they useful? By a "scripting language" I mean a simple, cheap (as in free and easy to maintain) and cheerful language with just enough functionality. It should be easy to explain to application users who may have only a little or no programming experience. The syntax should be clear and expressive. It would be better, from the user's perspective, to limit functionality for an easier ride. Think of, for example, an early dialect of BASIC, rather than an object-oriented extension of Lisp. By "embedded", I mean that an interpreter for this language can be integrated into your C/C++ application. This may seem crazy, but it really isn't that difficult and it can be very beneficial.

The principal reason for embedding a scripting language is to allow your application's functionality to be adjusted after it has been built. At the simplest level, most applications choose to externalise some of their operational parameters in a configuration file. This is a reasonable approach if you are able to determine in advance which parameters are likely to change, but that isn't always the case. For example, if your application needs to use a serial port, you could make an entry in a configuration file like:

[Comms]
; The port to use
port = "COM2"
timeout = 1000

So, a configuration file can be regarded as a group of keywordvalue pairs. Here the keyword is port with corresponding value COM2. The pairs are grouped into sections separated by the [Comms] type line. Optional comments are on lines beginning with a semicolon. Unfortunately, as it stands this isn't always flexible enough, as I shall explain.

I write applications in C++ for controlling scientific equipment. I have a library of routines to control and test each physical component that I combine to build each final application. It is during this stage that I am most exposed to the customer's whims. Much is written about managing projects and customer requirements, but I am not sure that any one system really works. The reality is that a customer may not actually know what they want until they can see a prototype working. For example, in a control environment, suppose you have machines "A" and "B", and specification like:

TEST 10 IS:
Switch "A" on
Wait for 5 seconds for it to warm up
Switch "B" on
Wait for 10 seconds for it to warm up
Prime "B"
Trigger "B"
Collect data with "A" for 20 seconds
Switch all off

Anticipating that the start-up times will need some fine-tuning, you would externalise them into your configuration file as:

[TEST 10]
; A start-up time
Startup_A_Time = 5
; B start-up time
Startup_B_Time = 10
; Collect data for
Collection_Time = 20

This works well until someone points out that in fact the instrument start-up order needs reversing. A quick response is to now externalise your "if" clause to the configuration file.

; False for reverse start-up order
Startup_A_Then_B = True

with corresponding pseudo-code:

if (getBoolFromConfigurationFile("Test 10",
                "Startup_A_Then_B") == true) {
  StartA(getIntegerFromConfigurationFile(
         "Startup_A_Time"));
  StartB(getIntegerFromConfigurationFile(
         "Startup_B_Time"));
}
else {
  // the other way round
}

You can imagine that on a complicated system this is will quickly get silly. Problems like this can and do show up even when installing systems at the client's site. A busy shop floor is definitely not the right environment to be going through the compile/build/link cycle for a large C/C++ application. At this point, what you really want is to be able to program your control algorithm in the configuration file. The installation engineer can then modify the scripts using a simple text editor, reload them into the application and get on with testing. The configuration file must have the ability to present simple functions to your application and call back into your application. So, if we imagine a simple Pascal-like syntax:

- A start-up time
Startup_A_Time = 5
- B start-up time
Startup_B_Time = 10
- Collect data for
Collection_Time = 20
- False for reverse start-up order
Startup_A_Then_B = True
function Test10()
  if (Startup_A_Then_B)
    StartA(Startup_A_Time)
    StartB(Startup_B_Time)
  else 
    - the other way round
  end
  -the rest of the algorithm
end

This is your chance as the instigator of your fledgling language to make it as simple as possible for non-programmers. Use a simple syntax, i.e., no semi-colons and if possible infer the type from the situation! I think you'll agree that this approach is a lot simpler and easier for non-programmers to understand. It can be argued that the original algorithm is more clearly preserved from the specification. Even more so if you imagine the C/C++ version cluttered with all the ancillary error checking and logging. Its greatest strength is that it is open to change by you, other non-programming engineers and possibly even the end-user. Note that I am not advocating rewriting the whole application in a scripting language, because I consider C/C++ the perfect languages for the controlling libraries.

Examples

I'll now look at some other examples of this technique, in roughly historical order:

Firstly, GNU Emacs. From the Emacs documentation:

Emacs is the extensible, customizable, self-documenting realtime display editor. If this seems to be a bit of a mouthful, an easier explanation is Emacs is a text editor and more. At its core is an interpreter for Emacs Lisp ("elisp", for short), a dialect of the Lisp programming language with extensions to support text editing.

After a few prior implementations, Emacs now consists of a light C core that contains the display code and a Lisp interpreter. The rest of Emacs is programmed in Lisp; the scripts can be edited (in Emacs) and reloaded whilst the system is running. This tremendous flexibility is the main reason why Emacs is loved.

Secondly, for me, are the CAD systems, like AutoCAD. These, like Emacs, generally have a C/C++ core, and also expose a Lisp interpreter. Through Lisp bindings to the core application, the user can write scripts to manipulate much of the system from the graphical user interface to the models.

Thirdly, VBA from the Microsoft Office Suite. From the Microsoft web site:

Finally, Visual Basic for Applications takes the same power available through the Visual Basic programming system and applies it to highly functional applications, enabling infinite levels of automation, customization, and integration.

Since Microsoft's initial business was BASIC interpreters, it should be no surprise that they chose BASIC as the prototype for their embedded language, Visual BASIC for Applications (VBA). Unlike Lisp, small BASIC programs can be written easily with little or no prior programming experience, after all the B is for Beginner's.

Fourthly, computer games: many contemporary games have some form of scripting included. The complexity of a modern game requires it. The core graphics and artificial intelligence libraries are written in C++, but hooks are exposed to an embedded scripting language. Then the script for the game and the levels can be developed, changed and tweaked all in the embedded scripting language. For a popular example, the game "Unreal", developed by Epic Games includes a very sophisticated language called UnrealScript. By exposing this facility they have created a very configurable game engine that can be customised easily. Its versatility is proven by Epic Games selling their engine to other games companies.

Finally, everyone's favourite web server: Apache HTTP Server. This web server also contains a small embedded scripting language for processing what they refer to as "directives". When the server starts, it loads and parses a configuration file, httpd.conf by default, which contains directives. These are essentially function callbacks to the main server, with the addition of some conditional processing, based on either command line parameters or module availability.

These are all very successful applications, and I maintain that a large part of their success is due to the fact that they have exposed key configuration data and functions to the end-user.

How To Guide

The simplest way to identify which part of your application would benefit from this is to ask yourself: "which parts of your system are you frequently asked to change?" I think there is a general pattern with most applications; requests for changes will be targeted at those areas the user has most interaction with. This will probably be the gross functionality, i.e. the interactions of your libraries and probably the graphical user interface, if you have one.

In choosing or designing an embedded language, keep in mind your target users. To be accessible for a modern user, you should probably avoid Lisp. I know it is a very powerful language, there are free interpreters available and it is easy to bind to C, but it is a little daunting to a novice. BASIC is fun and like many developers in their 30's it was the first language I learnt on a home computer. You may pause before using Microsoft's VBA since it will require extensive use of COM and it will cost you an indeterminate amount to get a licence from Microsoft. The main scripting languages, Perl, Python and Ruby can all function as an embedded scripting language, and TCL was designed for just such a role. However, I feel they are probably just too inaccessible to a novice. There is a freely available language called Lua that fits my requirements. Lua is available as C source code and comes with a liberal licence. It also has all my other desirables: it is relatively small, can be used as a simple procedural language and has a clean interface to C. Lua was designed to be flexible; it is more of a language framework. It can be coaxed into offering objects with member functions and function overloading, and there are mechanisms available to expose C++ classes directly in Lua. It also uses a virtual machine for speed and performs automatic garbage collection.

First download the Lua source. Version 5.0 has just become available, and I shall be using that. Check you can build it as a static library, and build the standalone Lua interpreter to begin experimenting. This can be used interactively, or alternatively to process a file test.lua type dofile("test.lua") at the command prompt. Just to get a feel for the language, here is a gentle introduction.

Firstly, note that Lua uses dynamically typed variables. For example:

- two global variables
port = "COM2"
timeout = 1000

Comments follow two hyphens and continue to the end of the line. port and timeout are global variables and do not have a type, although their values have types of string and number respectively. Lua has base types of nil, boolean, number, string, function, userdata, thread, and table. nil is the terminal type, boolean, number and string are all as expected, but note that by default Lua is compiled with numbers as doubles. functions in Lua are first class, which means that they can be passed around, created and stored like any other value. userdata types are for smoothing integration with C. Lua treats them as simple memory blocks, although this default behaviour can be controlled, as I will show later. threads are new to Lua 5.0 and outside the scope of this article. Finally we have the most important type of table, used exhaustively within Lua. A table is an associative map, for example:

- a global table
default_comms = { port = "COM1",
timeout = 5000 }

Which creates a global table with keys port and timeout with corresponding values COM1 and 5000. Note that the key may be omitted in which case it defaults to the first unused numeric index:

another_comms = { port = "COM1",
timeout = 5000, true }

will add key 1 with boolean value true. The fields can be added or accessed using the familiar dot notation, here using the debugging function print:

print("Default comms port: ",
default_comms.port, " with timeout ",
default_comms.timeout)

will produce the output:

Default comms port: COM1 with timeout 5000

Lua allows you to define functions:

- a global function
function comms_open(port, timeout)
  local time = 1300
  local status = "OK"
  - opening comms port (just fake it for now)
  return time, status
end
- and function call
- this first print will print nils because the
- time variable has local scope and is now
- invisible
print("Before comms_open: ", time, status)
time, status = comms_open(another_comms)
print("After comms_open: ", time, status)

This is a simple function taking some comms settings and returning the time to start up and a status string to the caller. Since functions are treated like any other value, they can be added to tables. The standard libraries supplied with Lua all package their functions within tables in analogy to namespaces in C++. For example, the standard library for table utilities contains a function foreach that can be used as follows:

- debugging, print out the contents of a table
- using the table library foreach function:
table.foreach(another_comms, print)

This will visit each of the keys in another_comms calling the function print with the values (key, value), giving the output:

1 true
port COM1
timeout 5000

Lua also supports the usual control structures, as demonstrated by the following function for printing even numbers:

- demonstration of control structures
function even_numbers(total)
  local step = 2
  for counter = 0, total, step do
    if counter >= 20 then
      print("Twenties", counter)
    elseif counter >= 10 then
      print("Tens", counter)
    else
      print("Units", counter)
    end
  end
end
even_numbers(24)

I think we now know enough for Lua to be a useful language, and we can move directly on to integrating this with the application. To use Lua as an embedded language within a C program, you first need to create an instance of Lua and load in all the standard libraries. Finally this resource should be freed before the program ends with a balancing close statement:

#include <stdio.h>
#include <string.h>
/* Lua is strictly C, so add a guard for
   C++ compilation */
#ifdef __cplusplus
extern "C" {
#endif
#include <lua.h>
#include <lualib.h>
#include <lauxlib.h>
#ifdef __cplusplus
}
#endif
int main(int argc, char* argv[]) {
  lua_State* L = lua_open(); /* Create a new
                                instance of Lua
                                (lua_State *) */
  /* Initialize Lua standard library
     functions */
  luaopen_base(L);
  luaopen_table(L);
  luaopen_io(L);
  luaopen_string(L);
  luaopen_math(L);
  luaopen_debug(L);
  /* do some stuff */
  lua_close(L);
  return 0;
}

All communication between C and Lua is done using a stack mechanism, function call parameters are pushed, the call is made and the results will be on the stack. Positive stack indices are from the bottom and negative stack indices are from the top, as usual a push adds elements to the top of the stack. So, to add a new global variable and a new global table to Lua:

/* Create a global variable in Lua */
lua_pushstring(L, "baud_rate");
    /* push the variable name */
lua_pushnumber(L, 9600);
    /* then its value */
lua_settable(L, LUA_GLOBALSINDEX);
    /* finally set it in the global table */

/* We can create a global table too */
lua_pushstring(L, "backup_comms");
    /* push the table name */
lua_newtable(L);
    /* create a new table on the stack */
lua_pushstring(L, "timeout");
    /* push the field name and value */
lua_pushnumber(L, 2500);
lua_settable(L, -3);
    /* now the table we created has been
       pushed to -3 */
lua_settable(L, LUA_GLOBALSINDEX);
    /* finally set it in the global table */

The functions lua_push***(L, ***) just push their datatypes onto the stack. The function settable adds the field baud_rate with value 9600 to the table at the stack index LUA_GLOBALSINDEX. This is a special reserved index to identify the table that holds the global variables. This C code is directly equivalent to the following Lua code:

baud_rate = 9600
backup_comms = { timeout = 2500 }

Note that it is possible to get Lua to execute code fragments from C as follows:

lua_dostring(L, "baud_rate =
9600\nbackup_comms =
{ timeout = 2500 }");

To make a C function callable from Lua we follow the stack conventions above. Note that when Lua calls C it does so with a clean stack each time. The calling parameters are available in stack indices +1, +2 etc, and on return push the return values. For example:

/* A C function callable from Lua */
int l_comms_open(lua_State *L) {
  const char *port = NULL;
  double timeout = 0.0;
  double time = 1300;
  const char *status = "OK";
  /* Function parameters passed in at the
     beginning of the stack */
  if (lua_isstring(L, 1))
    /* check that the first parameter is
       a string */
    port = lua_tostring(L, 1);
    /* retrieve the first parameter */
  if (lua_isnumber(L, 2))
    /* ditto for numbers */
    timeout = lua_tonumber(L, 2);

  /* Do something interesting...omitted */

  lua_pushnumber(L, time);
      /* push the return values */
  lua_pushstring(L, status);
  return 2;
  /* return the number of results */
}

This function can be registered in C as a global function in Lua as follows:

lua_register(L, "c_comms_open",
             l_comms_open);

Now we have passed some data down to Lua, we can load a Lua script and see how it all works together. Add the following to the end of our test script:

print("baud_rate: ", baud_rate)
table.foreach(backup_comms, print)
time, status = c_comms_open("COM4",
                            backup_comms.timeout);
print("Result of comms_open: ", time, status)

To load a script from C and confirm the variables are populated correctly, use:

lua_dofile(L, "test1.lua");

We can also get values from Lua and invoke functions in Lua in the same manner. To get a global value, just push the table key and call lua_gettable(L, LUA_GLOBALSINDEX) to ask Lua to look up the value and put it at the top of the stack. Similarly, to get a value from a global table, first ask Lua to lookup the table and put it on the stack, and then push the table key before calling lua_gettable to finally lookup the value. Using the same test script we can make a call to the Lua comms_open function as follows:

/* Call a Lua function */
lua_pushstring(L, "comms_open");
    /* ask Lua to find the global function
       and push it onto the stack */
lua_gettable(L, LUA_GLOBALSINDEX);
if (lua_isfunction(L, -1)) {
  lua_pushstring(L, "COM4");
    /* push the two operands */
  lua_pushnumber(L, 3500);
  lua_call(L, 2, 2);
    /* make the function call, two inputs
       and two outputs */
  if (lua_isnumber(L, -2))
    /* results will be on the top of the
       stack */
    time = lua_tonumber(L, -2);
  if (lua_isstring(L, -1))
    status = lua_tostring(L, -1);
  lua_pop(L, 2);
}

At this point we can get and set Lua global variables from C, and call Lua global functions from C. We can also callback from Lua into C and load and execute Lua scripts. The example functions above could easily be isolated into a Lua interface library, and I think there is an obvious wrapping into a C++ class if you'd prefer. This is enough functionality to begin exploring Lua and C integration in earnest. As I mentioned earlier the userdata type available in Lua and hinted that its behaviour could be modified. In fact, Lua exposes most of its internal functionality through "metatables" and these can be modified from either C or Lua script itself. These tables are used to hold the operators for each object. To take the example from the Lua documentation, consider the behaviour of adding two objects. The internal processing done by Lua is as follows: if both operands are numeric, just add them together. Otherwise, if the first operand has an __add field in its metatable, use that function, otherwise consider the second operand's metatable. The operators available for overloading in this way are: __add, __sub, __mul, __div, __pow, __unm (for unary minus), __concat (for string concatenation), __eq, __lt (less than) and __le (less than or equal), __index (for field getters) and __newindex (for field setters) and __call (for function calls). Additionally for userdata types there is the __gc event called by the garbage collector for object finalisation. To see how this can be used for your userdata types consider the following example:

#define COMMSHANDLE "Comms*"

typedef struct tagComms {
  char *port;
  double timeout;
} Comms;

static int l_comms_new(lua_State *L) {

  /* Create a new userdata object of the 
     correct size */
  Comms *comms =
    (Comms *)lua_newuserdata(L,
                             sizeof(Comms));
    comms->port = NULL;
    comms->timeout = 0.0;
  return 1;
}

If you now register this function with Lua then when it is called it will create a new Comms struct and initialise it. Unfortunately, since this example does not contain just plain old data, there will be a memory leak for each of these structures since there is no way to clean them up. To remedy this, we need to create a new metatable object implementing the correct garbage collection to override the default Lua behaviour for userdata types. To create a new metatable object, just use the function:

luaL_newmetatable(L, COMMSHANDLE);
/* create new metatable for file handles */

and to attach the Comms userdata objects to this metatable, just add the lines

luaL_getmetatable(L, COMMSHANDLE);
/* retrieve the metatable for this type */
lua_setmetatable(L, -2);
/* set this metatable for this object */

to the Comms constructor function above. Connecting a userdata object with its metatable in this way is the Lua equivalent of constructing a v-table for a C++ object with virtual member functions. You can override the garbage collection behaviour by creating a C function as follows:

/* Define the garbage collection
   finalizer for Comms objects */
static int l_comms_gc(lua_State *L) {
  Comms *comms =
              (Comms *)lua_touserdata(L, 1);
  free(comms->port);
  comms->port = NULL;
  return 0;
}

Register this as the garbage collection routine for this metatable as follows (assuming that the metatable object is currently on the top of the stack)

lua_pushliteral(L, "__gc");
lua_pushcfunction(L, l_comms_gc);
lua_settable(L, -3);

I hope I have shown that your application could benefit from an embedded scripting language of some form. I have discussed some of the prior examples and have introduced a more modern language called Lua. I've given a quick taste of Lua and indicated that it can be extended easily and it can be embedded easily. The interface between C and Lua is easy to understand and easy to isolate. There are examples of other third party wrappers available that promise to even wrap C++ classes for easy access from Lua. There is also a growing body of third party libraries available for processing XML and for creating GUIs with Tk.

Overload Journal #55 - Jun 2003 + Programming Topics