Programming Topics + CVu Journal Vol 17, #4 - Aug 2005
Browse in : All > Topics > Programming
All > Journals > CVu > 174
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Sharp as C

Author: Administrator

Date: 03 August 2005 05:00:00 +01:00 or Wed, 03 August 2005 05:00:00 +01:00

Summary: 

Body: 

1:18 For in much wisdom is much grief: and he that increaseth knowledge increaseth sorrow. KJV - Ecclesiastes

In the beginning was … a word.

And the word was … an algorithm!? Or should I say al-khwarizm? What does wikipedia (http://en.wikipedia.org/wiki/Main_Page) say about the term algorithm?

"An algorithm (the word is derived from the name of the Persian mathematician Al-Khwarizmi), is a finite set of well-defined instructions for accomplishing some task which, given an initial state, will terminate in a corresponding recognizable end-state"

Al-Khwarizmi? Citing: "Abu Abdullah Muhammad bin Musa al-Khwarizmi, was a Persian scientist, mathematician, astronomer/astrologer, and author. He was probably born in 780, or around 800; and probably died in 845, or around 840."

1200 years!

What This Article is About and How to Read This

This article represents my own point of view at the general approach to software development and its architecture. In respect of your time, I'm going to hide all my thoughts I had and way I did, whether they were short or long and going to offer you just a final conclusions. Sometimes these conclusions might seem strange, but that is what I'm thinking, my personal opinion. In this article I'm doing nothing, but expressing my own point of view, does it make any sense to you or not, do you see any useful ideas here or think all is absolutely useless it is for you to decide.

The plan of the article is pretty simple: In 'How it should be' section I'm describing the general idea. If you find it interesting then going further in the section 'It is possible.' You will find the details of the realization. Everything else is no more than pros & cons of that approach, proposed in the 'How it should be.' section. If the idea, described in the 'How it should be' section seems pointless to you, it might spare your time to read no further.

How Should It Be

"...since brevity is the soul of wit, And tediousness the limbs and outward flourishes, I will be brief: your noble son is mad:" POLONIUS, Hamlet, Prince of Denmark, W. Shakespeare

Whether it possible to draw an architecture of application in general? Somebody might say it depends on business. In my opinion it should look like Figure 1.

Figure 1.

Business logic is to be written in script to make it as plain as possible. The business entities are whatsoever your business needs, like collections (vectors, maps, sets, etc.), logging systems, DB vendors (MSSQL, Oracle, SyBase, etc), IPC (DCOM, RPC, Sockets, pipes), threading systems (such as POSIX) and so on. All these entities should expose some kind of an plain interface which basically are getters and setters. These entities should be as simple in logic as possible and in general they should export either data or simple functionality.

The business logic is written in some scripting language and portable also. If some entity is going to be changed (you are switching from SQL to Oracle for instance) the logic should not be changed, in the perfect case.

What I'm trying to say here is that any business logic and entities are to be separate. Let us see a classical sample:

int main() { printf ("Hello world.\n"); return 0; }

In this sample business logic is represented by means of C script (in general this is a script, since we have no idea how we are going to start it up) and business entities are only the one, this is the C-library (libc, msvcrt for instance), exposing plain 'exported' C-functions (printf in our case), this is the interface (see figure 2).

Figure 2.

This approach goes contrary with the traditional OOP approach of development. Since OOP puts together an object and its functionality, this approach does otherwise. It is even thus. Keeping things clean and trying to save some time I would dare to say that, OOP worked out its resource and it is … dead.

Now I'm saying that developing business entities is not as painful as the development of business logic algorithms. Construction of business algorithms is a much more peculiar, painful, nervous process and takes much more time and resources than anything else.

Therefore business logic is to be written in plain script, and this script is to be changeable at run-time, without any recompilation. My basic objective is that business logic should not be as a 'sacred ground', once-working-never-changed. Its to be 'playable' whenever it is required, especially on development/QA stages, this logic/script can be changed in run-time, with no any commits/check-ins to be done, no rebuilding, no restarting, or any other annoying procedures, just simple changing the script should immediately impact the running system. Let me guess, you say impossible, or if it be possible - too complicated.

It Is Possible

And it is not so complicated. This section will show how it works. (The sample code is written for the Microsoft Windows platform)

This is an application tree:

+--c_dispatcher
+--Debug
+--frontend_app
+--include
+--my_script
+--my_script_c_proxy
+--my_script_d_proxy

Inside the folder fronend_app is the main (console) application. There is only one file there: frontend_app.cpp

// frontend_app.cpp
// (c) George Shagov, 2005
#include <windows.h>
#include "..\\include\my_structs.h"

typedef int (__cdecl *MYFARPROC)
            (int nArg,
             char* pString,
             SMyStructure* pMyStruct);

int main(int argc, char* argv[])
{

  HMODULE hMyScript = LoadLibrary
        ("my_script_d_proxy.dll ");
  MYFARPROC pProcSource = (MYFARPROC)
        GetProcAddress
        (hMyScript, "c__my_entry_point");

  SMyStructure myStruct;

  myStruct.m_nVal = 0;
  strcpy(myStruct.m_sString, "");
  /*
  * calling for entry point.
  * directly
  */
  char sMyString[32];
  strcpy(sMyString, "My string here.");
  pProcSource(argc, sMyString, &myStruct);

  return 0;
}

As you can see here, it gets an address of entry point of the script and executes it.

The script itself might be found inside my_scipt folder, the file name: my_script.c_. There are some additional files there: my_script.gnrtd.c my_script.gnrtd.h, these ones are to be generated from my_script.c (below).

// my_script.c_
// (c) George Shagov, 2005

/*********************************************
*
* this file is automatically generated from 
* my_script.c_    do not modify it
*
*********************************************/
#include <stdio.h>
#include <string.h>
#include "..\\include\\my_structs.h"
#include "my_scri"pt.gnrtd.h"

int c__get_value_1_impl(char* pString)
{
  return 1;
}

int c__get_value_2_impl(int nArg)
{
  return 2;
}

int c__call_in_case_varables_are_equal_impl
    (SMyStructure* pMyStruct)
{
  pMyStruct->m_nVal = 0;
  strcpy(pMyStruct->m_sString, "equal");
  return 0;
}

int c__call_in_case_varables_are_not_equal_impl
      (SMyStructure* pMyStruct)
{
  pMyStruct->m_nVal = 0;
  strcpy(pMyStruct->m_sString, "not equal");
  return 0;
}

int c__re_entry_impl(int nArg, char* pString,
                     SMyStructure* pMyStruct)
{
  int nVar1 = c__get_value_1(pString);
  int nVar2 = c__get_value_2(nArg);

  if (nVar1 == nVar2)
  {
    c__call_in_case_varables_are_equal
          (pMyStruct);
  }
  else
  {
    c__call_in_case_varables_are_not_equal
         (pMyStruct);
  }

  return 11;
}

int c__my_entry_point_impl(int nArg,
      char* pString, SMyStructure* pMyStruct)
{
  int nRet;

  printf("------\nbefore:\n");
  printf("nArg: %d, string: %s\n", nArg,
        pString);
  printf("pMyStruct->m_nVal: %d, 
        pMyStruct->m_sString: %s\n", 
        pMyStruct->m_nVal, 
        pMyStruct->m_sString);

  nRet = c__re_entry(nArg, pString,
        pMyStruct);

  printf("++++++after:\n");
  printf("nArg: %d, string: %s\n", nArg,
        pString);
  printf("pMyStruct->m_nVal: %d, 
          pMyStruct->m_sString: %s\n",
          pMyStruct->m_nVal,
          pMyStruct->m_sString);
  printf("ret: %d\n-------\n", nRet);

  return nRet;
}

c__my_entry_point_impl is an entry point to be called from frontend_app. my_script.gnrtd.c is merely a copy of the original script. my_script.gnrtd.h represents the declarations.

As you can see fronend_app uses my_script_d_proxy library in order to make a call to c__my_entry_point_impl.

There are two files under my_script_d_proxy folder my_script_d_proxy.gnrtd.c & my_script_d_proxy.gnrtd.h, both these files are to be generated from original script (my_script.c_) also. my_script_d_proxy.gnrtd.c contains plugs for all the functions, written in the script, like this:

int c__re_entry_stub(int nESP, int nArg, 
    char* pString, SMyStructure* pMyStruct)
{
  void* pArgs = 0;
  int nSize = 0;
  _asm
  {
    push eax; /* saving eax */
    mov eax, ebp; /* ebp points out at the
          parameters (as known) */
    add eax, 8; /* now eax points out at the first
          argument, which is nESP*/
    mov pArgs, eax;
    add pArgs, 4; /* since first argument is esp,
          but we need real argument here */
    mov eax, nESP;
    sub eax, pArgs; /* eax now has a phisical size
          of the stack */
    shr eax, 2; /* eax/4 - eax now has an amount
          of arguments put in the stack */
    mov nSize, eax; /* saving that size */
    pop eax; /* restoring eax */
  }
  return g_pDispatcherEntry("c__re_entry",
        pArgs, nSize);
}

int c__re_entry(int nArg, char* pString,
    SMyStructure* pMyStruct)
{
  int nESP;
  _asm
  {
    mov nESP, esp;
  }
  return c__re_entry_stub(nESP, nArg, pString,
        pMyStruct);
}

The assembly code remembers the pointer to the first argument, which was put in the stack, the count of argument in stack, and delivers a call to c_dispatcher library, which then exports the g__c_dispatcher_entry_point function.

The Code of c_dispatcher.cpp

// c_dispatcher.cpp
// (c) George Shagov, 2005
#include <stdio.h>
#include <windows.h>
#include "c_dispatcher.h"

static HINSTANCE s_hCSource = NULL;
static HINSTANCE s_hProxy = NULL;
typedef int (__cdecl *MYFARPROC)();

MYFARPROC GetMyProcAddress(
    const char* pFunctionName)
{
  char pFile[128];
  char pFnName[128];
  sprintf(pFile, "my_script.%s_impl.c_", 
        pFunctionName);
  sprintf(pFnName, "%s_impl", pFunctionName);
  FILE* f = fopen(pFile, "r");
  if (f)
  {
    fclose(f);
    return (MYFARPROC)GetProcAddress(s_hProxy,
          pFnName);
  }
  else
    return (MYFARPROC)GetProcAddress
          (s_hCSource, pFnName);
}

BOOL APIENTRY DllMain( HANDLE hModule,
                     DWORD ul_reason_for_call,
                     LPVOID lpReserved)
{
  switch (ul_reason_for_call)
  {
    case DLL_PROCESS_ATTACH:
      s_hCSource = LoadLibrary
            ("my_script.dll");
      s_hProxy = LoadLibrary
            ("my_script_c_proxy.dll");
    break;
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    break;
    case DLL_PROCESS_DETACH:
      FreeLibrary(s_hCSource);
      FreeLibrary(s_hProxy);
    break;
}
return TRUE;
}

// This is an example of an exported function.
C_DISPATCHER_API int g__c_dispatcher_entry_point(
    const char* pFunctionName, 
    const void* pArguments, 
    int nArgumentsCount)
{
MYFARPROC pProc = GetMyProcAddress(pFunctionName);
void* pStack = 0;
if (nArgumentsCount)
{
_asm
{
mov ecx, nArgumentsCount;
loop_start_01:
push 0;
loop loop_start_01;
mov pStack, esp;
}
memcpy(pStack, pArguments, nArgumentsCount*4);
int nRet = pProc();
_asm
{
mov ecx, nArgumentsCount;
loop_start_02:
pop eax;
loop loop_start_02;
}
return nRet;
}
else
return pProc();
}

As you can see here the case dispatcher has found a file my_script.<function_name_impl>.c_ it delegates call my_script_c_proxy library, otherwise it defaults to my_script.dll, where the compiled script code is located. This actually is a substitution. Before the call it simulates the stack, knowing the pointer at the original position and its size, after the call - simple unwinding. Simple, right?

my_script_c_proxy library contains four files. (Here I should say, since we are going to change the code at run-time we need some kind of a C-interpreter. I took cint. cint is free C-interpreter, powerful enough and very suitable for this demo, yet there are couple of issues which means that some disadvantages in this demo implementation will be closely connected to this particular interpreter. G__clink.c G__clink.h - these files are generated from my_script_d_proxy.gnrtd.h (my_script_d_proxy folder) by cint, since cint during interpretation should not call to the script functions, but to stubs, implemented inside the my_script_d_proxy library, in order to be able to re-implement any function we need, not the whole script. The rest of the functions are to be called from my_script.dll. It's a little bit tricky. The file my_script_c_proxy.gnrtd.c contains stubs which look like this:

MY_SCRIPT_C_PROXY_API int 
c__my_entry_point_impl(int nArg, 
     char* pString, SMyStructure* pMyStruct)
{
  char tmp[128];
  int nRet;

  s__setup_cint();
  sprintf(tmp,"c__my_entry_point_impl((int)%d,
     (void*)0x%08lx, (SMyStructure*)0x%08lx);",
     nArg, (int)pString,pMyStruct);
     nRet = G__calc(tmp).obj.i; /* Call Cint
      parser */ return nRet;
}

G__calc is a cint function, which make a call to the script.

Well, actually, that's it.

Let us see how it works.

The context of c_\Debug folder (after getting the project built) looks like this:

C_dispatcher.dll 
frontend_app.exe
my_script.dll
my_script_c_proxy.dll
my_script_d_proxy.dll

Starting the application we get:

before:
nArg: 1, string: My string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString:
++++++after:
nArg: 1, string: My string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString: not equal
ret: 11
-------

This is what produced by the compiled script, and now located in the m_script.dll library.

In Debug folder we are creating an empty file: my_script.c__get_value_1_impl.c_. The existence of this file will be a sign to the dispatcher that there is a substitution for the c__get_value_1_impl function. We should create the my_script.c_ file also, within the next content: (the presence of two files is that disadvantage I referred to earlier caused by cint).

// my_script.cpp : Defines the entry point for the DLL application.
//

#include <stdio.h>
#include "..\\include\\my_structs.h"

int c__get_value_1_impl(char* pString)
{
  pString[1] = 'X'; 
  printf("c__get_value_1 ==>> str: %s\n",
        pString);
  return 2;
}

The contents of the Debug folder looks like this:

C_dispatcher.dll
frontend_app.exe
my_script.c_ 
my_script.c__re_entry_impl.c_
my_script.dll
my_script_c_proxy.dll
my_script_d_proxy.dll

Restarting application gives the result:

------
before:
nArg: 1, string: My string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString:
c__get_value_1 ==>> str: MX string here.
++++++after:
nArg: 1, string: MX string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString: equal
ret: 11
-------

Now let us try to re-implement two functions. For this purpose we are creating the second file: my_script.c__re_entry_impl.c_, in order to signalize the dispatcher, and modifying the script.

// my_script.cpp : Defines the entry point for // the DLL application.
#include <stdio.h>
#include "..\\include\\my_structs.h"

int c__get_value_1_impl(char* pString)
{
  pString[1] = 'X'; 
  printf("c__get_value_1 ==>> str: %s\n",]
        pString);
  return 2;
}

int c__re_entry_impl(int nArg, char* pString,
    SMyStructure* pMyStruct)
{
  printf("\"I'll not be juggled with.\nTo
      hell, allegiance! Vows, to the
      blackest devil!\nConscience and grace,
      to the profoundest pit!\nI dare
      damnation. To this point I stand,\"\n");
  printf("...for this is script\n");

  int nVar1 = c__get_value_1(pString);
  int nVar2 = c__get_value_2(nArg);

  if (nVar1 == nVar2)
  {
    c__call_in_case_varables_are_equal
          (pMyStruct);
  }
  else
  {
    c__call_in_case_varables_are_not_equal
          (pMyStruct);
  }

  return 11;
}
The result:
------
before:
nArg: 1, string: My string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString:
"I'll not be juggled with.
To hell, allegiance! Vows, to the blackest devil!
Conscience and grace, to the profoundest pit!
I dare damnation. To this point I stand,"
...for this is script
c__get_value_1 ==>> str: MX string here.
++++++after:
nArg: 1, string: MX string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString: equal
ret: 11
-------

Now a little bit about parameters or arguments of the functions. I might see already that the string 'My string here' has been changed to 'MX string here', It has been done by means of c__get_value_1_impl and re-implemented in the script. We are able to do the same with structures. Creating a new file: my_script.c__call_in_case_varables_are_equal_impl.c_ and adding next function to the script:

int c__call_in_case_varables_are_equal_impl(
    SMyStructure* pMyStruct)
{
  pMyStruct->m_nVal = 0;
  strcpy(pMyStruct->m_sString, "- EQUAL -");
  return 0;
}
The result:
------
before:
nArg: 1, string: My string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString:
"I'll not be juggled with.
To hell, allegiance! Vows, to the blackest devil!
Conscience and grace, to the profoundest pit!
I dare damnation. To this point I stand,"
...for this is script
c__get_value_1 ==>> str: MX string here.
++++++after:
nArg: 1, string: MX string here.
pMyStruct->m_nVal: 0, pMyStruct->m_sString: - EQUAL -
ret: 11
-------

By now the contents of the Debug folder looks like this:

c_dispatcher.dll
frontend_app.exe
my_script.c_ 
my_script.c__call_in_case_varables_are_equal_impl.c_
my_script.c__get_value_1_impl.c_
my_script.c__re_entry_impl.c_
my_script.dll
my_script_c_proxy.dll
my_script_d_proxy.dll

It works. As you can see:

  1. It is possible to change (or rather to say substitute) the code (script) on run-time, no recompilation required.

  2. It is not a hard task.

Performance

Yes of course using script instead of native code does mean significant loss of performance, yet there are two things to say:

  • In the systems where performance is a key point (such as real-time systems), no substitution is to be allowed. It means there should not be any dispatcher library and all the calls to be compiled as direct ones and linked during the compilation. In this approach there will not be any losing of performance. Yet in development for QA where possibility for substitution is highly required but performance does not play a significant role, this approach will be applicable.

  • In general, performance is not a key point. In this case If we need substitution right in production. It is possible to do that without significant lose of performance. In order to do that we should:

  1. Create and compile a separate library, let it have a name my_script_subst.dll. This library would contain the re-implementation of these functions which we need to substitute.

  2. Create and compile an additional proxy library, let it have a name my_script_s_prioxy.dll which should look like my_script_c_prioxy.dll, save that all the calls will be delegated not to cint, but to my_scipt_subst.dll (see step 1)

  3. Modify the dispatcher so that it should know what my_script_s_prioxy.dll is.

I didn't do that just in order not to overload the code. If the basic idea is understandable, the rest is but technique.

Cons and Pros

Had I patience and time I would write a book here, or two. Yet in brief.

Disadvantages

  • The build procedure becomes more complicated, additional parsing is required.

  • There should be an interpreter supplied

  • Read the performance section

  • Using C as a script might cause some problems, since C, by default, has a direct access to memory and has no mechanism of automatic unwinding, which may potentially cause leaks. Yet, that should be a C-script, not C, it means that all functions which uses access to memory should be exposed as entities.

Benefits

  • Clarity. OOP code is much less readable than plain script and this clarity is the main goal.

  • Ability to change the business logic run-time.

  • Control. Just think what we able to do having all entry-points in our hands.

TODO

A lot.

  • There should be a suitable C-interpreter

  • Parsing procedure

  • See the second clause described in the 'Performance' section

  • Dispatcher. Yes of cause the way it is implemented is not applicable to real system. There should be a map of functions which is to be updated in separate thread, according to timestamp of the modification

  • And so on…

Notes: 

More fields may be available via dynamicdata ..