Journal Articles

Overload Journal #135 - October 2016 + Programming Topics
Browse in : All > Journals > Overload > o135 (7)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Attacking Licensing Problems with C++

Author: Martin Moene

Date: 02 October 2016 21:19:35 +01:00 or Sun, 02 October 2016 21:19:35 +01:00

Summary: Software licenses are often crackable. Deák Ferenc presents a technique for tackling this problem.

Body: 

From the early days of the commercialization of computer software, malicious programmers, also known as crackers, have been continuously nettling the programmers of the aforementioned software by constantly bypassing the clever licensing mechanisms they have implemented in their software, thus causing financial damages to the companies providing the software.

This trend has not changed in recent years: the cleverer the routines the programmers write, the more time is spent by crackers in invalidating the newly created routines, and in the end the crackers always succeed. For companies to be able to keep up with the constant pressure provided by the cracking community, they would need to constantly change their licensing and identification algorithms, but in practice this is not a feasible way to deal with the problem.

An entire industry has evolved around software protection and licensing technologies, where renowned companies offer advanced (and expensive) solutions to tackle this problem. The protection schemes range from using various resources such as hardware dongles, to network activation, from unique license keys to using complex encryption of personalized data – the list is long.

This article provides a short introduction to illustrate a very simple and naive licensing algorithm’s internal workings. We will show how to bypass it in an almost real life scenario, and finally present a software based approach to mitigate the real problem by hiding the license checking code in a layer of obfuscated operations generated by the C++ template metaprogramming framework, which will make the life of the person wanting to crack the application a little bit harder. Certainly, if they are well determined, the code will still be cracked at some point, but at least we’ll make it harder for them.

A naive licensing algorithm

The naive licensing algorithm is a very simple implementation that checks the validity of a license associated with the name of the user who purchased the associated software. It is not an industrial strength algorithm: it only has demonstrative power, while trying to provide insight to the actual responsibilities of a real licensing algorithm.

Since the license checking code is usually shipped with the software product in compiled form, I’ll put in here both the generated code (in Intel x86 assembly) since that is what the crackers will see after a successful disassembly of the executable and the C++ code for the licensing algorithm. In order not to pollute precious page space with unintelligible binary code, I will restrict myself to including only the relevant bits of the code that naively determines whether a supplied license is valid or not, together with the C++ code that was used to generate the binary code.

Listing 1 is the source code of the licensing algorithm.

static const char letters[] =
  "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
bool check_license(const char* user,
                   const char* users_license)
{
  std::string license;
  size_t ll = strlen(users_license);
  size_t l = strlen(user), lic_ctr = 0;
  int add = 0;
  for (size_t i = 0; i < ll; i++)
    if (users_license[i] != '-')
      license += users_license[i];
  while (lic_ctr < license.length() ) {
    size_t i = lic_ctr;
    i %= l;
    int current = 0;
    while (i < l) current += user[i ++];
    current += add;
    add++;
    if (license[lic_ctr] 
        != letters[current % sizeof letters])
      return false;
    lic_ctr++;
  }
  return true;
}
			
Listing 1

The license which this method validates comes is in the form ABCD-EFGH-IJKL-MNOP, and there is an associated generate_license method which is presented as an appendix to this article.

Also, the naivety of this method is easily exposed by using the very proper name of check_license which immediately reveals to the want-to-be attacker where to look for the code checking the ... license. If you want to make harder for the attacker to identify the license checking method, I’d recommend either using some irrelevant names or just stripping all symbols from the executable as part of the release process.

The interesting part is the binary code of the method obtained via compilation of the corresponding C++ code (see Listing 2), which we obtained by compiling it with Microsoft Visual C++ 2015. I have compiled it in Release mode (with Debug information included for educational purposes) but it is intentionally not the Debug version, since we would hardly ship the debug version of the code to our customers.

if (license[lic_ctr] 
    != letters[current % sizeof letters])
  00FC15E4  lea      ecx,[license]
  00FC15E7  cmovae   ecx,dword ptr [license]
  00FC15EB  xor      edx,edx
  00FC15ED  push     1Bh
  00FC15EF  pop      esi
  00FC15F0  div      eax,esi
  00FC15F2  mov      eax,dword ptr [lic_ctr]
  00FC15F5  mov      al,byte ptr [ecx+eax]
  00FC15F8  cmp      al,byte ptr [edx+0FC42A4h]
  00FC15FE  jne      check_license+0DEh (0FC1625h)
  return false;
lic_ctr++;
  00FC1600  mov      eax,dword ptr [lic_ctr]
  00FC1603  mov      ecx,dword ptr [add]
  00FC1606  inc      eax
  00FC1607  mov      dword ptr [lic_ctr],eax
  00FC160A  cmp      eax,dword ptr [ebp-18h]
  00FC160D  jb       check_license+7Fh (0FC15C6h)
}
return true;
  00FC160F  mov      bl,1
  00FC1611  push     0
  00FC1613  push     1
  00FC1615  lea      ecx,[license]
  00FC1618  call
    std::basic_string<char,std::char_traits<char>,
    std::allocator<char> >::_Tidy (0FC1944h)
  00FC161D  mov      al,bl
}
  00FC161F  call     _EH_epilog3_GS (0FC2F7Ch)
  00FC1624  ret
  00FC1625  xor      bl,bl
  00FC1627  jmp      check_license+0CAh (0FC1611h)
			
Listing 2

I have also used the built-in debugger of the VS IDE to visualize the generated code next to the source, which facilitates a better understanding of the relation between the two of them.

Let’s analyze it for a few moments. The essence of the validity checking happens at address 00FC15F8 where the comparison cmp al, byte ptr [edx+0FC42A4h] takes place (for those wondering, edx gets its value as being the remainder of the division at 00FC15F0).

At this stage, the value of the al register is already initialized with the value of license[lic_ctr] and that is what is compared to the expected character. If it does not match, the code jumps to 0FC1625h where the bl register is zeroed out (xor bl, bl) and from there the jump goes backward to 0FC1611h to leave the method with the ret instruction found at 00FC1624. Otherwise the loop continues.

The most common way of returning a value from a method call is to place the value in the eax register and let the calling code handle it, so before returning from the method the value of al is populated with the value of the bl register (via mov al, bl found at 00FC161D).

Please remember that if the check discussed before did not succeed the value of the bl register was 0, but this bl was initialized to 1 (via mov bl,1 at 00FC160F) if the entire loop was successfully completed.

From the perspective of an attacker, the only thing that needs to be done is to replace the binary sequence of xor bl,bl with the binary code of mov bl,1 in the executable. Since luckily these two are the same length (2 bytes), the crack is ready to be published within a few seconds.

Moreover, due to the simplicity of the implementation of the algorithm, a highly skilled cracker could easily create a key-generator for the application, which would be an even worse scenario as the cracker doesn’t have to modify the executable. This means that further safety steps, such as integrity checks of the application, would all be executed correctly, but there would be a publicly available key-generator which could be used by anyone to generate a license-key without ever paying for it, or malicious salesmen could generate counterfeit licenses which they could sell to unsuspecting customers.

Here let’s look at our C++ obfuscating framework.

The C++ obfuscating framework

The C++ obfuscating framework provides a simple macro-based mechanism, combined with advanced C++ template meta-programming techniques for relevant methods and control structures, to replace the basic C++ control structures and statements with highly obfuscated code which makes the reverse engineering of the product a complex and complicated procedure.

By using the framework, reverse engineering the license checking algorithm presented in the previous section would prove to be a highly challenging task due to the sheer amount of extra code generated by the framework engine.

The framework has adopted a familiar, BASIC-like, syntax to make the switch from real C++ source code to the macro language of the framework as easy and painless as possible.

Functionality of the framework

The role of the obfuscating framework is to generate extra code, while providing functionality which is expected by the user, with as few syntax changes to the language as possible.

The following functionalities are provided by the framework:

Debugging with the framework

Like every developer who has been there, we know that debugging complex and highly templated C++ code can sometimes be a nightmare. In order to avoid this nightmare while using the framework, we decided to implement a debugging mode.

To activate the debugging mode of the framework, define the OBF_DEBUG identifier before including the obfuscation header file. Please see the specific control structures for how the debugging mode alters the behaviour of the macro.

Using the framework

The basic usage of the framework boils down to including the header file providing the obfuscating functionality

  #include "instr.h"

then using the macro pair OBF_BEGIN and OBF_END as delimiters of the code sequences that will be using obfuscated expressions.

For a more under-the-hood view of the framework, the OBF_BEGIN and OBF_END macros declare a try-catch block, which has support for returning values from the obfuscated current code sequence, and also provides support for basic control flow modifications such as the usage of continue and break emulator macros CONTINUE and BREAK.

Behind the scenes: OBF_BEGIN and OBF_END

OBF_BEGIN expands to:

  #define OBF_BEGIN \
   try {obf::next_step __crv = \
      obf::next_step::ns_done; \
      std::shared_ptr<obf::base_rvholder> \
      __rvlocal;

and OBF_END becomes:

  #define OBF_END } \
  catch(std::shared_ptr<obf::base_rvholder>& r) { \
  return *r; } catch (...) {throw;}

In order to support for ‘returning’ a value from the current obfuscated block we need a special variable __rvlocal. At later stages, this value will be populated with meaningful values as a result of executing the code of the RETURN macro (which will ‘throw’ a value with a type of std::shared_ptr<obf::base_rvholder>). The OBF_END will catch this specific value and handle it appropriately, while all other values thrown will be re-thrown in order to not to disturb the client code’s exception handling.

Value and numerical wrappers

To achieve an extra layer of obfuscation, the integral numerical values can be wrapped in the macro N() and all integral numeric variables (int, long, ...) can be wrapped in the macro V() to provide an extra layer of obfuscation for doing the calculation operations. The V() value wrapper also can wrap individual array elements(x[2]), but not arrays (x) and also cannot wrap class instantiation values due to the fact that the macro expands to a reference holder object.

The implementation of the wrappers uses the link time random number generator provided by [Andrivet] and the values are obfuscated by performing various operations to hide the original value.

And here is an example for using the value and variable wrappers:

  int a, b = N(6);
  V(a) = N(1);

After executing the statement above, the value of a will be 1.

The value wrappers implement a limited set of operations which you can use to change the value of the wrapped variable. These are the compound assignment operators: +=, -=, *=, /=, %=, <<=, >>=, &=, |= and ^=, and the post/pre-increment operations -- and ++. All of the binary operators (+, -, *, /, %, &, |, << and >>) are also implemented, so you can write V(a) + N(1) or V(a) - V(b).

Also, the assignment operator to a specific type and from a different value wrapper is implemented, together with the comparison operators.

As the name implies, the value wrappers will wrap values by offering a behaviour similar to the usage of simple values, so be aware that variables which are const values can be wrapped into the V() wrapper but, as with real const variables, you cannot assign to them. So for example the following code will not compile:

  const char* t = "ABC";
  if( V(t[1]) == 'B')
  {
    V( t[1] ) = 'D';
  }

And the following

  char* t = "ABC";
  if( V(t[1]) == 'B')
  {
    V( t[1] ) = 'D';
  }

will be undefined behaviour because the compiler will highly probably allocate the string "ABC" in a constant memory area (although I would expect your compiler to choke heavily on this expression since it’s not valid modern C++ anymore). To work with this kind of data, always use char[] instead of char*.

Behind the scenes of the implementation of numeric wrapping

The N macro is defined like the following:

  #define N(a) (obf::Num<decltype(a), \
    obf::MetaRandom<__COUNTER__, 4096>:: value ^ \
    a>().get() ^ obf::MetaRandom<__COUNTER__ - 1, \
    4096>::value)

As a first step, let’s consider that due to the implementation of [Andrivet] and the (more or less standard) __COUNTER__ macro, the following will have the same value:

  obf::MetaRandom<__COUNTER__, 4096>::value
  obf::MetaRandom<__COUNTER__ - 1, 4096>::value)

Now, taking the obf::Num class into view, we have Listing 3, where the iteration of the templates is finalized by Listing 4.

template<typename T, T n> class Num final {
public:
  enum { value = ( (n & 0x01)  
        | ( Num < T , (n >> 1)>::value << 1) ) 
  };
  Num() : v(0) {
    v = value ^  MetaRandom<32, 4096>::value;
  }
  T get() const { 
    volatile T x = v ^ MetaRandom<32,
    4096>::value; return x;
  }
private:
  volatile T v;
};
			
Listing 3
struct ObfZero { enum {value = 0}; };
struct ObfOne { enum {value = 1}; };

#define OBF_ZERO(t) template <> struct Num<t,0> final : public ObfZero { t v = value; };

#define OBF_ONE(t) template <> struct Num<t,1> final : public ObfOne { t v = value; };

#define OBF_TYPE(t) OBF_ZERO(t) OBF_ONE(t)
OBF_TYPE(int) // And for all other integral types
			
Listing 4

The Num class tries to add some protection by adding some extra xor operations to the use of a simple number, thus turning a simple numeric assignment into several steps of assembly code (Visual Studio 2015 generated the code Listing 5 in Release With Debug Info mode).

  int n;
  OBF_BEGIN
   n = N(42);
002A5F74  mov         dword ptr [ebp-4],0
002A5F7B  mov         dword ptr [ebp-4],78Ch
002A5F82  mov         eax,dword ptr [ebp-4]
002A5F85  xor         eax,0E8Fh
002A5F8A  mov         dword ptr [ebp-4],eax
002A5F8D  mov         eax,dword ptr [ebp-4]
002A5F90  xor         eax,929h
    OBF_END
			
Listing 5

However, please note the several volatile variables ... which are required to circumvent today’s extremely clever optimizing compilers. If we remove the volatile from the variables, the compiler is clever enough to guess the value I wanted to obfuscate, so ... there goes the obfuscation.

Behind the scenes of the implementation of variable wrapping

When we are not building the code in debugging mode, the macro V expands to the following C++ nightmare:

  #define MAX_BOGUS_IMPLEMENTATIONS 3

  #define V(a) ([&]() \
    {obf::extra_chooser<std::remove_reference \
      <decltype(a)>::type, \
      obf::MetaRandom<__COUNTER__,\
      MAX_BOGUS_IMPLEMENTATIONS>::value > \
      ::type _JOIN(_ec_,__COUNTER__)(a);\
      return obf::stream_helper();}() << a)

So let’s dissect it in order to understand the underlying operations.

The value wrappers add an extra obfuscation layer to the values they wrap, by performing an extra addition, an extra subtraction or an extra xor operation on the value itself. This is picked randomly when compilation happens by the extra_chooser class, which is like:

  template <typename T, int N>
  class extra_chooser
  {
    using type = basic_extra;
  };

and is helped by the following constructs:

 #define DEFINE_EXTRA(N,implementer) template \
  <typename T> struct extra_chooser<T,N> { \
  using type = implementer<T>; }

  DEFINE_EXTRA(0, extra_xor);
  DEFINE_EXTRA(1, extra_substraction);
  DEFINE_EXTRA(2, extra_addition);

which are the actual definition of the classes for the extra operations, which in their turn look like Listing 6, where the extra addition and subtraction are also very similar.

template <class T>
class extra_xor final : public basic_extra
{
public:
  extra_xor(T& a) : v(a)
  {
    volatile T lv 
      = MetaRandom<__COUNTER__, 4096>::value;
    v ^= lv;
  }
  virtual ~extra_xor() 
  {
    volatile T lv 
      = MetaRandom<__COUNTER__ - 1, 4096>::value;
    v ^= lv; 
  }
private:
  volatile T& v;
};
			
Listing 6

The next thing we observe is that an object of this kind (extra bogus operation chooser) is defined in a lambda function for the variable we are wrapping. The variable name for this is determined by _JOIN(_ec_,__COUNTER__)(a), where _JOIN is just a simple joiner macro:

  #define _JOIN(a,b) a##b

Upon creation and destruction of this extra_chooser object, the value of the object will remain unchanged; however, extra code will be generated by the compiler (thanks to the numerous volatile modifiers found in the extra operation classes, otherwise the compiler would ‘cheat’ again and just ‘skip’ our obfuscation). This is actually an extensible interface, so you can use it to define your own class for bogus operations using the DEFINE_EXTRA macro (and increase the MAX_BOGUS_IMPLEMENTATIONS as required).

Now, back to the lambda because it plays an important role. The lambda returns an object of type obf::stream_helper(), which is basically an empty class (class stream_helper {};), but the role of the lambda is still not done. As we can see in the macro, the lambda is executed and into its result (the obf::stream_helper() object) we stream the parameter of the macro (<< a). This gives control to the following operator:

  template <typename T>
  refholder<T> operator << (stream_helper, T& a)
  {
    return refholder<T>(a);
  }

providing us with a controversial class, refholder (Listing 7).

template <typename T>
class refholder final
{
public:
  refholder() = delete;
  refholder(T& pv) : v(pv) {}
  refholder(T&&) = delete;
  ~refholder() = default;
  refholder<T>& operator = (const T& ov) {
    v = ov; return *this;
  }
  refholder<T>& operator 
    = (const refholder<T>& ov) {
    v = ov.v; return *this;
  }
  bool operator == (const T& ov) {
    return !(v ^ ov);
  }
  bool operator != (const T& ov) {
    return !operator ==(ov);
  }
  COMPARISON_OPERATOR(>=)
  COMPARISON_OPERATOR(<=)
  COMPARISON_OPERATOR(>)
  COMPARISON_OPERATOR(<)
  operator T() {return v;}
  refholder<T>& operator++() {
    ++ v; return *this;
  }
  refholder<T>& operator--() {
    -- v; return *this; 
  }
  refholder<T> operator++(int) {
    refholder<T> rv(*this); operator ++();
    return rv; 
  }
  refholder<T> operator--(int) {
    refholder<T> rv(*this); operator --();
    return rv;
  }
  COMP_ASSIGNMENT_OPERATOR(+)
  COMP_ASSIGNMENT_OPERATOR(-)
  COMP_ASSIGNMENT_OPERATOR(*)
  COMP_ASSIGNMENT_OPERATOR(/)
  COMP_ASSIGNMENT_OPERATOR(%)
  COMP_ASSIGNMENT_OPERATOR(<<)
  COMP_ASSIGNMENT_OPERATOR(>>)
  COMP_ASSIGNMENT_OPERATOR(&)
  COMP_ASSIGNMENT_OPERATOR(|)
  COMP_ASSIGNMENT_OPERATOR(^)
private:
  volatile T& v;
};
			
Listing 7

This class has all the support for the basic operations you can execute on a variable either via the member operators (defined explicitly or via the macro COMP_ASSIGNMENT_OPERATOR) either defined via the DEFINE_BINARY_OPERATOR macro which defines binary operators for refholder classes. In cases when the variable wrapping is done on constant variables, there are specializations of this template class for constant Ts. There are various arguments against the construct of storing references as class members [Stackoverflow]; however, I consider this situation to be a reasonably safe one which can be exploited for this specific reason. So, here (Listing 8) comes a piece of generated assembly code for a very simple expression.

    int n;
    OBF_BEGIN
        V(n) = N(42);
00048466  mov         dword ptr [ebp-8],0  
0004846D  mov         dword ptr [ebp-8],97Ch  
00048474  push        esi  
00048475  mov         esi,dword ptr [ebp-8]  
00048478  mov         dword ptr [ebp-8],48Bh  
0004847F  xor         esi,0DC4h  
00048485  mov         eax,dword ptr [ebp-8]  
00048488  add         eax,dword ptr [n]  
0004848B  mov         dword ptr [n],eax  
0004848E  mov         dword ptr [ebp-8],48Bh  
00048495  mov         eax,dword ptr [ebp-8]  
00048498  sub         dword ptr [n],eax  
0004849B  lea         eax,[n]  
0004849E  push        eax  
0004849F  push        dword ptr [ebp-8]  
000484A2  lea         eax,[ebp-0Ch]  
000484A5  push        eax  
000484A6  call        obf::operator<<<int> (0414C9h)  
000484AB  add         esp,0Ch  
000484AE  xor         esi,492h  
000484B4  mov         eax,dword ptr [eax]  
000484B6  mov         dword ptr [eax],esi  
    OBF_END
			
Listing 8

The sheer amount of extra code generated for a simple assignment is overwhelming.

Control structures of the framework

The basic control structures which are familiar from C++ are made available for immediate use by the developers by means of macros, which expand into complex templated code.

They are meant to provide the same functionality as the standard C++ keyword they are emulating, and if the framework is compiled in DEBUG mode, most of them actually expand to the C++ control structure itself.

Decision making

When there is a need in the application to take a decision based on the value of a specific expression, the obfuscated framework offers the familiar if-then-else statement for the developers in the form of the IF-ELSE-ENDIF construct.

The IF statement

For checking the true-ness of an expression the framework offers the IF macro which has the following form:

  IF (expression)
  ....statements
  ELSE
  ....other statements
  ENDIF

where the ELSE is not mandatory, but the ENDIF is, since it indicates the end of the IF block’s statements.

And here is an example for the usage of the IF macro.

  IF( V(a) == N(9) )
    V(b) = a + N(5);
  ELSE
    V(a) = N(9);
    V(b) = a + b;
  ENDIF

Due to the way the IF macro is defined, it is not necessary to create a new scope between the IF and ENDIF; it is automatically defined and all variables declared in the statements between IF and ENDIF are destroyed.

Since the evaluation of the expression is bound to the execution of a hidden (well, at least from the outer world) lambda, unfortunately it is not possible to declare variables in the expression so the following:

  IF( int x = some_function() )

is not valid, and will yield a compiler error. This is partially intentional, since it gives that extra layer of obfuscation required to hide the operations done on a variable in a nameless lambda somewhere deep in the code.

In cases when debugging mode is active, the IF-ELSE-ENDIF macros are defined to expand to the following statements:

  #define IF(x)  if(x) {
  #define ELSE   } else {
  #define ENDIF  }

Implementation of the IF construct

The IF macro expands to the following:

  #define IF(x) { \
    std::shared_ptr<obf::base_rvholder> __rvlocal;\
    obf::if_wrapper(( [&]()->bool{ return (x); \
    })).set_then( [&]() {

the ELSE macro expands to:

  #define ELSE return __crv;}).set_else( [&]() {

and the ENDIF will give:

  #define ENDIF return __crv;}).run(); }

So to wrap it all up, the following code:

  IF( n == 42)
    n = 43;
  ELSE
    n = 44;
  ENDIF

will expand to Listing 9.

{
  std::shared_ptr<obf::base_rvholder> __rvlocal; 
  obf::if_wrapper( ([&]()->bool
  { 
    return (n == 42); 
  }) )
  .set_then( [&]() 
  {
    n = 43;
    return __crv;
  })
  .set_else( [&]() 
  {
    n = 44;
    return __crv;
  })
  .run(); 
}
			
Listing 9

Now let’s examine the if_wrapper class (Listing 10).

class if_wrapper final
{
public:
  template<class T>
  if_wrapper(T lambda) {
    condition.reset(new bool_functor<T>(lambda));}
  void run()
  {
    if(condition->run()) { if(thens) {
      thens->run();
    }}
    else { if(elses) {
      elses->run();
    }}
  }
  ~if_wrapper() noexcept = default;
  template<class T>
  if_wrapper& set_then(T lambda) 
  { 
    thens.reset(new next_step_functor<T>(lambda));
    return *this; 
  }
  template<class T>
  if_wrapper& set_else(T lambda) 
  { 
    elses.reset(new next_step_functor<T>(lambda));
    return *this; 
  }
private:
  std::unique_ptr<bool_functor_base> condition;
  std::unique_ptr<next_step_functor_base> thens;
  std::unique_ptr<next_step_functor_base> elses;
};
			
Listing 10

It is very clear why we needed the lambda created by the IF macro (([&]()->bool { return (n == 42); })): we needed to create an object of type class bool_functor from it, which will give us the true-ness of the if condition. The bool functor class looks like Listing 11, where the important part is the bool run() – which in fact runs the condition and returns its true-ness.

struct bool_functor_base
{
  virtual bool run() = 0;
};

template <class T>
struct bool_functor final : public bool_functor_base
{
  bool_functor(T r) : runner(r) {}
  virtual bool run() {return runner();}

private:
  T runner;
};
			
Listing 11

The two branches of the if are represented by the member variables std::unique_ptr<next_step_functor_base> thens; std::unique_ptr<next_step_functor_base> elses; and they behave very similarly to the conditional.

The run() method of the if_wrapper class firstly checks the condition and then, depending on the presence of the then and else branches, executes the required operations.

Support for looping

There are times when every application needs to iterate over a set of values, so I tried to re-implement the basic loop structures used in C++: the for loop, the while and the do-while have been reincarnated in the framework.

The FOR statement

The macro provided to imitate the for statement is:

  FOR(initializer, condition, incrementer)
  .... statements
  ENDFOR

Please note that, since FOR is a macro, it should use , (comma) not the traditional ; which is used in the standard C++ for loops, and do not forget to include your initializer, condition and incrementer in parentheses if they are expressions which have , (comma) in them.

The FOR loops should be ended with and ENDFOR statement to signal the end of the structure. Here is a simple example for the FOR loop.

  FOR(V(a) = N(0), V(a) < N(10), V(a) += 1)
    std::cout << V(a) << std::endl;
  ENDFOR

The same restriction concerning the variable declaration in the initializer as in the case of the IF applies for the FOR macro too, so it is not valid to write:

  FOR(int x=0, x<10, x++)

and the reasons are again the same as presented above.

In a debugging session, the FOR-ENDFOR macros expand to the following:

  #define FOR(init,cond,inc) for(init;cond;inc) {
  #define ENDFOR }

The WHILE loop

The macro provided as replacement for the while is:

  WHILE(condition)
  ....statements
  ENDWHILE

The WHILE loop has the same characteristics as the IF construct and behaves the same way as you would expect from a well-mannered while statement: it checks the condition at the top, and executes the statements repeatedly as long as the given condition is true. Here is an example for WHILE:

  V(a) = 1;
  WHILE( V(a)  < N(10) )
    std::cout << "IN:" << a<< std::endl;
    V(a) += N(1);
  ENDWHILE

Unfortunately the WHILE loop also has the same restrictions as the IF: you cannot declare a variable in its condition.

If compiled in debugging mode, the WHILE evaluates to:

  #define WHILE(x) while(x) {
  #define ENDWHILE }

The REPEAT-AS_LONG_AS construct posing as do-while

Due to the complexity of the solution, the familiar do-while construct of the C++ language had to be renamed a bit, since the WHILE ‘keyword’ was already taken for the benefit of the while loop, so I created the REPEAT-AS_LONG_AS keywords to achieve this goal.

This is the syntax of the REPEAT-AS_LONG_AS construct:

  REPEAT
  ....statements
  AS_LONG_AS( expression )

This will execute the statements at least once and then, depending on the value of the expression, either will continue the execution, or will stop and exit the loop. If the expression is true, it will continue the execution from the beginning of the loop; if the expression is false, execution will stop and the loop will be exited.

And here is an example:

  REPEAT
    std::cout << a << std::endl;
    ++ V(a);
  AS_LONG_AS( V(a) != N(12) )

When debugging, the REPEAT - AS_LONG_AS construct expands to the following:

  #define REPEAT   do {
  #define AS_LONG_AS(x) } while (x);

Implementation of the looping constructs

The logic and design of the looping constructs are very similar to each other. They behave very similarly to IF and each of them uses the same building blocks. There are the wrapper classes (for_wrapper, repeat_wrapper, while_wrapper), each of them with their functors for verifying the condition, and the steps to be executed.

The implementation in each of the run() method of the wrapper class follows the logic of the keyword it tries to emulate, with the exception that the commands are wrapped into a try - catch to enable BREAK and CONTINUE to function properly. Let’s see for example the run() of the for wrapper:

  void run()
  {
    for( initializer->run(); condition->run();
      increment->run())
    {
      try
      {
        next_step c = body->run();
      }
      catch(next_step& c)
      {
        if(c == next_step::ns_break) break;
        if(c == next_step::ns_continue) continue;
      }
    }
  }

Altering the control flow of the application

Sometimes there is a need to alter the execution flow of a loop. C++ supports this operation by providing the continue and break statements. The framework offers the CONTINUE and BREAK macros to achieve this goal.

The CONTINUE statement

The CONTINUE statement will skip all statements that follow it in the body of the loop, thus altering the flow of the application.

Here is an example for the CONTINUE used in a FOR loop:

  FOR(a = 0, a < 5, a++)
    std::cout << "counter before=" << a 
    <<   std::endl;
    IF(a == 2)
      CONTINUE
    ENDIF
    std::cout << "counter after=" << a 
    << std::endl;
  ENDFOR

and the equivalent WHILE loop:

  a = 0;
  WHILE(a < 5)
    std::cout << "counter before=" << a
    << std::endl;
    IF(a == 2)
      a++;
      CONTINUE
    ENDIF
    std::cout << "counter after=" << a
    << std::endl;
    a++;
  ENDWHILE

Neither of these should print out the counter after=2 text.

The BREAK statement

The BREAK statement terminates the loop statement it resides in and transfers execution to the statement immediately following the loop.

Here is an example for the BREAK statement used in a FOR loop:

  FOR(a = 0, a < 10, a++)
    std::cout << "counter=" << a << std::endl;
    IF(a == 1)
      BREAK
    ENDIF
  ENDFOR

This loop will print counter=0 and counter=1 then it will leave the body of the loop, continuing the execution after the ENDFOR.

The RETURN statement

As expected, the RETURN statement returns the execution of the current function and will return the specified value to the caller function. Here is an example of returning 42 from a function:

  int some_fun()
  {
    OBF_BEGIN
      RETURN(42)
    OBF_END
  }

With the introduction of RETURN, an important issue arose: the obfuscation framework does not support the use of void functions, so the following code will not compile:

  void void_test(int& a)
  {
    OBF_BEGIN
      IF(V(a) == 42)
        V(a) = 43;
      ENDIF
    OBF_END
  }

This is a seemingly annoying feature, but it can easily be fixed by simply changing the return type of the function to any non-void type. The reason is that the RETURN macro and the underlying C++ constructs should handle a wide variety of returnable types in a manner which can be handled easily by the programmer without causing confusion.

Implementation of CONTINUE, BREAK and RETURN

These keywords give the following when not compiled in debug mode:

  #define BREAK __crv = obf::next_step::ns_break; \
     throw __crv;

  #define CONTINUE __crv = \
     obf::next_step::ns_continue; throw __crv;

  #define RETURN(x) __rvlocal.reset\
    (new obf::rvholder<std::remove_reference\
    <decltype(x)> ::type>(x,x)); throw __rvlocal;

BREAK and CONTINUE offer no surprises in the implementation and they comply to the expectation that has been formulated in the looping constructs: they throw a specific value, which is then caught in the local loop of the implementation, which handles it accordingly.

However, RETURN is a different kind of beast.

It initializes the __rvlocal (the local return value) to the returned value and then throws it for the catch which is to be found in the OBF_END macro, which in its turn handles it correctly.

As you can see, there are three evaluations of the x macro parameter. To avoid unwanted behaviour from your application, do not use expressions which might turn out to be dangerous, such as RETURN (x++);, which will give a three-times increment to your variable and undefined behaviour.

The rvholder class has the body shown in Listing 12.

struct base_rvholder
{
  virtual ~base_rvholder() = default;

  template<class T>
  operator T () const
  {
    return *reinterpret_cast<const T*>(get());
  }
  template<class T>
  bool operator == (const T& o) const
  {
    return o == operator T ();
  }
  template<class T>
  bool equals(const T& o) const
  {
    return o ==
      *reinterpret_cast<const T*>(get());
  }
  virtual const void* get() const = 0;
};

template<class T>
class rvholder : public base_rvholder
{
public:
  rvholder(T t, T c) :
    base_rvholder(), v(t), check(c) {}
  ~rvholder() = default;
  virtual const void* get() const override 
  {
    return reinterpret_cast<const void*>(&v);
  }
private:
  T v;
  T check;
};
			
Listing 12

As you can see there is a redundant equals method in the base class, and this is due to the fact that during development of the framework, the Visual Studio compiler constantly crashed due to some internal error in the implementation of the CASE construct, and it always reported the error in the operator == of the base class. In order to make it work, I have added the extra equals member.

The CASE statement

When programming in C++, the switch-case statement comes in handy when there is a need to avoid long chains of if statements. The obfuscation framework provides a similar construct, although not exactly a functional and syntactical copy of the original switch-case construct.

Here is the CASE statement:

  CASE (<variable>)
    WHEN(<value>) [OR WHEN(<other_value>)] DO
    ....statements
    ....[BREAK]
    DONE
    [DEFAULT
    ....statements
    DONE]
  ENDCASE

The functionality is very similar to the well-known switch-case construct, the main differences are:

  1. It is possible to use non-numeric, non-constant values (variables and strings) for the WHEN due to the fact that all of the CASE statement is wrapped up in a templated, lambdaized, well-hidden from the outside world, construct. Be careful with this extra feature when using the debugging mode of the library because the CASE macro expands to the standard case keyword.
  2. It is possible to have multiple conditions for a WHEN label joined together with OR.

The fall through behaviour of the switch construct which is familiar to C++ programmers was kept, so there is a need to put in a BREAK statement if you wish the operation to stop after entering a branch.

Listing 13 is an example for the CASE statement.

std::string something = "D";
std::string something_else = "D";

CASE (something)
  WHEN("A") OR WHEN("B") DO
    std::cout <<"Hurra, something is " 
    << something << std::endl;
    BREAK;
  DONE

  WHEN("C") DO
    std::cout <<"Too bad, something is " 
    << something << std::endl;
    BREAK;
  DONE

  WHEN(something_else) DO
    std::cout <<"Interesting, something is " 
    << something_else << std::endl;
    BREAK;
  DONE

  DEFAULT
    std::cout << "something is neither A, B or C,"
    " but:" << something <<std::endl;
  DONE
ENDCASE
			
Listing 13

In cases when the framework is used in debugging mode, the macros expand to the following statements:

  #define CASE(a) switch (a) {
  #define ENDCASE }
  #define WHEN(c) case c:
  #define DO {
  #define DONE }
  #define OR
  #define DEFAULT default:

Implementation of the CASE construct

Certainly, the most complex of all constructs is the CASE one. Just the number of macros supporting it is huge:

  #define CASE(a) try { \
    std::shared_ptr<obf::base_rvholder> __rvlocal;\
    auto __avholder = a; \
    obf::case_wrapper<std::remove_reference \
    <decltype(a)>::type>(a).

  #define ENDCASE run(); } \
    catch(obf::next_step& cv) {}

  #define WHEN(c)\
    add_entry(obf::branch<std::remove_reference\
    <decltype(__avholder)>::type> \
    ( [&,__avholder]() -> \
    std::remove_reference<decltype(__avholder)>\
    ::type { \
    std::remove_reference<decltype(__avholder)>\
    ::type __c = (c); return __c;} )).
  #define DO add_entry( obf::body([&](){

  #define DONE return \
    obf::next_step::ns_continue;})).

  #define OR join().

  #define DEFAULT add_default(obf::body([&](){

Let’s dive into it.

The case_wrapper name should be already familiar from the various wrappers, but for CASE, the real workhorse is the case_wrapper_base class. The case_wrapper class is necessary in order to make CASE selection on const or non const objects possible, so the case_wrapper classes just derive from case_wrapper_base and specialize on the constness of the CASE expression. Please note that the CASE macro also evaluates more than one the a parameters, so writing CASE(x++) will lead to undefined behaviour.

The case_wrapper_base class looks like Listing 14.

template <class CT>
class case_wrapper_base
{
public:
  explicit case_wrapper_base(const CT& v) : check(v), default_step(nullptr) {}
  case_wrapper_base& add_entry(const case_instruction& lambda_holder) {
    steps.push_back(&lambda_holder);
    return *this;
  }
  case_wrapper_base& add_default(const
    case_instruction& lambda_holder) {
      default_step = &lambda_holder;
      return *this;
    }
  case_wrapper_base& join() {
    return *this;
  }
  void run() const ; // body extracted from here,
      // see later in the article for the
      // description of it
private:
  std::vector<const case_instruction*> steps;
  const CT check;
  const case_instruction* default_step;
};
			
Listing 14

The const CT check; is the expression that is being checked for the various case branches. Please note the add_entry and add_default methods, together with the join() method which allow chaining of expressions and method calls on the same object. The std::vector<const case_instruction*> steps; is a cumulative container for all the branch condition expressions and bodies (code which is executed in a branch). This will introduce more complex code at a later stage; however, it was necessary to have these two joined in the same container in order to allow behaviour as similar to the original way the C++ case works as possible.

The inner mechanism of the CASE depends on the following classes:

  1. The obf::case_instruction class, which acts as a basic class for:
  2. obf::branch and
  3. obf::body classes.

The obf::branch class is the class which gets instantiated by the WHEN macro in a call to the add_entry method of the case_wrapper object created by CASE. Its role is to act as the condition chooser, and it looks like Listing 15.

template<class CT>
class branch final : public case_instruction
{
public:
  template<class T>
  branch(T lambda) 
  {
    condition.reset(new any_functor<T>(lambda));
  }
  bool equals(const base_rvholder& rv, CT lv) const
  {
    return rv.equals(lv);
  }
  virtual next_step execute(const base_rvholder& against) const override
  {
    CT retv;
    condition->run(const_cast<void*>
      (reinterpret_cast<const void*>(&retv)));
    return equals(against,retv) ?
      next_step::ns_done : next_step::ns_continue;
  }
private:
  std::unique_ptr<any_functor_base> condition;
};
			
Listing 15

The WHEN macro has a more or less confusing lambda declaration which includes the local __avholder as being passed in by value. This is again due to the fact that various compilers decided to not to compile the same source code in the same way... well, some of them had a coup and bluntly declined to compile what the others already digested, that’s why the ugly solution came into existence.

The code that is executed upon entering a branch (including the default branch) is created by the DO and the DEFAULT macros. They both create an instance of the obf::body class: DO adds it to the steps of the case wrapper class, and DEFAULT calls the add_default member in order to specify a default branch. The oft::body class is much simpler, just a few lines (see Listing 16).

class body final : public case_instruction
{
public:
  template<class T>
  body(T lambda) 
  {
    instructions.reset
      (new next_step_functor<T>(lambda));
  }
  virtual next_step execute
    (const base_rvholder&) const override
  {
    return instructions->run();
  }
private:
  std::unique_ptr<next_step_functor_base>
    instructions;
};
			
Listing 16

The most interesting (and longest) part of the case implementation is the run() method, presented here (in a somewhat stripped manner – I have removed all the security checks in order to have presentable code considering its length) – see Listing 17.

void run() const
{
  auto it = steps.begin();
  while(it != steps.end()) {
    next_step enter 
      = (*it)->execute(rvholder<CT>(check,check));
    if(enter == next_step::ns_continue) {
      ++it;
    }
    else {
      while(! dynamic_cast<const body*>(*it) 
        && it != steps.end() ) 
      {
        ++it;
      }
      // found the first body.
      while(it != steps.end()) {
        if(dynamic_cast<const body*>(*it))
        {
          (*it)->execute(rvholder<CT>
            (check,check));
        }
        ++it;
      }
    }
  }
  if(default_step) {
    default_step->execute(rvholder<CT>
      (check,check));
  }
}
			
Listing 17

As a first step the code looks for the first branch which satisfies the condition (if (*it)->execute(rvholder<CT>(check,check)); returns next_step::ns_done it means it has found a branch satisfying the check). In this case it skips all the other conditions for this branch and starts executing the code for all the ofb::body classes that are in the object. In case a BREAK statement was issued while executing the bodies the code will throw and the catch in ENDCASE (catch(obf::next_step& cv) will swallow it, and will return the execution to the normal flow.

The last resort is that if we have a default_step and we are still in the body of the run (no-one issued a BREAK command) it also executes it.

And with this we have presented the entire framework, together with implementation details, and now we are ready to catch up with our initial goal.

The naive licensing algorithm revisited

Now that we are aware of a library that offers code obfuscation without too many headaches from our side (at least, this was the intention of the author) let’s re-consider the implementation of the naive licensing algorithm using these new terms (see Listing 18).

bool check_license1(const char* user,
                    const char* users_license)
{
  OBF_BEGIN
  std::string license;
  size_t ll = strlen(users_license);
  size_t l = strlen(user), lic_ctr = N(0);
  size_t add = N(0), i =N(0);

  FOR (V(i) = N(0), V(i) < V(ll), V(i)++)
    IF ( V(users_license[i]) != N(45) )
      license += users_license[i];
    ENDIF
  ENDFOR

  WHILE (V(lic_ctr) < license.length() )
    size_t i = lic_ctr;
    V(i) %= l;
    int current = 0;
    WHILE(V(i) < V(l) )
      V(current) += user[V(i)++];
    ENDWHILE
    V(current) += V(add);
    ++V(add);
    IF ( (license [lic_ctr]
        != letters[current % sizeof letters]) )
      RETURN(false);
    ENDIF
    lic_ctr++;
  ENDWHILE

  RETURN (true);
  OBF_END
}
			
Listing 18

Indeed, it looks a little bit more ‘obfuscated’ than the original source, but after compilation it adds a great layer of extra code around the standard logic, and the generated binary is much more cumbersome to understand than the one ‘before’ the obfuscation. And due to the sheer size of the generated assembly code, we simply omit publishing it here.

Disadvantages of the framework

Those who dislike the usage of CAPITAL letters in code may find the framework to be annoying. As presented in [Wakely14] this almost feels like the code is shouting at you. However, for this particular use case, I intentionally made it like this because of the need to have familiar words that a developer can instantly connect with (because the lower case words are already keywords), and also to subscribe to the C++ rule that macros should be upper case.

This brings us back to the swampy area of C++ and macros. There are several voices whispering loudly that macros have nothing to do in C++ code, and there are several voices echoing back that macros, if used wisely, can help C++ code as well as good old style C. I personally have nothing against the wise use of macros, indeed they became very helpful while developing this framework.

Last but not least, the numeric value wrappers do not work with floating point numbers. This is due to the fact that extensive binary operations are used on the number to obfuscate its value and this would be impossible to accomplish with floating point values.

Some requirements

The code is written with ‘older’ compilers in mind, so not all the latest and greatest features of C++14 and 17 are included. CLang version 3.4.1 happily compiles the source code, so does g++ 4.8.2. Visual Studio 2015 is also compiling the code.

Unit testing is done using the Boost Unit test framework. The build system for the unit tests is CMake and there is support for code coverage (the last two were tested only under Linux).

License and getting the framework

The library is a header only library, released in the public domain under the MIT license. You can get it from https://github.com/fritzone/obfy

Conclusion

History has shown us that if a piece of software is crackable, it will be cracked. And it just depends on the dedication, time spent, and effort invested by the software cracker when that piece of a software is to be proven crackable. There is no Swiss army knife when it comes to protecting your software against malicious interference because from the moment it left your build server and was downloaded, the software was out of your hands, and entered an uncontrollable environment. The only sensible thing you can do to protect your intellectual property is to make it as hard to crack as possible. This little framework provides a few ways of achieving this goal, and by making it open source, freely available and modifiable, to the developer community, we can only hope this will give it an advantage by allowing everyone to tailor it in order to suit their needs best.

Appendix: the license generating algorithm

As promised, Listing 19 is the naive license generating algorithm. Any further improvements to it are more than welcome.

static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::string generate_license(const char* user)
{
  if(!user) return "";
  // the license will contain only these character
  // 16 chars + 0
  char result[17] = { 0 };
  size_t l = strlen(user), lic_ctr = 0;
  int add = 0;
  while (lic_ctr < 16)
  {
    size_t i = lic_ctr;
    i %= l;
    int current = 0;
    while (i < l)
    {
      current += user[i];
      i++;
    }
    current += add;
    add++;
    result[lic_ctr] = 
      letters[current % sizeof letters];
    lic_ctr++;
  }
  return std::string(result);
}
			
Listing 19

References

[Andrivet] Random Generator by Sebastien Andrivethttps://github.com/andrivet/ADVobfuscator

[Stackoverflow] http://stackoverflow.com/questions/12387239/reference-member-variables-as-class-members

[Wakely14] ‘Stop the Constant Shouting’ Overload 121 June 2014, Jonathan Wakely

Notes: 

More fields may be available via dynamicdata ..