    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Open Source â€“ And Still Deceiving Programmers</title>
        <link>https://members.accu.org/index.php/journals/2423</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #141 - October 2017 + Internet Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c378/">o141</a>
                    (9)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c69/">Internet</a>
                    (35)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c378-69/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c378+69/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Open Source â€“ And Still Deceiving Programmers</h1>
<p><strong>Author:</strong>&nbsp;Bob Schmidt</p>
<p>
<strong>Date:</strong> 03 October 2017 19:00:09 +01:00 or Tue, 03 October 2017 19:00:09 +01:00</p>
<p><strong>Summary:</strong>&nbsp;Malware can hijack the normal flow of your program. DeÃ¡k Ferenc walks through the ELF format to avoid malicious code injection.</p>
<p><strong>Body:</strong>&nbsp;<p class="Byline">Malware can hijack the normal flow of your program. DeÃ¡k Ferenc walks through the ELF format to avoid malicious code injection.</p>

<p>Computer viruses, trojan horses, rootkits and other pieces of malicious software have been around for a very long time. Since the first application that could be classified as a â€˜classicâ€™ computer virus (based on the theory presented in <a href="#[Neumann66]">[Neumann66]</a>) appeared in 1971, countless variations of the same construct have appeared, with more or less destructive intentions, varying from harmless jokes to highly specialized pieces of malicious software targeting industrial processes and machines, while the number of them as per <a href="#[BBC]">[BBC]</a> has passed 1000000 (by 2008).</p>

<p>The threat presented by these noxious pieces of code is so significant that a new word has emerged in order to classify them: â€˜malwareâ€™. This is short for malicious software. Malware is a collective category encompassing several types of harmful applications from the classic virus (ie. an application which replicates itself by modifying already existing files or data structures on a computer) through worms (applications which move through the network infecting computers) to trojan horses (applications posing as something other than they are, very often disguised as legitimate applications). Malware also covers spyware and keyloggers (these spy on your activities, frequently registering your keystrokes which then are sent to malicious parties), rootkits (very low- level applications, more often found at the hardware/OS level, hidden from user-level access) and various ransomware applications, which hold your computer hostage by encrypting your data until you pay a â€˜ransomâ€™.</p>

<p>Recently, the trend in the propagation of these damaging applications has changed. Due to the increase in the level of security features in Operating Systems which were more traditionally affected by viruses, the number of classical â€˜virusesâ€™ which have multiplied themselves via modifying existing system files and spread via execution is in recess; however, there is a sharp increase in the sighting of other types of malware which are propagating via email attachments, malicious downloads or by simply utilizing vulnerabilities in operating systems <a href="#[Wikipedia_2]">[Wikipedia_2]</a>. </p>

<p>Most of these vicious applications have, however, one thing in common: we rarely see the original source code which led to their creation (except for some high-profile leaks such as <a href="#[TheIntercept]">[TheIntercept]</a>). Indeed, it would be kind-of silly to ask fellow programmers to â€œplease compile and run this code, it is a virus, it will infect your computer and it will multiply itself in countless pieces before rendering your machine unusableâ€ â€¦ this would be similar to the old joke of receiving an email with the content â€œHi, this is a manual virus. Since I am not so technically proficient as to be able to write a real virus, please forward this email to everyone in your contact list and delete all your files. With kind regards, Amateur Virus Writerâ€.</p>

<p>This certainly does not mean that everything you download from the internet and compile yourself is guaranteed to be clean and harmless. The open source community focused around various free products goes to great lengths in order to provide a high quality application without backdoors (ie: unofficial ways which makes access to certain systems possible) and, considering the backdoor attempt of 2003 targeting Linux <a href="#[LinuxBackdoor]">[LinuxBackdoor]</a>, their effort invested in this direction is more than welcome.</p>

<p>Considering this introduction, you might envision that this article will be a tutorial on how to write viruses, Trojan horses and other maleficent pieces of code in order to achieve world domination, or just simply for fun. You couldnâ€™t be further from the truth. On the contrary. This article will present practical ways for defending your open source application against hideous interventions by programmers with hidden intentions, who would like to hijack the normal flow of your code by various means we will discuss later.</p>

<p>We will see different ways of executing code, we will dig deep in the binary section of executable files, and we will have the chance to examine the source code of a true shape-shifting application. All of these can be present in real life situations when you are merging code from your contributors into the final product, and if you donâ€™t pay attention some not so well intended modifications will end up in the final product.</p>

<h2>Running an application</h2>

<p>The simple and mundane task of starting an executable application compiled for your platform in fact involves a long list of processes from the operating systemâ€™s side. Since the details of this topic in itself are worth a small book, I will just provide a very high-level overview of what happens when you start an application.</p>

<h3>Starting a process in Linux </h3>

<p>Applications in Linux use the so called â€˜ELFâ€™ format (Executable and Linkable Format <a href="#[Wikipedia_1]">[Wikipedia_1]</a>, <a href="#[Oâ€™Neil16]">[Oâ€™Neil16]</a>). This is a format adopted by various Unix-like operating systems, but more recently the Linux subsystem of Windows 10 also shows support for this type of executable.</p>

<p>A short overview of the steps taken when a new application is launched (either from a shell or from somewhere else) is as follows:</p>

<ol>
	<li>The <code>fork</code> system call is used to create a new process. The fork will create a â€˜copyâ€™ of the current process and set up a set of flags reflecting the state of the new process <a href="#[fork]">[fork]</a>.</li>
	
	<li>The <code>execve</code> system call is executed with the application to be executed <a href="#[execve]">[execve]</a>.</li>
	
	<li>Down in the Linux kernel, a <code>linux_binprm</code> <a href="#[linux_binprm]">[linux_binprm]</a> structure is being built in order to accommodate the new process, which is passed to: <code>static int load_elf_binary(struct linux_binprm  *bprm)</code> in <code>fs/binfmt_elf.c</code> <a href="#[load_elf_binary]">[load_elf_binary]</a></li>
	
	<li><code>load_elf_binary</code> does the actual loading of the ELF executable according to the specifications and at the end it calls <code>start_thread</code>, which is the platform dependent way of starting the loaded executable.</li>
</ol>

<p>For those expressing a deeper interest in this field, the excellent article <a href="#[HowKernelRuns]">[HowKernelRuns]</a> or Robert Loveâ€™s outstanding book <em>Linux Kernel Development</em> <a href="#[Love10]">[Love10]</a> will provide all the details required. </p>

<h2>The ELF format</h2>

<p>The ELF binary in itself is a complex subject â€“ a full description can be found in <em>Learning Linux Binary Analysis</em> <a href="#[Oâ€™Neil16]">[Oâ€™Neil16]</a>, and a shorter one on Wikipedia <a href="#[Wikipedia_1]">[Wikipedia_1]</a> â€“ so letâ€™s summarize it briefly:</p>

<h3>The ELF headers</h3>

<p>The file header of the ELF file starts with a few magic numbers for correctly identifying this as being a valid ELF file: 0x7F followed by the characters <code>E</code>, <code>L</code> and <code>F</code> (0x45, 0x4c, 0x46). The architecture of the file is specified (whether 32 or 64 bit) and whether the encoding of the file is big endian or little endian.</p>

<p>In the header, a special field denotes the target operating systemâ€™s Application Binary Interface and the instruction set of the binary. ELF files can be of different types, such as relocatable, executable, shared or core. This information is also stored in the elf header.</p>

<p>There are several fields in the header dealing with the length and format of the ELF sections, described below.</p>

<p>The file header of the ELF file is followed by a program header, which describes to the system how to create a process image, and several section headers.</p>

<h3>ELF sections</h3>

<p>There are several sections in an ELF file, each containing various data, vital to correctly understand and run the application. Among these sections is:</p>

<ul>
	<li>the <code>.text</code> section, which contains the actual code of the application, </li>
	<li>the <code>.rodata</code> section, containing the constant strings from the application</li>
	<li>the <code>.data</code> section, which contains for example the initialized global variables </li>
	<li>the <code>.bss</code> section, which contains uninitialized global data </li>
	<li>various other sections describing how this application handles shared libraries and alsoâ€¦</li>
	<li>sections describing application startup and destruction steps in the <code>.ctors</code> and <code>.dtors</code> sections (correspondingly <code>.init_array</code> and <code>.fini_array</code>). These sections contain function pointers to the  methods, which will be called on application startup and shutdown, and we will have a more detailed look at them in later paragraphs of this article.</li>
</ul>

<p>There is a handy Linux utility called <code>readelf</code>,which displays information about a specific ELF fileâ€™s structure. We will refer to it in this article and will present the output of it frequently.</p>

<h2>Deceiving techniques</h2>

<p>So, after this short but necessary, background introduction, we have finally arrived at the focus point of the article. We will present here various techniques you have to carefully observe in order to keep your source healthy and free of unwanted side effects.</p>

<h3>Mainless application</h3>

<p>The <code>main</code> function in a â€˜normalâ€™ application is the place where the application starts <a href="#[main_function]">[main_function]</a>. However, be aware that C and C++ compilers handle <code>main</code> very differently. For example, Listing 1 will compile flawlessly using <code>gcc</code> with default compilation flags even though  <code>void main(int a)</code> is strictly not standard compliant, but <code>g++</code>, being more picky, will refuse to compile it. For <code>gcc</code>, you need to use <code>-pedantic</code> to warn you about the return type of <code>main</code> not being <code>int</code> as required by the standard.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
#include &lt;stdlib.h&gt;
#include &lt;stdio.h&gt;
void main(int a)
{
  void (*fp[])(int) = {main,exit};
  printf(&quot;%d\n&quot;,a++);
  fp[a/101](a);
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<p>But what if an attacker does not wish to provide a main function (since he wants to pose the source code as being a part of a library)?</p>

<p>With <code>gcc</code>, there is always the possibility of using the <code>-e &lt;entrypoint&gt;</code> switch to specify a different entry point for your application. This is very observable in the build files, and there will be more to be dealt with, such as:</p>

<ol>
	<li>The need to specify <code>-nostartfiles</code> to the <code>gcc</code> command line in order to avoid the linking error: <code>(.text+0x20): undefined reference to 'main'</code>. As a short explanation to this, in the background <code>gcc</code> always links to some architecture specific files (such as <span class="filename">crtstuff.c</span>), which provide the application with the required startup functionality, which will end up calling the main <a href="#[linuxProgramStartup]">[linuxProgramStartup]</a></li>
	
	<li>Explicitly use the function <code>exit(&lt;CODE&gt;);</code> to properly exit the application in order to avoid the segmentation fault at exit.</li>
	
	<li>There is no access to the common argc and argv values passed in to your application.</li>
</ol>

<p>With these in mind, the source file in Listing 2, compiled with the command below should work as expected, by totally avoiding the <code>main</code> function thus deceiving you into believing the validity of the application.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

int my_main()
{
  printf(&quot;Mainless\n&quot;);
  exit(2);
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>Compiled with:</p>

<pre class="programlisting">
  gcc mainless.c -o mainless -e my_main<br />
    -nostartfiles</pre>

<p>It is interesting to note that comparing the binary file produced from compiling a â€˜mainlessâ€™ program with a more standard binary â€“ with â€˜mainâ€™ and the linked-in <code>gcc</code> startup files â€“ gives us the (not so) surprising result that the â€˜mainlessâ€™ file is smaller, with a difference of up to 2000 bytes. Also, the elf structure, analyzed with <code>readelf</code>, gives a much simpler layout. A few differences in the header of the ELF file can also be observed:</p>

<table class="journaltable">
	<tr>
		<th>Header</th>
		<th>mainless</th>
		<th>with main</th>
	</tr>
	<tr>
		<td>Entry point address</td>
		<td>0x400390</td>
		<td>0x400430</td>
	</tr>
	<tr>
		<td>Start of section headers</td>
		<td>5184</td>
		<td>6624</td>
	</tr>
	<tr>
		<td>Number of section headers</td>
		<td>20</td>
		<td>31</td>
	</tr>
</table>

<p>This all proves that the â€˜mainlessâ€™ file is, indeed, much smaller that the corresponding one with main.</p>

<h3>Running code before main</h3>

<p>When your targeted compiler is a C compiler, it can be really difficult to run code before <code>main</code> (or as we have seen, its replacement) starts. I have to emphasize that C++ has a dynamic initialization phase where arbitrary code is executed in order to initialize non-local variables. However, that comes with the well-known static initialization order fiasco: In C++, it is unpredictable which non-local is initialized before which other, so if one depends on the other one, the application in question may work flawlessly in some situations, while other reincarnations may suffer from this dependency with an uninitialized variable.</p>

<p>Fortunately, for C compilers, the ELF format provides extra support for running code before application start in the so called â€˜constructorâ€™ section, and it is also possible to hijack the <code>.init</code> section in order to execute code we want using special assembly syntax.</p>

<h3>The .init_array section of the ELF binary</h3>

<p>The <code>.init_array</code> (and the <code>.preinit_array</code>) section of the ELF binary contains a list of pointers (addresses of functions) called by the code initializing the application. This code (the one calling the functions in the <code>.init_array</code> section) usually resides in the <code>.init</code> section. The difference between <code>.preinit_array</code> and <code>.init_array</code> is that code in the <code>.preinit_array</code> is called before the <code>.init_array</code>.</p>

<p>The <code>gcc</code> compiler has a non-standard C extension to provide support for defining various dedicated elf sections with user specified code via the usage of the <code>__attribute__</code> syntax. An example of is in Listing 3.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
#include &lt;stdio.h&gt;

void my_main(int argc, char* argv[], 
  char* envp[])
{
  printf(&quot;my main: %d parameters\n&quot;, argc);
}
int main(int argc, char* argv[])
{
  printf(&quot;main: %d parameters\n&quot;, argc);
}
__attribute__((section(&quot;.init_array&quot;))) void 
  (* p_my_main)(int,char*[],char*[]) = &amp;my_main;
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 3</td>
	</tr>
</table>

<p>Analyzing a debug session of this application will reveal interesting insights on the working of libc and the application startup procedure (see Figure 1).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
(linux)$ gdb ./init_array
(gdb) break my_main
Breakpoint 2 at 0x40052a
(gdb) run
Starting program: .../init_array

Breakpoint 2, 0x000000000040052a in my_main(int, char**, char**) ()
(gdb) bt
#0  0x000000000040052a in my_main(int, char**, char**) ()
#1  0x00000000004005cd in __libc_csu_init ()
#2  0x00007ffff7a2d7bf in __libc_start_main (main=0x400550 &lt;main&gt;, ... ) at ../csu/libc-start.c:247
#3  0x0000000000400459 in _start ()
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Figure 1</td>
	</tr>
</table>

<p>For further information, evaluating this information, while combining it with the relevant section of assembly code (as the result of <code>objdump -h -S init_array</code>) will give details of how the code is actually executed, and it is easily traceable based on how the <code>.init_array</code> section is handled on application startup.</p>

<p>If we donâ€™t require access to the command line parameters, <code>gcc</code> also has support as an extension for specifying a function to be called before main using a much less cryptic syntax: <code>__attribute__ ((constructor))</code>, which has similar consequence to the section initializer syntax:</p>

<pre class="programlisting">
  void __attribute__ ((constructor)) premain()
  {
    printf(&quot;premain called\n&quot;);
  }</pre>
  
<p>An even more obscure way of defining a method to be called before main is to rely on the â€˜.initâ€™ section of the ELF format and do some assembly level magic (see Listing 4).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

int my_main()
{
  __asm__ (&quot;.section .init \n call my_main \n .section .text\n&quot;);
  printf(&quot;my_main\n&quot;);
}

int main()
{
  printf(&quot;main\n&quot;);
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 4</td>
	</tr>
</table>

<p>This will directly instruct the compiler to assemble the code <code>call my_main</code> into the section â€˜.initâ€™ by using the assembly directive <code>.section .init</code>.</p>

<p>And, last but not least, some proprietary compilers targeting commercial environment (but not <code>gcc</code>) have support for a special compiler specific <code>#pragma</code> directive: <code>#pragma init</code>, which has the same effect as having <code>__attribute__((constructor))</code> for the <code>gcc</code>.</p>

<h3>Process tracing</h3>

<p>Just a side note: the initialization phase of the application is a preferred place among wanna-be virus writers to place ptrace related code. ptrace is a mechanism offered by Linux making it possible for a parent process to observe and influence the execution of other processes. It is mainly used in debugging and examining the state of other processes; however, certain anti debugging features â€“ if compiled in into an executable â€“ will make the analysis of compiled application difficult.</p>

<p>You should be looking for the <code>PTRACE_TRACEME</code>, which detects if the current application is traced by a debugger, and will act accordingly.</p>

<h3>Running code after main</h3>

<p>Very similar to the above scenario where we want to run code before the main, the ELF binary comes again to our help by making it possible to run C code after <code>main</code> has finished its lifetime. The â€˜destructorsâ€™ section of the ELF is named the <code>.fini_array</code> and we can get access to it via the following code construct:</p>

<pre class="programlisting">
  void end_app(void)
  {
    printf(&quot;after main\n&quot;);
  }
  __attribute__((section(&quot;.fini_array&quot;))) void   (* p_end_app)(void) = &amp;end_app;</pre>

<p>or there is also support for the <code>__attribute__ ((destructor))</code> syntax, if we find the above one very cryptic. This section comes very handy in case we are in the situation of running some â€˜last minuteâ€™ cleanup jobs.</p>

<p>Similarly to the <code>.init</code> section, we can instruct the assembler to generate code into a <code>.fini</code> section, and also some proprietary compilers support the <code>#pragma fini</code> directive, to mark some identifiers as a finalization function.</p>

<h3>A shape shifting application</h3>

<p>It is important that you carefully observe not only your C and C++ files, but also the accompanying build instructions. Today, there are several tools which facilitate the management of build scripts (such as CMake, SCons, etc...) and these tools often use very complex files, which makes it easy to hide a few unwanted pieces of code, so be sure to check those too â€“ most of the time that is the location a deception begins.</p>

<p>So, letâ€™s consider the following situation, where you are working on a free and open source budget management application, and one of your contributors submits the code in Listing 5.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
// TODO: This is still work in progress, more
// months in the test plan are required and some
// code cleanup is necessary to remove the clutter.
// Will FIX ASAP!!!
#define INTEREST void*

#ifndef DEBUG_INTEREST
  #define L (int*)
  #define TEST void
  #define OPEN_INTEREST(a) \
    printf(&quot;Opening: %s\n&quot;, #a);
  #define READ_INTEREST(a) \
    printf(&quot;Reading: %s\n&quot;, #a);
  #define CLOSE_INTEREST(a) \
    printf(&quot;Closing: %s\n&quot;, #a);
  #define INTEREST_VALUE(a,b) b
#endif

TEST interest_calculator_test(char const *period)
{
  static INTEREST array[] = {
    L(26), L(2), 0,  // January
    L(5), L(26), 0,  // February
    L(8), L(2), 0,   // March
    L(11), L(2), 0,  // April
    L(14), L(14),0,  // May
    L(14), L(17), 0, // June
    L(20), L(20),0,  // July
    L(20), L(23), 0, // August
    L(2), L(2)};     // September

  INTEREST earning = array, *entry = array;
  char interest[1024] = {0}, *q, type = 3;
     // type 3 = Recurring
  char* c = interest; const char* name 
    = &quot;Monthly&quot;;

  OPEN_INTEREST(earning);
  if(earning == 0) READ_INTEREST(entry);
  if(entry != 0)  // Do we have an interest at
                  // the current point?
  {
    if(INTEREST_VALUE(entry,type) == 4)
      if((*INTEREST_VALUE(entry,name) != 46
                  // 11822 minutes ~ 8 days
        &amp;&amp; *(int16_t*)
          (INTEREST_VALUE(entry,name)) != 11822)) 
        c = interest; q = (char*)period; 
  }
  if(*q) *c++ = *q++;   // move to next period
  if(*q) {
    *c++ = 47;          // 47 - a check value,
                        // comes from test data
    q = (char*)INTEREST_VALUE(entry,name);  
                        // get its value, save it
    while(*q) *c++ = *q++; // skip period
  }
  if(*q) {
    *(int16_t*)c = 0x000a;
    printf(&quot;Current value: %s&quot;, interest);
    *c = 0;
    interest_calculator_test(interest); 
                    // advance month to next one
  }
  if(*q) CLOSE_INTEREST(entry);   // Done
}
int main()
{
  const char interest_period[] = {47, 0};
  interest_calculator_test(interest_period);
  return 0;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<p>Surely, this is a short unit test for some functions (supposedly <code>OPEN_INTEREST</code>, <code>READ_INTEREST</code>, <code>INTEREST_VALUE</code> being the function we â€˜wantâ€™ to test), albeit a pretty poorly written one. However, it seems to be harmless for the moment, and the comment on top clearly says it needs improvement, you decide to keep it in your source code base, hoping that the developer who submitted the patch just had a bad day, and he will come back with clarification lately. The code compiles, so it represents no harm and it does not really disturb anything in the normal flow of the application.</p>

<p>Soon a new patch comes in from the developer (maybe a different one, just to cause some more confusion), intended to have been a fix for something in the build system of the application, concerning the unit test, it looks just like the lines of code in Listing 6.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
all:
  ${CC} -UL -UOPEN_INTEREST -UREAD_INTEREST -UCLOSE_INTEREST \
    -UINTEREST_VALUE -DL\(n\)=\&amp;\&amp;n_label\#\#n -DD=__COUNTER__ \
    -DT\(x\,y\)=x\#\#y -include /usr/include/dirent.h -DT2\(x\,y\)=T\(x\,y\) \
    -DDO=T2\(n_label\,D\)\: \
    -DOPEN_INTEREST\(entry\)=entry\=T2\(open\,dir\)\(period\)\; \
    -Dif\(x\)=T2\(go,to\)\ \*array\[x\?D\:D\]\;\ DO -include /usr/include/sys/types.h -Dwhile=if \
    -DREAD_INTEREST\(entry\)=entry\=T2\(read\,dir\)\(\(DIR\*\)earning\)\; \
    -DCLOSE_INTEREST\(entry\)=T2\(close\,dir\)\(\(DIR\*\)earning\)\; -include /usr/include/unistd.h \
    -DINTEREST_VALUE\(INTEREST_VALUE\,t\)=\(\(struct\ T2\(dir\,ent\)\*\)INTEREST_VALUE\)\-\&gt;d_\#\#t \
    -DTEST=void -DDEBUG_INTEREST ${SOURCE}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 6</td>
	</tr>
</table>

<p>Yes, seemingly it patches something in compiling of the unit tests, but itâ€™s highly complex, difficult to read (intentionally), and regardless this is not a significant part of the application since the unit tests are just run on your computer.</p>

<p>The first sign of suspicion should have come from the forced include of <span class="filename">/usr/include/dirent.h</span> directly from the compilerâ€™s command line... So, this basically makes it possible for the malicious code writer to include a file into the compilation process from the command line, without appearing in the source file, thus avoiding suspicion. If we look further the malicious make entry contains some entries, which disturbingly resemble some commands used to handle directory structure in linux: <code>opendir</code>, <code>readdir</code> and <code>closedir</code>â€¦ (And I intentionally left it in this half-baked stage to raise awareness of this kind of issue, and the word â€˜labelâ€™ was left intentionally in the defines too...)</p>

<p>Other signs of malevolence are the forced redefinitions of the <code>if</code> and <code>while</code> keywords. Unfortunately, there is nothing in the compiler to stop you from doing this, so all this will compile and is considered valid. Although it will only affect this file, there are numerous <code>if</code>s in it so letâ€™s dig a bit more. Soon you realize there is something fishy going on, so you decide to look at the preprocessed source code of this innocent looking unit test. It is in Listing 7.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
void interest_calculator_test(char const *period)
{
  static void* array[] = { 
    &amp;&amp;n_label26, &amp;&amp;n_label2, 0,
    &amp;&amp;n_label5,  &amp;&amp;n_label26, 0,
    &amp;&amp;n_label8,  &amp;&amp;n_label2, 0,
    &amp;&amp;n_label11, &amp;&amp;n_label2, 0,
    &amp;&amp;n_label14, &amp;&amp;n_label14,0,
    &amp;&amp;n_label14, &amp;&amp;n_label17, 0,
    &amp;&amp;n_label20, &amp;&amp;n_label20,0,
    &amp;&amp;n_label20, &amp;&amp;n_label23, 0,
    &amp;&amp;n_label2,  &amp;&amp;n_label2};

  void* earning = array, *entry = array;
  char interest[1024] = {0}, *q, type = 3;
  char* c = interest; 
  const char* name = &quot;Monthly&quot;;

  earning=opendir(period);

  goto *array[earning == 0?0:1];
n_label2: 

  entry=readdir((DIR*)earning);;

  goto *array[entry != 0?3:4];
n_label5:

  goto *array[((struct dirent*)entry)-&gt;d_type
    == 4?6:7];
n_label8:

  goto *array[(*((struct dirent*)entry)-&gt;d_name
  != 46 &amp;&amp; *(int16_t*)( ((struct dirent*)entry)-&gt;
  d_name) != 11822)?9:10]; 
n_label11:

  c = interest; q = (char*)period;

  goto *array[*q?12:13]; 
n_label14: 

  *c++ = *q++;

  goto *array[*q?15:16]; 
n_label17: 

  *c++ = 47;
  q = (char*)((struct dirent*)entry)-&gt;d_name;

  goto *array[*q?18:19]; 
n_label20: *c++ = *q++;

  goto *array[*q?21:22]; 
n_label23: 

  *(int16_t*)c = 0x000a;
  printf(&quot;Current value: %s&quot;, interest);
  *c = 0;
  interest_calculator_test(interest);
  
  goto *array[*q?24:25]; 
n_label26: 

    closedir((DIR*)earning);;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 7</td>
	</tr>
</table>

<p>To your horror, the source code of the application has changed into something incomprehensible, full of <code>goto</code>s and linux system calls accessing directories, and it seems to be doing something totally different now: it browses the directory structure of your computer (I took the liberty of beautifying some of the preprocessed output, and stripped out <code>main</code>, which is the same) and it prints the directories found to the console.</p>

<p>The evil programmer has used some of the not so well known <code>gcc</code> extensions, such as storing the addresses of labels, and with a carefully constructed array of labels and indexes, he has been able to abuse the usage of the <code>__COUNTER__</code> macro (the one which gives an increasing sequence of numbers) in order to calculate various jump locations together with generating labels in a coherent way to achieve his real intentions: traversing your filesystem and performing operations on it (for this specific scenario, just printing names).</p>

<p>A few very strange lines of code appear, such as <code>*(int16_t*)( ((struct dirent*)entry)-&gt;d_name) != 11822</code>. However, after some thought, this is nothing but a comparison of the <code>d_name</code> field of the <code>struct dirent</code> structure <code>entry</code> to &quot;..&quot;. Because 11822 = 0x2E2E = &quot;..&quot;. </p>

<p>In order to facilitate a continuous sequence provided by the <code>__COUNTER__</code> in the array indexes, and also the label counters, the <code>array</code> contains a set of unused elements, such as zeroes; however, those values can be anything.</p>

<p>At this point, I have stopped, since it is not the intention of this article to publish destructive code but to raise awareness of its existence and provide meaningful ways for detecting and combating them.</p>

<p>Just a side note, for those wanting to do experiments on the code please find below the evil defines to make your experiments easier:</p>

<pre class="programlisting">
  #define INTEREST void*
  #define L(n) &amp;&amp;n_label##n
  #define D __COUNTER__ 
  #define T(x,y) x##y
  #define T2(x,y) T(x,y) 
  #define DO T2(n_label,D): 
  #define OPEN_INTEREST(entry) \
    entry=T2(open,dir)(period); 
  #define if(x) T2(go,to) *array[x?D:D]; DO
  #define while if 
  #define READ_INTEREST(entry) \
    entry=T2(read,dir)((DIR*)earning); 
  #define CLOSE_INTEREST(entry) \
    T2(close,dir)((DIR*)earning);
  #define INTEREST_VALUE(INTEREST_VALUE,t) \
    ((struct T2(dir,ent)*)INTEREST_VALUE)-&gt;d_##t</pre>

<h2>Conclusion</h2>

<p>As we have seen, the threats are real, and this article can by no means offer a full overview of all the software menaces that are present in our everyday life. Since we focused on an open source approach to deceiving techniques, we have tried to make the article as informative as possible without actually turning it into a â€˜how to write your own virusâ€™ essay. Please note that besides of presenting a few non-destructive scenarios, there could be several more that have not yet been identified... or that have been omitted intentionally.</p>

<h2>References</h2>

<p class="bibliomixed"><a id="[BBC]"></a>[BBC]  <a href="http://news.bbc.co.uk/2/hi/technology/7340315.stm">http://news.bbc.co.uk/2/hi/technology/7340315.stm</a></p>

<p class="bibliomixed"><a id="[execve]"></a>[execve]  <a href="http://man7.org/linux/man-pages/man2/execve.2.html">http://man7.org/linux/man-pages/man2/execve.2.html</a></p>

<p class="bibliomixed"><a id="[fork]"></a>[fork]  <a href="http://man7.org/linux/man-pages/man3/fork.3p.html">http://man7.org/linux/man-pages/man3/fork.3p.html</a></p>

<p class="bibliomixed"><a id="[HowKernelRuns]"></a>[HowKernelRuns]Â <a href="https://0xax.gitbooks.io/linux-insides/content/SysCall/syscall-4.html">https://0xax.gitbooks.io/linux-insides/content/SysCall/syscall-4.html</a></p>

<p class="bibliomixed"><a id="[linux_binprm]"></a>[linux_binprm] <a href="http://elixir.free-electrons.com/linux/v4.6.7/source/include/linux/binfmts.h#L14">http://elixir.free-electrons.com/linux/v4.6.7/source/include/linux/binfmts.h#L14</a></p>

<p class="bibliomixed"><a id="[LinuxBackdoor]"></a>[LinuxBackdoor] <a href="https://freedom-to-tinker.com/2013/10/09/the-linux-backdoor-attempt-of-2003/">https://freedom-to-tinker.com/2013/10/09/the-linux-backdoor-attempt-of-2003/</a></p>

<p class="bibliomixed"><a id="[linuxProgramStartup]"></a>[linuxProgramStartup] <a href="http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html">http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html</a></p>

<p class="bibliomixed"><a id="[load_elf_binary]"></a>[load_elf_binary] <a href="http://elixir.free-electrons.com/linux/latest/source/fs/binfmt_elf.c#L682">http://elixir.free-electrons.com/linux/latest/source/fs/binfmt_elf.c#L682</a></p>

<p class="bibliomixed"><a id="[Love10]"></a>[Love10] Robert Love, <em>Linux Kernel Development</em>, Addison-Wesley Professional, 2010</p>

<p class="bibliomixed"><a id="[main_function]"></a>[main_function] <a href="http://en.cppreference.com/w/cpp/language/main_function">http://en.cppreference.com/w/cpp/language/main_function</a></p>

<p class="bibliomixed"><a id="[Neumann66]"></a>[Neumann66] von Neumann, John and Arthur W. Burks. 1966. <em>Theory of Self-Reproducing Automata</em>, Univ. of Illinois Press, Urbana IL.</p>

<p class="bibliomixed"><a id="[Oâ€™Neil16]"></a>[Oâ€™Neil16] Ryan â€˜elfmasterâ€™ Oâ€™Neill, <em>Learning Linux Binary Analysis</em>, Packt, 2016</p>

<p class="bibliomixed"><a id="[TheIntercept]"></a>[TheIntercept]  <a href="https://theintercept.com/2017/04/14/leaked-nsa-malware-threatens-windows-users-around-the-world/">https://theintercept.com/2017/04/14/leaked-nsa-malware-threatens-windows-users-around-the-world/</a></p>

<p class="bibliomixed"><a id="[Wikipedia_1]"></a>[Wikipedia_1]  <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">https://en.wikipedia.org/wiki/Executable_and_Linkable_Format</a></p>

<p class="bibliomixed"><a id="[Wikipedia_2]"></a>[Wikipedia_2]  <a href="https://en.wikipedia.org/wiki/Timeline_of_computer_viruses_and_worms">https://en.wikipedia.org/wiki/Timeline_of_computer_viruses_and_worms</a></p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
