Regarding Deák Ferenc’s interesting article in Overload #135 (October 2016), with code at https://github.com/fritzone/obfy, on the generation of more-obfuscated binaries using C++ constructs, I think a better application is that of decoding obfuscated data rather than a simple license check.
Attackers tend to go for the weakest link. If you make the check_license
function very obfuscated, then sooner or later the weakest link will be, not the license check itself, but the call to it. The attacker finds the code that calls it and jumps over that call, or replaces the first few bytes of the compiled check_license
with code to return true.
Stripping out the name check_license
might help, but from your code it looks like reverse-engineering the location of check_license
(or a call to it) will become the simplest attack, unless we’re in a context where the attacker has read-only access to the code and the only plausible attack is a counterfeit license file (but that’s not the normal context if you’re shipping out proprietary software to an end user).
Therefore, I suggest this code could more appropriately be used in programs that need to ship with an obfuscated copy of proprietary data. For example, consider a dictionary program whose publishers wish to allow the end-user to look up individual words for display on screen, but not to copy out the entire dictionary and print their own (at least, not without going to the trouble of manually writing it down from the screen). Such a publisher might wish to store their dictionary entries encrypted, with the decryption algorithm obfuscated using your code. Unless the attacker can figure out how the decryption algorithm works, they’d be restricted to using the provided program to display the entries, which is what the publisher wants.
For the dictionary example, there might still be other weak points: it might be possible to feed keystrokes to the program, causing it to display each entry in turn, and automatically copy the text off the screen. The program might try to protect against this by limiting the speed at which entries can be displayed and/or the total number of entries per session. These limits could then be targeted in an attack. And so on. But this is all more difficult than bypassing a yes/no license check.
Of course, if enough publishers started using Ferenc’s code, then sooner or later somebody might try to write a decompiler for it. Such a decompiler would be dependent on the C++ compiler that had been used, and would be very difficult indeed to write, but once written could be used to attack many products. The publishers might however be able to stay ahead for a while by randomising the exact set of optimisations they allow their compiler to use. (The use of volatile prohibits pre-computation optimisations, but other types of optimisation could be switched on and off for more variations.)
Thanks.
Silas S. Brown
Ferenc replies:
Hi,
I would like to thank Silas for his constructive comments. The suggestions he made are valid: indeed, the weakest link in license checking during the application’s lifetime is the actual call to the license checking routine. The attacker can always ‘NOP’ out the calls to any specific routine in the application, once that routine is identified as the one checking the license, or just patch the method to return a valid license.
To mitigate this patching, I could recommend that several routines perform the task of checking the license, each of them written with a different sub-set of the obfuscation framework to generate different code. Also, different parts of the application could perform license checking in seemingly unrelated scenarios (for example, opening a dialog box could trigger the license check, as well as saving the current progress) and react accordingly. This would certainly lead to an increase of the delivered binary, however.
The call to license checking, however, is a different problem entirely. Gone are the DOS days when we could (easily) dynamically patch our executable in memory by decrypting a sequence of binary commands and jumping to them to perform operations which the disassembler cannot see, or simply by constructing a new sequence of commands by jumping into the middle of a carefully crafted binary opcode sequence, thus making debugging and disassembly a real nightmare (both of these techniques were widely used in viruses 20 years ago).
Without a further research, right now I’m only aware of a few ways (some standard and some non-standard) to call methods in an indirect way, but in the end they all end up in the generated binary code. For example, one could store the addresses of several ‘facade’ methods (which call the license checking method) in a structure, and randomly select one from them in order to confuse the wanna-be attacker (of course, more than one license checking methods can be there too). Another (pretty hackish, non-portable, non-standard and definitely not recommended) method is to carefully engineer a local stack overflow which will end up in the license checking method, without entirely destroying the stack of the application (again: not recommended).
In order to circumvent the Data Execution Prevention policy of some operating systems, it is possible to execute constructed binary code by creating specific memory areas which will be marked as executable, where we can generate code outside of the binary and make the call to the license checking algorithm invisible to the stored binary, thus nothing to patch. But this topic is so wide that it deserves a dedicated article, and it was outside the scope of the ‘Obfuscation Frameworks’ article.
Regarding the usage of code the way you suggested (ie: as a data storage component): indeed, it is a very appropriate use case for this framework, so please feel free to experiment with it and let me know the results.
Thank you again for your constructive ideas,
Kind regards,
Ferenc
Silas also noticed an error in the Editorial in Overload 135:
I just noticed a typo in Overload 135’s editorial.
It says A ⇒ B is identical to !A ⇒ !B.
This is incorrect. If A implies B, that does not mean lack of A implies lack of B. (Fire implies smoke, but if there’s no fire there might still be smoke from dry ice or something.)
Instead, A ⇒ B is identical to !A | B.
(Either there’s no fire, or there’s fire and smoke with it, but we’re not saying anything about the value of smoke when there’s no fire.)
Silas S Brown
And Fran agrees that Silas is right...
As you say, A ⇒ B is the same as !A | B
I meant !B ⇒ !A (I think).
A ⇒ B | !B ⇒ !A |
---|---|
0 1 0 0 1 1 1 0 0 1 1 1 |
1 1 1 0 1 1 1 0 0 0 1 0 |
Overload Journal #141 - October 2017 + Letters to the Editor
Browse in : |
All
> Journals
> Overload
> o141
(9)
All > Journal Columns > LettersEditor (132) Any of these categories - All of these categories |