Programming Topics + Overload Journal #138 - April 2017
Browse in : All > Topics > Programming
All > Journals > Overload > o138
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: (Not Really So) New Niche for C++: Browser!?

Author: Martin Moene

Date: 03 April 2017 19:35:30 +01:00 or Mon, 03 April 2017 19:35:30 +01:00

Summary: How do you run C++ in a browser? Sergey Ignatchenko demonstrates how to use Emscripten.

Body: 

Disclaimer: as usual, the opinions within this article are those of ‘No Bugs’ Hare, and do not necessarily coincide with the opinions of the translators and Overload editors; also, please keep in mind that translation difficulties from Lapine (like those described in [Loganberry04]) might have prevented an exact translation. In addition, the translator and Overload expressly disclaim all responsibility from any action or inaction resulting from reading this article.

[In chess annotation,] ‘!?’… usually indicates that the move leads to exciting or wild play but that the objective evaluation of the move is unclear
~ Wikipedia

For quite a long while, C++ had been losing popularity; for example, as reported in [Widman16], in 2016 it got 7% less of the listings on Dice.com compared with a year earlier; and according to [TIOBE17], from the C++ Golden Age in 2004 till 2017, the C++ share fell from ~17% to a measly 6%.

As all of us (as in, ‘hardcore C++ fans’) know </tongue-in-cheek>, this has nothing to do with the deficiencies of C++; rather it is related to an observation that the time of downloadable clients (which was one of the main C++ strongholds) has changed into the time of browser-based clients – and all the attempts to get C++ onto browsers were sooo ugly (ActiveX, anyone?) that this didn’t really leave a chance to use C++ there.

Well, it seems that this tendency is already in the process of being reverted:

C++ can already run on all four major browsers – and moreover, it has several all-important advantages over JavaScript, too.

And this – not too surprisingly – is what this article is all about.

A word of warning: please do NOT expect any revelations here; this article is admittedly long overdue – and quite a few people know MUCH more than I can fit here (and MUCH more than know myself). Still, given the lack of such overviews intended for those of us who haven’t tried it yet, I am sure that such an article has its merits. In the article, I will try to provide a very high-level overview of Emscripten, of the technologies involved, of the performance which can be expected, of the APIs which can be used – and what we can gain from using it.

JavaScript to the rescue!

Attempts to get C++ on browsers were continuing all the time (such as (P)NaCl), but all of them were platform- (and/or browser-)specific, and (as a result) were very problematic for browser deployments. However, help for the C++ side of things has come from exactly the same rival which has been stealing the browser show for all these years – from JavaScript. It wasn’t easy, and took several all-important (and IMO ingenious) pieces of the puzzle to make it useful.

Piece I – asm.js

In 2013, so-called asm.js was released. Essentially, asm.js is just a very small subset of JavaScript, intended to simulate good old assembler. If we take a look at a real-world asm.js program (not hand-written, but compiled from C++), we’ll see something along the lines of Listing 1 [Resig13].

function Vb(d) {
  d = d | 0;
  var e = 0, f = 0, h = 0, j = 0, k = 0, l = 0,
    m = 0, n = 0, o = 0, p = 0, q = 0, r = 0, s = 0;
  e = i;
  i = i + 12 | 0;
  f = e | 0;
  h = d + 12 | 0;
  j = c[h >> 2] | 0;
  if ((j | 0) > 0) {
    c[h >> 2] = 0;
    k = 0
  } else {
    k = j
  }
  j = d + 24 | 0;
  if ((c[j >> 2] | 0) > 0) {
    c[j >> 2] = 0
  }
     ...
}
			
Listing 1

As we can see, it is nothing like your usual high-level JavaScript, which deals with DOM and high-level onclick handlers. Instead (except from the if statements and function declarations) it directly translates into what we’d usually expect from an assembler language.

On taking a closer look, we can observe the following elements of more-or-less typical assembler in the code above:

Well, that’s pretty much all we need to get the full-scale assembler rolling.☹

For our current purposes, we don’t really want to go any deeper, but hopefully I’ve managed to describe the idea behind asm.js: essentially, it is pretty much a simulator of a strange CPU with a strange instruction set. In other words, asm.js did NOT try to simulate any existing instruction sets (and doing so would make it fatally inefficient).

Instead, asm.js has invented its own instruction set, which can be still seen as an instruction set of a CPU, at least from the point of view of a C++ compiler.

Piece II – LLVM/Emscripten

The above observation has made it possible to write a back-end for the LLVM compiler, and this back-end has allowed the generation of asm.js out of our usual C++ (some restrictions apply, batteries not included). Moreover, such a compiler is not only possible, but it exists and is working: it is Emscripten.1

Actually, the asm.js in the example above has been generated by Emscripten. Using Emscripten is indeed rather simple:2 we just take our existing standard-compliant and not-using-platform-specific-stuff C++ code (hey, you DO write your code as cross-platform and standard-compliant, don’t you?</wink>), and compile it into asm.js. As long as our code is just ‘moving bits around’, it works near-perfectly (and what will happen when we need to interact with the rest of the world, we’ll discuss in the ‘APIs’ section below), producing asm.js code which looks similar to the example above.

Piece III – optimizations for asm.js

When looking at all the stuff above, a very natural scepticism goes along the lines of “Ok, this compiled piece of [CENSORED] stuff MAY work correctly, but how slow it is going to be???” And here is the point where the third piece of the C++-to-asm.js puzzle comes in. I’m speaking about asm.js-specific optimizations.

The thing is that with asm.js being this simple and restricted, it becomes possible to optimize it during a JIT compile. That’s it – we can have our cake (write in C++) and eat it (run it in asm.js with a reasonable speed) too!

As of now, all the four major browsers (in alphabetical order: Chrome, Edge, Firefox, and Safari3) – at least try to optimize for asm.js. Results vary, but currently, most of the time, we’re speaking about a less than 2× performance degradation of asm.js compared to native C++ (say, compiled with Clang) [Zakai14]. While comparisons with native C++ are difficult to find (which BTW does make me to raise an eyebrow), the few resources available seem to support this claim (see, for example, [AreWeFastYet17]). BTW, Firefox results listed by the link are of special interest – in fact, it manages to keep the performance of asm.js within a mere 20% of the ‘native’ performance – and while we cannot rely on such performance (hey, we don’t want to be restricted only to Firefox users), it still serves as an indication of what it is possible to achieve (well, if enough effort is spent on it).

BTW, one important property of asm.js is that

As asm.js is a strict subset of JavaScript – it will run even if there is no special support for asm.js in browser.

Sure, without special support asm.js will be pretty slow – but if we’re speaking about ‘glue code’, it still may fly even with asm.js support being unavailable/disabled.

Restrictions

While Emscripten provides a full-scale and very usable environment, there are certain limitations due to the need to run from within browser. When you’re ready to go ahead with Emscripten, make sure to read [Emscripten.Porting]; the following is only a very short summary of the Emscripten restrictions and capabilities.

APIs

The most annoying restriction of Emscripten is (arguably) related to the provided APIs. First of all, we can use pretty much all the C++ standard libraries which don’t need to interact with the system – and that’s including STL (phew). boost:: libraries are not explicitly supported, but there are reports that some of them can be compiled too (not without some associated headaches); most of the header-only boost:: libraries are expected to work with Emscripten ‘out of the box’ (no warranties of any kind, batteries not included).

As noted above, libraries which interact with the rest of the world are a different story. Contrastingly, in general, all the stuff which we’d need to use on the client is present in the APIs; in particular, the following APIs are supported:

Threads and main loop

Due to the Emscripten runtime being run on a top of the JS engine, threading in Emscripten is quite limited from the point of view of a C++ developer.

First of all:

Unless we’re speaking about ‘Workers’, everything within our app happens within a single ‘browser main loop’

In practice, this means a few things:

Personally, I do NOT think that this is really restrictive; in other words, I am arguing to write the code in such an event-driven manner (which I like to name ‘(Re)Actor-style’) in any case, even when there is no Emscripten in sight. Very briefly – considering I have been arguing that having thread sync at app-level is evil for years now (see [NoBugs10] and [NoBugs15]) – going for a bunch of event-driven (Re)Actors exchanging messages is a Good Thing™.

Using multiple cores

While I am all for event-driven single-threaded processing, I am the first one to admit that there are situations when one single thread (and as a result, a single CPU core) is not sufficient to do whatever we need to do. Which means that we do need a way to use multiple cores.

However, being able to use multiple cores, DOES NOT necessarily imply the need to go into traditional mutex- and atomics-ridden untestable nightmare. Rather, we can have more than one separate event processor a.k.a. (Re)Actors (in Emscripten-speak, additional (Re)Actors – that is, beyond the original one running within the ‘browser main loop’ – are called ‘workers’) and exchange messages with them. It provides several benefits compared to classical mutex-based shared-state synchronization models:

Pthread support

In theory, Emscripten has support for pthreads. However, the support is experimental – and moreover, it is Firefox-only. This, of course, makes its use for serious projects a non-starter; however, my rant about pthreads goes deeper than that:

Even in the long run, I would prefer support for (Re)Actor-with-Extractors to support for pthreads.

Sure, having full-scale pthreads, we can implement (Re)Actor-with-Extractors ourselves; however:

64-bit int and 32-bit float issues

As of now, the only numeric data type in JavaScript is 64-bit float; in addition, some operations (mostly bitwise ones) return 32-bit integer (which always fits into 64-bit float). As a result, any operations which are neither 64-bit float nor 32-bit integer are not 100%-efficient in asm.js. In particular:

There are some proposals to deal with it (see, for example, [Zakai14]) but as far as I know, these slowdowns still apply, so if you’re after best-possible performance, you need to keep them in mind.

Practical uses

As noted above, I haven’t used Emscripten for a serious project (yet). However, quite a few projects were reported as compiled and working, including:

For a much more comprehensive list of ports and demos, please refer to [Emscripten.PortingExamples].

Competition: NaCl/PNaCl

An alternative way of running C++ code on browsers, is NaCl/PNaCl by Google. It serves pretty much the same noble purpose of running C++ on the browser, however, it has the BIG problem of being restricted to Chrome. As (a) no other browser has followed suit, and (b) as Chrome market share, while it grew to about 60%, has slowed down its growth in 2016, I do NOT think that NaCl/PNaCl is a viable option (except for some very narrowly defined scenarios) – especially when comparing it to Emscripten+asm.js.

Moreover, I’ve got a feeling (no warranties of any kind) that Google itself has realized futility of (P)NaCl and has slowed down development as a result; overall, my wild guess is that in a few years from now, (P)NaCl will be quietly abandoned in favor of asm.js (and Google is already working on support for asm.js optimizations) or in favor of WebAssembly (see below).

As a result, while the only thing which is certain is that nothing is certain yet, if faced with the task of developing/porting a new C++ Client for browser, I would clearly prefer Emscripten+asm.js.

Oh, BTW – if you already have a (P)NaCl client, there is a library pepper.js, which aims to provide a migration path from (P)NaCl to Emscripten; while I didn’t try it myself – well, it seems to be worth trying.

Ongoing development: WebAssembly a.k.a. wasm

As a next step in this development (and to compensate for certain problems such as asm.js parsing times on mobile devices), an alternative representation – known as WebAssembly or wasm – is being actively worked on.

The idea is to use (give or take) the same C++ source code as already can be used to compile into asm.js, and to compile it to a very different assembler (wasm). Then wasm will be loaded into the browser, where it will be JIT-compiled and then executed.

There seems to be quite significant momentum behind wasm – but as of now, it is too early to tell anything specific. What matters though is that

As app-level developers, we do NOT really care much whether it is asm.js or wasm which wins in the end. Rather, we can use asm.js right now, and hope that we won’t need to change our programs too much when re-compiling them into wasm (when/if it is widely available)

Whether these hopes will stand in reality, we’ll see, but as of now, it is IMNSHO by far the best option we have to try pushing our C++ Clients into browsers.

Practical uses: porting downloadable clients to the web

Well, it is all this stuff is certainly technically exciting, but what can we get from it in practice? Most importantly,

we can port our (well-written-enough) C++ Clients to the web.

Until two or so years ago, there was no way to port an existing downloadable Client into a web app. In other words, whatever we were doing with our C++, we weren’t able to avoid download and at least some warnings about how malicious our code can be from the browser – and this was the point where our potential users were dropping out the most.

So, for a long while, when deciding how to develop our Client,

we were facing a tough choice: either to develop it in JS-only (losing all the bells, whistles, and performance of C++ development) – or to have it in C++ but at the cost of dropping those users who don’t want to download.

With Emscripten and asm.js, these problems are gone. We can have our C++ cake and eat it on browsers too.

In addition, such an option opens a door for some things that are not really widely used yet – such as creating live demo versions which can be viewed in-browser without the need to download and install them; it looks very promising for reducing drop-out rates of potential customers (as showing a live demo tends to work orders of magnitude better then showing a screenshot, and if we can get live demo without download, we have a clear winner).

Of course, to achieve this holy grail of multi-platform clients with one of the platforms being ‘web browser’, we’ll need to re-learn how to write cross-platform programs (and apparently, with all the vendor efforts to lock us in, it is not an easy feat), but as soon as we do it (and some of us were doing it all the way regardless of Emscripten), we will be able to have one single C++ code base over all of the following: desktops, phones/tablets, and web (with AAA gamedevs being able to add consoles to the mix too).

Acknowledgement

Cartoon by Sergey Gordeev from Gordeev Animation Graphics, Prague.

References

[AreWeFastYet17] AreWeFastYet, https://arewefastyet.com/#machine=28&view=single&suite=asmjs-apps

[Emscripten.BrowserMainLoop] Emscripten Contributors, Emscripten Runtime Environment#Browser main loop,https://kripken.github.io/emscripten-site/docs/porting/emscripten-runtime-environment.html#browser-main-loop

[Emscripten.Porting] Emscripten Contributors, Porting, https://kripken.github.io/emscripten-site/docs/porting/index.html

[Emscripten.PortingExamples] Emscripten Contributors, Porting Examples and Demos, https://github.com/kripken/emscripten/wiki/Porting-Examples-and-Demos

[Loganberry04] David ‘Loganberry’, Frithaes! – an Introduction to Colloquial Lapine!, http://bitsnbobstones.watershipdown.org/lapine/overview.html

[NoBugs10] ‘No Bugs’ Hare, Single-Threading: Back to the Future?, Overload #97/#98

[NoBugs15] ‘No Bugs’ Hare, Multi-threading at Business-logic Level is Considered Harmful, Overload #128

[NoBugs16] ‘No Bugs’ Hare, Asynchronous Processing for Finite State Machines/Actors: from plain event processing to Futures (with OO and Lambda Call Pyramids in between), http://ithare.com/asynchronous-processing-for-finite-state-machines-actors-from-plain-events-to-futures-with-oo-and-lambda-call-pyramids-in-between/

[NoBugs17] ‘No Bugs’ Hare, upcoming book Development & Deployment of Multiplayer Online Games, Vol.II, chapter on (Re)Actors, current beta available at Leanpub and Indiegogo

[Resig13] John Resig, Asm.js: The JavaScript Compile Target, http://ejohn.org/blog/asmjs-javascript-compile-target/

[TIOBE17] TIOBE Index (February 2017), http://www.tiobe.com/tiobe-index/

[Widman16] Jake Widman, The Most Popular Programming Languages of 2016, https://blog.newrelic.com/2016/08/18/popular-programming-languages-2016-go/

[Zakai14] Alon Zakai, NATIVE SPEED ON THE WEB. JAVASCRIPT & ASM.JS, http://kripken.github.io/mloc_emscripten_talk/sloop.html#/

Notes: 

More fields may be available via dynamicdata ..