Journal Articles

CVu Journal Vol 31, #6 - January 2020 + Programming Topics

Browse in :

All > Journals > CVu > 316 (11)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Restaurant C++ and Pidgin Python

Author: Bob Schmidt

Date: 09 January 2020 18:12:30 +00:00 or Thu, 09 January 2020 18:12:30 +00:00

Summary: Pete Goodliffe looks at the idioms of language.

Body:

I guess that Iâ€™m a typical Briton. I am a tea-drinking fish-and-chip junkie who goes red 35 seconds after exposure to sunlight. I donâ€™t own any bulldogs, although Iâ€™ve been known to spout plenty of bull. And my grip of foreign languages is typically poor. I have what youâ€™d politely call restaurant-French. Any slightly taxing conversation involves me gesticulating wildly, whilst repeating the same sentence louder and more slowly until the conversant finally understands what Iâ€™m saying. Or politely pretends that they do. It works. At least, until they give up and talk back to me in English.

So, not a natural French-speaker, then.

Just how good is your mastery of your programming languages? Is it like my French â€“ do you have restaurant Java, or tourist Python? Or do you really know the languages you use? Do you need phrase books and cheat sheets, or are you fluent, knowing the natural language idioms? Do you have to screw your eyes up and think hard when crafting code to make sure that it makes sense? Or is your coding fluent? Do other readers understand what you write? Or do you write pidgin-code, translating idioms from a different language into the one youâ€™re writing?

A fluent French speaker doesnâ€™t think in English and convert their thoughts to French before speaking them. They think in French, and what they speak aloud comes naturally. There is no mismatch of idioms. There is no need for internal translation of the English idiom to the French idiom. To be truly effective in a programming language, to be able to craft Really Good Code, you have to operate in the same way.

There is a very real difference between fluent, idiomatic code, and pidgin-code. Stop for a moment and consider the number of ways that Listing 1 (Ouch!) offends you. Count them all. How is Listing 2 better? Is it any better? And what language is each one written in, anyway?

bool ouch()
  {
    if (do_something() == FAILED)
   goto fail_1;
    if (do_something_else() != OK)
   goto fail_2;

    return true;
      fail_2:
    tidy_up_second_thing();
fail_1:
    tidy_up_first_thing();
return false;
}

Listing 1

bool better()
try
{
    do_something();
    do_something_else();
}
catch (...) { return false; }

Listing 2

When weâ€™re staring at code, we like to see clear structure and code patterns that weâ€™re familiar with â€“ the natural idioms of that language. Certain code patterns are offensive â€“ they naturally cause us to sit up, take note, and apply extreme caution. The mere sight of a goto invokes the gag reflex; when we see messy and inconsistent layout, we feel bile rising; global variables cause an allergic reaction; and in the face of illogical and unmalleable structure, weâ€™re gripped with the urge to run away very quickly. This kind of judgement is a part of what sets great programmers apart from the merely adequate ones. And we all want to be great programmers, donâ€™t we? How advanced do you think your internal quality meter is?

Beauty is in the idiom of the beholder

Itâ€™s interesting to note that our sense of â€˜beautyâ€™ is shaped by familiarity â€“ by the prevalent idioms of the implementation domain. What constitutes natural and beautiful code differs from language to language. Idiomatic C code is quite a different beast from idiomatic Python. Listings 3 (idiomatic C code), 4 (the equivalent idiomatic Python code), and 5 (equivalent non-idiomatic Python code) illustrate this. The Python code in Listing 4 is idiomatic, but it could have been written like Listing 5 â€“ that listing is a more direct translation of the original C code. But itâ€™s not idiomatic Python, and it doesnâ€™t look or feel â€˜rightâ€™ as a consequence. Listing 5 is more verbose, and consequently harder to comprehend and more likely to harbour bugs.

int list[] = { 1, 2, 3, 5, 8 };
for (int n = 2; n < 4; n++)
{
   do_something(list[n]);
}

Listing 3

list = [1, 2, 3, 5, 8]
for element in list[2:4]:
   do_something(element)

Listing 4

n = 2
while i < 4:
   do_something(list[n])
   n += 1

Listing 5

And thatâ€™s just a really small example. (After all, small examples are idiomatic for magazine columns.)

We become accustomed code that fits the natural idioms of the language weâ€™re using. Non-idiomatic code is most often what sets our internal alarm bells ringing. And rightly so. Idioms donâ€™t just look nice, they help us to write safe, correct code, avoiding the subtle pitfalls in the language. Like old wivesâ€™ tales, there is a body of collected wisdom in our programming language idioms that is perilous to ignore.

Learning the specific idioms of a language are a rite of passage, and mark your mastery over the language, like a journeyman programmer becoming getting acquainted with his tools.

No idiom is an island

Important as they are, idioms are not sacred. Nor are idioms fixed and immutable. Fashion doesnâ€™t stand still; over time tastes change. Some classic programming idioms have dated: Hungarian Notation used to be a conventional, safe, and well-regarded practice. These days it is not merely passÃ©, but socially unacceptable: the modern equivalent of software leprosy.

Idioms donâ€™t have all the answers, either. Multiple idioms compete over the same coding practice, and no one is necessarily right. Some aspects of code beauty are not clear cut and are continually the root of religious debates. For example, what is your opinion on the following; do you even care about them?

Do you indent with spaces or tabs?
If you use C-like languages, where do you put your braces?
Do you advocate single-entry-single-exit functions?
Do you prefer the functional coding style, where functions have no side effects?

The functional coding paradigm, in particular, has gained mindshare recently as functional programming enjoys a renaissance and the industry begins to learn how powerful and tractable some of the functional idioms are. Itâ€™s informed the design and common usage idioms of many traditionally â€˜non-functionalâ€™ programming languages.

Idiom idiocy

But sometimes the desire for elegant, beautiful, idiomatic code can trip us up. Well intentioned use of idioms can bite you. Hereâ€™s a cautionary tale involving C++. Now, C++ idioms are particularly amusing. The oft-cited Perl mantra is Thereâ€™s more than one way to do it. C++ is like that for idioms â€“ thereâ€™s always more than one idiom for anything in C++, and you can bet that each has zealots who fervently believe that theirs is the Only Right Way To Do It.

One of the most basic, contentious, C++ idioms is the naming member variables. Many of these idioms involve the subtle incursion of Hungarian Notation. See Listing 6 for examples.

class ExampleMemberNames
{
  // Many programmers prefix member variables by
  // "_"
  int _common;

  // Also common is an "m_" prefix
  int m_member;
  int m_another;

  // Herb Sutter advocates a trailing underscore
  int sutter_;

  // Scott Meyers is the sensible advocate of
  // minimalism
  int meyers;
};

Listing 6

Many C++ programmers prefix member variables by an underscore. However, this is dubious practice as the language standard reserves many identifier names beginning with an underscore. Class member variables are not actually one of the reserved cases, but this convention sails dangerously close to the wind. Also common is the m_ prefix (where m stands for member). Iâ€™ve even seen the disgustingly cute my_ prefix, e.g. my_member. Euch!

So, what do the C++ experts do? Herb Sutter advocates a trailing underscore on member variable names. Andrei Alexandrescuâ€™s books have sadly followed his lead here. I have to admit that I have a personal dislike for this approach as the variable names read very strangely.

Scott Meyers is the sensible advocate of minimalism â€“ he writes the variable name, the whole name, and nothing but the name (see the end of Listing 6). To my mind, this approach makes sense. If you need any extra indication of memberyness then you probably have code that is too hard to read â€“ your function has too many parameters, or your class is too large. Fix that problem, donâ€™t mask it with silly variable names.

Why does this simple naming issue matter? Well, it shouldnâ€™t, until we combine the Meyers Minimal Member Moniker Mechanism with another idiom. When constructing a C++ class, members are given their initial values in the member initialisation list. Thereâ€™s another naming minefield here: three of the options are enumerated in Listing 7.

class Foo
{
private:
  int thing;

public:
  // How would you name the ctor parameter?
  Foo(int thing_in) : thing(thing_in) {} // (1)
  Foo(int t) : thing(t) {}               // (2)
  Foo(int thing) : thing(thing) {}       // (3)
};

Listing 7

They are only subtly different, are functionally equivalent, and each seems perfectly adequate. They are all common in modern C++ code. Iâ€™ve most often seen idiom 3; itâ€™s nicely symmetric and doesnâ€™t introduce another unnecessary name into the code.

Great.

Well, not quite. Idiom 3 has a hidden sting in its tail. Sure enough, in Listing 7 it works just as advertised. But consider what happens when you need some slightly more complex constructor logic, like Listing 8.

Foo::Foo(int thing) : thing(thing)
{
  if (thing == 1)
    thing = 2;
}

Listing 8

What is the value of the thing member when a Foo is constructed with the value 1? Itâ€™s 2, right? Well, no it isnâ€™t. Itâ€™s 1. But how can that be I hear you ask? The name thing inside the constructor is bound to the constructor parameter, not to the classâ€™ member variable. If you wanted to assign the member, you must write

  this->thing = 2.

otherwise youâ€™re just writing to a temporary variable that will shortly be thrown away. This is a subtle but nasty way to introduce obscure bugs into your codebase. So, there you have a collision of idioms. Some idioms are bad for you! Hurrah for C++, and hurrah for idioms.

How could you alleviate this problem? There are many ways. For example, you could make the thing parameter const in the constructor implementation. But for built-in types passed by value, this isnâ€™t idiomatic! How else could you avoid it? (You could always choose to follow a different set of idioms. Or use a different language.)

The moral of the story

Itâ€™s important to consider the idioms of the language youâ€™re working in â€“ and to gauge the beauty and quality of code against the familiar idioms it should adhere to. Common language idioms have several important uses: they help to show the elegance, beauty, and artistry of a piece of code. They help you to write code that seems familiar and easy to work with, and they (usually) help you to avoid simple bugs. You can gauge your mastery of a language by how well you know its idioms.

Itâ€™s particularly important to understand why these idioms exist. Learn to think in the programming language youâ€™re using, to think in terms of it idioms.

But donâ€™t blindly trust idioms. Idioms can be flawed. Always use your brain. Of course, if this seems like too much work, perhaps you should give up and produce boring, ugly code. Or learn to speak French properly instead.

Pete Goodliffe Pete Goodliffe is a programmer who never stays at the same place in the software food chain. He has a passion for curry and doesnâ€™t wear shoes.

Notes:

More fields may be available via dynamicdata ..