Journal Articles

CVu Journal Vol 26, #5 - November 2014 + Programming Topics

Browse in :

All > Journals > CVu > 265 (10)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Perl is a Better Sed, and Python 2 is Good

Author: Martin Moene

Date: 05 November 2014 07:07:56 +00:00 or Wed, 05 November 2014 07:07:56 +00:00

Summary: Silas S. Brown sweats the differences between tools on common platforms.

Body:

If youâ€™ve done any Unix shell scripting, youâ€™ve probably come across the Stream Editor (sed). Itâ€™s most often used for simple substitution, for example:

  for N in *.wav ; do lame "$N" -o "$(echo "$N"|sed
  -e 's/wav$/mp3/')"; done

which goes through all *.wav files and calls the MP3 encoder â€˜lameâ€™ on each one, passing a -o parameter as the filename with the wav at the end changed to mp3 â€“ itâ€™s the sed -e s/x/y/ that does this substitution. [The -e argument allows you to provide multiple commands for a single invocation. Ed]

In this example, the $ at the end of wav is there so that the substitution is made only at the very end of the filename; I donâ€™t want to confuse things if a filename happens to contain â€˜wavâ€™ part-way through. In other situations you might want to add a g after the closing / to globally replace a regular expression many times in a line.

As this example shows, however, you do have to think carefully about your regular expressions (regexps), especially if you donâ€™t know what input youâ€™re going to get. In the above example, if I knew in advance exactly which filenames the command will be working with â€“ say, a particular set of a dozen or so .wav files â€“ and I knew that none of them contain the letters â€˜wavâ€™ except at the very end of the filename, then I wouldnâ€™t need to worry about including the $ character in the regexp. (Also, if I knew there were no spaces or other special characters in the filenames, then I wouldnâ€™t have to put quite so many quote marks around everything.) But if, instead of writing a one-liner to do something with a particular set of filenames, Iâ€™m writing a script that Iâ€™ll be using later, or even sharing with other people, then I must be more careful.

Sed is a fairly universal tool: itâ€™s installed â€˜out of the boxâ€™ on nearly every version of Linux, even many small â€˜embeddedâ€™ versions, and also on other Unix systems, such as BSD and its derivative Darwin which runs Mac OS X. So if you use sed for small jobs like this, it should work on all of these systems. At least, thatâ€™s the theory.

In practice, there are a few annoying differences between BSDâ€™s version of sed (on the Mac) and GNUâ€™s version of sed (on Linux). If you develop and test a script on Linux, it might not work on the Mac, and vice versa. For example, on Linux you can include \n in the replacement string to indicate an extra newline should be added, but you canâ€™t do that on the Macâ€™s version of sed.

Yes you can install GNU tools on the Mac, but I like my scripts to be able to run â€˜out of the boxâ€™ to the extent possible, without requiring the installation of too much extra software. Thatâ€™s because I often need to run my scripts on other peopleâ€™s computers (or give them to others to run), so I want to make a reasonable attempt to minimise the amount of system setup thatâ€™s needed before the script will run. (Thatâ€™s also why I tend to be parsimonious about how many third-party libraries my programs rely on: if such libraries wonâ€™t already be there on the system, and arenâ€™t very easy to bundle, then theyâ€™d better be good enough to be worth the hassle of an extra dependency. A large library I want to make extensive use of, like the Tornado web framework in Python, might be a justifiable dependency, but I wouldnâ€™t want to bring in an extra dependency just to save myself from writing a 10-line function â€“ not unless I know for a fact that Iâ€™ll never have to set up this program with its dependencies anywhere else. The trouble with dependencies is you never know when someone will come along with a system on which they donâ€™t compile, or doesnâ€™t give them enough rights to run the installer, or something, and if itâ€™s not your code then itâ€™s that much harder to figure out what to do about it.)

And so we come to perl. Iâ€™m not an expert perl programmer (most of the perl Iâ€™ve done has been making changes to other peopleâ€™s scripts rather than writing my own), but perl does have a very nice (and often overlooked) command-line option to sort-of â€˜emulateâ€™ sed: the -p option. Try:

  perl -p -e 's/wav$/mp3/'

and youâ€™ll find it behaves just the same as sed -e, except itâ€™s the same across Linux and BSD (and supports things like â€˜newline in replacement textâ€™ on both platforms). Also, you donâ€™t have to put backslashes in front of any parentheses you use (in fact you shouldnâ€™t), which makes your regexps more readable. The other thing to watch for is, if youâ€™re doing multiple substitutions then you should separate them with semicolons rather than supplying additional -e commands as with sed.

Apart from these minor differences to be aware of (which generally go in perlâ€™s favour), perl -p is more or less a â€˜drop-in replacementâ€™ for most uses of sed, except itâ€™s more powerful (and you donâ€™t have to backslash-escape so much) and itâ€™s more likely to work across platforms. So if you find yourself using sed -e in scripts a lot, Iâ€™d recommend being aware of this.

Of course, there will be some â€˜embeddedâ€™ systems out there that have sed but not perl. But generally speaking, perl is quite ubiquitous these days, and it has for some years â€˜settled downâ€™ to a nice stable language thatâ€™s not likely to change under your feet, so it is very well suited for use in shell scripts like this.

What I call a â€˜stableâ€™ language, some people might call â€˜stagnatedâ€™. But I donâ€™t see whatâ€™s wrong with a bit of stability: if you want your code to be portable to many systems â€˜out thereâ€™ with minimum fuss, itâ€™s probably easiest if youâ€™re using a language that has â€˜settled downâ€™ to being pretty much the same everywhere, even if this does mean youâ€™re â€˜living in the pastâ€™ to an extent.

Python 2 is now a nice stable language as well, especially since Python 3 has syphoned off all new development but Python 2 is still (just about) supported for essential bug fixes and security checks. Python 2 is pre-installed on nearly every Linux and Mac OS X machine, is available for all kinds of older systems that Python 3 has yet to be back-ported to â€“ Windows Mobile, Android SL4A, Series 60, EPOC, even RISC OS â€“ and thereâ€™s also a tool to turn a Python program into a standalone Windows executable, including interpreter, which can be run without needing any administrator privileges on the Windows machine (later versions of this tool began to require administrator privileges, which rules out use in a computer lab; I have a nice early version which even lets me update the Windows package from the comfort of Linux without having to go into Windows at all, athough it does mean I canâ€™t add new libraries to it).

Itâ€™s even possible to write code in such a way that it will run on very old 2.x versions of Python, on older systems. For example, for Python 2.2 and earlier, do this:

  try: True
  except: exec("True = 1 ; False = 0")

which defines True and False as variables if the keywords donâ€™t yet exist. And try to avoid writing â€˜string1 in string2â€™ where string1 can be more than one character (not supported in versions of Python before 2.3). You could also do:

  try: set
  except:
    def set(l):
      d = {}
      for i in l: d[i]=True
      return d

to emulate the set() constructor (from a list) on versions of Python before real sets were introduced.

But these days I usually target Python 2.7 if there is no great need to be that multi-platform (i.e. the script Iâ€™m writing will probably not be useful on Series 60 etc, but I still want it to work on any Linux or Mac system from the last few years). Even still, I try to code in such a way that it wonâ€™t be that much of a hassle to back-port to earlier versions of Python 2 if necessary (although if I have to depend on a library like Tornado then thereâ€™s no point even trying to support versions of Python that are older than the library supports â€“ or at least thereâ€™s no point going before the oldest version of Python thatâ€™s supported by the oldest sensible version of the library).

I do remember writing for Python 1.x, and Iâ€™m glad Iâ€™m not doing that any more. But it now seems Python 2 has reached a nice balance of features and stability, and I really donâ€™t see the need to move to Python 3: its advantages are not worth the extra dependency of installing it on every system I want my programs to work on (including older Mac OS X machines). Perhaps a Python 3 enthusiast would like to point out whatâ€™s so good about Python 3? But it had better be amazingly outstanding if I have to insist all my users install it first instead of using whatâ€™s already on their systems.

Incidentally, this year the Ubuntu distribution of Linux declared an intention to eventually ship only Python 3 by default, and to make Python 2 an optional package. This has not yet come to fruition, but if it does, it still wonâ€™t help non-Ubuntu distributions, or BSD, especially all the older Mac OS X machines that for various reasons might not be upgradable to whichever future version of Mac OS X actually ships Python 3 by default (as far as I know none of the existing versions of Mac OS X do this). In the current climate, if Ubuntu were to ship Python 3 by default then Iâ€™d just tell Ubuntu users to install the Python 2 package, because Iâ€™m concerned about all those other systems as well, some of which donâ€™t have easy-to-use package managers like Ubuntu does. But I donâ€™t understand why anyone would want to â€˜kill offâ€™ Python 2 anyway: why canâ€™t they leave it alone like Perl 5 as a super-stable ubiquitous tool? Yes Iâ€™m all for playing with new languages, but not when Iâ€™m trying to write something thatâ€™s supposed to run everywhere (well, not unless I can first compile my code into a more widespread language to ship, but thatâ€™s not the case with Python â€“ if you want it to run somewhere then you need a suitable version of Python â€˜on siteâ€™ there, and that usually means Python 2).

Notes:

More fields may be available via dynamicdata ..