Journal Articles
Browse in : |
All
> Journals
> CVu
> 265
(10)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Perl is a Better Sed, and Python 2 is Good
Author: Martin Moene
Date: 05 November 2014 07:07:56 +00:00 or Wed, 05 November 2014 07:07:56 +00:00
Summary: Silas S. Brown sweats the differences between tools on common platforms.
Body:
If you’ve done any Unix shell scripting, you’ve probably come across the Stream Editor (sed). It’s most often used for simple substitution, for example:
for N in *.wav ; do lame "$N" -o "$(echo "$N"|sed -e 's/wav$/mp3/')"; done
which goes through all *.wav files and calls the MP3 encoder ‘lame’ on each one, passing a -o
parameter as the filename with the wav at the end changed to mp3 – it’s the sed -e s/x/y/
that does this substitution. [The -e
argument allows you to provide multiple commands for a single invocation. Ed]
In this example, the $
at the end of wav
is there so that the substitution is made only at the very end of the filename; I don’t want to confuse things if a filename happens to contain ‘wav’ part-way through. In other situations you might want to add a g
after the closing /
to globally replace a regular expression many times in a line.
As this example shows, however, you do have to think carefully about your regular expressions (regexps), especially if you don’t know what input you’re going to get. In the above example, if I knew in advance exactly which filenames the command will be working with – say, a particular set of a dozen or so .wav files – and I knew that none of them contain the letters ‘wav’ except at the very end of the filename, then I wouldn’t need to worry about including the $
character in the regexp. (Also, if I knew there were no spaces or other special characters in the filenames, then I wouldn’t have to put quite so many quote marks around everything.) But if, instead of writing a one-liner to do something with a particular set of filenames, I’m writing a script that I’ll be using later, or even sharing with other people, then I must be more careful.
Sed is a fairly universal tool: it’s installed ‘out of the box’ on nearly every version of Linux, even many small ‘embedded’ versions, and also on other Unix systems, such as BSD and its derivative Darwin which runs Mac OS X. So if you use sed for small jobs like this, it should work on all of these systems. At least, that’s the theory.
In practice, there are a few annoying differences between BSD’s version of sed (on the Mac) and GNU’s version of sed (on Linux). If you develop and test a script on Linux, it might not work on the Mac, and vice versa. For example, on Linux you can include \n
in the replacement string to indicate an extra newline should be added, but you can’t do that on the Mac’s version of sed.
Yes you can install GNU tools on the Mac, but I like my scripts to be able to run ‘out of the box’ to the extent possible, without requiring the installation of too much extra software. That’s because I often need to run my scripts on other people’s computers (or give them to others to run), so I want to make a reasonable attempt to minimise the amount of system setup that’s needed before the script will run. (That’s also why I tend to be parsimonious about how many third-party libraries my programs rely on: if such libraries won’t already be there on the system, and aren’t very easy to bundle, then they’d better be good enough to be worth the hassle of an extra dependency. A large library I want to make extensive use of, like the Tornado web framework in Python, might be a justifiable dependency, but I wouldn’t want to bring in an extra dependency just to save myself from writing a 10-line function – not unless I know for a fact that I’ll never have to set up this program with its dependencies anywhere else. The trouble with dependencies is you never know when someone will come along with a system on which they don’t compile, or doesn’t give them enough rights to run the installer, or something, and if it’s not your code then it’s that much harder to figure out what to do about it.)
And so we come to perl. I’m not an expert perl programmer (most of the perl I’ve done has been making changes to other people’s scripts rather than writing my own), but perl does have a very nice (and often overlooked) command-line option to sort-of ‘emulate’ sed: the -p
option. Try:
perl -p -e 's/wav$/mp3/'
and you’ll find it behaves just the same as sed -e
, except it’s the same across Linux and BSD (and supports things like ‘newline in replacement text’ on both platforms). Also, you don’t have to put backslashes in front of any parentheses you use (in fact you shouldn’t), which makes your regexps more readable. The other thing to watch for is, if you’re doing multiple substitutions then you should separate them with semicolons rather than supplying additional -e
commands as with sed.
Apart from these minor differences to be aware of (which generally go in perl’s favour), perl -p
is more or less a ‘drop-in replacement’ for most uses of sed, except it’s more powerful (and you don’t have to backslash-escape so much) and it’s more likely to work across platforms. So if you find yourself using sed -e
in scripts a lot, I’d recommend being aware of this.
Of course, there will be some ‘embedded’ systems out there that have sed but not perl. But generally speaking, perl is quite ubiquitous these days, and it has for some years ‘settled down’ to a nice stable language that’s not likely to change under your feet, so it is very well suited for use in shell scripts like this.
What I call a ‘stable’ language, some people might call ‘stagnated’. But I don’t see what’s wrong with a bit of stability: if you want your code to be portable to many systems ‘out there’ with minimum fuss, it’s probably easiest if you’re using a language that has ‘settled down’ to being pretty much the same everywhere, even if this does mean you’re ‘living in the past’ to an extent.
Python 2 is now a nice stable language as well, especially since Python 3 has syphoned off all new development but Python 2 is still (just about) supported for essential bug fixes and security checks. Python 2 is pre-installed on nearly every Linux and Mac OS X machine, is available for all kinds of older systems that Python 3 has yet to be back-ported to – Windows Mobile, Android SL4A, Series 60, EPOC, even RISC OS – and there’s also a tool to turn a Python program into a standalone Windows executable, including interpreter, which can be run without needing any administrator privileges on the Windows machine (later versions of this tool began to require administrator privileges, which rules out use in a computer lab; I have a nice early version which even lets me update the Windows package from the comfort of Linux without having to go into Windows at all, athough it does mean I can’t add new libraries to it).
It’s even possible to write code in such a way that it will run on very old 2.x versions of Python, on older systems. For example, for Python 2.2 and earlier, do this:
try: True except: exec("True = 1 ; False = 0")
which defines True
and False
as variables if the keywords don’t yet exist. And try to avoid writing ‘string1 in string2’ where string1
can be more than one character (not supported in versions of Python before 2.3). You could also do:
try: set except: def set(l): d = {} for i in l: d[i]=True return d
to emulate the set()
constructor (from a list) on versions of Python before real sets were introduced.
But these days I usually target Python 2.7 if there is no great need to be that multi-platform (i.e. the script I’m writing will probably not be useful on Series 60 etc, but I still want it to work on any Linux or Mac system from the last few years). Even still, I try to code in such a way that it won’t be that much of a hassle to back-port to earlier versions of Python 2 if necessary (although if I have to depend on a library like Tornado then there’s no point even trying to support versions of Python that are older than the library supports – or at least there’s no point going before the oldest version of Python that’s supported by the oldest sensible version of the library).
I do remember writing for Python 1.x, and I’m glad I’m not doing that any more. But it now seems Python 2 has reached a nice balance of features and stability, and I really don’t see the need to move to Python 3: its advantages are not worth the extra dependency of installing it on every system I want my programs to work on (including older Mac OS X machines). Perhaps a Python 3 enthusiast would like to point out what’s so good about Python 3? But it had better be amazingly outstanding if I have to insist all my users install it first instead of using what’s already on their systems.
Incidentally, this year the Ubuntu distribution of Linux declared an intention to eventually ship only Python 3 by default, and to make Python 2 an optional package. This has not yet come to fruition, but if it does, it still won’t help non-Ubuntu distributions, or BSD, especially all the older Mac OS X machines that for various reasons might not be upgradable to whichever future version of Mac OS X actually ships Python 3 by default (as far as I know none of the existing versions of Mac OS X do this). In the current climate, if Ubuntu were to ship Python 3 by default then I’d just tell Ubuntu users to install the Python 2 package, because I’m concerned about all those other systems as well, some of which don’t have easy-to-use package managers like Ubuntu does. But I don’t understand why anyone would want to ‘kill off’ Python 2 anyway: why can’t they leave it alone like Perl 5 as a super-stable ubiquitous tool? Yes I’m all for playing with new languages, but not when I’m trying to write something that’s supposed to run everywhere (well, not unless I can first compile my code into a more widespread language to ship, but that’s not the case with Python – if you want it to run somewhere then you need a suitable version of Python ‘on site’ there, and that usually means Python 2).
Notes:
More fields may be available via dynamicdata ..