    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A Python Script to Relocate Source Trees</title>
        <link>https://members.accu.org/index.php/articles/205</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + CVu Journal Vol 16, #2 - Apr 2004</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c103/">162</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-103/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+103/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A Python Script to Relocate Source Trees</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 01 April 2004 22:53:48 +01:00 or Thu, 01 April 2004 22:53:48 +01:00</p>
<p><strong>Summary:</strong>&nbsp;<p>Files form the raw ingredients of a software system - source
files, build files, configuration files, resource files, scripts
etc. These files are organised into directories.</p>
</p>
<p><strong>Body:</strong>&nbsp;<p>As the system develops, this directory structure must develop
with it: maybe an extra level of hierarchy needs adding to
accommodate a new revision of an operating system; maybe third
party libraries need gathering into a single place; maybe we are
porting to a platform which imposes some restriction on file names;
or maybe the name originally chosen for a directory has simply
become misleading.</p>
<p>This article describes the development of a simple Python script
to facilitate relocating a source tree. It is as much - if not more
- about getting started with Python as it is about solving the
particular problem used as an example.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Statement of
the Problem</h2>
</div>
<p>Let's suppose that we have a directory structure we wish to
modify. This existing structure has been reviewed and the decision
has been taken to remap source directories as follows:</p>
<pre class="screen">
png         -&gt; graphics/thirdparty/png
jpeg        -&gt; graphics/thirdparty/jpeg
bitmap      -&gt; graphics/common/bitmap
UserIF      -&gt; ui
UserIF/Wgts -&gt; ui/widgets
os          -&gt; platform/os
os/hpux     -&gt; platform/os/hpux10
</pre>
<p>By &quot;re-mapped&quot;, I mean that the directory and its contents
should be recursively moved to the new location. So, for
example:</p>
<pre class="screen">
UserIF/Wgts/buttons/switchbutton.cpp
  -&gt; ui/widgets/buttons/switchbutton.cpp
</pre>
<p>Although it's straightforward create a new top-level directory
and copy existing directories to their new locations, the problem
we then face is that our source files will no longer build because
the files they include have moved. In fact, some of the build files
themselves need adjusting, since they too reference moving
targets.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>An Outline
Solution</h2>
</div>
<p>In outline, our script will implement a two-pass algorithm:</p>
<div class="variablelist">
<dl>
<dt><span class="term">1st Pass:</span></dt>
<dd>
<p>Traverse all files in the current source tree, working out where
they will move to. The output of this pass is a container which
maps existing files to their new locations.</p>
</dd>
<dt><span class="term">2nd Pass:</span></dt>
<dd>
<p>For each file found in the first pass, perform the actual
relocation, updating any internal references to file paths.</p>
</dd>
</dl>
</div>
<p>The actual processing for a file in the 2nd pass depends on the
type of the file. We can simply copy a bitmap, for example, to its
new home, but when relocating a C/C++ source file we'll need to be
more careful. This is why I've chosen a two-pass solution: when
updating internal file references, I prefer look-up to
recalculation.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>First Pass -
Iterating Over Files</h2>
</div>
<p>We want to map existing files to their new locations. In Python,
the builtin mapping type is called a dictionary. The output of pass
one will be a dictionary, which we initialise to be empty.</p>
<pre class="programlisting">
files_map = {}
</pre>
<p>There are two standard modules which support file and directory
operations:</p>
<pre class="screen">
<span class=
"bold"><b>os</b></span> - &quot;Miscellaneous operating system interfaces&quot;
<span class=
"bold"><b>os.path</b></span> - &quot;Common pathname manipulations&quot;
</pre>
<p>Both of these will be of use to our script. In fact, both
provide a mechanism for traversing a directory tree:</p>
<div class="variablelist">
<dl>
<dt><span class="term">os.path.walk</span></dt>
<dd>
<p>which calls back a supplied function at each subdirectory,
passing that function the subdirectory name and a list of the files
it contains.</p>
</dd>
<dt><span class="term">os.walk</span></dt>
<dd>
<p>which generates a 3-tuple (<tt class="literal">dirpath</tt>,
<tt class="literal">dirnames</tt>, <tt class=
"literal">filenames</tt>) for each subdirectory in the tree.</p>
</dd>
</dl>
</div>
<p>The second option, <tt class="literal">os.walk</tt>, only exists
in Python 2.3 (2.3 strengthens the language's support for
generators). I prefer it since it makes the script more direct.</p>
<pre class="programlisting">
import os
# Initialise a dictionary to map current file
# path to new file path.
files_map = {}

# Fill the dictionary by remapping all files
# beneath the current working directory.
for (subdir, dirs, files) in os.walk('.'):
  print &quot;Mapping files in subdir [%s]&quot; % subdir
  files_map.update(
             mapFiles(subdir, files)
             )
</pre>
<p>Note the general absence of visible symbols to delimit blocks
and expressions - a colon marks the end of the <tt class=
"literal">for</tt> condition, and that's about it. Expressions are
terminated by a newline, unless the newline is escaped with a
backslash or the expression is waiting for a closing bracket to
complete it. Thus the <tt class="literal">print</tt> statement
terminates at the newline, but the dictionary <tt class=
"literal">update</tt> statement spreads over three lines. Note also
that statements can be grouped into a block by placing them at the
same indentation level: the body of the for loop is a block of two
statements.</p>
<p>To a C/C++ programmer these syntactical rules may seem unusual,
dangerous even - attaching meaning to whitespace!? - but I would
argue that they actually encourage clean and well laid out
scripts.</p>
<p>Incidentally, the default behaviour of the print statement is to
add a newline after printing. Appending a trailing comma would
print a space instead of this newline.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Pass One:
Mapping Files and Directories</h2>
</div>
<p>If we attempt to run the script as it stands, we'll see an
exception thrown:</p>
<pre class="screen">
&lt;...snip...&gt;
NameError: name 'mapFiles' is not defined
</pre>
<p>which is as we'd expect. We need to define the function:</p>
<pre class="programlisting">
def mapFiles(dirname, files):
  &quot;&quot;&quot;Return a dictionary mapping files to their new locations.&quot;&quot;&quot;
  new_dir = mapDirectory(dirname)
  print &quot;mapDirectory [%s] -&gt; [%s]&quot; % \
                          (dirname, new_dir)
  fm = {}
  for f in files:
    fm[os.path.join(dirname, f)] = \
       os.path.join(new_dir, f)
  return fm
</pre>
<p>The Python interpreter needs to know about this function before
it can use it, so we'll place it before the path traversal
loop.</p>
<p>The first statement of the function body is the function's
(optional) documentation string, or docstring. The Python
documentation explains why it's worth getting the habit of using
docstrings and the conventions for their use.</p>
<p>The function fills a dictionary mapping files to their new
location. It uses <tt class="literal">os.path.join</tt> from the
<tt class="literal">os.path</tt> module to construct a file path.
The backslash is there to escape a newline, allowing the dictionary
itemsetter to continue onto a second line.</p>
<p>The final component of the first pass is the function <tt class=
"function">mapDirectory</tt>, which maps an existing directory to
its new location.</p>
<div class="sidebar">
<p class="title c2">The Python Interpreter</p>
<p>If we were to start an interactive Python shell and load the
mapFiles function, we could then query it and its attributes:</p>
<pre class="programlisting">
&gt;&gt;&gt; mapFiles
&lt;function mapFiles at 0x01106630&gt;
&gt;&gt;&gt; dir(mapFiles)
['__call__', '__class__', '__delattr__',
'__dict__', '__doc__', '__get__',
'__getattribute__', '__hash__', '__init__',
'__module__', '__name__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__str__', 'func_closure',
'func_code', 'func_defaults', 'func_dict',
'func_doc', 'func_globals', 'func_name']

&gt;&gt;&gt; mapFiles.func_doc
'Return a dictionary mapping files in dir to
their new locations.'
</pre>
<p>I like to switch between editor and interpreter when developing
scripts (on Windows, the PythonWin IDE makes this easy to do, or,
alternatively, Python's <tt class="literal">-i</tt> option), since
it helps me understand both how my script works and how Python
works. Here we can see the name &quot;<tt class="literal">mapFiles</tt>&quot;
refers to a function object which has a list of attributes.</p>
<p>The Python interpreter also allows functions to be exercised
immediately:</p>
<pre class="programlisting">
&gt;&gt;&gt; mapFiles('png',('pngRead.h','pngWrite.h'))
</pre></div>
<pre class="programlisting">
def mapDirectory(dname):
  &quot;&quot;&quot;Return the new location of the input directory.&quot;&quot;&quot;
  # The following dictionary maps existing
  # directories to their new locations.
  dirmap = {
    'png'         : 'graphics/thirdparty/png',
    'jpeg'        : 'graphics/thirdparty/jpeg',
    'bitmap'      : 'graphics/common/bitmap',
    'UserIF'      : 'ui',
    'UserIF/Wgts' : 'ui/widgets',
    'os'          : 'platform/os',
    'os/hpux'     : 'platform/os/hpux10'
  }
  # Successively reduce the directory path
  # until it matches one of the keys in the
  # dictionary.
  mapped_dir = p = dname
  while p and not p in dirmap:
    p = os.path.dirname(p)
  if p:
    mapped_dir = os.path.join(dirmap[p],
                              dname[len(p) + 1:])
 return mapped_dir
</pre>
<p>The directory rearrangement described earlier in this article
has been represented as a dictionary. The input directory is
reduced until we match a key in this dictionary. As soon as we find
such a match, we construct our return value from the value at this
key and the un-matched tail of the input directory; or, if no such
match is found, the input value is returned unmodified.</p>
<p>The expression <tt class="literal">dname[len(p) + 1:]</tt> is a
slice operation applied to a string. Bearing in mind that
<tt class="literal">p</tt> is the first <tt class=
"literal">len(p)</tt> characters in <tt class="literal">dir</tt>,
this expression returns what's left of <tt class=
"literal">dname</tt>, omitting the slash which separates the head
from the tail of this path.</p>
<p>For example, when mapping the directory <tt class=
"literal">'os/hpux/include'</tt> we would expect to exit the while
loop when <tt class="literal">p == 'os/hpux'</tt>, and return the
result of joining the path <tt class=
"literal">'platform/os/hpux10'</tt> to <tt class=
"literal">'include'</tt>.</p>
<pre class="programlisting">
mapDirectory('os/hpux/include')
  -&gt; os.path.join('platform/os/hpux10',
                   'include')
  -&gt; 'platform/os/hpux10/include'
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Pass One:
Testing</h2>
</div>
<p>Let's test these expectations by adding the following lines to
our script:</p>
<pre class="programlisting">
assert(mapDirectory('os/hpux/include')
    == 'platform/os/hpux10/include')
assert(mapDirectory('os/win32')
    == 'platform/os/win32')
assert(mapDirectory('unittests')
    == 'unittests')
</pre>
<p>These tests pass on unix platforms, but if you run them on
Windows the first two tests raise <tt class=
"literal">AssertionError</tt> exceptions (although the final one
passes). For now, I'll leave you to work out why - but promise a
more platform independent solution in the final version of the
script.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Second
Pass</h2>
</div>
<p>Recall that:</p>
<div class="variablelist">
<dl>
<dt><span class="term">2nd Pass:</span></dt>
<dd>
<p>For each file found in the first pass, perform the actual
relocation, updating any internal references to file paths.</p>
</dd>
</dl>
</div>
<p>and that the output of this phase is a dictionary, <tt class=
"literal">files_map</tt>. The main loop for the 2nd pass is:</p>
<pre class="programlisting">
# Create the new root directory for relocated
# files
new_root = '../relocate'
os.makedirs(new_root)

# Now actually perform the relocation
for srcdst in files_map.items():
  relocate(file, new_root, files_map)
</pre>
<p>The function <tt class="literal">os.makedirs</tt> recursively
creates a directory path. It throws an exception if the directory
path already exists. We will not catch this exception since we want
to ensure the files are being relocated to a completely new
directory.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Pass Two:
Relocating by Copying</h2>
</div>
<pre class="programlisting">
import shutil
def relocate(srcdst, dst_root, files_map):
  &quot;&quot;&quot;Relocate a file, correcting internal file references.&quot;&quot;&quot;
  dst_file = os.path.join(dst_root, srcdst[1])
  dst_dir = os.path.dirname(dst_file)
  if not os.path.isdir(dst_dir):
    os.makedirs(dst_dir)
  if isSourceFile(dst_file):
    relocateSource(srcdst, dst_file,
                   files_map)
  else:
    shutil.copyfile(srcdst[0], dst_file)
</pre>
<p>There aren't many new features to comment on here. The first
parameter to the function, <i class="parameter"><tt>srcdst</tt></i>
is a (key, value) item from the <tt class="literal">files_map</tt>
dictionary, so <tt class="literal">srcdst[0]</tt> is the path to
the original file, and <tt class="literal">srcdst[1]</tt> is the
path to the relocated file, relative to the new root directory.</p>
<p>We create the destination directory unless it already exists.
Then, if our file is a source file, we call <tt class=
"function">relocateSourceFile;</tt> otherwise, we simply copy the
file across.</p>
<p>I admit this isn't the most object-oriented of functions. The
meaning of the literal <tt class="literal">0</tt> and <tt class=
"literal">1</tt> isn't transparent, and switching on type often
indicates unfamiliarity with polymorphism. It isn't Python that's
to blame here, nor a lack of familiarity with polymorphism: rather
a lack of familiarity on my part with Python's support for
polymorphism, and a reluctance to add such sophistication to a
simple script.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Pass Two:
Identifying Source Files</h2>
</div>
<pre class="programlisting">
import re

def isSourceFile(file):
  &quot;&quot;&quot;Return True if the input file is a C/C++ source file.&quot;&quot;&quot;
  src_re = re.compile(r'\.(c|h)(pp)?$',re.IGNORECASE)
  return src_re.search(file) is not None
</pre>
<p>We identify source files using a regular expression pattern:</p>
<pre class="programlisting">
r'\.(c|h)(pp)?$'
</pre>
<p>Here, the &quot;r&quot; stands for raw string, which means that
backslashes are not handled in any special way by Python - the
string literal is passed directly on to the regular expression
module.</p>
<p>Regular expression patterns in Python are as powerful, concise
and downright confusing to the uninitiated as they are elswhere. I
would say that subsequent use of regular expression matches is a
little more friendly.</p>
<p>In this case, the regex reads: &quot;match a '.' followed by either a
'<tt class="literal">c</tt>' or an '<tt class="literal">h</tt>'
followed by one or no '<tt class="literal">pp</tt>'s, followed by
the end of the string&quot;. The <tt class="literal">re.IGNORECASE</tt>
flag tells the regex compiler to ignore case.</p>
<p>So, we expect:</p>
<pre class="programlisting">
assert(isSourceFile('a.cpp'))
assert(isSourceFile('a.C'))
assert(not isSourceFile('a.cc'))
assert(not isSourceFile('a.cppp'))
assert(isSourceFile('a.cc.h'))
</pre>
<p>This time, the assertions hold.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Pass Two:
Relocating Source Files</h2>
</div>
<pre class="programlisting">
def relocateSource(srcdst, dstfile, files_map):
&quot;&quot;&quot;Relocate a source file, correcting included file paths to included.&quot;&quot;&quot;
  fin = file(srcdst[0], 'r')
  fout = file(dstfile, 'w')
  for line in fin
    fout.write(processSourceLine(line,
               srcdst, files_map))
  fin.close()
  fout.close()
</pre>
<p>The function <tt class="function">relocateSource()</tt> simply
reads the input file line by line. Each line is converted and
written to the output file.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Pass Two:
Processing a Line of a Source File</h2>
</div>
<pre class="programlisting">
def processSourceLine(line, srcdst, files_map):
  &quot;&quot;&quot;Process a line from a source file, correcting included file paths.&quot;&quot;&quot;
  include_re = re.compile(
        r'^\s*#\s*include\s*&quot;'
        r'(?P&lt;inc_file&gt;\S+)'
        '&quot;')
  match = include_re.match(line)
  if match:
    mapped_inc_file = mapIncludeFile(
        match.group('inc_file'),
        srcdst,
        files_map)
    line = line[:match.start('inc_file')] + \
           mapped_inc_file + \
           line[match.end('inc_file'):]
  return line
</pre>
<p>The function <tt class="function">processSourceLine</tt> has a
rather more complicated regex at its core. Essentially we want to
spot lines similar to:</p>
<pre class="programlisting">
#include &quot;UserIF/Wgts/Menu.hpp&quot;
</pre>
<p>and extract the double-quoted file path. The complication is
that there may be any amount of whitespace at several points on the
line - hence the appearances of \s*, which reads &quot;zero or more
whitespace characters&quot;.</p>
<p>The three raw strings which comprise the regex will be
concatenated before the regex is compiled - in the same way that
adjacent string literals in C/C++ get joined together in an early
phase of compilation. I have split the string in this way to
emphasise its meaning.</p>
<p>The bizarre <tt class="literal">(?P&lt;inc_file&gt;\S+)</tt>
syntax creates a named group: essentially, it allows us to identify
the sub-group of a match object using &quot;<tt class=
"literal">inc_file</tt>&quot;.</p>
<p>So, the function looks for lines of the form:</p>
<pre class="programlisting">
#include &quot;inc_file&quot;
</pre>
<p>then calls <tt class="literal">mapIncludeFile(inc_file...)</tt>
to find what should now be included, and returns:</p>
<pre class="programlisting">
#include &quot;mapped_inc_file&quot;
</pre>
<p>Incidentally, I am assuming here that the angle brackets are
reserved for inclusion of standard library files - or at least not
the files we are moving. That is, we don't try and alter lines such
as:</p>
<pre class="programlisting">
#include &lt;vector&gt;
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Pass Two:
Mapping Include Files</h2>
</div>
<pre class="programlisting">
import sys

def mapIncludeFile(inc, srcdst, files_map):
  &quot;&quot;&quot;Determine the remapped include file path.&quot;&quot;&quot;
  # First, obtain a path to the include file
  # relative to the original source root
  if os.path.dirname(inc):
    pass # Assumption 1) - &quot;inc&quot; is our
         # relative path
  else:
    # Assumption 2) The file must be located
    # in the same directory as the source file
    # which includes it.
    inc = os.path.join(
               os.path.dirname(srcdst[0]), inc)
  # Look up the new home for the file
  try:
    mapped_inc = files_map[inc]
    if (os.path.dirname(mapped_inc) ==
        os.path.dirname(srcdst[1])):
      mapped_inc = os.path.basename(mapped_inc)
  except KeyError:
    print 'Failed to locate [%s] (included ' \
          'by [%s]) ' \
          'relative to source root.' % (
          include, srcdst[0])
    sys.exit(1)
  return mapped_inc
</pre>
<p>The function <tt class="function">mapIncludeFile</tt> is
actually quite simple, though only because of an assumption I have
made about the way include paths are used in this source tree. The
assumption is:</p>
<div class="blockquote">
<blockquote class="blockquote">
<p>All <tt class="literal">#include</tt> directives give a path
name relative to the root of the source tree, except when the
included file is present in the same directory as the source file -
in which case the file can be included directly by its basename.
Furthermore, there are no source files at the top-level of the
source tree (there are only directories at this level).</p>
</blockquote>
</div>
<p>For source trees with more complex include paths, and
correspondingly more subtle <tt class="literal">#include</tt>
directives, this function will need fairly heavyweight adaptation.
(Alternatively, run another script to simplify your include paths
first.)</p>
<p>If this assumption holds, we can easily determine the original
path to the included files, then use our files_map dictionary to
look up the new path. If the assumption doesn't hold, then the
dictionary look up will fail, raising a <tt class=
"literal">KeyError</tt> exception. The exception is caught, a
diagnostic printed, then the script exits with status 1.</p>
<p>We could test whether &quot;mapped_inc&quot; is a key in our dictionary
before attempting to use it; and if it were absent, we could simply
print an error and continue. However, we choose to view such an
absence as exceptional since it undermines the assumptions made by
the script. We do not wish to risk moving thousands of files
without being sure of what we're doing.</p>
<p>We could test whether &quot;<tt class="literal">mapped_inc</tt>&quot; is a
key in our dictionary before attempting to use it; and if it were
absent, we could simply print an error and continue. However, we
choose to view such an absence as exceptional since it undermines
the assumptions made by the script. We do not wish to risk moving
thousands of files without being sure of what we're doing.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Finishing
Touches</h2>
</div>
<p>The final script appears at the end of this article.</p>
<p>The Windows problem I mentioned earlier is caused by the
platform specific directory path separator. Unix uses forward
slashes, Windows backslashes. My chosen solution is to work with
&quot;normalised&quot; paths internally until we actually write out the
include files, when we make sure forward slashes are used as
separators.</p>
<p>As a gesture towards user-friendliness I have added an
indication of progress and an output log. However, this script
remains very much for software developers who understand what it's
doing. I have chosen &quot;sys.stdout.write&quot; in preference to the
&quot;print&quot; statement used during the script's development, since it
gives greater control over output format.</p>
<p>I have not done anything special with Makefiles, project files,
Jamfiles - or whatever else you use with your build system. There
will be an order of magnitude fewer of these to deal with (unless
you have a very strange build system), but the same techniques
apply.</p>
<p>The script as it stands has several weaknesses. It is, of
course, suited to doing a very specific job: solving the exact
problem laid out at the start of this article, right down to the
specified directory mapping. Whilst it would be overkill to provide
a GUI allowing users to enter this mapping, this input data could
usefully be separated from the body of the script. Similarly, as
already mentioned, the script makes some big assumptions about way
include paths are used in this particular system. Finally, there
are no unit tests - the only testing has been a rather ad hoc
probing of functions during the script's development.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Concluding
Thoughts</h2>
</div>
<p>This sort of bulk re-arrangement of files is well suited to a
scripted solution for reasons of:</p>
<div class="variablelist">
<dl>
<dt><span class="term">Reliability:</span></dt>
<dd>
<p>The script can be shown to work by unit tests and by system
tests on small data sets. Then it can be left to do its job.</p>
</dd>
<dt><span class="term">Efficiency:</span></dt>
<dd>
<p>Editing dozens - perhaps hundreds - of files by hand is error
prone and tedious. What's worse, unless some moratorium on checkins
has been imposed during the restructure, the new structure may be
out of date before it is ready. A script can process megabytes of
source in minutes. Alternatively, it will happy to run at night
after even the most nocturnal of programmers has logged out.</p>
</dd>
<dt><span class="term">Recordability:</span></dt>
<dd>
<p>The script becomes part of the source tree (perhaps in the
tools/scripts directory), in which place it can record accurately
and repeatably the tasks it performs.</p>
</dd>
<dt><span class="term">Reusability:</span></dt>
<dd>
<p>A well-written script can be amended and enhanced to solve
future source re-organisations. And even if it can't be re-used, a
knowledge of scripting can.</p>
</dd>
</dl>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Final
Script</h2>
</div>
<pre class="programlisting">
import os
import re
import shutil
import sys

def mapDirectory(dirmap, dname):
  &quot;&quot;&quot;Return new location of the input directory.&quot;&quot;&quot;
  # Successively reduce the directory path until it
  # matches one of the keys in the directory map.
  mapped_dir = p = dname
  while p and not p in dirmap:
    p = os.path.dirname(p)
  if p:
    mapped_dir = os.path.join(dirmap[p], dname[len(p) + 1:])
  return mapped_dir

def mapFiles(logfp, dirmap, dname, files):
  &quot;&quot;&quot;Return a dictionary mapping files in dir to their new locations.&quot;&quot;&quot;
  dname = os.path.normpath(dname)
  new_dir = mapDirectory(dirmap, dname)
  logfp.write(&quot;mapDirectory [%s] -&gt; [%s]\n&quot; % (dname, new_dir))
  fm = {}
  for f in files:
    src = os.path.join(dname, f)
    dst = os.path.join(new_dir, f)
    logfp.write(&quot;\t[%s] -&gt; [%s]\n&quot; % (src, dst))
    fm[src] = dst
  return fm

def isSourceFile(file):
  &quot;&quot;&quot;Return True if the input file is a C/C++ source file.&quot;&quot;&quot;
  src_re = re.compile(r'\.(c|h)(pp)?$', re.IGNORECASE)
  return src_re.search(file) is not None

def swapSlashes(str):
  &quot;&quot;&quot;Return the input string with backslashes swapped to forward slashes.&quot;&quot;&quot;
  back_to_fwd_re = re.compile(r'\\')
  return back_to_fwd_re.sub('/', str)

def mapIncludeFile(logfp, inc, srcdst, files_map):
  &quot;&quot;&quot;Determine the remapped include file path.&quot;&quot;&quot;
  # First, obtain a path to the include file
  # relative to the original source root
  if os.path.dirname(inc):
    pass # Assumption 1) - &quot;inc&quot; is our relative path
  else:
    # Assumption 2) The file must be located in the
    # same directory as the source file which
    # includes it.
    inc = os.path.join(os.path.dirname(srcdst[0]), inc)
    inc = os.path.normpath(inc)
  # Look up the new home for the file
  try:
    mapped_inc = files_map[inc]
  except KeyError:
    err_msg= ('\nFatal error: Failed to locate [%s] '
              '(included by [%s]) '
              'relative to source root.' %
              (inc, srcdst[0]))
    logfp.write(err_msg)
    sys.stderr.write(err_msg)
    sys.exit(1)
  if (os.path.dirname(mapped_inc) == os.path.dirname(srcdst[1])):
    mapped_inc = os.path.basename(mapped_inc)
  return mapped_inc

def processSourceLine(logfp, line, srcdst, files_map):
  &quot;&quot;&quot;Process a line from a source file, correcting included file paths.&quot;&quot;&quot;
  include_re = re.compile(r'^\s*#\s*include\s*&quot;'
                          r'(?P&lt;inc_file&gt;\S+)'
                          '&quot;')
  match = include_re.match(line)
  if match:
    logfp.write(' [%s] -&gt; ' % line.rstrip())
    mapped_inc_file = mapIncludeFile(logfp,
                                     match.group('inc_file'),
                                     srcdst,
                                     files_map
                                    )
    line = line[:match.start('inc_file')] + mapped_inc_file + line[match.end('inc_file'):]
    line = swapSlashes(line)
    logfp.write('[%s]\n' % line.rstrip())
  return line

def relocateSource(logfp, srcdst, dstfile, files_map):
  &quot;&quot;&quot;Relocate a source file, correcting paths to included files.&quot;&quot;&quot;
  infp = open(srcdst[0], 'r')
  outfp = open(dstfile, 'w')
  logfp.write('Relocating source file [%s] -&gt; [%s]\n' % (srcdst[0], dstfile))
  for line in infp:
    outfp.write(processSourceLine(logfp, line, srcdst, files_map))
  infp.close()
  outfp.close()

def relocate(logfp, srcdst, dst_root, files_map):
  &quot;&quot;&quot;Relocate a file, correcting internal file references.&quot;&quot;&quot;
  dst_file = os.path.join(dst_root, srcdst[1])
  dst_dir = os.path.dirname(dst_file)
  if not os.path.isdir(dst_dir):
    os.makedirs(dst_dir)
  if isSourceFile(dst_file):
    relocateSource(logfp, srcdst, dst_file, files_map)
  else:
    logfp.write('Copying [%s] -&gt; [%s]\n' % (srcdst[0], dst_file))
    shutil.copyfile(srcdst[0], dst_file)

def percent(num, denom):
  &quot;&quot;&quot;Return num / denom expressed as an integer percentage.&quot;&quot;&quot;
  return int(num * 100 / denom)

def printSettings(fp, dmap, src, dst):
  &quot;&quot;&quot;Output script settings to the input file.&quot;&quot;&quot;
  fp.write('Relocating source tree from [%s] ' 
           'to [%s]\n' %
           (src, dst))
  fp.write('Relocating directories:\n')
  for d in dirmap:
    fp.write(' [%s] -&gt; [%s]\n' % (d, dmap[d]))
  fp.write('\n')

# Main processing starts here...
# First, set up script data:
# - a dictionary mapping existing dirs to
# their new locations,
# - the source and destination roots for
# the source tree,
np = os.path.normpath
dirmap = {
  np('png') : np('graphics/thirdparty/png'),
  np('jpeg') : np('graphics/thirdparty/jpeg'),
  np('bitmap') : np('graphics/common/bitmap'),
  np('UserIF') : np('ui'),
  np('UserIF/Wgts') : np('ui/widgets'),
  np('os') : np('platform/os'),
  np('os/hpux') : np('platform/os/hpux10')
}

from_root = '.'
to_root = '../relocate'

# Further initialisation.
logfp = open('relocate.log', 'w')
printSettings(logfp, dirmap, from_root, to_root)
printSettings(sys.stdout, dirmap, from_root, to_root)

# Initialise a dictionary to map current file path
# to new file path.
files_map = {}

# Fill the dictionary by remapping all files beneath
# the current working directory.
sys.stdout.write('Preprocessing files. '
                 'Please wait.\n')

for (subdir, dirs, files) in os.walk(from_root):
  files_map.update(
    mapFiles(logfp, dirmap, subdir, files)
  )

# Create the new root directory for relocated files
os.makedirs(to_root)

# Now actually perform the relocation
count = len(files_map)
item = 0
sys.stdout.write('Relocating [%d] files.\n'
                 'Logfile at [%s]\n'
                 'Progress [%02d%%]' %
                 (count, logfp.name, percent(item, count))
                 )

for srcdst in files_map.items():
  item += 1
  sys.stdout.write('\b\b\b\b%02d%%]' % percent(item, count))
  relocate(logfp, srcdst, to_root, files_map)

logfp.close()
sys.stdout.write('\nRelocation completed '
                 'successfully.\n')
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>
</div>
<div class="orderedlist">
<ol type="1">
<li>
<p>I used the PythonWin IDE (available with the ActivePython
distribution) while developing this script, and can recommend it to
Windows users. The context sensitive help is great, the debugger is
good, and the key-bindings - incredibly - include my favourites
from both GNU emacs and Microsoft Visual Studio. ActivePython:
<a href="http://www.activestate.com/Products/ActivePython/" target=
"_top">http://www.activestate.com/Products/ActivePython/</a></p>
</li>
<li>
<p>Thanks to Dan Tallis for reviewing an earlier draft of this
article, and for inspiring me to learn Python.</p>
</li>
</ol>
</div>
<div class="bibliography">
<div class="bibliomixed"><a name="python" id="python"></a>
<p class="bibliomixed">[python] <span class="bibliomisc"><a href=
"http://www.python.org" target="_top">http://www.python.org</a> -
the official website for the Python language.</span></p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
