    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Adding Python 3 Compatibility to Python 2 Code</title>
        <link>https://members.accu.org/index.php/journals/2762</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 32, #1 - March 2020 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c408/">321</a>
                    (11)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c408-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c408+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Adding Python 3 Compatibility to Python 2 Code</h1>
<p><strong>Author:</strong>&nbsp;Bob Schmidt</p>
<p>
<strong>Date:</strong> 04 March 2020 23:05:11 +00:00 or Wed, 04 March 2020 23:05:11 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Silas S. Brown explains how to cope with the differences.</p>
<p><strong>Body:</strong>&nbsp;<p>When Python 3 was new, its pace of change was fairly quick, and as most of us didnâ€™t want to spend too long rewriting our code to adapt to every new release, we carried on using the far more stable Python 2. Now that Python 2 is being thrown out of GNU/Linux distributions, weâ€™re finally having to convert all our code to Python 3 (unless we want to compile Python 2 in our home directories and just hope no more security issues arise, although that approach is not possible in every situation), and Pythonâ€™s â€˜2to3â€™ tool does not help with everything (I donâ€™t use it as in my case it did more harm than good to my code). Since I have a lot of legacy Python code and Iâ€™d rather work with â€˜stable intermediate formsâ€™, I have been trying to convert as much as possible of it to work on both Python 2 and Python 3 from the same codebase. But this dual-compatibility has more caveats.</p>

<h2>Byte-strings</h2>

<p>In Python 2, the default string type is a byte-string, and Unicode strings are something else. But a Unicode string containing only ASCII will compare as equal to the same ASCII in a byte-string, and the index operator <code>[]</code> on a string will give a string of length 1 in both byte and Unicode strings. In Python 3, however, the default string type is now Unicode (and the representation for byte string-literals is not compatible with all versions of Python 2), and more subtly a Unicode string containing only ASCII will <em>not</em> be considered equal to its equivalent byte-string, and the index operator <code>[]</code> on a byte-string gives an integer: if you want a string of length 1 then youâ€™d better convert it into slice notation i.e. <code>s[i:i+1]</code> instead of <code>s[i]</code>. Since the slice-notation version behaves identically in Python 2 and Python 3, I suggest converting all single-index operators to that, plus making sure as much as possible of your code will work regardless of whether itâ€™s given byte-strings or Unicode-strings as input, using <code>type</code> if necessary to determine the type of its input. But remember <code>str</code> means different things on the two platforms; a quick way of checking if weâ€™re on Python 3 is to check if <code>type(&quot;&quot;)==type(u&quot;&quot;)</code>.</p>

<p>Code that mentions <code>encode('utf-8')</code> or <code>decode('utf-8')</code> will particularly need attention (and even more so if other character sets are in use). I also find it useful to define some small helper functions to â€˜make sure this thing is a byte-stringâ€™ (calling <code>.encode</code> if itâ€™s Unicode) or â€˜make sure this thing is a Unicode-stringâ€™ (calling <code>.decode</code> if itâ€™s a byte-string) â€“ sometimes these are best done in such a way that non-string objects can be passed through unchanged. String operations like <code>.replace</code> (and the regex library) can work on both Unicode strings and byte-strings, but theyâ€™ll fault if thereâ€™s inconsistency between their parameters (e.g. <code>b.replace(x,y)</code> where <code>b</code> is a byte-string and <code>x</code> and <code>y</code> are Unicode strings will fail), so those â€˜make sure this thing is aâ€™ helper functions can be especially useful for porting regex-related code.</p>

<p>Another thing to be aware of is that file I/O (and <code>stdin</code>, <code>stdout</code> and <code>stderr</code>) might or might not be done in UTF-8 by default: it depends on your systemâ€™s locale. When you have the luxury of a GNU/Linux system thatâ€™s set to UTF-8 by default, itâ€™s easy to forget that the Microsoft Windows platform has an annoying habit of setting locale charset to something other than UTF-8, and even some Linux-based environments (such as containers) use the â€˜Câ€™ locale instead, in which case Python 3â€™s I/O (when not done in binary mode) will fault on anything that isnâ€™t ASCII. To work around this from inside your script (i.e. if setting up the right environment variables before Python runs is not an option), the easiest way is probably to write code like Listing 1. Obviously you should do this only if you know for sure that the input and output really should be in UTF-8 and the systemâ€™s locales are simply not set up properly (see Listing 1).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
if type(&quot;&quot;)==type(u&quot;&quot;): # Python 3+
  import codecs
  # Make sure stdin and stdout are set to UTF-8,
  # even if the system's locales don't have 
  # UTF-8.
  stdin=codecs.getreader(&quot;utf-8&quot;)
    (sys.stdin.buffer)
  stdout=codecs.getwriter(&quot;utf-8&quot;)
    (sys.stdout.buffer)
  old_stdin, sys.stdin = sys.stdin, stdin
  old_stdout, sys.stdout = sys.stdout, stdout
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<h2>Numbers</h2>

<p>In Python 2, division of two integers is an integer operation just as it is in C. But in Python 3, division of two integers will convert it to a floating-point number, and if you wanted to have the integer then you must ask for it explicitly. This likely means many of the divisions in your code will need some attention. Also the <code>L</code> suffix for long integers has been removed; if you want compatibility with early versions of Python 2 (which required <code>L</code>) and also Python 3, youâ€™ll probably have to reach these numbers by multiplying up or similar, and may also have to detect the Python version at runtime and go down different branches as appropriate.</p>

<h2>Standard output and error</h2>

<p>Python 3 of course makes <code>print</code> into a function which requires parentheses (and I still donâ€™t understand why that change gets more attention than the byte-strings change, but perhaps I do more work with Unicode than most English developers do). <code>print</code> with parentheses will also work in Python 2, but if supplied more than one argument, it will make its arguments look like a tuple, which is probably not what you want. Compatible with both versions is to restrict <code>print</code> to one argument and use format strings or construct the string manually (but remember to account for Unicode string / byte string differences in Python 3); also of note is that Python 2 code containing <code>print</code> by itself for a blank line will need to be written as <code>print()</code> in Python 3, or <code>print(&quot;&quot;)</code> for compatibility with both versions.</p>

<p>You might prefer to use <code>sys.stdout</code>, and/or <code>sys.stderr</code> for the 	â€˜standard errorâ€™ stream (which is a separate stream if your programâ€™s standard output has been redirected to a file or pipe). But another difference between Python 2 and Python 3 is that, in Python 3, <code>sys.stderr</code> is buffered in the same way as <code>sys.stdout</code> is, i.e. the output wonâ€™t happen until you call <code>sys.stderr.flush()</code> or output a newline. If this matters, you might need to add some calls to <code>sys.stderr.flush()</code> that are unnecessary (but harmless) in Python 2.</p>

<p>Reading and writing from files in Python 3 automatically converts to/from Unicode strings; if you want bytes, you must either open the file in binary mode (<code>rb</code> or <code>wb</code>) or else use the fileâ€™s <code>.buffer</code> member (which is not present on Python 2, so youâ€™ll have to write an <code>if</code>-<code>else</code> branch depending on the Python version). Note that <code>.buffer</code> is only a weak reference: you must keep a reference to the file itself, not just its buffer, or youâ€™ll find it has been automatically closed.</p>

<h2>Library changes</h2>

<p>There are too many standard library changes between Python 2 and Python 3. In some cases itâ€™s just a matter of importing a different module, and you can have <code>if</code>-<code>else</code> branches in your imports to maintain compatibility with both versions. For example, <code>commands.getoutput</code> now needs to be <code>subprocess.getoutput</code>, <code>thread</code> now needs to be <code>_thread</code>, and various HTML-related and urllib-related libraries may need importing differently. But there are other libraries with more substantial changes, e.g. the <code>email</code> module works completely differently in Python 3 (my IMAP-processing code is still stuck in Python 2 for this reason); some usage of <code>StringIO</code> might need to be <code>BytesIO</code> on Python 3 (and now imported from <code>io</code>); some exceptions have been renamed and might need assigning for compatibility; and version 6 of the third-party Tornado library has completely changed the way it does callbacks and <code>IOLoop</code> (although I managed to make Web Adjuster compatible with both versions by writing some fancy decorators).</p>

<p>Some built-in functions are also no longer available in Python 3, so you might have to write things like:</p>

<pre class="programlisting">
  try: unichr # Python 2
  except: unichr,xrange = chr,range # Python 3</pre>
  
<p>to keep your code compatible. Also, some things that used to return lists now return iterators, and if you want a list you must explicitly ask for one, so for example you can no longer say:</p>

<pre class="programlisting">
  Unicode_Greek_letters = range(0x3b1,0x3ca) 
  + range(0x391,0x3aa) # wrong
</pre>

<p>youâ€™ll have to say <code>list(range())</code> instead. Most notably, <code>.items()</code> no longer returns a list: some Python 2 code will assume that it does, and will assume that the dictionary from which it was taken may be changed without averse effect on the <code>.items()</code> list it has (this is now likely to raise an exception if used in a loop), so you may wish to wrap all use of <code>.items()</code> in <code>list()</code> to help port this.</p>

<p>Also the <code>sort()</code> functions and methods have changed: they no longer take comparison functions, only key functions. Python 2 <code>sort()</code> can also take <code>key=</code>, so if you can rewrite all your comparison functions as key functions, i.e. functions that return the â€˜equivalent valueâ€™ of a single item for sorting purposes, then you can write this in a way thatâ€™s compatible with both 2 and 3.</p>

<p>There are many other subtle changes, and you will need to test the code carefully in both versions of Python before considering it compatible with both. But the above changes were the most important ones to make in my code so far.</p>

<h2>Summary</h2>

<p>The most likely places that will need amending are:</p>

<ol>
	<li>Anywhere where Unicode is converted to/from UTF-8, or where files are written/read</li>
	<li>Any <code>[]</code> index operators that might be applied to byte strings (use slices for maximum compatibility)</li>
	<li>Any use of <code>.replace</code> or <code>re.sub</code> (make sure itâ€™s all the same type)</li>
	<li>Any divisions (should we take the integer?)</li>
	<li><code>print</code> and <code>import</code> statements</li>
	<li>Any writes to <code>sys.stderr</code> (do we need to flush?)</li>
	<li>Any use of <code>.items()</code> (does it need to be put into a <code>list()</code> now?), and <code>sort()</code> with comparison function</li>
</ol>

<p>As always, good test coverage is the most important thing, and you may have to go through several iterations before it works.</p>

<p class="bio"><span class="author"><b>Silas S. Brown</b></span> is a partially-sighted Computer Science post-doc in Cambridge who currently works in part-time assistant tuition and part-time for Oracle. He has been an ACCU member since 1994.</p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
