    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A Python project (2)</title>
        <link>https://members.accu.org/index.php/journals/1213</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 15, #3 - Jun 2003 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c108/">153</a>
                    (14)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c108-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c108+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A Python project (2)</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 June 2003 13:15:57 +01:00 or Tue, 03 June 2003 13:15:57 +01:00</p>
<p><strong>Summary:</strong>&nbsp;<p>This article continues my description of the Python code in a small project designed to help increase one's vocabulary in a foreign language by playing audio samples.</p></p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e22" id="d0e22"></a></h2>
</div>
<p>This article continues my description of the Python code in a
small project designed to help increase one's vocabulary in a
foreign language by playing audio samples.</p>
<p>Some readers might have noticed that the <tt class=
"literal">__init__</tt> method (i.e. the constructor) of the class
<tt class="literal">Lesson</tt> set two variables <tt class=
"literal">self.newWords</tt> and <tt class=
"literal">self.oldWords</tt> which were not explained in the text.
This was an oversight on my part due to having rushed the article
to publication (to fill a gap). The original code kept track of the
number of new and old words added to each lesson so that it could
give the user a textual report, but in the interests of brevity I
deleted this functionality (among other things) when writing the
article. But I overlooked the initialisation of those two
variables, which were now no longer needed.</p>
<p>To continue from where we left off last time, we need to
construct some <tt class="literal">GluedEventLists</tt> for the
<tt class="literal">Lesson</tt>. Each list will correspond to the
revision of one word at increasing intervals as previously
described; each time the word is revised, it will be prompted for
and then repeated one or more times, with appropriate delay to
allow the user to anticipate the response. (In other words, the
computer should play recordings such as &quot;please say&quot;, word in
English, pause, word in Chinese; if the word is relatively new then
this might be repeated.) These clusters of &quot;anticipation events&quot;
will be separated with &quot;glue&quot; so that the whole sequence can be
added to the <tt class="literal">Lesson</tt> and interspersed with
other sequences (but not interspersed in such a way that the
individual anticipation events are interfered with).</p>
<p>Here is a function that returns the name of a randomly-chosen
prompt (such as &quot;please say&quot;), along with an integer specifying
whether the prompt should preceed (0) or follow (1) the
first-language word. The function is hard-coded so that the first
time a word is introduced it will say &quot;listen and repeat&quot; and the
second time it says &quot;say again&quot;; subsequent occurrences are chosen
at random from the rest of the list.</p>
<pre class="programlisting">
def randomInstruction(numTimesBefore):
  if not numTimesBefore: return (&quot;repeatAfterMe.wav&quot;,0)
  if numTimesBefore==1: return (&quot;sayAgain.wav&quot;,1)
    return random.choice([
      (&quot;whatSay.wav&quot;,0),
      (&quot;pleaseSay.wav&quot;,1),
      (&quot;nowPleaseSay.wav&quot;,1),
      (&quot;tryToSay.wav&quot;,1),
      ])
</pre>
<p>Now for a function that returns a single item in an
&quot;anticipation sequence&quot;, given the first-language word (<tt class=
"literal">promptFile</tt>) and second-language word (<tt class=
"literal">zhFile</tt>) and the number of previous repetitions of
that word. The result is returned as a <tt class=
"literal">CompositeEvent</tt> made up of all appropriate recordings
and pauses; such <tt class="literal">CompositeEvent</tt>s will not
be interleaved with others, but the <tt class="literal">Glue</tt>
that intersperses them will be. Note that the variable <tt class=
"literal">numTimesBefore</tt> means the number of times that the
word has ever been repeated in the entire &quot;course&quot; (not just the
current <tt class="literal">Lesson</tt>); the counts persist
between sessions. The hardcoding of number of repetitions is
arbitrary and experimental (you can sometimes get away with that
sort of thing in Python scripts).</p>
<pre class="programlisting">
def anticipation(promptFile,zhFile,numTimesBefore=0):
  instruction, instrIsPrefix = randomInstruction(numTimesBefore)
  instruction = promptsDirectory+os.sep+instruction
  zhFile=samplesDirectory+os.sep+zhFile
  promptFile=samplesDirectory+os.sep+promptFile
  secondPause = 1+WavEvent(zhFile).length
  if not numTimesBefore: anticipatePause = 1
  else: anticipatePause = secondPause
  if numTimesBefore == 1: numRepeat = 3
  elif numTimesBefore &lt; 5: numRepeat = 2
  elif numTimesBefore &lt; 10: numRepeat=random.choice([1,2])
  else: numRepeat = 1
  pauseAfter = random.choice([1,2,3])
  # Now ready to go
  list = []
  if instrIsPrefix: list.append(WavEvent(instruction))
  list.append(WavEvent(promptFile))
  if not instrIsPrefix: list.append(WavEvent(instruction))
  for i in range(numRepeat):
    list.append(Event(anticipatePause))
    list.append(WavEvent(zhFile))
    anticipatePause = secondPause
  list.append(Event(pauseAfter))
  return CompositeEvent(list)
</pre>
<p>Now that this has been defined, we can define another function
that builds an &quot;anticipation sequence&quot; from a list of such events
separated by <tt class="literal">Glue</tt>. In this case the number
of previous repetitions will be incremented with each item; the
function to construct a list is passed a range of values (start
value and end value) for this, and will generate the list with an
appropriate number of items to fill this range. Thus we can use the
function to continue where a previous lesson left off. Again, a
certain amount of guesswork is involved with the numbers; I've
found these work reasonably well if you are already familiar with
the second language.</p>
<pre class="programlisting">
def anticipationSequence(promptFile,zhFile,start,to):
  sequence = []
  sequence.append(GluedEvent(initialGlue(),anticipation(promptFile,zhFile,start)))
  for i in range(start+1,to):
    sequence.append(GluedEvent(glueBefore(i),anticipation(promptFile,zhFile,i)))
  return sequence

def glueBefore(num):
  if num==0: return initialGlue()
  elif num==1: return Glue(15,15)
  elif num==2: return Glue(45,15)
  elif num==3: return Glue(130,30)
  elif num==4: return Glue(500,60)
  else: return Glue(500,150+3*(num-5))
</pre>
<p><tt class="literal">initialGlue()</tt> will return some glue
that is arbitrarily stretchable, to separate the first event from
the beginning of the lesson (since the word can be introduced
anywhere in the lesson). Other calls to <tt class=
"literal">Glue</tt>'s constructor give it the ideal length of glue
and the stretchability (the &quot;+/-&quot; of the initial length) as
arguments. Here is the implementation of <tt class=
"literal">initialGlue()</tt>:</p>
<pre class="programlisting">
def initialGlue(): return Glue(0,maxLenOfLesson)
</pre>
<p>Now we will have a class <tt class=
"literal">ProgressDatabase</tt> that keeps track of the user's
&quot;progress&quot; (the total number of times each word has been repeated
in previous sessions) and generates new lessons accordingly. Its
main member data is a list of words and repetitions; a list is used
rather than a dictionary because that way we can sort it as
required. The list can easily be saved and loaded to a text file in
Python syntax by using Python's provided functionality:</p>
<pre class="programlisting">
class ProgressDatabase:
  def __init__(self):
    self.data = []
    try:
      f = open(progressFile)
      self.data = eval(f.read())
      f.close()
    except IOError: pass
    except SyntaxError: pass # maybe /dev/null
    mergeProgress(self.data,scanSamples())
  def save(self):
    f = open(progressFile,'w')
    f.write(progressFileHeader)
    pprint.PrettyPrinter(indent=2,width=60,stream=f).pprint(self.data)
    f.close()
</pre>
<p>Once the data has been loaded, it is merged with the result of
<tt class="literal">scanSamples</tt>, a function to scan a
sound-samples directory for matching files in the first and second
language. This means that new words can be added to the vocabulary
merely by putting them in the right directory, without having to
take any special action to tell the program about them. Here is an
implementation of <tt class="literal">scanSamples</tt>, along with
a companion function <tt class="literal">isDirectory</tt> that
tries to determine in a platform-independent way whether or not a
particular file is a directory, which is used for recursing
subdirectories.</p>
<pre class="programlisting">
def isDirectory(directory):
  oldDir = os.getcwd()
    try:
      os.chdir(directory)
      ret = 1
    except OSError:
      ret = 0
    os.chdir(oldDir)
    return ret

def scanSamples(directory=samplesDirectory):
  retVal = []
  ls = os.listdir(directory)
  firstLangSuffix = &quot;_&quot;+firstLanguage+&quot;.&quot;
  secLangSuffix = &quot;_&quot;+secondLanguage+&quot;.&quot;
  for file in ls:
    if isDirectory(directory+os.sep+file):
      for i,j,k in scanSamples(directory+os.sep+file):
        retVal.append((i,file+os.sep+j,file+os.sep+k))
    elif file.find(firstLangSuffix)&gt;=0:
      file2 = file.replace(firstLangSuffix,secLangSuffix)
      if file2 in ls:
        retVal.append((0,file,file2))
  return retVal
</pre>
<p>The <tt class="literal">mergeProgress</tt> function merges a
progress database with a samples scan, to pick up any new samples
that were added since last time the program saved its state:</p>
<pre class="programlisting">
def mergeProgress(progList,scan):
  for (_,j,k) in scan:
    found=0
    for (i2,j2,k2) in progList:
      if j==j2:
        found=1
        break
    if not found: progList.append((0,j,k))
  return progList
</pre>
<p><tt class="literal">ProgressDatabase</tt>'s method to create a
<tt class="literal">Lesson</tt> involves a little more &quot;scripting&quot;
(experimental / arbitrary coding). First it tries to add some
recently-learned old words, then new words, then some older words
and so on. Each group of words is handled by a service routine that
tries to add words according to constraints; for example, each new
word should be repeated at least <tt class=
"literal">newWordsTryAtLeast</tt> times (initially, <tt class=
"literal">newInitialNumToTry</tt> repetitions should be tried; if
this cannot be fitted in, one repetition less should be tried and
so on down to <tt class="literal">newWordsTryAtLeast</tt>). The
values of the global variables referred to will be open to
tinkering later.</p>
<pre class="programlisting">
def makeLesson(self):
  self.l = Lesson()
  self.data.sort() ; jitter(self.data)
  self.exclude = {}
  # First priority: Recently-learned old words
  # (But not too many - want room for new words)
  self.addToLesson(1,knownThreshold,1,recentInitialNumToTry,maxReviseBeforeNewWords)
  # Now some new words
  self.addToLesson(0,0,newWordsTryAtLeast,newInitialNumToTry,maxNewWords)
  # Now some more recently-learned old words
  self.addToLesson(1,knownThreshold,1,recentInitialNumToTry,0)
  self.addToLesson(knownThreshold,reallyKnownThreshold,1,recentInitialNumToTry,0)
  # Finally, fill in the gaps with ancient stuff (1 try only of each)
  self.addToLesson(reallyKnownThreshold,-1,1,1,0)
  l = self.l ; del self.l
  assert l.events,&quot;Didn't manage to put anything in the lesson&quot;
  return l
def addToLesson(self,minTimesDone=0,maxTimesDone=-1,minNumToTry=0, \
                maxNumToTry=0,maxNumToAdd=0):
  numberAdded = 0
  numToTry = maxNumToTry
  while numToTry &gt;= minNumToTry:
    managed = 0
    for i in range(len(self.data)):
      if maxNumToAdd and numberAdded &gt;= maxNumToAdd: break # too many
      if self.exclude.has_key(i): continue # already had it
      (timesDone,promptFile,zhFile)=self.data[i]
      if timesDone &lt; minTimesDone or (maxTimesDone&gt;=0 and timesDone &gt; maxTimesDone):continue # out of range this time
      if timesDone &gt;= knownThreshold: thisNumToTry = min(random.choice([2,3,4]),numToTry)
      else: thisNumToTry = numToTry
      if timesDone &gt;= randomDropThreshold \
        and random.random() &lt;= calcDropLevel(timesDone):
        # dropping it at random
        self.exclude[i] = 1 # pretend we've done it
        continue
      try:
        self.l.addSequence(anticipationSequence(promptFile,zhFile,timesDone, \
        timesDone+thisNumToTry))
        managed = 1
        numberAdded = numberAdded + 1
        self.exclude[i] = 1
        # Keep a count
        if not timesDone: self.l.newWords=self.l.newWords + 1
        else: self.l.oldWords=self.l.oldWords+1
        self.data[i]=(timesDone+thisNumToTry,promptFile,zhFile)
      except StretchedTooFar:
        pass
      except IOError:
        # maybe this file isn't accessible at the moment; keep the progress data though
        self.exclude[i] = 1 # save trouble
    if not managed:
      numToTry = numToTry - 1
      firstPass = 0
  return numberAdded
</pre>
<p>That code referred to a function <tt class=
"literal">calcDropLevel</tt> which calculates the probability that
an old word should be completely omitted from a lesson; this
increases with the number of previous repetitions of the word, so
as to avoid monotony. Here is one possible implementation:</p>
<pre class="programlisting">
def calcDropLevel(timesDone):
  # assume timesDone &gt; randomDropThreshold
  if timesDone &gt; randomDropThreshold2:
    return randomDropLevel2
  # or linear interpolation between the two thresholds
  return dropLevelK * timesDone + dropLevelC
try:
  dropLevelK = (randomDropLevel2-randomDropLevel)/(randomDropThreshold2-randomDropThreshold)
  dropLevelC = randomDropLevel-dropLevelK*randomDropThreshold
except ZeroDivisionError: # thresholds are the same
  dropLevelK = 0
  dropLevelC = randomDropLevel
</pre>
<p>The constants will be defined later. Also there is a function
<tt class="literal">jitter()</tt> which &quot;jitters&quot; (slightly
shuffles) the elements of a list, again to avoid monotony:</p>
<pre class="programlisting">
def jitter(list):
  # Assumes item is a tuple and item[0] might be ==
  # Doesn't touch &quot;new&quot; words (tries==0)
  swappedLast = 0
  for i in range(len(list)-1):
    if list[i][0] and ((list[i][0] == list[i+1][0] and random.choice([1,2])==1) or \
      or (not list[i][0] == list[i+1][0] \
          and random.choice([1,2,3,4,5,6])==1 \
          and not swappedLast)):
      x = list[i]
      del list[i]
      list.insert(i+1,x)
      swappedLast = 1
    else: swappedLast = 0
</pre>
<p>As another method of avoiding monotony, the length of long glue
is randomly adjusted before checking for collisions (this was in
the previous issue's code but was not explained, sorry).</p>
<p>All that remains, apart from defining the constants, is to write
a main program. We'll put it in a function, and only call the
function if this Python module is the main module; that way it can
also be used as a library module in which case it will not execute
its main() when imported.</p>
<pre class="programlisting">
def main():
  dbase = ProgressDatabase()
  soFar = dbase.message()
  lesson = dbase.makeLesson()
  firstTime = 1
  while 1:
    lesson.play()
    if firstTime:
      dbase.save()
      firstTime = 0
    if not getYN(&quot;Hear this lesson again?&quot;): break
def getYN(msg):
  ans=None
  while not ans=='y' and not ans=='n':
    ans = raw_input(&quot;%s (y/n): &quot; % (msg,))
  if ans=='y': return 1
  return 0
if __name__==&quot;__main__&quot;:
  main()
</pre>
<p>Of course, in a production release <tt class=
"literal">getYN</tt> and so forth should be more
internationalised.</p>
<p>The initialisation of the constants needs to go before
<tt class="literal">main()</tt> is called; I think it's a good idea
to put them near the top of the script for easy access.</p>
<pre class="programlisting">
samplesDirectory = &quot;samples&quot;
promptsDirectory = &quot;prompts&quot;
firstLanguage = &quot;en&quot;
secondLanguage = &quot;zh&quot;
maxLenOfLesson = 30*60 # 30 minutes
maxNewWords = 5
maxReviseBeforeNewWords = 3
newInitialNumToTry = 5
recentInitialNumToTry = 3
newWordsTryAtLeast = 3
knownThreshold = 5
reallyKnownThreshold = 10
randomAdjustmentThreshold = 500
randomDropThreshold = 14
randomDropLevel = 0.67
randomDropThreshold2 = 35
randomDropLevel2 = 0.97
progressFile = &quot;progress.txt&quot;
progressFileHeader = &quot;&quot;&quot;# -*- mode: python -*-
# Do not add more comments - this file will be overwritten\n&quot;&quot;&quot;
</pre>
<p>For convenience, I also put the following code to support
overriding the defaults in a different module called override.py,
if it exists. The code should go before any of the variables are
actually used; note that the default values of function and method
parameters are evaluated at parse time, so this code should really
go before any of the other code (but after the definition of the
defaults).</p>
<pre class="programlisting">
try:
from override import *
except ImportError: pass
</pre>
<p>Finally, to complete the script a few more imports are needed
toward the beginning:</p>
<pre class="programlisting">
import time,sched,sndhdr,sys,os,random,math,pprint
if sys.platform.find(&quot;win&quot;)&gt;=0: import winsound
else: winsound=None
</pre></div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
