Title: A Python project (2)

Author:

Date: 03 June 2003 13:15:57 +01:00 or Tue, 03 June 2003 13:15:57 +01:00

Summary:

This article continues my description of the Python code in a small project designed to help increase one's vocabulary in a foreign language by playing audio samples.

Body:

This article continues my description of the Python code in a small project designed to help increase one's vocabulary in a foreign language by playing audio samples.

Some readers might have noticed that the __init__ method (i.e. the constructor) of the class Lesson set two variables self.newWords and self.oldWords which were not explained in the text. This was an oversight on my part due to having rushed the article to publication (to fill a gap). The original code kept track of the number of new and old words added to each lesson so that it could give the user a textual report, but in the interests of brevity I deleted this functionality (among other things) when writing the article. But I overlooked the initialisation of those two variables, which were now no longer needed.

To continue from where we left off last time, we need to construct some GluedEventLists for the Lesson. Each list will correspond to the revision of one word at increasing intervals as previously described; each time the word is revised, it will be prompted for and then repeated one or more times, with appropriate delay to allow the user to anticipate the response. (In other words, the computer should play recordings such as "please say", word in English, pause, word in Chinese; if the word is relatively new then this might be repeated.) These clusters of "anticipation events" will be separated with "glue" so that the whole sequence can be added to the Lesson and interspersed with other sequences (but not interspersed in such a way that the individual anticipation events are interfered with).

Here is a function that returns the name of a randomly-chosen prompt (such as "please say"), along with an integer specifying whether the prompt should preceed (0) or follow (1) the first-language word. The function is hard-coded so that the first time a word is introduced it will say "listen and repeat" and the second time it says "say again"; subsequent occurrences are chosen at random from the rest of the list.

def randomInstruction(numTimesBefore):
  if not numTimesBefore: return ("repeatAfterMe.wav",0)
  if numTimesBefore==1: return ("sayAgain.wav",1)
    return random.choice([
      ("whatSay.wav",0),
      ("pleaseSay.wav",1),
      ("nowPleaseSay.wav",1),
      ("tryToSay.wav",1),
      ])

Now for a function that returns a single item in an "anticipation sequence", given the first-language word (promptFile) and second-language word (zhFile) and the number of previous repetitions of that word. The result is returned as a CompositeEvent made up of all appropriate recordings and pauses; such CompositeEvents will not be interleaved with others, but the Glue that intersperses them will be. Note that the variable numTimesBefore means the number of times that the word has ever been repeated in the entire "course" (not just the current Lesson); the counts persist between sessions. The hardcoding of number of repetitions is arbitrary and experimental (you can sometimes get away with that sort of thing in Python scripts).

def anticipation(promptFile,zhFile,numTimesBefore=0):
  instruction, instrIsPrefix = randomInstruction(numTimesBefore)
  instruction = promptsDirectory+os.sep+instruction
  zhFile=samplesDirectory+os.sep+zhFile
  promptFile=samplesDirectory+os.sep+promptFile
  secondPause = 1+WavEvent(zhFile).length
  if not numTimesBefore: anticipatePause = 1
  else: anticipatePause = secondPause
  if numTimesBefore == 1: numRepeat = 3
  elif numTimesBefore < 5: numRepeat = 2
  elif numTimesBefore < 10: numRepeat=random.choice([1,2])
  else: numRepeat = 1
  pauseAfter = random.choice([1,2,3])
  # Now ready to go
  list = []
  if instrIsPrefix: list.append(WavEvent(instruction))
  list.append(WavEvent(promptFile))
  if not instrIsPrefix: list.append(WavEvent(instruction))
  for i in range(numRepeat):
    list.append(Event(anticipatePause))
    list.append(WavEvent(zhFile))
    anticipatePause = secondPause
  list.append(Event(pauseAfter))
  return CompositeEvent(list)

Now that this has been defined, we can define another function that builds an "anticipation sequence" from a list of such events separated by Glue. In this case the number of previous repetitions will be incremented with each item; the function to construct a list is passed a range of values (start value and end value) for this, and will generate the list with an appropriate number of items to fill this range. Thus we can use the function to continue where a previous lesson left off. Again, a certain amount of guesswork is involved with the numbers; I've found these work reasonably well if you are already familiar with the second language.

def anticipationSequence(promptFile,zhFile,start,to):
  sequence = []
  sequence.append(GluedEvent(initialGlue(),anticipation(promptFile,zhFile,start)))
  for i in range(start+1,to):
    sequence.append(GluedEvent(glueBefore(i),anticipation(promptFile,zhFile,i)))
  return sequence

def glueBefore(num):
  if num==0: return initialGlue()
  elif num==1: return Glue(15,15)
  elif num==2: return Glue(45,15)
  elif num==3: return Glue(130,30)
  elif num==4: return Glue(500,60)
  else: return Glue(500,150+3*(num-5))

initialGlue() will return some glue that is arbitrarily stretchable, to separate the first event from the beginning of the lesson (since the word can be introduced anywhere in the lesson). Other calls to Glue's constructor give it the ideal length of glue and the stretchability (the "+/-" of the initial length) as arguments. Here is the implementation of initialGlue():

def initialGlue(): return Glue(0,maxLenOfLesson)

Now we will have a class ProgressDatabase that keeps track of the user's "progress" (the total number of times each word has been repeated in previous sessions) and generates new lessons accordingly. Its main member data is a list of words and repetitions; a list is used rather than a dictionary because that way we can sort it as required. The list can easily be saved and loaded to a text file in Python syntax by using Python's provided functionality:

class ProgressDatabase:
  def __init__(self):
    self.data = []
    try:
      f = open(progressFile)
      self.data = eval(f.read())
      f.close()
    except IOError: pass
    except SyntaxError: pass # maybe /dev/null
    mergeProgress(self.data,scanSamples())
  def save(self):
    f = open(progressFile,'w')
    f.write(progressFileHeader)
    pprint.PrettyPrinter(indent=2,width=60,stream=f).pprint(self.data)
    f.close()

Once the data has been loaded, it is merged with the result of scanSamples, a function to scan a sound-samples directory for matching files in the first and second language. This means that new words can be added to the vocabulary merely by putting them in the right directory, without having to take any special action to tell the program about them. Here is an implementation of scanSamples, along with a companion function isDirectory that tries to determine in a platform-independent way whether or not a particular file is a directory, which is used for recursing subdirectories.

def isDirectory(directory):
  oldDir = os.getcwd()
    try:
      os.chdir(directory)
      ret = 1
    except OSError:
      ret = 0
    os.chdir(oldDir)
    return ret

def scanSamples(directory=samplesDirectory):
  retVal = []
  ls = os.listdir(directory)
  firstLangSuffix = "_"+firstLanguage+"."
  secLangSuffix = "_"+secondLanguage+"."
  for file in ls:
    if isDirectory(directory+os.sep+file):
      for i,j,k in scanSamples(directory+os.sep+file):
        retVal.append((i,file+os.sep+j,file+os.sep+k))
    elif file.find(firstLangSuffix)>=0:
      file2 = file.replace(firstLangSuffix,secLangSuffix)
      if file2 in ls:
        retVal.append((0,file,file2))
  return retVal

The mergeProgress function merges a progress database with a samples scan, to pick up any new samples that were added since last time the program saved its state:

def mergeProgress(progList,scan):
  for (_,j,k) in scan:
    found=0
    for (i2,j2,k2) in progList:
      if j==j2:
        found=1
        break
    if not found: progList.append((0,j,k))
  return progList

ProgressDatabase's method to create a Lesson involves a little more "scripting" (experimental / arbitrary coding). First it tries to add some recently-learned old words, then new words, then some older words and so on. Each group of words is handled by a service routine that tries to add words according to constraints; for example, each new word should be repeated at least newWordsTryAtLeast times (initially, newInitialNumToTry repetitions should be tried; if this cannot be fitted in, one repetition less should be tried and so on down to newWordsTryAtLeast). The values of the global variables referred to will be open to tinkering later.

def makeLesson(self):
  self.l = Lesson()
  self.data.sort() ; jitter(self.data)
  self.exclude = {}
  # First priority: Recently-learned old words
  # (But not too many - want room for new words)
  self.addToLesson(1,knownThreshold,1,recentInitialNumToTry,maxReviseBeforeNewWords)
  # Now some new words
  self.addToLesson(0,0,newWordsTryAtLeast,newInitialNumToTry,maxNewWords)
  # Now some more recently-learned old words
  self.addToLesson(1,knownThreshold,1,recentInitialNumToTry,0)
  self.addToLesson(knownThreshold,reallyKnownThreshold,1,recentInitialNumToTry,0)
  # Finally, fill in the gaps with ancient stuff (1 try only of each)
  self.addToLesson(reallyKnownThreshold,-1,1,1,0)
  l = self.l ; del self.l
  assert l.events,"Didn't manage to put anything in the lesson"
  return l
def addToLesson(self,minTimesDone=0,maxTimesDone=-1,minNumToTry=0, \
                maxNumToTry=0,maxNumToAdd=0):
  numberAdded = 0
  numToTry = maxNumToTry
  while numToTry >= minNumToTry:
    managed = 0
    for i in range(len(self.data)):
      if maxNumToAdd and numberAdded >= maxNumToAdd: break # too many
      if self.exclude.has_key(i): continue # already had it
      (timesDone,promptFile,zhFile)=self.data[i]
      if timesDone < minTimesDone or (maxTimesDone>=0 and timesDone > maxTimesDone):continue # out of range this time
      if timesDone >= knownThreshold: thisNumToTry = min(random.choice([2,3,4]),numToTry)
      else: thisNumToTry = numToTry
      if timesDone >= randomDropThreshold \
        and random.random() <= calcDropLevel(timesDone):
        # dropping it at random
        self.exclude[i] = 1 # pretend we've done it
        continue
      try:
        self.l.addSequence(anticipationSequence(promptFile,zhFile,timesDone, \
        timesDone+thisNumToTry))
        managed = 1
        numberAdded = numberAdded + 1
        self.exclude[i] = 1
        # Keep a count
        if not timesDone: self.l.newWords=self.l.newWords + 1
        else: self.l.oldWords=self.l.oldWords+1
        self.data[i]=(timesDone+thisNumToTry,promptFile,zhFile)
      except StretchedTooFar:
        pass
      except IOError:
        # maybe this file isn't accessible at the moment; keep the progress data though
        self.exclude[i] = 1 # save trouble
    if not managed:
      numToTry = numToTry - 1
      firstPass = 0
  return numberAdded

That code referred to a function calcDropLevel which calculates the probability that an old word should be completely omitted from a lesson; this increases with the number of previous repetitions of the word, so as to avoid monotony. Here is one possible implementation:

def calcDropLevel(timesDone):
  # assume timesDone > randomDropThreshold
  if timesDone > randomDropThreshold2:
    return randomDropLevel2
  # or linear interpolation between the two thresholds
  return dropLevelK * timesDone + dropLevelC
try:
  dropLevelK = (randomDropLevel2-randomDropLevel)/(randomDropThreshold2-randomDropThreshold)
  dropLevelC = randomDropLevel-dropLevelK*randomDropThreshold
except ZeroDivisionError: # thresholds are the same
  dropLevelK = 0
  dropLevelC = randomDropLevel

The constants will be defined later. Also there is a function jitter() which "jitters" (slightly shuffles) the elements of a list, again to avoid monotony:

def jitter(list):
  # Assumes item is a tuple and item[0] might be ==
  # Doesn't touch "new" words (tries==0)
  swappedLast = 0
  for i in range(len(list)-1):
    if list[i][0] and ((list[i][0] == list[i+1][0] and random.choice([1,2])==1) or \
      or (not list[i][0] == list[i+1][0] \
          and random.choice([1,2,3,4,5,6])==1 \
          and not swappedLast)):
      x = list[i]
      del list[i]
      list.insert(i+1,x)
      swappedLast = 1
    else: swappedLast = 0

As another method of avoiding monotony, the length of long glue is randomly adjusted before checking for collisions (this was in the previous issue's code but was not explained, sorry).

All that remains, apart from defining the constants, is to write a main program. We'll put it in a function, and only call the function if this Python module is the main module; that way it can also be used as a library module in which case it will not execute its main() when imported.

def main():
  dbase = ProgressDatabase()
  soFar = dbase.message()
  lesson = dbase.makeLesson()
  firstTime = 1
  while 1:
    lesson.play()
    if firstTime:
      dbase.save()
      firstTime = 0
    if not getYN("Hear this lesson again?"): break
def getYN(msg):
  ans=None
  while not ans=='y' and not ans=='n':
    ans = raw_input("%s (y/n): " % (msg,))
  if ans=='y': return 1
  return 0
if __name__=="__main__":
  main()

Of course, in a production release getYN and so forth should be more internationalised.

The initialisation of the constants needs to go before main() is called; I think it's a good idea to put them near the top of the script for easy access.

samplesDirectory = "samples"
promptsDirectory = "prompts"
firstLanguage = "en"
secondLanguage = "zh"
maxLenOfLesson = 30*60 # 30 minutes
maxNewWords = 5
maxReviseBeforeNewWords = 3
newInitialNumToTry = 5
recentInitialNumToTry = 3
newWordsTryAtLeast = 3
knownThreshold = 5
reallyKnownThreshold = 10
randomAdjustmentThreshold = 500
randomDropThreshold = 14
randomDropLevel = 0.67
randomDropThreshold2 = 35
randomDropLevel2 = 0.97
progressFile = "progress.txt"
progressFileHeader = """# -*- mode: python -*-
# Do not add more comments - this file will be overwritten\n"""

For convenience, I also put the following code to support overriding the defaults in a different module called override.py, if it exists. The code should go before any of the variables are actually used; note that the default values of function and method parameters are evaluated at parse time, so this code should really go before any of the other code (but after the definition of the defaults).

try:
from override import *
except ImportError: pass

Finally, to complete the script a few more imports are needed toward the beginning:

import time,sched,sndhdr,sys,os,random,math,pprint
if sys.platform.find("win")>=0: import winsound
else: winsound=None

Notes:

More fields may be available via dynamicdata ..

Journal Articles

Title: A Python project (2)