    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: I_mean_something_to_somebody, Part Two</title>
        <link>https://members.accu.org/index.php/articles/208</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + CVu Journal Vol 16, #2 - Apr 2004</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c103/">162</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-103/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+103/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;I_mean_something_to_somebody, Part Two</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 01 April 2004 22:53:48 +01:00 or Thu, 01 April 2004 22:53:48 +01:00</p>
<p><strong>Summary:</strong>&nbsp;<p>This is the second of a two part article describing an
experiment carried out during the 2003 ACCU conference. The first
part was published in a previous issue of C Vu (15.6, December
2003) and discussed the background to the experiment and some of
the applicable characteristics of the subjects taking part; this
one, the second, discusses the results of the experiment.</p>
</p>
<p><strong>Body:</strong>&nbsp;<div class="titlepage">
<h2><a name="d0e20" id=
"d0e20"></a>Introduction</h2>
</div>
<p>The aim of this experiment was to measure one particular aspect
of software developers' behaviour when assigning meaning to
identifier names. This aspect was the extent to which knowledge of
the application domain of the source code containing an identifier
affects the meaning developers assign to that identifier name.</p>
<p>Software developers are constantly exhorted to use '<span class=
"emphasis"><em>meaningful</em></span>' identifier names. However,
there have not been any published studies investigating the kinds
of information readers extract from identifier names or of any
benefits the availability of this information might provide to
readers. Reading source code whose identifier names are based on a
human language the reader does not speak provides a vivid example
of the often unappreciated benefit that identifier names can
provide to readers (when these names are based on a human language
spoken by the reader).</p>
<pre class="programlisting">
if(pParametreFichier != (FILE*)NULL) {
  memset(&amp;Enregistrement.CodeInterne1, '\0',
         sizeof(Enregistrement.CodeInterne1));
  memset(&amp;Enregistrement.BlocPrimaireNumerique ,
         '\0', sizeof(Enregistrement.
                          BlocPrimaireNumerique));
  while(!ExcTrouve)
    ...
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>
</div>
<p>Words are used both to communicate with other people and for
internal thought processes. The culture we are born into provides
us with a predefined set of words and a network of meanings
associated with them. The use of words in their spoken form to
communicate with other people has a cost that speakers attempt to
minimise by using them in a way that is consistent with the meaning
they believe their listeners will assign to them. A lifetime of
realtime feedback from the people spoken to enables users of a
language to build a detailed collection of beliefs on the meanings
assigned to words by both people in general and some specialist
groups of people (e.g., software engineers).</p>
<p>When speaking it is expected that not only will listeners make
an effort to comprehend the speakers' thought processes, but that
speakers will make an effort to ensure that what they are saying is
comprehensible to their listeners. When writing text people must
make use of their experience with the spoken form to help ensure
that readers will assign a meaning to the words that is consistent
with that intended. However, there is no realtime feedback between
writer and reader<sup>[<a name="d0e41" href="#ftn.d0e41" id=
"d0e41">1</a>]</sup> and experience shows that readers often have
to invest significantly more effort to assign a coherent meaning to
what they read, compared to the effort needed while listening
during a spoken conversation.</p>
<p>Software developers are not usually told which identifiers they
should use in a given context and are rarely given rules for
creating new identifier names from existing ones<sup>[<a name=
"d0e47" href="#ftn.d0e47" id="d0e47">2</a>]</sup>.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Selecting
identifiers</h2>
</div>
<p>Experience shows that many developers believe that the names
they select for identifiers are '<span class=
"emphasis"><em>obvious</em></span>', '<span class=
"emphasis"><em>self-evident</em></span>', or '<span class=
"emphasis"><em>natural</em></span>'. Studies of people's
performance in creating names for objects suggests that this belief
is false [<a href="#Carroll">Carroll</a>, <a href=
"#Furnas-1983">Furnas-1983</a>, <a href=
"#Furnas-1987">Furnas-1987</a>]. When asked to provide names for
various kinds of entities people have been found to select a wide
variety of different names, showing that there is nothing
'<span class="emphasis"><em>obvious</em></span>' about the choice
of a name.</p>
<p>One naming study [4, 5] described operations (e.g., hypothetical
text editing commands, categories in '<span class=
"emphasis"><em>Swap 'n Sale</em></span>' classified ads, keywords
for recipes) to subjects, who were not domain experts, and asked
them to suggest a name for each operation. The results showed that
the name selected by one subject was, on average, different from
the name selected by 80-90% of the other subjects (one experiment
included subjects who were domain experts and the results for those
subjects were also consistent with this performance). The number of
occurrences of different names chosen tended to follow an inverse
law with a few words occurring frequently and most only rarely.</p>
<p>Various factors have been found to influence the selection of
what is believed to be the appropriate word in a given context. A
study by Labov [<a href="#Labov">Labov</a>] showed subjects
pictures of individual items that could be classified as either
cups or bowls, as shown in Figure 1. These items were presented in
one of two contexts; a neutral context in which the pictures were
simply presented and a food context (subjects were asked to think
of the items as being filled with mashed potatoes).</p>
<div class="figure"><a name="d0e90" id="d0e90"></a>
<p class="title c2">Figure 1. Cup and bowl like objects of various
widths (ratios 1.2, 1.5, 1.9, and 2.5) and heights (ratios 1.2,
1.5, 1.9, and 2.4). From Labov [<a href="#Labov">Labov</a>].</p>
<div class="mediaobject c3"><img src="resources/jones-figure1.png"
align="middle" alt=
"Cup and bowl like objects of various widths (ratios 1.2, 1.5, 1.9, and 2.5) and heights (ratios 1.2, 1.5, 1.9, and 2.4). From Labov [Labov]."></div>
</div>
<div class="figure"><a name="d0e99" id="d0e99"></a>
<p class="title c2">Figure 2. The percentage of subjects who
selected the term 'cup' or 'bowl' to describe the object they were
shown (the paper did not explain why the figures do not sum to
100%). From Labov [<a href="#Labov">Labov</a>].</p>
<div class="mediaobject c3"><img src="resources/jones-figure2.png"
align="middle" alt=
"The percentage of subjects who selected the term &lsquo;cup&rsquo; or &lsquo;bowl&rsquo; to describe the object they were shown (the paper did not explain why the figures do not sum to 100%). From Labov [Labov]."></div>
</div>
<p>The results (Figure 2) showed that as the width of the item seen
was increased, an increasing number of subjects classified it as a
bowl. By introducing a food context subject responses are shifted
towards classifying the item as a bowl at narrower widths.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Recognizing
words</h2>
</div>
<p>Human languages have a relatively fixed set of letter sequences
that are acknowledge by speakers of a language as being 'root
words' (this glosses over the heated discussions that sometimes
occur over what letter sequences should be treated as root words).
Additional words can be derived from these words using language
specific rules (e.g., <span class="emphasis"><em>write -&gt;
writes, writing, written; writer</em></span> could be treated as
either a derived or a root word).</p>
<p>Identifiers sometimes contain more than one word. In this case
readers need to either use their knowledge of existing words to
subdivide an identifier's character sequences, or use deduction
based on common naming conventions to extract words (e.g.,
<tt class="literal">IsHot</tt> is likely to be interpreted as the
phrase 'is hot', rather than 'I shot').</p>
<p>Identifiers often have the form of one or more abbreviated
words. A study by Ehrenreich and Porcu [<a href=
"#Ehrenreich-">Ehrenreich-</a>] found that readers' performance in
reconstructing the original word, from an abbreviated form, was
significantly better when they knew the rules used to create the
abbreviation (81-92% correct), compared to when the abbreviation
rules were not known (at best 62% after six exposures to the letter
sequences). Given that this experiment was not intended to measure
subjects' abbreviation to word reconstruction performance, no
rarely occurring abbreviations were used.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Studies of
meaning assignment</h2>
</div>
<p>While there have been no other published studies of how people
assign a meaning to identifiers there have been a few studies of a
similar nature for words.</p>
<p>A study by Nickerson and Cartwright [<a href=
"#Nickerson-">Nickerson-</a>] asked subjects to write down as many
different meanings of a word (presented one at a time, in written
form, for 30 seconds). Combining the results from all subjects
showed that words were often given over 6 and sometimes as many as
20 different meanings. The majority of the responses for a given
word were usually contained within one or two meanings.</p>
<p>Word association is an activity that has some similarities to
providing a meaning for a word. Studies of word association give
subjects a word and ask them to write down the first meaningfully
related word that comes to mind. (e.g., <span class=
"emphasis"><em>doctor ? nurse</em></span>).</p>
<p>The results of these studies<sup>[<a name="d0e145" href=
"#ftn.d0e145" id="d0e145">3</a>]</sup> have found that there is
rarely a single answer, a wide range of responses is given, and
words given by subjects do not always overlap those of other
subjects.</p>
<p>A subject's age has also been found to be a factor in word
association performance. A study by Hirsh and Tree [<a href=
"#Hirsh-">Hirsh-</a>] compared the responses of young (21-30) and
older (66-81) adults to 90 stimulus words. The results showed that
the same word was produced as the most popular response, for a
given age group, in 36 out of 90 cases (when the top three
responses were considered the overlap between groups was 57%). They
also found that the younger group produced a wider range of
responses, and that members of the older group were much more
likely to select the most popular response for their group (40%,
against 20% for the younger group).</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Experimental
setup</h2>
</div>
<p>The experiment was performed during two 30 minute sessions on
different days of the 2003 ACCU conference held in Oxford, UK.
Subjects were given a brief introduction to the experiment, during
which they filled out background information about themselves. They
then spent 15 minutes working on the identifier list. All subjects
volunteered their time and were anonymous.</p>
<p>The first part of this paper describes the background of the
subjects and how this information was collected.</p>
<p>Almost any sequence of characters could serve as an identifier.
However, the initial list of identifiers considered for use in the
experiment were obtained by extracting all identifiers that were
common to the source code of a variety of programs. These programs
were the Linux kernel, the game Doom, gcc (the GNU compiler
collection), Netscape internet browser, PostgresSQL database, AFS
(Advanced File System) from IBM, and OpenMotif from the OpenGroup.
It was hoped that usage in a wide variety of programs was an
indication that an identifier had a significant meaning to a large
number of developers. This method also removes experimenter bias
from the choice of identifier names (but not from the choice of
programs to consider).</p>
<p>The initial list was refined by removing those identifiers that
were the names of standard library functions (these might be
recognized as such and their library meaning given as a response),
or contained rarely occurring abbreviations, or contained a single
character. The resulting list of identifiers was randomized and
printed one per line on A4 sheets of paper.</p>
<p>All subjects from both groups saw an identical list of
identifiers. However, one group was told that the identifiers came
from a multiplayer game, while the other that they came from the
Linux kernel. The instructions given were:</p>
<div class="blockquote">
<blockquote class="blockquote">
<p>The following pages contain identifiers that have been extracted
from the source of {a very large multiplayer game program}/{the
Linux kernel}. For each identifier:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>when you first see the identifier, write down any ideas that pop
into your head about what it might represent,</p>
</li>
<li>
<p>briefly (5-10 seconds is sufficient) think about what the
identifier might represent. Write any new ideas you have on a
separate line.</p>
</li>
</ol>
</div>
</blockquote>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Threats to
validity</h2>
</div>
<p>There are a number of reasons why the responses given in this
experiment might not be valid in a source code comprehension
context. These include:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>developers are not usually asked to provide the kind of
information that they were asked to provide in this experiment. It
is possible that the subjects were unsure of the responses expected
of them, or misinterpreted the instructions they were given,</p>
</li>
<li>
<p>identifiers invariably exist within a context when they are read
in source code. For instance, there are other identifiers (e.g.,
the name of the function in which an identifier is referenced)
whose names often provide a subcontext,</p>
</li>
<li>
<p>providing a possible meaning for an identifier requires a lot of
intellectual effort. It is unusual for developers to be asked to
provide a meaning to so many identifiers over such a relatively
short period of time. Over the period of the experiment fatigue may
have caused subjects' performance to decline, because of the high
cognitive work load.</p>
</li>
</ul>
</div>
<p>A few of the subjects had a different cultural background from
the majority of the subjects (i.e., they were not British). It is
possible that these subjects made use of different cultural
conventions when assigning meaning to identifiers. For instance, in
the US politicians <span class="emphasis"><em>run</em></span> for
office, while in Spain and France they <span class=
"emphasis"><em>walk</em></span>, and in Britain they <span class=
"emphasis"><em>stand</em></span> for office.</p>
<p>It is possible that on the first day I failed to point out
during the introduction that the identifiers were extracted from a
multiplayer game (I did point out that the identifiers came from
the Linux kernel on the second day). This information is given in
the instructions, but it is possible that subjects did not read the
sentence containing this information.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>
</div>
<p>The 45 subjects produced a total of 1662 responses (34.8% Linux,
65.2% game), and 74 different words were responded to. There were
179 responses (45 different words) where the subject had written
&quot;none&quot; (or a question mark, or a dash). The identifiers were
printed on both sides of the page and some subjects only gave
responses for identifiers appearing on the odd numbered pages. In
this case the identifiers appearing on the even numbered pages were
not counted as &quot;none&quot;. See Table 1 on the next page for a summary
of responses.</p>
<p>Each subject's response for each identifier needed to be
classified. The following process was intended to ensure that the
person doing the classification (your author) was not influenced by
information about the subject who gave the response. (i.e., whether
the subject belonged to the Linux or games group, and which
responses were given by the same subject). Every response was
automatically assigned a random number and the resulting list of
identifier/response pairs was sorted. This list of randomised
responses was the one used for classification.</p>
<p>Certain words and phrases occurred several times in the
responses and were assumed to imply a game context, but not a Linux
context. These included: <span class=
"emphasis"><em>player</em></span>, <span class=
"emphasis"><em>game</em></span>, <span class="emphasis"><em>skill
level</em></span>, and <span class=
"emphasis"><em>shoot</em></span>. While some words appear to have
an obvious games meaning (i.e., <span class=
"emphasis"><em>kill</em></span>), if it was possible that they also
had a Linux meaning they were not classified as being games
related.</p>
<p>Words and phrases that might be claimed to be a strong
indication of a Linux context (e.g., <span class=
"emphasis"><em>Linux, operating system</em></span>) rarely occurred
in the responses. Much of the functionality provided by an
operating system (e.g., Linux) might reasonably also be expected to
be provided internally within a game. For instance, <span class=
"emphasis"><em>virtual memory</em></span> refers to a memory
management mechanism used by both operating systems and games
(which, for efficiency reasons, might swap unneeded game
information out of fast memory). This overlap in functionality,
which many subjects are likely to be aware of, makes it difficult
to reliably classify any responses as belonging to a Linux
context.</p>
<p>A games context was assigned to 134 responses (12.4% of
responses made by games subjects) scattered over 33 different
words. A Linux context was assigned to 10 responses (1.2% of
responses made by Linux subjects) scattered over 6 different
words.</p>
<p>The forms of the meanings given were such that it was rarely
possible to definitely specify which group a response belonged to.
For instance, for the identifier <tt class="literal">blue_pos</tt>
many subjects gave a response of the form <span class=
"emphasis"><em>position of some blue thing</em></span>. In itself
this response is not sufficient to be able to assign a Linux or
game context. Additional information such as <span class=
"emphasis"><em>index into array</em></span> could apply in either
context, while use of the word <span class=
"emphasis"><em>player</em></span> would suggest a game context.</p>
<p>In many cases the responses described a possible role that the
identifier might fill, e.g., flag, or counter, while in other cases
subjects simply expanded an identifier to a non-abbreviated form,
e.g., gave <span class="emphasis"><em>page number</em></span> as
the response to <tt class="literal">pagenum</tt>.</p>
<p>The responses contained fewer different meanings per identifier
than the Nickerson and Cartwright [9] study. However, this
experiment did not explicitly request subjects to list all possible
meanings of an identifier.</p>
<div class="table"><a name="d0e266" id="d0e266"></a>
<p class="title c2">Table 1. <span class="bold">Responses.</span>
The five most common responses for identifiers having more than 20
responses (&quot;most&quot; indicates that most responses had this form).</p>
<table summary=
"Responses. The five most common responses for identifiers having more than 20 responses (&ldquo;most&rdquo; indicates that most responses had this form)."
border="1" cellspacing="0">
<tr>
<th>Identifier</th>
<th>Number of Responses</th>
<th>Responses (number)</th>
</tr>
<tr>
<td>accurate</td>
<td align="center">27</td>
<td>flag (12), none (6), numeric value (4), game (3)</td>
</tr>
<tr>
<td>answer</td>
<td align="center">29</td>
<td>input value (9), result (4), none (4), string value (2), game
(1)</td>
</tr>
<tr>
<td>blue_pos</td>
<td align="center">42</td>
<td>game (15), none (14), position of (10)</td>
</tr>
<tr>
<td>body</td>
<td align="center">30</td>
<td>none (7), game (7), code (7), html (3)</td>
</tr>
<tr>
<td>children</td>
<td align="center">19</td>
<td>tree structure (6), processes (3), OO (3), counter (2), none
(1)</td>
</tr>
<tr>
<td>cur_mode</td>
<td align="center">40</td>
<td>cursor (4), current mode (24), none (2), game (2), linux
(1)</td>
</tr>
<tr>
<td>def</td>
<td align="center">19</td>
<td>definition (6), define (6), none (2), language preprocessor
(2)</td>
</tr>
<tr>
<td>digest</td>
<td align="center">44</td>
<td>cryptography (12), summary (9), eat (8), none (6), game
(3)</td>
</tr>
<tr>
<td>disconnected</td>
<td align="center">28</td>
<td>flag (most), not connected (1), game (1)</td>
</tr>
<tr>
<td>driver</td>
<td align="center">38</td>
<td>device driver (15), game (5), none (4)</td>
</tr>
<tr>
<td>drop</td>
<td align="center">44</td>
<td>delete/discard (11), game (8), none (4), connection (4)</td>
</tr>
<tr>
<td>event_mask</td>
<td align="center">21</td>
<td>bit map/mask (all)</td>
</tr>
<tr>
<td>force</td>
<td align="center">22</td>
<td>physical force (8), none (4), flag (4)</td>
</tr>
<tr>
<td>fraction</td>
<td align="center">23</td>
<td>mathematical (16), ration (2), none (2)</td>
</tr>
<tr>
<td>fragstotal</td>
<td align="center">32</td>
<td>total fragments (12), game (10), memory fragments (2), 'frags'
(2), none (1)</td>
</tr>
<tr>
<td>inactive</td>
<td align="center">24</td>
<td>flag (most), none (1), game (1)</td>
</tr>
<tr>
<td>inc</td>
<td align="center">33</td>
<td>increment (most), none (3), include (3)</td>
</tr>
<tr>
<td>last_sent</td>
<td align="center">40</td>
<td>time message sent (most), none (2)</td>
</tr>
<tr>
<td>levels</td>
<td align="center">32</td>
<td>level count (most), game (8), none (3)</td>
</tr>
<tr>
<td>Lock</td>
<td align="center">44</td>
<td>concurrency (most), game (2), none (1), lake (1)</td>
</tr>
<tr>
<td>magnitude</td>
<td align="center">45</td>
<td>size of (most), absolute value (5), none (3), game (3)</td>
</tr>
<tr>
<td>mirror</td>
<td align="center">45</td>
<td>copy/cache/backup (most), none (6), game (5)</td>
</tr>
<tr>
<td>misses</td>
<td align="center">34</td>
<td>count of (most), game (6), cache (4), wife (1), none (1)</td>
</tr>
<tr>
<td>near</td>
<td align="center">39</td>
<td>close (11), shortptr (8), none (8), game (2)</td>
</tr>
<tr>
<td>numsegs</td>
<td align="center">44</td>
<td>number of segments (most), linux (4), game (2), none (1)</td>
</tr>
<tr>
<td>origin</td>
<td align="center">24</td>
<td>coordinates (most), parent (1)</td>
</tr>
<tr>
<td>outside</td>
<td align="center">39</td>
<td>none (9), flag (7), game (4), linux (1)</td>
</tr>
<tr>
<td>pagenum</td>
<td align="center">45</td>
<td>number (15), document (14), memory (3), counter (3), none
(2)</td>
</tr>
<tr>
<td>picture</td>
<td align="center">24</td>
<td>image (13), pointer (3), none (2), Cobol (1)</td>
</tr>
<tr>
<td>play</td>
<td align="center">45</td>
<td>sound (14), start something (11), game (8), none (2)</td>
</tr>
<tr>
<td>position</td>
<td align="center">35</td>
<td>location/coordinates (most), in list (5), game (5)</td>
</tr>
<tr>
<td>purge</td>
<td align="center">41</td>
<td>clean out (13), delete (12)</td>
</tr>
<tr>
<td>quick</td>
<td align="center">44</td>
<td>flag (most), none (10), fast (9), game (4), optimization
(3)</td>
</tr>
<tr>
<td>registered</td>
<td align="center">45</td>
<td>flag (most), registration (6), signed on (4), Linux (2), game
(1)</td>
</tr>
<tr>
<td>reliable</td>
<td align="center">43</td>
<td>none (9), trustworthy (5), correct (4), communication link (3),
game (2)</td>
</tr>
<tr>
<td>routine</td>
<td align="center">22</td>
<td>function (9), none (6), ordinary normal (3)</td>
</tr>
<tr>
<td>rover</td>
<td align="center">32</td>
<td>none (14), dog (6), data structure (3), car (3)</td>
</tr>
<tr>
<td>self</td>
<td align="center">44</td>
<td>this (18), object (18), game (6)</td>
</tr>
<tr>
<td>single</td>
<td align="center">35</td>
<td>none (8), game (6), singleton (5), flag (3)</td>
</tr>
<tr>
<td>stopped</td>
<td align="center">44</td>
<td>process (11), finished (6), none (2), game (2), flag (1)</td>
</tr>
<tr>
<td>transformed</td>
<td align="center">43</td>
<td>none (2), game (2), flag (1), changed (1)</td>
</tr>
<tr>
<td>translation</td>
<td align="center">40</td>
<td>language (12), cartesian (6), none (4)</td>
</tr>
</table>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e575" id=
"d0e575"></a>Discussion</h2>
</div>
<p>This study set out to investigate the extent to which knowledge
of the applicable application domain affected the meaning assigned
to identifier names. A single experiment was performed, resulting
in a single data point. More measurements, based on responses for
other identifiers and application domains, are needed before it is
possible to draw any general conclusions about the interaction
between developer knowledge of the application domain and the
meaning assigned to identifiers.</p>
<p>However, the 12.4% of game subject responses having a game
context is significantly less than 100%. Some of the possible
reasons for this include:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>subjects implicitly knew that many identifiers appearing in
source code have no direct connection to the application domain.
That is to say, many identifiers are used in the implementation of
some algorithm and the choice of their names is primarily
influenced by this algorithmic context. The meanings assigned to
identifiers reflected this developer knowledge of typical
identifier usage patterns,</p>
</li>
<li>
<p>a failure by subjects to provide all of the information needed
by this study. It is possible that the large number of identifiers
appearing in the handout and the short amount of time available led
to subjects deciding to provide brief, rather than detailed,
responses. Subjects were not aware of the exact nature of the
experiment or the kind of information it was hoped they would
provide.</p>
</li>
</ul>
</div>
<p>A &quot;flag&quot; meaning was given in a surprising number of responses.
This may represent a default response, given when subjects could
not think of anything else to write, or perhaps the identifier
names used in this experiment often have this meaning in source
code.</p>
<p>The responses generally involved concepts encountered in
software engineering.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e593" id=
"d0e593"></a>Conclusion</h2>
</div>
<p>As the first of its kind the results of this experiment
encountered a number of problems:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>feedback from subjects suggested that in the short space of time
available they were not able to reliably estimate the quantity of
code read/written. Given that few developers regularly measure the
amount of source they have read/written it is not clear that
anybody would be able to provide a reasonably accurate answer to
this question,</p>
</li>
<li>
<p>many of the written responses provided by subjects had a low
information content (i.e., the question being asked was not
answered). Providing subjects with more time and asking them to
provide a detailed response, or interviewing subjects on a
one-to-one basis would solve this problem,</p>
</li>
<li>
<p>feedback from subjects suggested that without the context of the
surrounding code it was difficult to provide what they considered
to be a good interpretation of the likely meaning of an
identifier's name,</p>
</li>
<li>
<p>choosing identifiers based on their occurrence in various
programs may prevent experimenter bias and provide a good
justification for their use, but it severely restricts the semantic
range of identifiers that can be used.</p>
</li>
</ul>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2>Further
reading</h2>
</div>
<p>In many ways identifiers are metaphors. For a fascinating
introduction to metaphors in English see: &quot;Metaphors we live by&quot; by
G. Lakoff and M. Johnson.</p>
<p>For an interesting, and readable, discussion of people's
performance in answering questions they do not know the answer to
see: &quot;Simple heuristics that make us smart&quot; by G. Gigerenzer and P.
M. Todd.</p>
<p>The University of South Florida word association norms can be
downloaded from: http://w3.usf.edu/FreeAssociation</p>
<p>The responses given in the experiment (stripped of subject
background information) can be downloaded from:
http://www.knosof.co.uk/cbook/accu2003.html</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e622" id=
"d0e622"></a>Acknowledgements</h2>
</div>
<p>The author wishes to thank everybody who volunteered their time
to take part in the experiment and the ACCU conference organizers
for making conference slots available to run it.</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2>
</div>
<div class="bibliomixed"><a name="JSC" id="JSC"></a>
<p class="bibliomixed">[JSC] Justice Standards Clearinghouse:
<span class="bibliomisc"><a href=
"http://www.xmlmind.com/xmleditor/namespace/clipboard&quot;%20%3E%3Cbibliomisc%20%3Ehttp://it.ojp.gov/jsr/public/index.jsp"
target="_top">http://it.ojp.gov/jsr/public/index.jsp</a></span>,
2004</p>
</div>
<div class="bibliomixed"><a name="Carroll" id="Carroll"></a>
<p class="bibliomixed">[Carroll] J.M.Carroll, <span class=
"citetitle"><i class="citetitle">What's in a Name? An essay on the
psychology of reference</i></span>, W.H.Freeman, 1985</p>
</div>
<div class="bibliomixed"><a name="Ehrenreich-" id=
"Ehrenreich-"></a>
<p class="bibliomixed">[Ehrenreich-] S.L.Ehrenreich and T.Porcu,
&quot;Abbreviations for automated systems: Teaching operators the
rules&quot;, in A.Badre and B.Shneiderman, editors, <span class=
"citetitle"><i class="citetitle">Directions in Human/Computer
Interaction</i></span>, chapter 6, pages 111-135, Ablex Publishing
Corp., 1982</p>
</div>
<div class="bibliomixed"><a name="Furnas-1983" id=
"Furnas-1983"></a>
<p class="bibliomixed">[Furnas-1983] G.W.Furnas, T.K.Landauer,
L.M.Gomez, and S.T.Dumais, &quot;Statistical semantics: Analysis of the
potential performance of key-word information systems&quot;,
<span class="citetitle"><i class="citetitle">The Bell System
Technical Journal</i></span>, 62(6):1753- 1805, 1983</p>
</div>
<div class="bibliomixed"><a name="Furnas-1987" id=
"Furnas-1987"></a>
<p class="bibliomixed">[Furnas-1987] G.W.Furnas, T.K.Landauer,
L.M.Gomez, and S.T.Dumais, &quot;The vocabulary problem in human-system
communication: an analysis and a solution&quot;, <span class=
"citetitle"><i class="citetitle">Communications of the
ACM</i></span>, 30(11):964-971, 1987</p>
</div>
<div class="bibliomixed"><a name="Hirsh-" id="Hirsh-"></a>
<p class="bibliomixed">[Hirsh-] K.W.Hirsh and J.J.Tree, &quot;Word
association norms for two cohorts of British adults&quot;, <span class=
"citetitle"><i class="citetitle">Journal of
Neurolinguistics</i></span>, 14(???):1-44, 2001</p>
</div>
<div class="bibliomixed"><a name="Labov" id="Labov"></a>
<p class="bibliomixed">[Labov] W.Labov, &quot;The boundaries of words
and their meaning&quot;, in C.- J.N.Bailey and R.W Shuy, editors,
<span class="citetitle"><i class="citetitle">New ways of analyzing
variation of English</i></span>, pages 340-373, Georgetown Press,
1973</p>
</div>
<div class="bibliomixed"><a name="Nelson-" id="Nelson-"></a>
<p class="bibliomixed">[Nelson-] D.L.Nelson, C.L.McEvoy, and
T.A.Schreiber, <span class="citetitle"><i class="citetitle">The
University of Sourth Florida word association, rhyme and word
fragment norms</i></span>, Technical Report ???, University of
South Florida, Aug. 1999</p>
</div>
<div class="bibliomixed"><a name="Nickerson-" id="Nickerson-"></a>
<p class="bibliomixed">[Nickerson-] C.A.Nickerson and
D.S.Cartwright, An empirical thesaurus: <span class=
"citetitle"><i class="citetitle">Meaning norms for 90 common words,
complete tables</i></span>, Technical Report 85, University of
Colorado at Boulder, Oct. 1979</p>
</div>
</div>
<div class="footnotes"><br>
<hr class="c4" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e41" href="#d0e41" id=
"ftn.d0e41">1</a>]</sup> 'Talking' via text messaging is not
discussed here.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e47" href="#d0e47" id=
"ftn.d0e47">2</a>]</sup> The high cost of having database fields
representing the same data item, e.g., a person's first name, but
with different names, e.g., first_name or given_name or
christian_name, across multiple databases has caused some
organizations to plan to start mandating the use of specific names
to denote specific data items [<a href="#JSC">JSC</a>].</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e145" href="#d0e145" id=
"ftn.d0e145">3</a>]</sup> The University of South Florida word
association norms [<a href="#Nelson-">Nelson-</a>] lists nearly
three-quarters of a million responses to 5,019 stimulus words
produced by 6,000 participants.</p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
