Browse in : |
All
> Journal Columns
> Professionalism
All > Journals > CVu > 132 Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Professionalism in Programming #7
Author: Administrator
Date: 06 April 2001 13:15:45 +01:00 or Fri, 06 April 2001 13:15:45 +01:00
Summary:
Practicing safe source
Body:
As 'professional' programmers we have a duty to take responsibility for our work. This is to guarantee the quality, safety and maintainability of our code. In this column we are going to think not so much about the safety of our executables as that of our development processes.
Even though we are programmers we have the responsibility to ensure that our work is:
-
safe and secure (i.e. it will not be accidentally lost after three months of development and cannot be leaked as trade secret information out of the company),
-
accessible (i.e. modifiable by the appropriate people, not by others),
-
reproducable (once released, the source is not thrown away and can still be used to build exactly the same application image 16 years later when the tool versions have changed and its language is not supported any more), and
-
maintainable (this does not just include good programming practices, but also the use of 'configuration management' and source control).
We will touch on these issues below. What we need to bear in mind is that we have a responsibility to ensure that these requirements are met, not necessarily to meet them entirely ourselves. Maybe these issues seem tediously removed from the act of writing code, but we should not belittle their importance.
So this is obvious is it not? We are all well aware that we are stupid if we do not make regular back ups of our work. But we are human; if we rely on doing it manually then we will just forget or leave it too late, or... I am frightened when I discover how much work gets done on computer systems and workstations that are not being backed up[1]. The level of risk is preposterous.
Now, I am not advocating that the professional programmer should make a daily personal back up copy of their hard drive. Rather, that their responsibility lies in ensuring that important files are placed on a file system that is being backed up. In fact, it is often best that the responsibility for performing the backups rests with someone specific, usually the IT department of a company - it is more likely to get done. After all, it seems that most good programmers have little or no common sense, let alone memory.
There is not a great deal of difference between the requirements for personal or corporate backups. The only real difference is the scale. Back ups need to be:
-
regular,
-
checked and audited,
-
easily retrievable, and
-
preferably automatic.
For example, even when working on an un-backed up NT workstation I often save my work on an NFS-mounted Unix file server which is backed up rather than on the unsafe local disk.
There is a whole realm of technical detail involved with making back ups that we are not able to go into in depth in this article. For example, in multi-user environments when are the back ups made? (The answer is usually during the night when there is less computer activity and less information changing on the file systems being backed up). We should also note the differences between 'full' and 'incremental' back ups - the former being a complete physical copy of a filesystem, the latter a report of the differences between the filesystem now and how it looked at the last back up. There are various benefits and costs of each approach - a mixed approach is usually best.
The main point we should bear in mind is that our work is not safe if it is not retrievable in the event of human or mechanical failure. Is all your work backed up? If not then I urge you to remedy this now, even if it is 'only' for personal use.
It is one thing ensuring that your source code is obtainable in the event of failure. But how accessible is it - can other developers in your company use it? Perhaps you could work on a shared file server so others can see your source files too, maybe even work on them alongside you. But how ideal is this? How do you make sure two people are not working on the exact same file at the same time causing all sorts of confusion when they each hit the "save" button? A better way of managing the code is to use a source control tool.
So what is source control? Well, it is a method for one or many people to work on the same 'repository' of source code in a controlled manner. What is does 'controlled' mean? Let us think about some of the problems that can be faced if people were to work on the same repository of code (on a shared disk on a common file server, as in the example above).
There are a number of potential problems:
-
Altering same file at the same time.
-
Performing a build whilst files are being edited, or even performing two conflicting builds together.
-
Not saving files, leaving them open in the editor whilst they are in an inconsistent state on the disk.
-
Releasing a build of the system when the files are still being worked on, having new features added
A source control system works around these kinds of problems. It allows each developer to 'check out' their own copy of the common source repository and work on that in isolation, whilst enabling them to keep up to date with all the changes that are being made by the other users - bringing their copy up to date as and when required. Any changes that have been made can be 'checked back in' to the main repository for all developers to see. The source control tool also allows us to control which users have access to which parts of the code base, the system administrator will have the privilege to set this up[2].
Some systems do not allow two users to edit the same file at the same time (using a kind of locking mechanism), alleviating the problem (1) above. The more sophisticated systems allow users to edit the same files concurrently, the changes are merged back together as they are 'checked back in'.
The repository will contain important revision information that allows you to check out any version of each file. Therefore you can make changes in the knowledge that it can be fully backed out of. This is a very powerful weapon indeed. The repository is a log of all changes made means that you also get extra benefits:
-
You can undo any change that you make, to any level.
-
You can track changes made to the source as you are working on it.
-
You can see who changed what and when they did it (and can even do complex searches like see how much work a single developer has done on a particular product - useful when a product's lifetime is many years).
-
You can mark a particular revision (release) of a particular product so that it can be retrieved at any time (useful if your customer reports a bug in a three year old version of your product and you need to look at its source code quickly).
-
You can check out a version of the repository as it stood at some particular date.
-
Allows you to separate 'bleeding edge' code from stable code via named release points (fixing points (2) and (4) above.
-
... and so on ...
The main repository can be held on a local machine (I tend to work like this for small pieces of personal work not destined for main company repository) or on a remote machine, accessed over a network connection. Indeed, good source control systems allow repositories to be accessed within a local network or by developers world-wide. In the latter case they can be very useful indeed, removing the burden of developers in different time zones needing to co-ordinate when they update files. (In fact, I have recently worked like this on a project with half of a development team in the UK and half in the US - and the UK developers split over two sites).
If you are using an important company repository it will be held on a server that is being backed up. This means that there is less of a reliance on personal workstation back ups - most of the work you have done is checked in, and the loss of your workstation will not be so critical to the project as a whole.
There are a couple of different ways of working with a source code repository. The 'little and often' approach sees each file checked in when any small change is made. The repository therefore contains many, many revisions of each file each consisting of a very small change. Doing this makes it easy to track changes made during development. The alternative approach is to only check in the big important changes, i.e. check a version in for each release of the product. This makes it easier to obtain a particular previous version of the product, but much harder to track changes.
I favour the little and often approach, although in many ways the choice may depend on the quality of the source control tool used. Usually a repository will have a mechanism to allow you to mark each major milestone (e.g. CVS' tags) so the little and often approach in no way lacks the capabilities of its counterpart.
There are a number of different source control systems available, some have an open licence, some are proprietary. Each has its own advantages, each its disadvantages. Often, the choice of system is enforced by company practice. Sadly, this does not always mean it is the right tool for the job.
The father of all version control systems was SCCS (the Source Code Control System) developed at Bell Labs in 1972. I have used this system at a place of work, and by modern standards it is fair to say that it sucks. It has largely been superseded by RCS (the Revision Control System) [RCS]. The most commonly used source control tool in the open source world is CVS [CVS]. It is built upon RCS, and provides a collaborative environment where several developers can work on the same file at the same time. Whereas RCS uses file locking, CVS is a concurrent system. Any conflicts in check-ins are marshalled by the CVS repository. CVS can work across the internet, and is a very powerful tool. In itself it is a command line program, but there are a number of different front ends available for any conceivable platform - take WinCVS/MacCVS (a windows front end) [CVSgui], Cervisia (a Unix/KDE front end) [Cervisia], and webcvs (a web based interface to CVS) [cvsweb] as examples.
There are a huge number of proprietary source control tools that provide different/better[3] functionality to the free tools available. These include MKS Source Integrity [MKS], Visual SourceSafe [Microsoft], and PVCS [Merant]. Interfaces for both the proprietary and free source control tools can be embedded in the popular IDEs. Although such graphical front ends exist, personally I prefer the raw power and customisability of the command line. Your tastes and/or company policy may dictate which tool you use and in which environment.
If you are looking for a source control tool to begin to use on private projects, I would strongly advise you to take a look at CVS, and one of the available front ends.
Most source control systems also serve as configuration management tools. Allowing you to generate multiple 'products' from single repositories of source code. They manage the configuration issues, i.e. which files are included in which product, which versions of these files, what sort of compilation environment[4] is required, which documentation relates to which product variants, etc. Some configuration management systems also integrate bug tracking and work flow control in their system.
In some senses the term configuration management describes how a source control tool is used during the entire software development process. It is subtly different from pure 'source control', encompassing it and adding a development procedure to its use. However, some tools provide more configuration management capabilities than others.
Using configuration management we would store more than just source code and object code for which have no source (e.g. from external suppliers). We would include specifications for each part of the system (tied to packages by the system), test cases, test results, documentation and release notes, tracking information for hardware versions, plus entire build environments.
Finally, we should be thinking about what we do with the source code as we work on it. You can never fully account for human stupidity. 'Top secret' company work should not be left on a laptop in an unlocked car, for example. Likewise source code should not be left on a publicly accessible network.
An element of professionalism includes understanding the terms of any contract that you might be working under - and acting appropriately. Many employment contracts limit who you can discuss work related information with, and also assign rights to any work you produce to that company. Tied with this is the common use of non disclosure agreements (or NDAs), legally binding agreements which prevent information shared between companies from being leaked to parties outside the agreement.
Programmers should also be careful to ensure that their log-in passwords are known only by themselves, as should any computer system user. Outsiders (and even possibly malicious insiders) should not be able to sabotage work using inappropriate access rights.
We have discussed various methods of working that ensures we take responsibility for the source code we create to ensure that we do so in a safe and controlled manner.
I have not really spent too much time with any of the proprietary source control tools mentioned in this article. Perhaps readers who have would write a few words on their experiences with them for the Member's Experiences section of C Vu.
[RCS] RCS. Available from: www.gnu.org/software/rcs/rcs.html See also [MKS].
[CVS] CVS. Available from: www.cvshome.org/
[CVSgui] WinCVS/MacCVS. Available from: www.cvsgui.org/
[Cervisia] Cervisia. Available from: cervisia.sourceforge.net/
[cvsweb] cvsweb. Available from: stud.fh-heilbronn.de/~zeller/cgi/cvsweb.cgi/
[PRCVS] PRCVS. Available from: www.xcf.berkeley.edu/~jmacd/prcs.html
[MKS] MKS. Source Integrity. Available from: www.mks.com/products/scm/si/
[Microsoft] Microsoft. Visual SourceSafe. Available from: www.microsoft.com/
[Merant] Merant. PVCS. Available from: www.pvcs.com/
[1] It seems that NT is, as a workstation system, less likely to be backed up than Unix. Perhaps this is because it is harder to do, or more likely to need an expensive proprietary backup package.
[2] This may seem draconian, but tends to increase responsibility in programming, if certain rights are required before some code can be modified.
[3] Or in some people's opinion 'worse'
[4] The compilation environment is far more important than many people realise. It incorporates issues such as compiler runtime flags and versions, what other software is installed on the machine at the same time, and even the physical build hardware itself.
Notes:
More fields may be available via dynamicdata ..