Process Topics + CVu Journal Vol 29, #4 - September 2017

Browse in :

All > Topics > Process
All > Journals > CVu > 294
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Navigating a Route

Author: Bob Schmidt

Date: 09 September 2017 16:44:33 +01:00 or Sat, 09 September 2017 16:44:33 +01:00

Summary: Pete Goodliffe helps us work with unfamiliar code.

Body:

...the Investigation of difficult Things by the Method of Analysis ought ever to precede the Method of Composition.
~ Sir Isaac Newton

A new recruit joined my development team. Our project, whilst not vast, was relatively large and contained a number of different areas.

There was a lot to learn before he could become effective. How could he plot a route into the code? From a standing start, how could he rapidly become productive?

Itâ€™s a common situation; one which we all face from time to time. If you donâ€™t, then you need to see more code and move on to new projects more often. (Itâ€™s important not to get stale from working on one codebase with one team forever.)

Coming into any large existing codebase is hard. You have to rapidly:

Discover where to start looking at the code
Work out what each section of the code does, and how it achieves it
Gauge the quality of the code
Work out how to navigate around the system
Understand the coding idioms, so your changes will fit in sympathetically
Find the likely location of any functionality (and the consequent bugs caused by it)
Understand the relationship of the code to its important satellite parts (e.g., its tests and documentation)

You need to learn this quickly, as you donâ€™t want your first changes to be too embarrassing, accidentally duplicate existing work, or break something elsewhere.

A little help from my friends

My new colleague had a wonderful head start in this learning process. He joined an office with people who already knew the code, who could answer innumerable small questions about it, and point out where existing functionality could be found. Â This kind of help is simply invaluable.Â

If you are able to work alongside someone already versed in the code, then exploit this. Donâ€™t be afraid to ask questions.Â If you can, take opportunities to pair program and/or to get your changes reviewed.Â

Your best route into code is to be led by someone who already knows the terrain. Donâ€™t be afraid to ask for help!

If you canâ€™t pester people nearby, donâ€™t fear; there may still be helpful people further afield. Look for online forums or mailing lists that contain helpful information and helpful people. There is often a healthy community that grows around popular open source projects.

The trick when asking for help is to always be polite, and to be grateful. Ask sensible, appropriate questions. â€œCan you do my homework for me?â€ is never going to get a good response. Always be prepared to help others out with information in return.

Employ common sense: make sure that youâ€™ve Googled for an answer to your question first. Itâ€™s simple politeness to not ask foolish questions that you could easily research yourself. You wonâ€™t endear yourself to anyone if you continually ask basic questions and waste peopleâ€™s precious time. Like the boy who cried wolf and failed to get help when he really needed it, a series of mind-numbingly dumb questions will make you less likely to receive more complex help when you need it.

Look for clues

If you are rooting in the murky depths of a software system without a personal guide, then you need to look for the clues that will orient you around the code.

These are good indicators:

Ease of getting the source

How easy is it to obtain the source?

Is it a single, simple checkout from version control that can be placed in any directory on your development machine? Or must you check out multiple separate parts, and install them in specific locations on your computer?

Hardcoded file paths are evil. They prohibit you from easily building different versions of the code.

Healthy projects require a single checkout to obtain the whole codebase, and the code can be placed in any directory on your build machine. Do not rely on multiple checkout steps, or code in hardcoded locations.

As well as availability of the source code itself, consider availability of information about the codeâ€™s health. Is there a CI (continuous integration) build server that continually ensures that all parts of the code build successfully? Are there published results of any automated tests?
Ease of building the code

This can be very telling. If itâ€™s hard to build the code, itâ€™s often hard to work with it.

Does the build depend on unusual tools that youâ€™ll have to install? (How up-to-date are those tools?)

How easy is it to build the code from scratch? Is there adequate and simple documentation in the code itself? Does the code build straight out of source control, or do you first have to manually perform many small configuration tweaks before it will build?

Does one simple, single step build the entire system, or does it require many individual build steps? Does the build process require manual intervention? [1]Â Can you work on a small part of the code, and only build that section, or must you rebuild the whole project repeatedly to work on a small component?Â

A healthy build runs in one step, with no manual intervention during the build process.

How is a release build made? Is it the same process as the development builds, or do you have to follow a very different set of steps?

When the build runs, is it quiet? Or are there many, many warnings that may obscure more insidious problems?
Tests

Look for tests: unit tests, integration tests, end-to-end tests, and the like. Are there any? How much of the codebase is under test? Do the tests run automatically, or do they require an additional build step? How often are the tests run? How much coverage do they provide? Do they appear appropriate and well constructed, or are there just a few simple stubs to make it look like the code has test coverage?

There is an almost universal link here: code with a good suite of tests is usually also well factored, well thought out, and well designed. These tests act as a great route into the code under test, helping you understand the codeâ€™s interface and usage patterns. Itâ€™s also a great place from which to start working on a bugfix (you can start by adding a simple, failing unit testâ€”then fix that test, without breaking the others).
File structure

Look at the directory structure. Does it match the code shape? Does it clearly reveal the areas, subsystems, or layers of the code? Is it neat? Are third-party libraries neatly separated from the project code, or is it all messily intermingled?
Documentation

Look for the project documentation. Is there any? Is it well written? Is it up-to-date? Perhaps the documentation is written in the code itself using NDoc, Javadoc, Doxygen, or a similar system. How comprehensive and up-to-date does this documentation appear?
Static analysis

Run tools over the code to determine the health and to plot out the associations. There are some great source navigation tools available, and Doxygen can also produce very usable class diagrams and control flow diagrams.
Requirements

Are there any original project requirements documents or functional specifications? (In my experience, these often tend to bear little relation to the final product, but they are interesting historical documents nonetheless.) Is there a project wiki where common concepts are collected?
Project dependencies

Does the code use specific frameworks and third-party libraries? How much information do you need to know about them? You canâ€™t learn every aspect of all of them initially, especially because some libraries are huge (Boost, Iâ€™m looking at you). But it pays to get a feel for what facilities are provided for you, and where you can look for them.

Does the code make good use of the languageâ€™s standard library? Or do many wheels get reinvented? Be wary of code with its own set of custom collection classes or homegrown thread primitives. System-supplied core code is more likely to be robust, well tested, and bug-free.
Code quality

Browse through the code to get a feel for the quality. Observe the amount and the quality of code comments. Is there much dead codeâ€”redundant code commented out but left to rot? Is the coding style consistent throughout?

Itâ€™s hard to draw a conclusive opinion from a brief investigation like this, but you can quickly get a reasonable feel for a codebase from some basic reading.
Architecture

By now you should be able to get a reasonable feel for the shape and the modularisation of the system.Â Can you identify the main layers? Are the layers cleanly separated, or are they all rather interwoven? Is there a database layer? How sensible does it look? Can you see the schema? Is it sane? How does the app talk to the outside world? What is the GUI technology? The file I/O tech? The networking tech?

Ideally, the architecture of a system is a top-level concept that you learn before digging in too deeply. However, this is often not the case, and you discover the real architecture as you delve into the code.

Often the real architecture of a system differs from the ideal design. Always trust the code, not the documentation.

Perform software archaeology on any code that looks questionable. Drill back through version control logs and â€˜svn blameâ€™ (or the equivalent) to see the origin and evolution of some of the messes. Try to get a feel for the number of people who worked on the code in the past. How many of them are still on the team?

Learn by doing

A woman needs a man like a fish needs a bicycle.
~ Irina Dunn

You can read as many books as you like about the theory of riding a bicycle. You can study bicycles, take them apart, reassemble them, investigate the physics and engineering behind them. But you may as well be learning to ride a fish. Â Until you get on a bicycle, put your feet on the pedals and try to ride it for real, youâ€™ll never advance. Youâ€™ll learn more by falling off a few times than from days of reading about how to balance.Â

Itâ€™s the same with code.

Reading code will only get you so far. You can only really learn a codebase by getting on it, by trying to ride it, by making mistakes and falling off. Donâ€™t let inactivity prevent you from moving on. Donâ€™t erect an intellectual barrier to prevent you from working on the code.

Iâ€™ve seen plenty of great programmers initially paralysed through their own lack of confidence in their understanding.

Stuff that. Jump in. Boldly. Modify the code.

The best way to learn code is to modify it. Then learn from your mistakes.

So what should you modify?

As you are learning the code, look for places where you can immediately make a benefit, but that will minimise the chances youâ€™ll break something (or write embarrassing code).

Aim for anything that will take you around the system.Â

Low-hanging fruit

Try some simple, small things, like tracking down a minor bug that has a very direct correlation to an event you can start hunting from (e.g., a GUI activity). Start with a small, repeatable, low-risk fault report, rather than a meaty intermittent nightmare.

Inspect the code

Run the codebase through some code validators (like Lint, Fortify, Cppcheck, FxCop, ReSharper, or the like). Look to see if compiler warnings have been disabled; re-enable them, and fix the messages. This will teach you the code structure and give you a clue about the code quality.

Fixing this kind of thing is often not tricky, but very worthwhile; a great introduction. It often gets you around most of the code quickly. This kind of nonfunctional code change teaches you how things fit together and about what lives where. It gives you a great feel for the diligence of the existing developers, and highlights which parts of the code are the most worrisome and will require extra care.

Study, then act

Study a small piece of code. Critique it. Determine if there are weak spots. Refactor it. Mercilessly. Name variables correctly. Turn sprawling code sections into smaller well-named functions.

A few such exercises will give you a good feel for how malleable the code is and how yielding to fixes and modifications. (Iâ€™ve seen codebases that really fought back against refactoring).

Be wary: writing code is easier than reading it. Many programmers, rather than putting in the effort to read and understand existing code, prefer to say â€œitâ€™s uglyâ€ and rewrite it. This certainly helps them get a deep understanding of the code, but at the expense of lots of unnecessary code churn, wasted time, and in all likelihood, new bugs.

Test first

Look at the tests. Work out how to add a new unit test, and how to add a new test file to the suite. How do the tests get run?

A great trick is to try adding a single, one-line, failing test. Does the test suite immediately fail? This smoke test proves that the tests are not actively being ignored.

Do the tests serve to illustrate how each component works? Do they illustrate the interface points well?

Housekeeping

Do some spit-and-polish on the user interface. Make some simple UI improvements that donâ€™t affect core functionality, but do make the app more pleasant to use.Â

Tidy the source files: correct the directory hierarchy. Make it match the organisation in the IDE or project files.

Document what you find

Does the code have any kind of top-level README documentation file explaining how to start working on it? If not, create one and include the things that you have learned so far.

Ask one of the more experienced programmers to review it. This will show how correct your knowledge is, and also help future newbies.

As you gain understanding of the system, maintain a layer diagram of the main sections of code. Keep it up-to-date as you learn. Do you discover that the system is well layered, with clear interfaces between each layer and no unnecessary coupling? Or do you find the sections of code are needlessly interconnected? Look for ways of introducing interfaces to bring about separation without changing the existing functionality.

If there are no architectural descriptions so far, yours can serve as the documentation that will lead the new recruit into the system.

Conclusion

The more you exercise, the less pain you feel and the greater the benefit you receive. Coding is no different. The more you work on new codebases, the more you are able to pick up new code effectively.

Questions

Do you often enter new codebases? Do you find it easy to work your way around unfamiliar code? What are the common tools you use to investigate a project? What tools can you add to this arsenal?
Describe some strategies for adding new code to a system you donâ€™t understand fully yet. How can you put a firewall around the existing code to protect it (and you)?
How can you make code easier for a new recruit to understand? What should you do now to improve the state of your current project?
Does the likely time you will spend working on the code in the future affect the effort and manner in which you learn existing code? Are you more likely to make a â€˜quick and dirtyâ€™ fix to code that you will no longer have to maintain, even though others will have to later on? Is this appropriate?

Notes

[1] A single, automatic build step means your build can be placed into a CI harness and run automatically.

Notes:

More fields may be available via dynamicdata ..