An Analysis of Errors in a Reuse-Oriented Development Environment [PDF] is a paper written by William M. Thomas, Alex Delis, and Victor R. Basili. It’s the analysis of a bunch of projects written in Ada and FORTRAN in the NASA Goddard Space Flight Center, with the objective of figuring out what kind of problems you’d get in code reuse, based on the kind of code reuse you have.

3 categories are defined in the text:

  • verbatim reuse, in which the component is unchanged;
  • reuse with slight modification, in which the original component is slightly tailored for the new application (less than 25% of the code changed)
  • reuse with extensive modification, in which the original component is extensively altered for the new application (25% or more of the code changed).

Additionally, a 4th category, rewriting stuff from scratch, is introduced as a comparison point.

The paper has an intriguingly long introduction before getting down to the cold, hard, facts. I will spare you the authors hypotheses, which are explained in great details in the paper itself.

Ada Vs. FORTRAN

The use of Ada and FORTRAN made at NASA differ in their approach. The following differences are noted:

The Ada approach was to develop a set of generics that can be instantiated to support a variety of application types. In contrast, the FORTRAN approach was to develop a collection of libraries specific to each application type. On projects within a very narrow domain, both approaches achieved similar high levels of reuse.

There is more than one valid type of reuse, and different approaches may lead to different ones. A note is added regarding the adaptability of such approaches, though:

[…] When there was a significant change in the domain, the Ada approach achieved a sizable amount of reuse (50% verbatim reuse), while the FORTRAN approach showed less than 10% verbatim reuse […] Thus, it would appear that the parameterized, generic approach is better suited to development in a dynamic, evolving domain.

By parameterization, the authors seem to imply the idea of options and configurations to tweak:

The Ada approach is centered on the development of a reuse library containing generics that can be instantiated with mission-specific parameters to develop new application. […] The FORTRAN approach was to develop separate libraries containing subsystems for certain mission-specific options.

In some cases, they mention the parametrized option as being simpler than heavily modified components:

[…] the reused verbatim components […] are smaller in size […] and external dependencies. […] The extensively modified units tend to be the most complex, both in terms of their size and external dependencies

But the generic ones may still be complex, according to different (and sometimes similar and somewhat contradictory?) metrics:

We see an increasing complexity (expressed both in terms of module size and external dependencies) in the reused components. Also, we see a rise in the number of parameters per subprogram in the verbatim units, suggesting an increasing generality among them.

Pros of code reuse

Although most developers would be easily convinced that code reuse is good, it’s nice to have a few proofs of it:

The percentage of effort spent in the Coding/Unit Test phase has dropped from 44% on an early simulator, to only 18% on one of the more recent simulators (Stark, 1993). This suggests that there is a significant leveraging of the stored experience […]

The less you modify code, the less it is likely to have errors and defects:

There is a clear benefit from reuse in terms of reduced error density when the reuse is verbatim or via slight modification. However, reuse through slight modification only shows about a 59% reduction in total error density, while verbatim reuse results in more than a 90% reduction compared to newly developed code.

And the following “bomb” is dropped:

reuse via extensive modification appears to yield no advantage over new code development.

Or, to reword things, if you need to rewrite more than 25% of a piece of code, rewriting it from scratch may be as good of an option when it comes to errors and defects.

But wait, there’s more! The number of defects and errors isn’t the only thing at play. How hard an errror is to correct is also important. Here again, reuse yields interesting results:

Basili and Perricone (1984) in their study of a FORTRAN development project, reported that modified components typically required more correction effort than new components.

Whether this is due to the nature of FORTRAN’s maintainability or something that can be extrapolated at large is left to be seen, but the code from NASA, both in Ada and FORTRAN did respect the same tendency.

So when it comes to rewriting large parts of an application or starting from scratch, starting from scratch may be a very interesting approach! The authors do note that other factors should influence the decision made:

Reuse via extensive modification does not provide the reduction in error density that the other modes of reuse yield, and it also results in errors that typically were more difficult to isolate and correct than the errors in newly developed code. In terms of the rework due to the errors in these components, it appears that this mode of development is more costly than new development. However, extensive modification may offer savings in development effort that outweigh the increased cost of rework.

Rushing for a new system isn’t always smarter.

It is also to note that isolating errors is rather uniform everywhere:

We do not see much variation in the effort to isolate an error, as the percentage of difficult-to-isolate errors ranges from 12.4% for new components to 14.5% for the extensively modified components.

Although modified components are the harder to fix in total, the percentage of hard-to-fix bugs is higher in reused components:

The reused verbatim components had the highest percentage of errors requiring more than one day to complete an error correction, and the new components had the lowest percentage, while the modified components fell in between.

The explanations given are:

  1. Devs are familiar with new code; it’s faster to fix
  2. Reused components usually had a longer life. Most simple bugs were weeded out already and the tough stuff was left.

The Environment Matters

The authors mentioned and asserted that there is a big difference in the approach the organization takes towards code reuse and the results they obtain in attempting to reuse it.

This comes from a few observations, such as the following:

[Other Research] reported that the verbatim reused modules tend to have a smaller interface than newly created units. We observed the opposite–that the verbatim reused modules tend to have more parameters than either the modified or new components.

The authors highlight other conflicts during the text, including things like specification-related errors, but then offer the following explanation:

A difference from the environments examined in those studies is that reuse has been well planned for […] This result suggests that the reused functionality is more likely to be well specified. This is not surprising, since the reused components have been specified previously, with the expectation that they would be reused.

You will not see nearly as many errors of specifications when you do take the time to specify things, which is reassuring, since it means specifications may still hold some value.

I personally keep doubts on how far this can be pushed, especially when considering the content of “Programs, Life Cycles, and Laws of Software Evolution”. Still, it throws reasonable doubt on a strict You-Ain’t-Gonna-Need-It approach; when in doubt, study and specify; it might lead to decent reuse, and flexibility (possibly known as ‘common sense’?) could be valuable.

In Favor of slightly modifying code

Although the paper takes what appears to be a strong stance in favour of verbatim reuse, the following is noted:

Across all errors, we see little difference between the classes of new, extensively modified, and reused verbatim components, as nearly two thirds of the errors in these classes escaped unit test. This is significantly higher than what we observe in the slightly modified components, where only 43% escaped unit test.

Unit tests sucked in most cases, but sucked less with slightly modified code. The quality of unit tests in place isn’t mentioned, and no mention is made on if techniques such as property-based testing may have been used.

In any case, the following explanation is given, and it would be due to something somewhat coincidental:

It appears that the nature of the changes being made to these components lend themselves well to detection by unit-level verification processes.

But more oddly, the errors that do escape unit tests are way simpler to fix with the slightly modified code:

There is a significant reduction in the slightly modified class compared to the other modes of reuse, in the percentage of difficult-to-complete errors that escape unit test, as only 58% of these errors escape unit test, compared to 87% and 100%in the extensively modified and verbatim classes.

Not much is mentioned as an explanation except that validation may be more efficient with the slightly modificated tests, although the authors point out the following:

Slightly modified components […] have a much higher frequency of interface errors than any other class. This suggests that the nature of the modifications is likely to be associated with the interface.

Could it be that unit tests are especially good or bad at figuring out interface errors? Are the hardest-to-fix errors related to interfaces, in general? If a good causal link can be established, this could be something very interesting in terms of error prevention.

As a side note, maybe verbatim reusers shouldn’t drop their unit test phase effort from “44% on an early simulator, to only 18% on one of the more recent simulators”, to reuse an earlier quote!

More Statistics, Facts & Observations

On industry-wide reuse:

Jones (1984) estimates that only 15% of the developed software is unique to the applications for which it was developed.

On how to solve problems and find errors:

code reading was found to be the most effective technique for isolating interface errors, while functional testing was found to be more effective at finding logic errors.

Code is the problem:

Across all classes, “code” is the most common error source […]

Also, everyone is terrible at managing data:

New components are more likely to have data errors than the reused components. […] Basili and Perricone (1984) found the opposite effect, namely, that the modified components had a greater percentage of data errors than did the new components.

Using someone else’s code without changing it much is better than using yours, in general:

In both the verbatim and slightly modified classes of reuse, the relative amount of rework was less than in new code. This suggests that while there is a cost of increased correction effort per error associated with such reuse, the cost is outweighed by the benefit of the reduced number of errors.

Recommendations?

One thing the authors noticed at NASA is that even with a strong reuse-oriented mentality (departments were in charge of writing reusable libraries and others were mostly users), code is hard to find, and this can be very important:

One might want to investigate techniques to better describe the compo- nents stored in the experience base so that the likelihood of a misunderstanding of the function and implementation is lessened. The experience with reuse in an organization and the approach taken toward reuse are likely to influence the nature of errors.

If you were to introduce a more reuse-oriented mentality in your environment, it would appear to be recommended to begin with utility functions that are very generic by nature. Then, as time goes, you can ramp up the complexity:

The reused components appear to be simpler, have fewer dependencies, and be more parameterized than new components. However, as this organization gained reuse experience, the distinction became less apparent-more and more complex components, at higher levels in the application hierarchy were reused. As an organization moves toward a reuse-oriented development approach, it must evolve its practices to accommodate the new effects of reuse.

It also reminds us to be aware that the longer code reuse is pushed forth, the more likely it is to become more and more complex, and that this, on its own, can be a problem.

10 years ago
  1. mononcqc posted this