Not so long ago one of our colleagues left the team and joined one company developing software for embedded systems. There is nothing extraordinary about it: in every firm people come and go, all the time. Their choice is determined by bonuses offered, the convenience aspect, and personal preferences. What we find interesting is quite another thing. Our ex-colleague is sincerely worried about the quality of the code he deals with in his new job. And that has resulted in us writing a joint article. You see, once you have figured out what static analysis is all about, you just don't feel like settling for "simply programming".
Forest Reserves
I find an interesting phenomenon occurring in the world nowadays. What happens when a software development department turns into a secondary entity not closely related to the company's basic area of activity? A forest reserve appears. However significant and critical the company's area of activity may be (say, medicine or military equipment), a small swamp appears anyway, where new ideas get stuck and 10-year old technologies are in use.
Here you are a couple of extracts from the correspondence of a man working in the software development department at some nuclear power plant:
And then he says, "What for do we need git? Look here, I've got it all written down in my paper notebook."
...
And do you have any version control at all?
2 men use git. The rest of the team use numbered zip's at best. Though it's only 1 person with zip's I'm sure about.
Don't be scared. Software developed at nuclear power plants may serve different purposes, and no one has abolished hardware security yet. In that particular department, people collect and process statistical data. Yet the tendency to swamping is pretty obvious. I don't know why it happens, but the fact is certain. What's interesting, the larger the company, the more intense the swamping effect.
I want to point it out that stagnation in large companies is an international phenomenon. Things are quite the same abroad. There is an article on the subject, but I don't remember its title. I spent quite a time trying to find it, but in vain. If somebody knows it, give me the link please so that I could post it. In that article, a programmer is telling a story about him having worked at some military department. It was - naturally - awfully secret and bureaucratic - so much secret and bureaucratic that it took them several months to agree on which level of access permissions he could be granted to work on his computer. As a result, he was writing a program in Notepad (without compiling it) and was soon fired for inefficiency.
Foresters
Now let's get back to our ex-colleague. Having come to his new office, he was struck with kind of a cultural shock. You see, after spending so much time and effort on studying and working with static analysis tools, it's very painful to watch people ignore even compiler warnings. It's just like a separate world where they program according to their own canons and even use their own terms. The man told me some stories about it, and most of all I liked the phrase "grounded pointers" common among the local programmers. See how close they are to the hardware aspect?
We are proud of having raised inside our team a skilled specialist who cares about the code quality and reliability. He hasn't silently accepted the established situation; he is trying to improve it.
As a start, he did the following. He studied the compiler warnings, then checked the project with
Cppcheck and considered preventing typical mistakes in addition to making some fixes.
One of his first steps was preparing a paper aiming at improving the quality of the code created by the team. Introducing and integrating a static code analyzer into the development process might be the next step. It will certainly not be PVS-Studio: first, they work under Linux; second, it's very difficult to sell a software product to such companies. So, he has chosen Cppcheck for now. This tool is very fine for people to get started with the static analysis methodology.
I invite you to read the paper he has prepared. It is titled "The Way You Shouldn't Write Programs". Many of the items may look written pretty much in the Captain Obvious style.
However, these are real problems the man tries to address.
The Way You Shouldn't Write Programs
Issue 1
Ignoring compiler warnings. When there are many of them in the list, you risk easily missing genuine errors in the lately written code. That's why you should address them all.
Issue 2
In the conditional statement of the
if operator, a variable is assigned a value instead of being tested for this value:
if (numb_numbc = -1) { }
The code is compiled well in this case, but
the compiler produces a warning. The correct code is shown below:
if (numb_numbc == -1) { }
Issue 3
The statement
using namespace std; written in header files may cause using this namespace in all the files which include this header, which in turn may lead to calling wrong functions or the occurrence of name collisions.
Issue 4
Comparing signed variables to unsigned ones:
unsigned int BufPos;
std::vector ba;
....
if (BufPos * 2 < ba.size() - 1) { }
Keep in mind that mixing signed and unsigned variables may result in:
- overflows;
- occurrence of always true or always false conditions and, as a consequence, infinite loops;
- a value larger than INT_MAX may be written into a signed variable (and it will be negative);
- an int-variable participating in addition/subtraction/etc. with an unsigned variable becomes unsigned too (so that negative values turn into large positive ones);
- other unexpected nice things
The foregoing code sample incorrectly handles the situation of the
ba array being empty. The expression
ba.size() - 1 evaluates to an unsigned
size_t value. If the array contains no items, the expression evaluates to 0xFFFFFFFFu.
Issue 5
Neglecting usage of constants may lead to overlooking hard-to-eliminate bugs. For example:
void foo(std::string &str)
{
if (str = "1234")
{
}
}
The
= operator is mistakenly used instead of
==. If the
str variable were declared as a constant, the compiler would not even compile the code.
Issue 6
Pointers to strings are compared instead of strings themselves:
char TypeValue [4];
...
if (TypeValue == "S") {}
Even if the string
S is stored in the variable
TypeValue, the comparison will always return false. The correct way to compare strings is to use the special functions
strcmp or
strncmp.
Issue 7
Buffer overflow:
memset(prot.ID, 0, sizeof(prot.ID) + 1);
This code may cause several bytes of the memory area right after
prot.ID to be cleared as well.
Don't mix up
sizeof() and
strlen(). The
sizeof() operator returns the complete size of an item in bytes. The
strlen() function returns the string length in characters (without counting the null terminator).
Issue 8
Buffer underflow:
struct myStruct
{
float x, y, h;
};
myStruct *ptr;
....
memset(ptr, 0, sizeof(ptr));
In this case only N bytes will be cleared instead of the whole
*ptr structure (N is the pointer size on the current platform). The correct way is to use the following code:
myStruct *ptr;
....
memset(ptr, 0, sizeof(*ptr));
Issue 9
Incorrect expression:
if (0 < L < 2 * M_PI) { }
The compiler doesn't see any error here, yet the expression is meaningless, for you will always get either true or false when executing it, the exact result depending on the comparison operators and boundary conditions.
The compiler generates a warning for such expressions. The correct version of this code is:
if (0 < L && L < 2 * M_PI) { }
Issue 10
unsigned int K;
....
if (K < 0) { }
...
if (K == -1) { }
Unsigned variables cannot be less than zero.
Issue 11
Comparing a variable to a value it can never reach. For example:
short s;
...
If (s==0xaaaa) { }
The compiler produces warnings against such things.
Issue 12
Memory is allocated with the help of
new or
malloc, while forgotten to be freed through
delete/
free correspondingly. It may look something like this:
void foo()
{
std::vector *v1 = new std::vector;
std::vector v2;
v2->push_back(*v1);
...
}
Perhaps it was the pointer to
std::vector that used to be saved into
v2 before. Now, due to modifications of some code parts, it is no longer needed and there are just
int values being saved. At the same time, memory allocated for
v1 is not freed, as that was not needed in earlier times. To fix the code we should add the statement
delete v1 at the end of the function, or use smart pointers.
Even better is to bring refactoring to an end, making
v1 a local object, since you no longer need to pass it anywhere:
void foo()
{
std::vector v1;
std::vector v2;
v2->push_back(v1[0]);
...
}
Issue 13
Memory is allocated through
new[] and freed through
delete. Or, vice versa, memory is allocated through
new and freed through
delete[].
Issue 14
Using uninitialized variables:
int sum;
...
for (int i = 0; i < 10; i++)
{
sum++;
}
In C/C++, variables are not initialized to zero by default. Sometimes code only seems to run well, which is not so - it's merely luck.
Issue 15
A function returns a reference or pointer to local objects:
char* CreateName()
{
char FileName[100];
...
return FileName;
}
On leaving the function,
FileName will refer to an already-freed memory area, since all the local objects are created on the stack, so it's impossible to handle it correctly any further.
Issue 16
Values returned by functions are not checked, while they may return an error code or
-1 in case of an error. It may happen that a function returns an error code, us continuing to work without noticing and reacting to it in any way, which will result in a sudden program crash at some point. Such defects take much time to debug after that.
Issue 17
Neglecting usage of special static and dynamic analysis tools, as well as creation and usage of unit-tests.
Issue 18
Being too greedy for adding some parentheses in math expressions, which results in the following:
D = ns_vsk.bit.D_PN_ml + (int)(ns_vsk.bit.D_PN_st) << 16;
In this case, addition is executed in the first place and only then left-shift is. See "
Operation priorities in C/C++". Judging by the program logic, the order the operations must be executed in is quite reverse: shift first, then addition. A similar error occurs in the following fragment:
#define A 1
#define B 2
#define TYPE A | B
if (type & TYPE) { }
The error here is this: the programmer forgot to enclose the
TYPE macro in parentheses. This results in first executing the
type & A expression and only then the
(type & A ) | B expression. As a consequence, the condition is always true.
Issue 19
Array index out of bounds:
int mas[3];
mas[0] = 1;
mas[1] = 2;
mas[2] = 3;
mas[3] = 4;
The
mas[3] = 4; expression addresses a non-existing array item, since it follows from the declaration of the
int mas[N] array that its items can be indexed within the range [0...N-1].
Issue 20
Priorities of the logical operations
&& and
|| are mixed up. The
&& operator has a higher priority. Example of bad code:
if (A || B && C) { }
This may not conform to the required execution logic. It's often assumed that logical expressions are executed from left to right.
The compiler generates warnings for such suspicious fragments.
Issue 21
An assigned value will have no effect outside the function:
void foo(int *a, int b)
{
If (b == 10)
{
*a = 10;
}
else
{
a = new int;
}
}
The pointer
a cannot be assigned a different address value. To do that, you need to declare the function in the following way:
void foo(int *&a, int b) {....}
or:
void foo(int **a, int b) {....}
References:
- "Enough Rope to Shoot Yourself in the Foot. Rules for C and C++ Programming". Allen I. Holub;
- "C++ Coding Standards: 101 Rules, Guidelines, and Best Practices". Herb Sutter, Andrei Alexandrescu;
- "Code Complete". Steve McConnel;
- "C++ Gotchas: Avoiding Common Problems in Coding and Design". Stephen C. Dewhurst;
- "Effective C++: 50 Specific Ways to Improve Your Programs and Designs". Scott Meyers.
Conclusion
I haven't drawn any specific and significant conclusions. I'm only sure that in one particular place the situation with software development is beginning to improve. And that's pleasant.
On the other hand, it makes me sad that many people haven't even heard of static analysis. And these people are usually responsible for serious and important affairs. The area of programming is developing very fast. As a result, those who are constantly "working at work" fail to keep track of contemporary tendencies and tools in the industry. They eventually grow to work much less efficiently than freelance programmers and programmers engaged in startups and small companies.
Thus we get a strange situation. A young freelancer can do his work better (because he has knowledge: TDD, continuous integration, static analysis, version control systems, and so on) than a programmer who has worked for 10 years at Russian Railways/nuclear power plant/... (add your variant of some large enterprise). Thank God, it's so far not always like that. But still it happens.
Why am I feeling sad about this? I wish we could sell PVS-Studio to them. But they don't even have a slightest suspicion about existence and usefulness of such tools. :)
Good articles :)