Advertisement

"Hard-C"

Started by October 29, 2014 12:28 AM
8 comments, last by Oolala 10 years, 3 months ago

Two questions, following a quick description.

Been dealing with a lot of C-code lately, porting it from being compiled on one system to being compiled on a new system. The old system was pretty conventional, but the new system is really fragile, and a lot of things that would safely fall into the undefined behavior domain on the old system, but would result in weird explosions and hangs on the new system. What I really need is some way to hold the C-code to a higher standard, and to get it to raise flags when it strays from the straight-and-narrow.

C/C++ seem to have two "standards". One is the actual standard, and the other is this weird de-facto standard that comes about when enough big projects misuse a decision made by a compiler author for long enough such that changing that decision, or the compiler/system/etc, becomes problematic.

So....

1- Is there an implementation of a hardened C and C++ compiler out there that implements all "undefined" behavior with a fail-stop, exception, or something along those lines. Something for which all things are defined, and can be used to iron out the kinks from existing code.

2- If (1) is "no", is there interest in such a thing for people other than myself?

GCC (and by extension, MinGW and likely clang) accept several command-line arguments that restrict some of the common acceptable abuses of C++.

I don't remember all the details and which do what, but you can do some googling for these ones: (copy+pasted from a makefile project with my own possibly inaccurate comments)


#Adds GCC warn-about-every-little-coding-mistake when compiling.
-Wall

#Adds GCC warn-about non-standard code.
-ansi -pedantic

#Turn on Scott Meyer's Effective C++ suggested warnings:
#(See:  http://stackoverflow.com/questions/8174127/the-effective-c-warnings-in-mingw  )
-Weffc++

I've never tried it, but apparently you can selectively make some of those warnings into full errors:

http://stackoverflow.com/questions/475407/make-one-gcc-warning-an-error

Advertisement

I think -Wall and -pedantic pretty much covers all the compile time undefined behaviors. But run time undefined behaviors will be very costly to check. For example, you'll need to check every signed addition for overflow.

Also in the real world, undefined behavior sometimes could/need to be exploited, because you are so close to the OS and hardware and there are so much feature that the standard can not and do not want to cover.

I'm hoping to capture things like pointer arithmetic that results in the computation of an out-of-range index, whether or not that index is later dragged back in range, or casting a void pointer to something other than what it was originally allocated as being. GCC with -Wall is a great start, but I'm hoping to also capture the stuff that isn't obvious through localized static analysis, or might even be run-time dependent.

Also, I agree that sometimes it is unavoidable, such as with device driver authorship, but those cases are rare and are often tested with rigor far beyond "normal code". I'm more worried about the stuff some goon programmer thought seemed clever, that is in fact a bug in waiting.

Keep in mind that "undefined behavior" is actually a concept that is used extensively in the standard, at least for C++; I don't know about C, but I presume it's similar. There's no Java-esque version of the language that tries to define run-time behavior for every single edge case. Any time that source code uses some aspect of officially undefined behavior, it's either due to sloppy and/or ignorant coding, or due to an assumption that the compiler and platform will remain consistent.

I think the best option in this case is to begin writing extensive unit tests, and perhaps also get aggressive with the use of debug-build assert()s when possible. At least C code tends to be a bit easier to unit test than C++ code. Extensive use of global and static variables can muck that up some, though, and I would not be surprised if you're dealing with plenty of those.

"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke

I don't think there is a general purpose tool to do what you want (in fact, it is probably asking too much) but at least for memory accesses you can use valgrind on unix-like systems, which will track the stack, heap, and data segments and check that all memory accesses are valid. I think it may even handle unaligned reads, but not sure about that.

For the rest, you are probably on your own, and just need to ramp up your coding practices: unit tests, modular code, strict adherence to the standard with well-encapsulated system-specific calls you need to make, a clear mental image of which types go where (this helps with type aliasing analysis, if you keep your types straight and make judicious use of structs, unions, enums and typedefs, the language's type system will assist you - if you just make everything ints and char pointers, you're simply asking for trouble). And do not ignore warnings.

If you have clang, it has in my experience much better diagnostics than gcc, and has the awesome -Weverything flag, which basically turns on every single warning, even those with a high false-positive rate. It's probably not a good idea to use that flag all the time, but you can check your code with it on a regular basis to occasionally discover some very subtle bugs, especially relating to floating-point correctness and structure padding (do be wary because they are sometimes in error, but it's better to discard a warning as erroneous than to never discover a bug because the warning was conservatively turned off!).

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Advertisement

I'm hoping to also capture the stuff that isn't obvious through localized static analysis, or might even be run-time dependent.

Then why not use a static analyzer? Most static analyzers cost buckets of money, and the free ones aren't too great (but at least are better than nothing). CppCheck is one of the free ones.

Undefined behavior (UB) is sort of a necessary evil, in both C and C++ UB allows for eliminating what would otherwise be costly checks around doing things that could be runtime UB or not, and also serves to not over-specify the language so that it remains portable to a wider variety of platforms than it would otherwise be.

You can get 90% or more of what you're asking for with a combination of turning on all warnings (GCC's /Wall /Pedantic or Microsoft's /W4) plus having the compiler treat warnings as errors, and static analysis. You can further close the gap by using tools like valgrind, and using your own system of asserts.

throw table_exception("(? ???)? ? ???");

We usually build with "gcc -std=c89 -Wall -Wextra -pedantic -Werror" (or "g++ -std=c++11 ...") which gets you close. For additional fun, we'll sometimes run through scan-build. We've also used Coverity, which has a program for Free software. Run-time checks can be performed using valgrind. Between all those tools, our software is pretty stable and in widespread daily use by millions all over the planet. so it's probably good enough.

Stephen M. Webb
Professional Free Software Developer

I'm hoping to also capture the stuff that isn't obvious through localized static analysis, or might even be run-time dependent.

Then why not use a static analyzer? Most static analyzers cost buckets of money, and the free ones aren't too great (but at least are better than nothing). CppCheck is one of the free ones.

Because "I'm hoping to also capture the stuff that isn't obvious through localized static analysis, or might even be run-time dependent.", and thus flies under the radar of static analysis.

Regarding improving my coding practices, the main problem here is that I'm using a lot of code that isn't mine. The majority of the code I'm using is external code that is being used for benchmarking and proof-of-concept for a compilation/optimization workflow. The problem is that, while the compilation flow I'm working on adheres to the C standard, it cannot extend a lot of the implementation-side guarantees that things like MSVC or GCC make. A lot of the benchmarking code I'm working with is large (many in the order of 10K to 100K lines of code), and sanitizing all this junk by hand is a real pain.

Thus I was hoping to find a means of either saying, for a given source-code and input, it adheres to the C standard completely, or that it deviates from the standard in situation X.

This topic is closed to new replies.

Advertisement