Advertisement

How many of you write self-documenting code?

Started by July 09, 2015 12:54 PM
63 comments, last by SimonForsman 9 years, 3 months ago


e for exponent, or possibly derived from the mathematical constant that is the base of the natural logarithm; n from the set of Natural Numbers; i from index; and S from summation, often represented in mathematical formula by the symbol sigma (?, also the Greek alphabet equivalent of 'S'). What you are running into here is the tension inherent in jargon, in that it increases communicative density for the expert, but it baffles the uninitiated.

int exponent;
int summation;
// Why not just...ah, forget it!

P.S. The code we're referencing is from a tutorial site.

It's generally understood that jargon should not be used when explaining concepts to those who aren't familiar with it. When it does comes up, it's the responsibility of the instructor to explain what it means to avoid confusion. If I were teaching someone how to play bass, I wouldn't expect him or her to know what an E-Minor chord refers to. Similarly, when speaking to friends who don't play many video games, I wouldn't expect them to know what RPG or FPS means.

It's generally understood that jargon should not be used when explaining concepts to those who aren't familiar with it. When it does comes up, it's the responsibility of the instructor to explain what it means to avoid confusion. If I were teaching someone how to play bass, I wouldn't expect him or her to know what an E-Minor chord refers to. Similarly, when speaking to friends who don't play many video games, I wouldn't expect them to know what RPG or FPS means.


Learning programming doesn't mean you're learning mathematics, though. Formally, basic algebra and calclulus are prerequisites for most undergraduate computer science work; is it entirely unreasonable that tutorials targeting freshman-level programmers assume these familiarities?

Further, code that imements a mathematical formula rather than a logical heuristic or responding to an input/control event, tends to have a tighter scope and fewer method invocations, in which case naming brevity is tolerable.

Again, I'm not encouraging this kind of naming, I'm merely pointing out where it may have come from and why it may have remained unchallenged: the original, or even just prior, audiences for this material may have had a greater formal grounding before approaching programming.

Your point stands, your example is just poor.
Advertisement


I think type aliases for complex types are awesome, regardless if you use typedef or using, but it can get pretty annoying when you type alias basic built in types and simple containers all over the place.

Whenever im working on something that has to support multiple platforms, I end up aliasing most of the basic built in types.

As with anything its something that can be abused, and does get annoying when it doesnt have any semantical impact.

ex. stuff like this is good


typedef signed char		i8;
typedef unsigned char		u8;

stuff like this is stupid:


typedef signed char        schar;
typedef unsigned char      uchar;

That's a good point. It annoys me to no end when people typedef, for example, intN_t to myprojectprefix_intN or whatever. How many different fixed-width N-bit types can you possibly have? If it has special properties particular to your project (at least N bits, at most N bits, etc...) then it deserves a special name, but if you're just aliasing, what's the point... I can understand old codebases and libraries having their own ifdef'd typedefs of built-in types to fixed-width types, because before the existence of stdint.h you had to do it yourself, and having to keep them due to good old backwards compatibility, but namespacing primitive types for the sake of it is idiotic.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Further, code that imements a mathematical formula rather than a logical heuristic or responding to an input/control event, tends to have a tighter scope and fewer method invocations, in which case naming brevity is tolerable.

This is a great point. I'd actually go one further and say that if code implements a mathematical formula, then it had better use the same terminology, including abbreviations, as the original formula.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

if code isn't readable, it doesn't need to be commented, it needs to be rewritten.
This.

It annoys me to no end when people typedef, for example, intN_t to myprojectprefix_intN or whatever. How many different fixed-width N-bit types can you possibly have? If it has special properties particular to your project (at least N bits, at most N bits, etc...) then it deserves a special name
That's the C and C++ committees' fault, though. For decades, you had no such thing as <stdint> in C++ at all, and even now it's only borrowed from C. Really, variable templates (the hell, who needs these!), but no proper support for exact/minimum size types?

And although C has had <stdint.h> for a long time, compilers didn't necessarily have that file, and that was "OK" since it was not strictly required.

Not to mention the fact that you need to include a file for having typedefs that should probably be a keyword. At least the least-size and exact-size stdint types should be part of the language, anyway (the _fast types are a bit silly if you ask me, if I need a particular size, that's what I need, and otherwise I expect the compiler to make it as fast as possible anyway). Heck, wchar_t, char16_t, and char32_t are all keywords, and they are so utterly useless. Apparently, there is no hindrance to adding keywords to the language.

I've been writing a collaborative (non-gaming) project with a colleague for about two years now, and while we agree on some things, we definitely have a clash on coding styles. I would typically write variable names perhaps 5 - 10 characters (and include a small comment descriptor explaining exactly what it is) whereas he would write the longer 'sentences' explaining what is what. So for example, I would write the following :


int i;                   // Loop counter over all particles
int Npart = 16;          // No. of gas particles in loop
float rpart[3];          // Position of current particle in loop

// Compute forces for all particles
for (i=0; i<Npart; i++) {
  rpart[0] = ...
  ...
}

whereas he would write something like


int number_of_particles = 16;

// Compute forces on all particles
for (int iparticle=0; iparticle<number_of_particles; iparticle++) {
  float position_of_current_particle[3];
  position_of_current_particle[0] = ...
  ...
}

The big problem I have with the long 'sentence' style names is that even small trivial parts of code (e.g. adding two variables together in a third summation variable) suddenly become way too long to read comfortably and I simply can't glance over the code to get an overview of what is happening because I can't even see where the variables begin and end easily (maybe using camelCase here would be help instead of snake_case??). So I end up going through some parts of his code and 'trimming' the variable names down to size so it's not so verbose. Another aspect of it that we disagree is that I declare all my variables either at the top of the subroutine (with a comment next to it) or at the beginning of the current code block (e.g. if/else or for loop) whereas he declares all variables inline just before he needs them. I guess if the code is 'self-documenting', it doesn't really matter where they are declared, but I am a bit of a neat-freak when it comes to coding so like to keep things together :-)

Anyway, I always assumed that his style was 'wrong' because I found it hard to read but perhaps reading some of the comments here, it is not so uncommon. My style might also be a little excessive (in the opposite extreme) in that I comment every variable declaration, but I find that helps me to keep a nice coding style. Is there anybody here who uses my style? Or am I the lone 'freak' out there?? :-)

Advertisement

Anyway, I always assumed that his style was 'wrong' because I found it hard to read but perhaps reading some of the comments here, it is not so uncommon. My style might also be a little excessive (in the opposite extreme) in that I comment every variable declaration, but I find that helps me to keep a nice coding style. Is there anybody here who uses my style? Or am I the lone 'freak' out there?? :-)

I tend to go even farther than you. On several projects I'm involved with it is standard practice to start code with a paragraph or more of comment text explaining what the goal of the next section is, how it is supposed to work, and why I'm doing it that way. I spend more time writing and formatting comments than writing the actual code. When it comes time to make a change or edit, then I start writing a comment on what is going to be changed and why, start editing the actual code, and then write a closing remark in the comment to sign off that the intended changes are working as expected.

When someone else involved in the project comes along and has to make changes to the code, then there is zero confusion about what is going on, and they have to be exceptionally thick to misunderstand things because the entire thought process behind the code itself is embedded right there above the code, and if I ever have to dig into code that someone else wrote then I have all the information I need right in front of me. There are additional notes and design info in external docs and specifications, but we've found the 'really insanely detailed notes' right there makes things flow better. There is no "oh, well they are clearly doing this thing with that short bit of code, obviously the bug is related to not checking whatever" which is followed up days, weeks, months, or even years later with "Well, that code was actually meant to work slightly differently than what numb-nuts read it as, and his 'fix' broke two other things."

As for the common argument against it of "Well once the comments are out of sync with the code, then it is worthless and a bigger headache!", well the counter to that is that the comments Don't get out of sync. The group is very keen on this fact, and every change gets looked over and double checked that the comments made are clear and useful and the code is easy to follow. If anyone breaks that rule, even just reformatting some code layout so that functionality isn't changed but doing so without a clear comment of it is grounds for dismissal. If you are going to be lazy and careless with the code base, then you're not welcome on the projects.

Old Username: Talroth
If your signature on a web forum takes up more space than your average post, then you are doing things wrong.

It's generally understood that jargon should not be used when explaining concepts to those who aren't familiar with it. When it does comes up, it's the responsibility of the instructor to explain what it means to avoid confusion. If I were teaching someone how to play bass, I wouldn't expect him or her to know what an E-Minor chord refers to. Similarly, when speaking to friends who don't play many video games, I wouldn't expect them to know what RPG or FPS means.


Learning programming doesn't mean you're learning mathematics, though. Formally, basic algebra and calclulus are prerequisites for most undergraduate computer science work; is it entirely unreasonable that tutorials targeting freshman-level programmers assume these familiarities?

Further, code that imements a mathematical formula rather than a logical heuristic or responding to an input/control event, tends to have a tighter scope and fewer method invocations, in which case naming brevity is tolerable.

Again, I'm not encouraging this kind of naming, I'm merely pointing out where it may have come from and why it may have remained unchallenged: the original, or even just prior, audiences for this material may have had a greater formal grounding before approaching programming.

Your point stands, your example is just poor.

.

Yes. I do find it unreasonable (I'll explain why in a moment). You should never make assumptions about what someone does or doesn't already know. This is especially true when it's pertinent to the material that you're teaching.

This is my point of view: I do not find it appropriate to use naming brevity in this manner because it's based on an assumption that everyone else will know exactly what you mean (even without a comment). An individual's previous knowledge should be moot in the teacher's mind to some degree. That's why good instructors will briefly review Algebra, Trig, Geometry, etc. at the beginning of a Calculus course, as mine had in high school. It's because they don't want to assume your immortal knowledge of the subject, and then lose you later on because you're having trouble using the FOIL (first, outer, inner, last) method.

Side note: Taking Calculus has no correlation to assuming "S" means summation in programming. We use the Greek symbol, not a capital "S".

I started this topic because this a widespread issue. There's really no reason why one programmer should have trouble reading someone else's code without copious lines of comments. Even if it isn't self-documenting by technicalities, why not just use names written in plain English? That is the heart of this discussion, I'd say.

So please stop focusing on an example that I used (it wasn't the best, I will acknowledge that) and let's discuss the real problem. The fact that code has become unreasonably difficult to read due to the use of previous programming conventions, and what we can do to establish a new programming convention that will encourage us to write code in plain English.

I know that the long method names commonly used in Javascript (the DOM or Document Object Model) bother a lot of people, but hey, you know exactly what the function does, right?


var exampleVariable = document.getElementById("exampleElement");

getElementsByClassName("className")

getElementsByTagName("tagName")

document.getElementById("test").removeAttribute("href")

More here

»sigh«

Yes. I do find it unreasonable (I'll explain why in a moment). You should never make assumptions about what someone does or doesn't already know. This is especially true when it's pertinent to the material that you're teaching.


They are not "assumptions." They are prerequisites. Look at the page you took your example from: what is the code supposed to do? It's an implementation of a power function. Are you saying that if I give you an assignment to write a program that implements a power function, I should not "assume" that you are familiar with what a power function is and the basic nomenclature of describing them? That's just silly talk.

It is perfectly acceptable to hold reasonable, domain-specific expectations of familiarity, and to leverage them within that limited context. Even when programming, we don't describe every routine from scratch; we leverage units that we have organized into functions and classes and methods and modules in order to both improve communicative density to other programmers and to improve machine performance (along the axis of code size).

This is my point of view: I do not find it appropriate to use naming brevity in this manner because it's based on an assumption that everyone else will know exactly what you mean (even without a comment). An individual's previous knowledge should be moot in the teacher's mind to some degree. That's why good instructors will briefly review Algebra, Trig, Geometry, etc. at the beginning of a Calculus course, as mine had in high school. It's because they don't want to assume your immortal knowledge of the subject, and then lose you later on because you're having trouble using the FOIL (first, outer, inner, last) method.


LOL, no. The reason your high school instructors review algebra and trig when starting calculus is because they know that most of you weren't paying attention the first time around, or have forgotten what little you learned. You think Calc IV begins with a refresher on Calc II? You think Systems Design spends the first couple hours revisiting Computer Organization?

Prerequisites are a fact or life. Refreshers are an inefficiency. Where the context is known and pre-disclosed and the scope is acceptably finite, it is perfectly fine to employ more informationally dense nomenclature. Stop trying to foist some incompetence-based absolute on everyone in every situation. Some shit just ain't for beginners.

Side note: Taking Calculus has no correlation to assuming "S" means summation in programming. We use the Greek symbol, not a capital "S".


Rebuttal: "?" is not a legal source code character in C. Don't get fresh.

I started this topic because this a widespread issue. There's really no reason why one programmer should have trouble reading someone else's code without copious lines of comments. Even if it isn't self-documenting by technicalities, why not just use names written in plain English? That is the heart of this discussion, I'd say.


That's mostly fair, but my point is that it is not an absolute. There are context-specific exceptions to your rule of thumb, and In My No-Longer So Humble Opinion mathematical formula code is one of them.

So please stop focusing on an example that I used (it wasn't the best, I will acknowledge that) and let's discuss the real problem. The fact that code has become unreasonably difficult to read due to the use of previous programming conventions, and what we can do to establish a new programming convention that will encourage us to write code in plain English.


You picked the example. I already told you you have a decent point, but picked a poor example. Yet here you are simultaneously defending your example choice and asking me to ignore it? Figure out what you want.

You may think that citing JavaScript's verbosity is a meaningful rejoinder here, but it isn't because you're looking at two different levels of audience. The broader and more general your audience, the more explicit and verbose your nomenclature. System-wide APIs should be as readable as possible, as should libraries intended for consumption by other code. Inner workings only to be maintained by experts can be more terse, because a high degree of familiarity is a prerequisite of maintenance responsibility.

Again, you're not wrong; you're just not completely right.


The reason your high school instructors review algebra and trig when starting calculus is because they know that most of you weren't paying attention the first time around, or have forgotten what little you learned.

I can confirm this is as truth by empirical self-reflection. Also, it happened again 3 years out of university math-class. :)

This topic is closed to new replies.

Advertisement