Advertisement

Question about portability

Started by October 30, 2024 04:20 PM
6 comments, last by LorenzoGatti 1 month, 2 weeks ago

Is Func4 the only good answer here? If bufferSize returns a value of only 24 bytes in Visual Studio, should I still use the Func4 example? It seems to be the only option that satisfies the compiler. When I tried using uint8_t, I received warnings about potential data loss. Should I always use size_t in this situation?

/EDIT: I misread the intention, please disregard.

/Original post follows:

No, it is not portable.

In general don't create functions like that with potentially overlapping integral types. It is something that can be done with extreme caution and testing on a specific system, knowing you are in treacherous code.

Overload resolution rules are complex enough, especially when you get into implicit conversions and implementation variations on fundamental types. Even worse will be if you use numeric constants like the value 10 directly.

It's a scenario that people look through the standards and argue over highly specific wording, and then check it against major compiler vendors to see what they actually do as the implementation can be subtly different from the language standard. Each compiler can potentially have different rules to rank which is used depending on the properties of their implementation-defined behavior for fundamental types. The rules are 23 pages long in C++20, pages 314 to 337.

The only winning move is not to play.

Advertisement

@frob I think he's asking about what integer type to use in general, not whether the overloading in the example is portable (and since the functions have different names, there is no issue with overloading there).

There are differing opinions on using size_t vs. int vs. other types. Some people say to always use int for uniformity, but I don't find those arguments to be convincing. Using int has several significant drawbacks:

  • Pros:
    • Can represent negative integer values. However negative integers are often a bug, since most integers represent a number of things or index which cannot be negative.
    • Some operations (e.g. cast from float to int) are faster with int than with unsigned types like size_t, at least on Intel CPUs. The cast is 1 instruction for 32-bit int, while other types require multiple instructions to cast (float to size_t/uint64_t is pretty bad).
  • Cons:
    • A smaller range of possible values (half as many), due to the type being signed.
    • The size can vary depending on the platform. It might be 16, 32 or 64 bits.
    • If used to represent an index or count, bounds checking requires checking ≥0 and <N, rather than just <N as with unsigned types.
    • All std namespace data structures use size_t. If you don't use size_t as well, you will get countless compiler warnings about type conversions and sign mismatches. Don't silence these warnings, they indicate potential bugs.

I prefer to use size_t almost everywhere. It has some benefits:

  • Pros:
    • Since size_t is always as big as a pointer, and unsigned, you always have access to the full range of values supported by the hardware.
    • The size can vary based on the platform, but that's actually good for portability. You always get the largest type that the hardware can manipulate efficiently.
    • No need to check for values <0 when doing bounds checking.
    • Some operations can be faster (e.g. load effective address) if the value size matches the pointer size. With 32-bit int on x64 platforms, I've seen an extra instruction emitted that wasn't there with size_t. This extra leaq instruction seems to happen if loop counters are int or uint32_t, but not with size_t.
    • Good compatibility with std namespace data structures.
  • Cons:
    • On 64-bit platforms, 8 bytes per value is overkill most of the time. This increases the size of data structures which might cause more cache misses in some situations (not likely to be a real issue most of the time).
    • Potential for underflow when doing subtraction. However, this is avoidable in most cases (you should have checked the operands for A ≥ B elsewhere). If you want to saturate at 0, you can do max(A,B) - B rather than max(A-B,0).

My general advice is to use size_t most of the time. The only cases where I use fixed-size types (uint32_t, uint64_t) are when I need a specific integer size, such as when doing serialization. I almost never use signed integer types, except when the value represents something that can be negative (e.g. an offset from current file position (int64_t)). There are only a handful of places in my codebase (>million LOC) where I use plain “int”, and it's mostly for faster float <-> int conversions where I don't mind the limited range, and in that case I actually use int32_t instead of int, in case int is not 32 bits. So, I can safely say there is little good reason to use int.

@Aressera thank you very much , you speak my language. you answered my question precisely! Big love!

Aressera said:
No need to check for values <0 when doing bounds checking.

As soon as you have multiple sizes, and eg want to compute remaining size, you start subtracting sizes from each other and may end up in the “negative” area of the range, except it's interpreted as a very large number due to the type being unsigned.

In other words you still need <0 checking, except it's not around 0 but somewhere in the 10**9 .

Wondering what Stroustrup has to say about it, I got a random result from a search engine https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf which is apparently about some standards proposal. There is however also quite some generic arguing on int vs uint typing for “psitive only” values, eventually favoring int.

Alberth said:

Aressera said:
No need to check for values <0 when doing bounds checking.

As soon as you have multiple sizes, and eg want to compute remaining size, you start subtracting sizes from each other and may end up in the “negative” area of the range, except it's interpreted as a very large number due to the type being unsigned.

In other words you still need <0 checking, except it's not around 0 but somewhere in the 10**9 .

In my post I addressed that concern. If you are determining if size A is ≥ B or A ≤ B, you can do that comparison directly. If you want to know the remainder with saturation at 0, then you can do max(A,B) - B instead of max(A-B,0). If you need to know the negative remainder, then you can cast to signed types, with all the caveats about reduction in range.

Advertisement

On 64-bit platforms, 8 bytes per value is overkill most of the time. This increases the size of data structures which might cause more cache misses in some situations (not likely to be a real issue most of the time).

While size_t is “pointer friendly”, for small integers of known limits the portable signed and unsigned integer types of guaranteed size (<cstdint> in C++) are likely to be a more flexible and equally safe choice.

For example, a permutation of 7 objects can be a uint_fast16_t, or a uint16_t if you care about padding or total size of structs (or if you don't like surprises). It is definitely convertible to size_t unless pointers are less than 16 bits.

Omae Wa Mou Shindeiru

This topic is closed to new replies.

Advertisement