Advertisement

Data type diversity

Started by July 27, 2022 10:55 AM
51 comments, last by Calin 2 years, 5 months ago

Oberon_Command said:
bitset's size must be known at compile time

Ah ok your use-case is different, I use dynamic bitsets only, so a bit is indicated by its index everywhere. Their actual index value has little meaning, you make matches by checking for the same index value in other sets.

a light breeze said:
A bool is effectively an enum { false, true }

this confirms my suspicion.

a light breeze said:
The reasons for using bools are basically the same as the reasons for using enums, and you should be using enums all of the time

I`m not really an enum person but the idea of enums does help understand things in a broader scope.

My project`s facebook page is “DreamLand Page”

Advertisement

Alberth said:

a light breeze said:
You should still use bools wherever they make sense, even though they are potentially less efficient than chars.

Huh, you really think compiler implementors will pick a less efficient representation?

I know for a fact that Visual C++ prior to 5.0 had 4-byte bools. It's at least possible for a modern compiler to do the same when targeting a CPU on which 32-bit operations are faster than 8-bit operations, trading storage efficiency for runtime performance.

a light breeze said:
But if you have a big container of them, it might make sense to makes use a char-sized enum instead, for efficiency.)

a 32 or 64 bit integer is far more efficient (by a factor 8 just for storage, more if you want to perform operations on your bits as a cpu can handle then 32 or 64 bits in a single instruction, read about “bit twiddling” for more information). For more bits, use a bitset, which afaik is simpley std::vector<bool> (but never tried that).

Sure, if your container is an array (preferably of fixed size), and you know that it contains bools (as opposed to generic code which works on both bools and other values). Other containers exist, like std::deque and std::map.

Calin said:

a light breeze said:
A bool is effectively an enum { false, true }

this confirms my suspicion.

As I wrote earlier in the thread, there are plenty of additional rules about bools as a special type, especially when it comes to conversions which happen automatically behind your back. It provides much stronger guarantees than an enum.

The unscoped enumeration above, enum { false, true } could just as easily be set to values like -1 or 42 especially on old compilers respecting older language standards. C++11 tightened the rules a lot, C++17 refined them more. A scoped enumeration or strictly enforcing modern rules, enum class MyBool { myFalse, myTrue } adds some additional type checking so you couldn't do MyBool b = 42; but you can still cast into it with MyBool b = MyBool(42); and similar where you could still achieve the value of 42 stored inside.

The bool type has stronger conversion requirements built into the language. Casting or converting into bool requires whatever compiler-specific conversion is needed, casting/converting back out must only result in either the values 0 or 1, anything else is a compiler defect. That is, bool b = bool(42); will still give you an integer value 1.

a light breeze said:
I know for a fact that Visual C++ prior to 5.0 had 4-byte bools. It's at least possible for a modern compiler to do the same when targeting a CPU on which 32-bit operations are faster than 8-bit operations, trading storage efficiency for runtime performance.

In theory, but I've checked every modern/mainstream-compiler, and std::uint_fast8_t are always defined as "unsigned char". Also, after dealing with assembly for a while for my JIT-compiler I can also confirm that usually, dealing with char-sized data types is not inherently slower. There are always byte-sized mnemonics for registers (AL, BL, …) that are supported by most instructions, and the encoding is not less efficient than for dword/qword. The only place where using a char is less efficient is if you have to promote it to a larger type, then you have to use an instruction that does eigther zero or sign-extension. That is usually not even that much slower, there are specific instructions, but its something to keep in mind when doing pointer-arithemtic especially, since using a type smaller than the architectures native size_t will likely use an additional instruction (or more) in that case.

I have worked on FPGA chips with 32-bit chars. They are not PC class hardware, but they absolutely exist.

Also, c++ as a language doesn't typically concern itself directly with performance. There are plenty of operations like bit shifting that lead to nasty performance on hardware which does not have support. Virtual indirection had a notable cost even on desktop computers until 1996, as C++ was gaining popularity but still not standardized.

The language is concerned about correctness first and foremost. They added the exact size and fast size variations because of uncommon hardware. Specific compilers can implement exceptions and modifications, and compiler switches that forgo correctness in favor of performance.

Advertisement

frob said:
could just as easily be set to values like -1

isn`t an enum always uint (pozitive values)?

My project`s facebook page is “DreamLand Page”

Calin said:

frob said:
could just as easily be set to values like -1

isn`t an enum always uint (pozitive values)?

No. An enum value can be anything that the underlying type can represent. C++ not only allows the underlying type to be explicitly declared signed, but the default type is usually int.

See: https://en.cppreference.com/w/cpp/language/enum

Juliean said:

a light breeze said:
I know for a fact that Visual C++ prior to 5.0 had 4-byte bools. It's at least possible for a modern compiler to do the same when targeting a CPU on which 32-bit operations are faster than 8-bit operations, trading storage efficiency for runtime performance.

In theory, but I've checked every modern/mainstream-compiler, and std::uint_fast8_t are always defined as "unsigned char".

I'm mostly thinking about word addressing CPU architectures, e.g. Cray. Compilers for such processors can emulate byte-level access in the same way that compilers for byte addressing CPU architectures can support bit fields, but this comes at a significant runtime cost. It's at least possible that compilers for these architectures would choose to support 8-bit chars but use full word-sized bools.

It would make no sense to do this in the mainstream ARM/x86 world. Visual C++ prior to 5.0 did it anyway.

fleabay said:
I fear this is going to spawn a large thread of conversation about the nuances of booleans and the bool data type in various languages. None of which will be of any use.

a light breeze said:
I know for a fact that Visual C++ prior to 5.0 had 4-byte bools.

I really appreciate you taking the time to confirm my post by talking about a 25 year old compliler.

🙂🙂🙂🙂🙂<←The tone posse, ready for action.

This topic is closed to new replies.

Advertisement