Advertisement

C++ pointers that are released upon going out of scope

Started by February 13, 2025 05:02 PM
49 comments, last by taby 3 days, 2 hours ago

taby said:
That’s brilliant. Is there any chance that you could provide a code sample?

I'd have to first read about modern C++, last time I touched was with C++11 🙂

taby said:
What is the advantage of using a malloc wrapper, versus using a vector?

I saw al these array allocations and setting up pointers between them. Standard containers assume they can do with their data what they like. They aren't designed for pointing into the middle of their data, it's not safe.

I do agree with frob though. “it's from C” is not a good argument. In C too (even more than in C++) you also must handle memory allocation carefully. So it's not unlikely that the C code that you have does already do a good job with malloc/free.

I can see though that you may not trust it and it feels dangerous. C++ pushes that idea a lot. It takes some getting used to them. C is a very different language from C++.

If I would be in that situation, I would likely write proper C++ from the ground up, using the algorithmic description with a few peeks at the C code for details that are not clear. Porting the body of C code mostly unmodified is faster, but it will remain a somewhat magic not completely understood code piece.

Just for fun, here are some data about the code. As you can see, the triple pointer count is low enough to handle manually, but I'm still going to do it automatically just for fun.

Declaration count: 1695
Pointer declaration count: 640

DIR *   1
FILE *   41
FILE **   4
char *   44
char **   52
char ***   6
double *   289
double **   22
double ***   1
float *   23
float **   4
gzFile *   1
int *   116
int **   10
size_t *   2
struct sorting_double *   1
struct sorting_string *   8
unsigned *   7
unsigned **   1
unsigned char *   1
unsigned char **   2
unsigned int *   2
unsigned short *   1
unsigned short **   1
Advertisement

Why? What is the actual problem you're trying to solve? Pointers themselves are part of the language and intended to be used, even in modern code there are lots of places where pointers are the best solution.

The two things that would make me ask questions in a code review are the plain “unsinged *”, shorthand for “unsigned int” which is fine in C but less fine in C++, and the triple pointers. Even so, it is fundamentally getting back to the question of WHY. Everything about this looks like you're changing code merely for the purpose of changing it. C code works great with C++ code, just wrap the interface in extern “C” {} and call it done.

The whole point of the exercise is to get away from using malloc and free.

Please consider this Huffman code that I co-wrote. It beats the AI method, which used a sentinel value of \0, making it impossible to encode all values.

I use pointers, and I keep track of them in a vector, and upon destruction, I free them all in one fell swoop. Maybe this is something closer to what I'd want.

#include <iostream>
#include <vector>
#include <string>
#include <queue>
#include <map>
#include <algorithm>
#include <complex>
using namespace std;


// Modified from:
// https://gist.github.com/pwxcoo/72d7d3c5c3698371c21e486722f9b34b
//
// also see:
// Mastering Algorithms with C by Kyle Loudon


// For the use of complex<float> with map
bool operator<(const complex<float>& left, const complex<float>& right)
{
	if (right.real() > left.real())
		return true;
	else if (right.real() < left.real())
		return false;

	if (right.imag() > left.imag())
		return true;
	else if (right.imag() < left.imag())
		return false;

	return false;
}

template <typename T>
class huffman_codec
{
public:
	template <typename T>
	class Node
	{
	public:
		Node(void)
		{
			ch = 0;
			freq = 0;
			left = right = nullptr;
		}

		T ch;
		int freq;
		Node<T> *left, *right;
	};

private:
	vector<Node<T>*> nodes_to_clean_up;
	map<T, string> huffman_codes;

	size_t map_bit_count;
	vector<T> text;

public:
	huffman_codec(const vector<T>& plaintext)
	{
		set_plaintext(plaintext);
	}

	huffman_codec(void)
	{
		map_bit_count = 0;
		text.clear();
	}

	~huffman_codec(void)
	{ 
		clean_up(); 
	}

	void set_plaintext(const vector<T>& plaintext)
	{
		map_bit_count = 0;

		if (plaintext.size() == 0)
			return;

		text = plaintext;

		clean_up();

		init_huffman_codes(text);

		for (const auto pair : huffman_codes)
			map_bit_count += sizeof(T) * 8 + pair.second.size(); // m bits per key + n bits per element
	}

	size_t get_map_bit_count(void)
	{
		return map_bit_count;
	}

	void print_huffman_codes(void)
	{
		cout << "Huffman codes:" << endl;

		for (auto pair : huffman_codes)
			cout << pair.first << " " << pair.second << endl;

		cout << endl;
	}

	void get_huffman_codes(map<T, string> &huffman_codes_output)
	{
		huffman_codes_output = huffman_codes;
	}
	
	bool get_encoded_string(string &encoded_string)
	{
		encoded_string = "";

		for (const T c : text)
			encoded_string += huffman_codes[c];

		return true;
	}

	bool get_decoded_vector(const string& encoded_string, vector<T>& decoded_vector)
	{
		decoded_vector.clear();

		if (huffman_codes.size() == 0 || encoded_string == "")
			return false;

		if (huffman_codes.size() == 1)
		{
			const T c = huffman_codes.begin()->first; // should be '0'

			for (size_t i = 0; i < encoded_string.size(); i++)
				decoded_vector.push_back(c);

			return true;
		}

		// Get the minimum and maximum token size
		size_t min_bits = static_cast<size_t>(-1); // Casting the number -1 transforms it into the largest possible value that can be held by a size_t
		size_t max_bits = 0;

		for (const auto pair : huffman_codes)
		{
			if (pair.second.size() < min_bits)
				min_bits = pair.second.size();

			if (pair.second.size() > max_bits)
				max_bits = pair.second.size();
		}

		const size_t encoded_len = encoded_string.length();

		// Use a sliding window of variable length
		size_t begin_index = 0; // Start at the very beginning of the string
		size_t len = min_bits; // Don't waste time by starting at len = 1 if it's not necessary

		// While stuff to parse 
		while (is_valid_window(encoded_len, begin_index, len))
		{
			if (len > max_bits)
			{
				// No match was found up to here, which is exceptionally weird
				return false;
			}

			// Match token with map element
			const string token = encoded_string.substr(begin_index, len);

			bool found_token = false;

			for (const auto pair : huffman_codes)
			{
				if (pair.second == token)
				{
					decoded_vector.push_back(pair.first);
					found_token = true;
					break;
				}
			}

			if (found_token)
			{
				// Slide window by token size number of steps,
				// then reset window length to the minimum
				begin_index += token.size();
				len = min_bits;
			}
			else
			{
				// Expand window length by 1
				len++;
			}
		}

		return true;
	}

private:
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	void encode(Node<T>* root, const string str, map<T, string>& huffman_codes)
	{
		if (root == nullptr)
			return;

		// found a leaf node
		if (!root->left && !root->right)
			huffman_codes[root->ch] = str;

		encode(root->left, str + "0", huffman_codes);
		encode(root->right, str + "1", huffman_codes);
	}
	
	// Function to allocate a new tree node
	Node<T>* getNode(const T ch, const int freq, Node<T>* left, Node<T>* right)
	{
		Node<T>* node = new Node<T>();
		nodes_to_clean_up.push_back(node);

		node->ch = ch;
		node->freq = freq;
		node->left = left;
		node->right = right;

		return node;
	}

	// Comparison object to be used to order the heap
	struct node_comp
	{
		bool operator()(const Node<T>* l, const Node<T>* r)
		{
			// highest priority item has lowest frequency
			return l->freq > r->freq;
		}
	};

	void clean_up(void)
	{
		if (nodes_to_clean_up.size() == 0)
			return;

		//cout << "Cleaning up " << nodes_to_clean_up.size() << " nodes." << endl;

		for (size_t i = 0; i < nodes_to_clean_up.size(); i++)
			delete nodes_to_clean_up[i];

		nodes_to_clean_up.clear();
	}

	bool is_valid_window(const size_t string_length, const size_t begin_window_index, const size_t window_length) const
	{
		if (begin_window_index >= string_length)
			return false;

		if ((begin_window_index + window_length - 1) >= string_length)
			return false;

		return true;
	}

	void init_huffman_codes(const vector<T>& input)
	{
		huffman_codes.clear();

		if (input.size() == 0)
			return;

		Node<T>* root = nullptr;

		// count frequency of appearance of each character
		// and store it in a map
		map<T, int> freq;

		for (T ch : input)
			freq[ch]++;

		if (freq.size() == 1)
		{
			root = getNode(freq.begin()->first, freq.begin()->second, nullptr, nullptr);
			huffman_codes[root->ch] = "0";

			return;
		}

		// Create a priority queue to store live nodes of
		// Huffman tree;
		priority_queue<Node<T>*, vector<Node<T>*>, node_comp> pq;

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (auto pair : freq)
			pq.push(getNode(pair.first, pair.second, nullptr, nullptr));

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node<T>* left = pq.top();
			pq.pop();

			Node<T>* right = pq.top();
			pq.pop();

			// Create a new internal node with these two nodes
			// as children and with frequency equal to the sum
			// of the two nodes' frequencies. Add the new node
			// to the priority queue.
			int sum = left->freq + right->freq;

			pq.push(getNode('\0', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		root = pq.top();

		// traverse the Huffman Tree and store Huffman Codes
		// in a map. Also prints them

		encode(root, "", huffman_codes);
	}
};

template <typename T>
float get_vector_binary_entropy(const vector<T> &v)
{
	const float length = static_cast<float>(v.size());

	float entropy = 0.0f;

	map<T, size_t> input_map;

	for (size_t i = 0; i < v.size(); i++)
		input_map[v[i]]++;

	for (const auto pair : input_map)
	{
		const float probability = pair.second / length;

		// See Definition 10.1.1, where we use the binary Shannon entropy
		entropy += probability * logf(probability);
	}

	if (entropy == 0.0f)
		return 0.0f;
	
	return -(entropy / logf(2.0f));
}


int main(void)
{
	const complex<float> u = 1.0f;
	const complex<float> a = u / sqrtf(2.0f);
	const complex<float> b = complex<float>(-1.0f,  1.0f) / sqrtf(6.0f);
	const complex<float> c = complex<float>(-1.0f, -1.0f) / sqrtf(6.0f);
	const complex<float> d = complex<float>( 1.0f, -1.0f) / sqrtf(6.0f);

	// Vectors with lower entropy produce higher compression rates
	vector<complex<float>> plaintext = { a, b, c, d, a, u, u, u };
	cout << "Vector entropy: " << get_vector_binary_entropy<complex<float>>(plaintext) << endl;


	huffman_codec<complex<float>> h;
	h.set_plaintext(plaintext);
	h.print_huffman_codes();


	cout << "Original vector was: ";

	for (size_t i = 0; i < plaintext.size(); i++)
		cout << plaintext[i] << ' ';

	cout << endl;
	

	string encoded_string = "";
	h.get_encoded_string(encoded_string);
	cout << "Encoded string is:   " << encoded_string << endl;


	vector<complex<float>> decoded_vector;
	h.get_decoded_vector(encoded_string, decoded_vector);
	cout << "Decoded vector is:   ";

	for (size_t i = 0; i < decoded_vector.size(); i++)
		cout << decoded_vector[i] << ' ';

	cout << endl;


	// The number of map bits becomes negligible for large encoded vector length
	size_t num_map_bits = h.get_map_bit_count();

	size_t num_encoded_bits = encoded_string.size() + num_map_bits;
	size_t num_decoded_bits = decoded_vector.size() * sizeof(complex<float>)*8;
	float scale = static_cast<float>(num_encoded_bits) / static_cast<float>(num_decoded_bits);

	// Scale is less than 1.0 if compression occurs
	cout << "Scale: " << scale << endl;

	return 0;
}

I don't know. There are like a few dozen types used by the code. I'd have to keep a vector for each type. Seems not elegant.

Advertisement

taby said:
The whole point of the exercise is to get away from using malloc and free.

Getting used to other peoples coding conventions would be a much better exercise. /(:D\

I'm trying to use a vector<vector<vector<float>>> instead of a float ***. I've got it figured out.

https://gist.github.com/sjhalayka/43575da24e0661749e63e0f73d10c234

I've got it figured out.

No, you really don't. I think I am going to walk away from this. Good luck on the project.

You're right. The last dimension was not allocated using malloc, so I had to change the type to vector<vector<float *>>.

Thanks for your insight. I am serious about trying to do this the best way. I appreciate everyone's recommendations.

int read_data_fly(char* datafile, int dtype, double* data, float** probs,
    int num_samples_use, int* keepsamps, int start, int end,
    int* keeppreds_use, gzFile inputgz, size_t current,
    int num_samples, int num_preds, int genskip, int genheaders,
    int genprobs, size_t* bgen_indexes, double missingvalue,
    double threshold, double minprob, int nonsnp,
    int maxthreads) 
{
    int  thread;
    int  threadstart;
    int  threadend;
    int  threadlength;
    float*** threadprobs;
    vector<vector<float*>> threadprobs3_;



    // this is only temporary
    dtype = 2;
    maxthreads = 1;
    probs = malloc(sizeof(float*) * 2);
    num_samples_use = 1000;

    const size_t probs_size = num_samples_use;

    probs[0] = malloc(sizeof(float) * probs_size);
    probs[1] = malloc(sizeof(float) * probs_size);

    float val = 0.0f;

    for (size_t i = 0; i < probs_size; i++)
    {
        val += 1.0f;

        probs[0][i] = val;
        probs[1][i] = val;
    }





    if (dtype == 1 || dtype == 2 || dtype == 3 ||
        dtype == 4) // can read in parallel
    {
        threadlength = (end - start - 1) / maxthreads + 1;
        if (dtype == 2) 
        {
            threadprobs3_.resize(maxthreads);
            //threadprobs = malloc(sizeof(float**) * maxthreads);
        }

#pragma omp parallel for private(thread, threadstart, threadend)   schedule(static, 1)


        for (thread = 0; thread < maxthreads; thread++) {
            ;
            threadstart = start + thread * threadlength;
            threadend = start + (thread + 1) * threadlength;
            if (threadend > end) {
                threadend = end;
            }

            if (dtype == 1) {
                read_bed_fly(
                    datafile, data + (size_t)(threadstart - start) * num_samples_use,
                    num_samples_use, keepsamps, threadend - threadstart,
                    keeppreds_use + threadstart, num_samples, num_preds, missingvalue);
            }

            if (dtype == 2) {
                if (probs == NULL) {
                    read_bgen_fly(datafile,
                        data + (size_t)(threadstart - start) * num_samples_use,
                        NULL, num_samples_use, keepsamps, threadstart,
                        threadend, keeppreds_use, num_samples, num_preds,
                        bgen_indexes, missingvalue, threshold, minprob);
                }
                else{
                    //threadprobs_[thread] = malloc(sizeof(float*) * 2);
                    threadprobs3_[thread].resize(2);

                    cout << "ptr usage ahead" << endl;
       
					//threadprobs[thread][0] = probs[0] + (size_t)(threadstart - start) * num_samples_use;
					//threadprobs[thread][1] = probs[1] + (size_t)(threadstart - start) * num_samples_use;
                    ((float**)&threadprobs3_[thread])[0] = probs[0] + (size_t)(threadstart - start) * num_samples_use;
                    ((float**)&threadprobs3_[thread])[1] = probs[1] + (size_t)(threadstart - start) * num_samples_use;

                    //for (size_t x = 0; x < maxthreads; x++)
                    //{
                    //    for (size_t y = 0; y < 2; y++)
                    //    {
                    //        for (size_t z = 0; z < probs_size; z++)
                    //        {
                    //            cout << ((float**)&threadprobs3_[x])[y][z] << endl;;
                    //        }
                    //    }
                    //}

                    cout << "ptr usage done" << endl;

                    read_bgen_fly(
                        datafile, data + (size_t)(threadstart - start) * num_samples_use,
                        &threadprobs3_[thread][0], num_samples_use, keepsamps, threadstart,
                        threadend, keeppreds_use, num_samples, num_preds, bgen_indexes,
                        missingvalue, threshold, minprob);
 
                    //free(threadprobs[thread]);
                    threadprobs3_[thread].clear();
                    

                }
            }

            if (dtype == 3) {
                read_sped_fly(
                    datafile, data + (size_t)(threadstart - start) * num_samples_use,
                    num_samples_use, keepsamps, threadstart, threadend, keeppreds_use,
                    num_samples, num_preds, missingvalue, threshold, nonsnp);
            }

            if (dtype == 4) {
                read_speed_fly(
                    datafile, data + (size_t)(threadstart - start) * num_samples_use,
                    num_samples_use, keepsamps, threadstart, threadend, keeppreds_use,
                    num_samples, num_preds, missingvalue, threshold, nonsnp);
            }
        }

        if (dtype == 2) {
         
           //free(threadprobs);
           threadprobs3_.clear();
        }
    }

    if (dtype == 5) {
        (void)read_gen_fly(datafile, data, probs, num_samples_use, keepsamps, start,
            end, keeppreds_use, inputgz, current, num_samples,
            num_preds, genskip, genheaders, genprobs, missingvalue,
            threshold, minprob, nonsnp);
    }

    //free(probs[0]);// = malloc(sizeof(float) * num_samples_use * 1000);
    //free(probs[1]);// = malloc(sizeof(float) * num_samples_use * 1000);
    //free(probs);// = malloc(sizeof(float*) * 2);


    return (current + end);
}
Advertisement