Advertisement

random selection of array elements

Started by August 30, 2017 01:16 PM
27 comments, last by heh65532 7 years, 5 months ago

If you're using C++, the standard library provides just such a distribution.


 

#include <iostream>
#include <random>

int
main()
{
  std::random_device           rd;
  std::minstd_rand             gen(rd());
  std::discrete_distribution<> d({1000,800,400,200});
  
  for(int n=0; n<10; ++n)
  {
    std::cout << d(gen) << "\n";
  }
}

 

Stephen M. Webb
Professional Free Software Developer

3 minutes ago, Bregma said:

If you're using C++, the standard library provides just such a distribution.




 

 

 

Not using C++ just running tests with it. thanks though!

Advertisement
10 minutes ago, heh65532 said:

I'm unsure if I got the math right.

I made sample program where there's no arrays but I'm trying to calculate the probability for each number. 100 / 100 for example should have 50% probability.

 

Or in another example max number being 100 the number 10 should have 10% change of being picked.

(max number just means the highest number found in array)

It's hard to see how these 2 examples represent the same mathematics.

In the first example, it's 50/50 because each number is equal. Another way to view it is that the total weighting of the numbers is 200, and each of those numbers is 100, so each has a 100/200 chance, i.e. 0.5, which equals 50%.

In the second example, we know there is a number 100, and a number 10 - but that's a total weight of at least 110. If those are the only 2 numbers, then you'd expect the 10 to be picked 9% of the time (i.e. 10/110). If there were a few other numbers, that proportion could go down.

So I think you need to come up with a more rigorous definition of your weighting formula.

28 minutes ago, heh65532 said:

I made sample program where there's no arrays but I'm trying to calculate the probability for each number. 100 / 100 for example should have 50% probability.

Do you mean to say you have a set S of integers and you wish to choose some s in S such that the probability P(s) of s being chosen is \(P(s) = \frac{s}{\sum(S)}\) (ie. a discrete distribution just like in the C++ standard library)?

Or, from your example, do you mean to say the largest value in S should always have a probability of 0.5 and all remaining values should be proportional to that (which is going to give you painful math).

Stephen M. Webb
Professional Free Software Developer

Actually I think the code I posted earlier works like supposed to.

Here's my final code with array (I'll let the code show what I'm trying to archive):

 


#include <iostream>
#include <string>

using namespace std;



int main()
{
 srand (time(NULL));
    
    // Array already sorted
 int array[] = {200,300,700,1000,1500};
 int asize = 5;
 
 int maxNum = array[asize - 1];
    
 bool picked = false;
    

 for(int i = 0; i < asize; i++)
 {
     float change = (float)array / ((float)maxNum) / 2.0;
     int r = rand() % 100;
     
     int ichange = (int(change * 100));
         
     cout << " change: " << ichange << "% random: " << r << "%" << endl;
     
     if( r <= ichange )
     {
         cout << " Picked random array element: " << array << " with change of " << ichange << "%" << endl;
         picked = true;
         break;
     }
     
 }
 
 
 if(!picked) 
 {
  cout << "Max picked: " << array[asize - 1] << endl;
 }
 
}

 

Not 100% sure if it really works mathematically correct so please let me know what you think.

Whether it is mathematically correct depends on whether you've defined what "correct" is. If you're using it for a game (presumably), then if it "feels good" then that is probably enough.

That said, the code you've posted may not very general, I suspect, as it depends on the largest number to calculate the "weight", so I imagine you might be less happy with the results if the largest number were either extremely large or not very large compared with the others. Generating the random numbers inside the inner loop could result in surprising distributions, particularly as the array size increases. That said, I cannot really visualise the resulting distribution, I could be wrong, these are just intuitive guesses.

Without knowing your use case, from your description you probably do want the "Fitness proportionate selection" algorithm.

Advertisement

If you want to produce the numbers 0, 1, 2, 3 with probabilities proportional to 1000, 800, 600, 400 respectively, make an array with the partial sums of the non-normalized probabilities, like this:

a[4] = {0, 1000, 1800, 2400};

total_sum = 2800

Now, pick a random number between 0 and 2800 (including 0 and not including 2800) and find the highest index i such that a is less or equal than your number. You can do this part with binary search, if you want the procedure to be fast even when using a large table.

 

Thank you all for the replies.

@alvaro that's just what I needed, you approach is so much better than mine.

here's the resulting code, I think this is what you meant:

 


#include <iostream>
#include <string>

using namespace std;

int main()
{
 srand (time(NULL));
     
  int asize = 4;
  int array[asize] = {0, 1000, 2000, 3000};
  int total = 6000;
  
  int ra = rand() % total;
  
  for(int i = asize - 1; i >= 0; i--)
  {
      if( array <= ra )
      {
          cout << "selected " << i << " --- " << array << " <= " << ra << endl;
          break;
      }
  }
  
  cout << "finished" << endl;
}

 

You should include <cstdlib> to have access to srand and rand, <ctime> for time, and you don't need anything from <string>. But other than that, your code seems OK to me.

There are a few issues with your use of pseudo-random numbers that would go away if you were to use C++11's <random> library, and then you may as well use std::discrete_distribution instead of rolling your own.

 

The code wrongly uses the relative frequencies in the array rather than the cumulative frequencies, so that it's possible to generate a random value that will never be less than or equal to any of the values in the array.

And the iteration should be forwards, not backwards. Currently you'd exit the loop on the first iteration every time (if the previous bug was fixed).

Once those are fixed, this would be a reasonable implementation of roulette wheel selection.

This topic is closed to new replies.

Advertisement