Advertisement

Histogram: logarithmic binning in O(1)?

Started by November 30, 2013 01:55 AM
5 comments, last by Prune 11 years, 2 months ago

I need a histogram with a logarithmic X axis; that is, the bin size is logarithmic (say with ranges 0.1-1, 1-10, 10-100, etc.). I'm storing the histogram as a plain 1D array.

Given a value, to find the corresponding bin, code I've found online loops over the bins, calculating bin width and checking if x falls in it, else continuing. That's O(num_bins), which is awful. How do I calculate the bin/array index in O(1) as is done in the linear case? Surely there's a closed-form analytic formula.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

bin = floor(log10(x));

or am I not undertanding the question?

If it's not log10 base you can use natural logs to convert the ranges.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley
Advertisement

\[\text{bin}(x) = \left \lfloor \log_{10}\left ({\frac{x}{x_\text{min}}} \right ) \right \rfloor \]

In your case, \( x_\text{min} = 0.1 \) and the bin numbering starts at zero. You can derive this by starting the histogram at the 1..10 bin and accounting for the smaller bins afterwards.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Thanks a lot!

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

Actually, not quite there. When I said "say with ranges 0.1-1, 1-10, 10-100, etc." that was just an example. The formula above works when num_bins = log10(max_range). But I want num_bins to be a parameter that can be set independently of max_range.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

The general formula is:

bin = floor(N * (log(x) - log(min)) / (log(max) - log(min)))

The value of bin is the bin index such that there are N bins logaritmically spaced between min (inclusive) and max (exclusive).

Advertisement

Thanks again!

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

This topic is closed to new replies.

Advertisement