An initial upload is available at SourceForge, although be warned: it is pretty messy still, and a lot of the new features are thoroughly untested. Especially, watch out for the noise_gen.cpp file, where the value, gradient, cellular, white and simplex noise types are generated. It is non-commented, messy, experimental, and in need of reorganization.
One change that I am making is to allow certain properties of noise modules that are currently scalar-only (threshold/falloff parameters for Select modules, frequency parameter for Fractals, etc...) to be overridden with module sources if desired. This could open up some interesting behavior, I think. A couple things on my TODO list, that I'm going to try to work on as things progress:
* General bug-fixing. It's kind of hard to thoroughly test these things for bugs, especially in the higher-orders. I am certain there are many nasty, insidious bugs lurking in this thing, and I'll write tests and squash the bugs as I progress.
* Speed up cellular noise. I'm currently doing a brain-dead implementation that uses a single seed-point per cell (and not the Poisson distribution of seed points per cell that Worley initially described). I also don't perform the speed optimization that Worley describes in his reference implementation as found in the book Texturing and Modeling: A Procedural Approach. I just iterate the entire neighborhood of the cells. Cellular noise is going to need a bit of an overhaul, considering the fact that the library has to support up to 6-dimensions of it. But trying to do Worley's optimization (basically, unrolling the inner loops by hand) is a god-awful ugly task for 6D noise. Bad enough for 4D, but 6D... *shudder*
* Simplex noise variants (especially 4 and 6 dimensional) are still under development. They work, currently, but I'm still fiddling with them.
* General, across-the-board optimizations. I'm not a very good optimizer; I haven't spent enough time studying optimization methods and theory, and my computer science college years are so far behind me as to be a dim, fond memory that I can't fully piece together, like the remnants of a pleasant dream that I just can't quite completely recall. It's like, "I know the dream had something to do with roller skates and a girl in a toga, but the rest is a greenish haze." That kind of dream. But I would like to put some effort and study into how the library should be changed to make it thread-safe, and to facilitate parallel processing in general. Maybe even to be able to use OpenCL or something, to do on-GPU evaluation of the functions. This is all just pie-in-the-sky, since I'm not really all that up on parallel processing, have never used OpenCL, and very rarely even use multi-threaded architectures in my projects. But this kind of function chaining, it seems to me, would be an ideal application of parallel processing.
* Re-integrate the RGBA modules. The initial file upload to the sourceforge project includes only the current set of re-factored Implicit components. These components output double-type values. I have a large set of additional modules that operate in RGBA space, and have hooks and adapters that draw data from the Implicit space. They're still being re-factored to fit the new system of ScalarParameters.
* Refactor and reintegrate the buffer-based components of the library. I have a large set of functions that operate on 2D buffers of data (either doubles or RGBAs). These include things like taking the normal-map of a 2D mapping, calculating a bump-map, performing a blur, erosion simulation, etc... I am unsure whether it is even appropriate to include these things in the library. While it is convenient to be able to specify the same sort of module chaining with these operations as with the purely mathematical noise functions, I think the methodology has some flaws. I implement a mapFunction() interface in these buffer-based functions, that allocates buffers of memory, populates them with data from sources, and uses those source buffers to calculate values for output. It works fine in theory, but in practice, if I am calling a very complex chain of functions (which happens quite often, in my case) and operating upon large buffers of data (again, quite common) then the memory usage skyrockets, as each step along the chain allocates some number of temporary buffers, then calls recursively up the chain into other functions that allocate yet more temporary buffers, and so forth, until the buffers are finally populated directly from the implicit sources. I have quite frequently gotten Lua "out of memory" errors while performing certain elaborate computations. You wouldn't believe how quickly memory is consumed when doing complex, layered procedural textures at 1024x1024 or 2048x2048 dimensions.
So I'm thinking of removing the chaining of buffer-based functions altogether, removing some convenience in favor of forcing a greater level of control over buffer allocations upon the user. The adapter functions (that convert implicit output to buffered output) would remain as Utility functions. At a later date, I might re-think the system and see if I can redesign it to be more well-behaved. Because it really is very convenient sometimes, to be able to chain them like that.
* Source control. Currently, the sole download is a .ZIP archive. I'd like to convert to a source-control soon.
* Library build and configuration. As it stands, you merely include the .cpp files in your own project, and include the header anl.h to use the modules. It might be better to do a library build instead.
* And many more. I'd like to flesh out the set of basis functions some more (lattice convolution, maybe sparse convolution, etc...) The more, the merrier. Also, pattern functions and periodic functions based on sin, etc...
Anyway, that's it for now. Enjoy, and if you find bugs, please let me know.[/font]
If you use stl algorithms already (no idea if you do) you can pretty much just drop in the parallel equivalents. The algorithm/functor pattern makes it relatively simple to see how to be thread-safe with PPL/TBB too.
Also worth looking at OpenMP if you prefer something a bit less stl-like.
I wrote my mandelbrot renderer using std algorithms and lambdas, and got around 4x performance using PPL parallel_for instead of std::for_each in my outer loop (for each row of pixels in the image). Such an easy win! I also tried CUDA, but the speed-up wasn't as impressive as I'd hoped - between 1 and 2x over PPL. That said, my CUDA implementation is pretty naive, and I'm reading back the image every time. I imagine I'd get a lot of benefit if I was using an OpenGL/D3D texture instead of reading back to system memory, and rendering a constant stream of images rather than stalling after each one.
I'm also pretty sure the unpredictable innermost loop which decides if a point is in the set is much less parallelism-friendly than most image synthesis stuff. So I highly recommend giving PPL/TBB/OpenMP/CUDA/whatever a go [img]http://public.gamedev.net/public/style_emoticons/default/smile.gif[/img]