Flexible and fast pluggable math library?
So I''m just about finished with my container library, and I''ve started thinking about how i''m going to implement my math library, and I''ve decided to use a somewhat flexible module based approach, with support for different ways of accomplishing the results, swappable at run-time (modules optimised for 3dnow, sse, etc)
While this sounds nice on paper, I realised today that virtualism requires that all the functions be non-inline...
and the function call overhead is larger than the actual operation being performed.
ugh. Anyhow, I was wondering if anyone has done anything like this before. For my intense number crunching code, I am thinking of using a strategy pattern to represent an algorithm, and the strategy uses all inlined specific function code. Sound good?
And on a side note, does anyone know where I can find some in depth information on the 3dnow instructions?
===============================================
Have I no control, is my soul not mine?
Am I not just man, destiny defined?
Never to be ruled, nor held to heel!
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
AMD 3dnow: www.amd.com/devconn/3dsdk/index.html
Module based approach sounds pretty cool. Make dynamical loadable libraries for different processors instructions (MMX,3DNOW...)
and determine what processor user has(Take a look from flipcode they recently had tip for this). Then load proper library. You must use same function (AARGH! what was that word. Just same name and parameters)and call function loading mechanism takes care of the end.
Inline functions are turned to "normal" code while compiling so you can´t take use of them with dynamical approach.
Instead of loading library you could use function pointers or polyformism(But these don´t either accept inline functions).
You were worried about overhead for function calls.So there is dirty way using c++ preprocessor and defining which processor instructions to use.But this requires user to be able compile his/her program(I don´t support for windows but maybe Linux).
So as a final conclusion I would suggest you to use dynamical approach for complex mathematical functions(which do enough operations) and inlines for simple functions.
Well it´s up to you what you use.Do some research. But I suppose you aren´t professional game programmer so don´t be too worried about speed just code your solution and profile it later and make better,faster...
Module based approach sounds pretty cool. Make dynamical loadable libraries for different processors instructions (MMX,3DNOW...)
and determine what processor user has(Take a look from flipcode they recently had tip for this). Then load proper library. You must use same function (AARGH! what was that word. Just same name and parameters)and call function loading mechanism takes care of the end.
Inline functions are turned to "normal" code while compiling so you can´t take use of them with dynamical approach.
Instead of loading library you could use function pointers or polyformism(But these don´t either accept inline functions).
You were worried about overhead for function calls.So there is dirty way using c++ preprocessor and defining which processor instructions to use.But this requires user to be able compile his/her program(I don´t support for windows but maybe Linux).
So as a final conclusion I would suggest you to use dynamical approach for complex mathematical functions(which do enough operations) and inlines for simple functions.
Well it´s up to you what you use.Do some research. But I suppose you aren´t professional game programmer so don´t be too worried about speed just code your solution and profile it later and make better,faster...
Be worried about nothing else than speed ! I suppose you aren´t professional game programmer so don´t be too worried about flexibility just code the fastest solution and do a bit of extra work when using the library... ![](wink.gif)
If you want both of speed and flexy, you can put both the prototypes of all the functions (for every processor), and function pointers updated at runtime depending on the processor in your library header. That way the lazy user can go for the virtual function pointers, and the speed freak can check for the processor type in his code, removing any function call overhead.
Another way of removing function call overhead is to process many data elements inside your virtual functions. This should work cause many of the extensions are string instructions anyway. (Single Instruction Multiple Data).
I didnt understand what you mean. Can you give an example ?
![](wink.gif)
If you want both of speed and flexy, you can put both the prototypes of all the functions (for every processor), and function pointers updated at runtime depending on the processor in your library header. That way the lazy user can go for the virtual function pointers, and the speed freak can check for the processor type in his code, removing any function call overhead.
Another way of removing function call overhead is to process many data elements inside your virtual functions. This should work cause many of the extensions are string instructions anyway. (Single Instruction Multiple Data).
quote:
I am thinking of using a strategy pattern to represent an algorithm, and the strategy uses all inlined specific function code.
I didnt understand what you mean. Can you give an example ?
May be just use preprocessor defintions and compile three (or more) different executables? With all functions inline, of course... I think this is the fastest way...
The key here is to operate on large arrays of data.
For single values, optimazation rarly makes sence.
Very easy to implement + very cache friendly.
The other approach is to code complex functions in the math library. Functions that thousands of cycles to execute.
This can be very time consuming and makes the library very specialized.
Obviously the first approach is simpler but not always (but very often) applicable.
For single values, optimazation rarly makes sence.
Very easy to implement + very cache friendly.
The other approach is to code complex functions in the math library. Functions that thousands of cycles to execute.
This can be very time consuming and makes the library very specialized.
Obviously the first approach is simpler but not always (but very often) applicable.
quote:
I am thinking of using a strategy pattern to represent an algorithm, and the strategy uses all inlined specific function code.
quote:
Original post by Diodor
I didnt understand what you mean. Can you give an example ?
okay, i was using a term from the Design Pattern''s book, which people here are probably not accustomed to.
The ''Strategy'' pattern is an object which represents an algorithm.
The easiest way I can explain it is to use an example: An array class has a sub-module which sorts the array. This module is virtual, and can be swapped out with different algorithms at run time (quick sort, bubble sort, merge sort, heap sort, etc). This module is considered a ''Strategy''.
My plan is to have my graphics engine use a strategy to represent transformations, rather than have the transformation code be generic and call a specific math library. This way, the strategy algorithm will have all the low level math code inlined, and the transformation algorithm can be swapped.
sounds good? I know it will end up having some code duplication between separate implementations, but it''s the only flexible method I can think of that is fast as well.
===============================================
But as for me, hungry oblivion
Devour me quick, accept my orison
My earnest prayers Which do importune thee,
With gloomy shade of thy still empery,
To vail both me and my poesy
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
quote:
Original post by nullguid
May be just use preprocessor defintions and compile three (or more) different executables? With all functions inline, of course... I think this is the fastest way...
I was hoping to avoid that approach, as it isn''t very flexible.
===============================================
But as for me, hungry oblivion
Devour me quick, accept my orison
My earnest prayers Which do importune thee,
With gloomy shade of thy still empery,
To vail both me and my poesy
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
quote:
Original post by zel
The key here is to operate on large arrays of data.
For single values, optimazation rarly makes sence.
Very easy to implement + very cache friendly.
The other approach is to code complex functions in the math library. Functions that thousands of cycles to execute.
This can be very time consuming and makes the library very specialized.
Obviously the first approach is simpler but not always (but very often) applicable.
Yeah, since the 3dnow and sse options are all SIMD, I was thinking of perhaps allowing the user to pass in a large array of data that all needs the same instruction performed on it.
Or even to take that one step further, pass in a queue of commands to be completed sequentially on an array of data, to take advantage of multiple 3dnow units on the processor (k6''s have 2, athlons have 3)
===============================================
But as for me, hungry oblivion
Devour me quick, accept my orison
My earnest prayers Which do importune thee,
With gloomy shade of thy still empery,
To vail both me and my poesy
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
Strategy sounds good. I was thinking the same thing : give the lib user access to all the functions for each processor and let him choose the right ones with a runtime processor check. This looks as fast as it can be done (no function overhead, and taking advantage of special instruction sets).
Btw, what actually happens when you execute Intel instructions on AMD, or the reversed ?
Btw, what actually happens when you execute Intel instructions on AMD, or the reversed ?
You need to reduce the granularity of your alogorithms to use the strategy pattern. Instead of writing funtions that perform an operation on one item, you make them perform it on an array of them.
If that''s not feasible then you need to program the entire algorithm and compile it multiple times. You could use some macros & preproccessor def''s to compile the same dll mutliple times; each time targetting a different processor. This way you can inline the particular CPU specific code, and decide which dll to use at run-time, which are all garunteed to contain the same functions.
Intel & AMD both have math & DSP libraries with code specific to thier chips. They are twice as fast as anything that I''ve written! I use the strategy pattern extensively in my spectrum analyzer program that I''ve been working on. I''m always processing an array of something, so it works out well.
The utilitity of the stragery method is immense - not only can you decide which algorithm to use at start-up, you can change it while the program''s running; download a new dll/COM object for the P4 / K8 and poof it''s available for the program to use! you don''t even need to exit!
Magmai Kai Holmlor
- The disgruntled & disillusioned
If that''s not feasible then you need to program the entire algorithm and compile it multiple times. You could use some macros & preproccessor def''s to compile the same dll mutliple times; each time targetting a different processor. This way you can inline the particular CPU specific code, and decide which dll to use at run-time, which are all garunteed to contain the same functions.
Intel & AMD both have math & DSP libraries with code specific to thier chips. They are twice as fast as anything that I''ve written! I use the strategy pattern extensively in my spectrum analyzer program that I''ve been working on. I''m always processing an array of something, so it works out well.
The utilitity of the stragery method is immense - not only can you decide which algorithm to use at start-up, you can change it while the program''s running; download a new dll/COM object for the P4 / K8 and poof it''s available for the program to use! you don''t even need to exit!
Magmai Kai Holmlor
- The disgruntled & disillusioned
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement