1.) When is your math lib gona bee over ? 2.) You sent me an code couple of days ago and I hawe a question. Is it posibal to rewrite your code in asm () blocks widouth any preformance loses ? becouse I can hardly understand any of those #define __64xor (yust a gues - don''t hawe the code open) functions - the asm is much more easy to understand.
And one last thing - in your post above you said :
quote:
Proof : - my 3DNow version of Quaternion mul : 18 cycles - my 387 version of the quaternion mul : 28 cycles. I don''t have a SSE machine atm to test but I suppose it would give 12 cycles.
And I get : ------------------------------------------- 3DNow! | 21.076 | + 3 than you | 387 | 29.598 | + 1.5 than you | -------------------------------------------
so does the value of cycles vary on diferent computers ?
@Red Drake 1.) When is your math lib gona bee over ?
1.0 is not for now, but the project itself, say some 0.x version will be public, on a CVS probably, very soon. To me 1.0 should already be tested and on many platforms since it's one of the shock charateristics of the lib. Write totally portable and ultra high perfs code, write once.
Currently I am slowed on my gcc version, because I don't have a good and reliable debugger at the moment. I'll probably install Linux next week on another drive.
And I also wait for an answer of someone who had already started a math lib project on Sourceforge. I hope he'll help me with the site and CVS stuff. My plan is to release the different layers of the components one by one, to help avoid confusion in the potential contributors. So that they don't feel overwhelmed by the quantities of files, and tricks in the lib.
Is it possible to rewrite your code in asm () blocks without any preformance loses ?
On gcc maybe in some cases but it would be far more tedious than using my VSIMD. with Visual you would certainly be obliged to write pure functions, with 8 cycles lost in a function call overhead. Thus, in general, no, that's the advantage of using intrisics.
Most of my lib functions are macro functions or inline functions based on C 'intrisics' like add_2f(), even if some of these 'intrisics' map to asm instructions inlined. Thus the compiler can : - reorder instructions to increase scheduling and hide latencies. - optimize register allocation across intrisics. - optimize by eliminating some redundant code in global contexts
All these things won't be done if you write long blocks of handwritten asm. You see my VSIMD intrisics enable : - portability across hardwares, sytems and compilers. - use human asm expert intelligence - and blends it with the real strengths of the C/C++ compiler.
But in some cases, the compiler is too weak. For instance the Visual 6.0 CC (.Net too seems) is bad at compiling the intel intrisics library. So sometimes it's better write a function in 'native' asm and to call the function. For the quaternion multiplication Visual gives 30 cycles with intrisics, and 24 with in a called asm function.
So do the value of cycles vary on diferent computers ?
Yes it does (latencies depend on the processor and memory accesses on the chipset). Still you have the same delta between the two versions and 3DNow remains significantly faster (roughly +30-40%).
In many cases, the latencies (cycles) tend to grow on newest machines. Compare K6 (most 3DNow : 2 cycles) and Athlon (most 3DNow : 3-4 cycles) documents for instance. But of course they are still faster since they usually come with higher frequencies, more parallelism (hard to fill) and better chipsets.
Or else a small compiler option has changed between the project I benched and the one I sent to you, I have been modifying it constanly.
becouse I can hardly understand any of those #define __64xor (yust a gues - don't hawe the code open) functions - .
Well because you don't have to put your nose in such codes normally. Only the most qualified contributors will have to. I did not give you my library code, this was just an independant project, where everything is mixed together, and there are a lot of preprocessing stuff, compiler directives, inline asm, etc... interleaved. I used it to test many things about gcc so that I could see how to make the gcc version. which would have been impossible directly in my library.
As a user or contributors to the higher layers, all you have to know is the user headers. The private headers, that implement the inlines and macros of the lower layers will probably be given as precopmiled files. This way no chance any end user has the same confusion and reaction as you.
the asm is much more easy to understand
Then, as a user , with the docs and header files that explain it all, I doubt that writing :
A = add_2f(B, C); // C is more difficult to understand and write than :
movq mm0, C pfadd mm0, B mov A, mm0
Anyway in C++ it becomes : A=B+C; // C++
Off topic : and good luck for Croatia. They'd better win 3 points now cause else ... specially if France does not get the three points against England.
[edited by - Charles B on June 13, 2004 10:44:10 AM]
quote:
becouse I can hardly understand any of those #define __64xor (yust a gues - don''t hawe the code open) functions - .
Well because you don''t have to put your nose in such codes normally. Only the most qualified contributors will have to. I did not give you my library code, this was just an independant project, where everything is mixed together, and there are a lot of preprocessing stuff, compiler directives, inline asm, etc... interleaved. I used it to test many things about gcc so that I could see how to make the gcc version. which would have been impossible directly in my library.
As a user or contributors to the higher layers, all you have to know is the user headers. The private headers, that implement the inlines and macros of the lower layers will probably be given as precopmiled files. This way no chance any end user has the same confusion and reaction as you.
You see I haw a name for people like this - "normal people". Any normal person woud do the stuff you said - use the header and ignore other things - btuh I stated before : 1) I am nearly learning - and not coding for som high end engine 2) I want to know this - in case you decide that your library is too good to be FREE and decide to sell it - what then ? (I woud probably do this in your place ) 3) I am a very courios person and like chalenges
quote:
Off topic : and good luck for Croatia. They''d better win 3 points now cause else ... specially if France does not get the three points against England.