Back to General and Gameplay Programming

a memset that doesn't suck

General and Gameplay Programming Programming

Started by Shannon Barber February 10, 2001 03:58 PM

39 comments, last by Shannon Barber 23 years, 11 months ago

122

February 13, 2001 05:12 PM

No I haven''t looked at the assembly listings but if they are the same, why did my version consistantly perform slightly better? Interesting... Now I''ll have to look...

Regards,
Jumpster

Regards,JumpsterSemper Fi

maq

122

February 15, 2001 04:39 AM

Yesterday then i profiled jumpsters code against the standard memset function, that lists the exact same way as NuffSaid said, i found out that it was a cache problem. Cause if i put jumpsters code right before the normal memset, the normal memset function was slightly faster and if i did it the opposit way with the normal memset first and then jumpsters code right after, jumpsters code was slightly faster.

/maq

- maq

NuffSaid

122

February 15, 2001 03:32 PM

Sounds interesting. Could someone explain to me what maq meant by the cache being an influence?

==========================================In a team, you either lead, follow or GET OUT OF THE WAY.

Shannon Barber

Author

1,684

February 15, 2001 07:18 PM

If you memcpy''ed a small amount of data (smaller than the L3/L2/L1 cache size) it''d be waiting there for the next memcpy - so which ever memset is called first gets a cache-miss penalty not applied to the second memcpy.

Copy ten megs, and if the difference goes away, it may very well be a cache-hit bonus on the second memcpy.

quote:

Notice the pushad/popad instructions are missing?

MSVC may add them for safety in debug mode, are they still inserted in a ''retail'' build?

Magmai Kai Holmlor
- The disgruntled & disillusioned

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

Anonymous

February 15, 2001 08:09 PM

quote:
Original post by Magmai Kai Holmlor

Notice the pushad/popad instructions are missing?

MSVC may add them for safety in debug mode, are they still inserted in a ''retail'' build?

I believe MSVC always automatically preserves certain registers for you (though not all) in the function prologue and epilogue when you use __asm blocks. So, in all probability, the pushad/popad combo will be redundant even when compiling in release mode.

– Bevan

Jumpster

122

February 16, 2001 05:30 AM

Cache miss penalty? But my numbers were not created by calling the two functions in sequence. I actaully created two seperate programs - Identical in every way except for the call that copies the memory. MemCopy32Bit() and memcpy(). The MemCopy32Bit was slightly faster in every execution of the program. If the code is the same, then why would that be?

Regards,
Jumpster

Regards,JumpsterSemper Fi

Seyedof

123

February 16, 2001 10:30 AM

hi
rep stosd is slow.
While a memory set or copy operaion you must take care of
alignment on DWORD as well as the cache omptimizations,
i think a loop which copies 16 bytes in each iteration
would be faster than just copying some dword. 16 comes from
the cache strip size.

--MFC (The Matrix Foundation Crew)

NuffSaid

122

February 16, 2001 07:21 PM

I'm no computer architecture guru, but I'd just like to know how you're going to copy 16 bytes at each iteration? AFAIK, that's only possible if you've got a 128 bit bus, which many platforms don't have, right?

Edited by - NuffSaid on February 17, 2001 5:50:18 AM

==========================================In a team, you either lead, follow or GET OUT OF THE WAY.

Shannon Barber

Author

1,684

February 16, 2001 08:18 PM

Good point, is there a write cache, or can writes be pipelined?

Magmai Kai Holmlor
- The disgruntled & disillusioned

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

NickB

146

February 17, 2001 04:48 AM

if you want to get rid of the calling overhead you can use ''__declspec(naked)'' for the function, then MSVC does not add a prolog or epilog to the function...or you could just write the function as an asm file, and set custom build rules to compile it with masm (or the like).

And now a quick question, does anyone know how the AMD ''prefetch'' instruction works...I just get errors if I try to assemble ''prefetch eax'' (or whatever)...and would this possibly help in this situation (ie with writes)

a memset that doesn't suck

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

a memset that doesn't suck

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines