Advertisement

Compute Shader (d3d11 sm5) hangs when I hardcode a value

Started by September 27, 2018 01:27 PM
14 comments, last by Adam Miles 6 years, 4 months ago

Hello there. First post :).

I'm trying to perform cellular automata in the GPU without double buffering, I came across a very weird problem though that I could simplify into few lines of code:


    for (int x = 0; x < max; ++x)
    {
        for (int y = 1; y < max; ++y)
        {
            uint down = map[int2(x, y-1)];
            if (!down)
            {
                map[int2(x, y-1)] = 64;
                map[int2(x, y)] = 0;
            }
        }
    }

map is `RWTexture2D<uint> map : register(u0);`. It's dispatched with (1,1,1) and numthreads (1,1,1) so it loops the whole area (in this case 256*256) in a single thread (I know this isn't good but it's the simplest way to repro the issue I could come up with). It works fine when `max` is passed in a constant buffer, however, as soon as I hardcode that value simply by adding `int max = 256;` (and removing from the cbuffer) the shader simply crashes. I think it could be some compiler optimization gone wrong but I'm compiling debug shaders.

Do you guys have any idea what could be causing that? The shader compiles fine and there's no error whatsoever, it simply hangs forever when I hardcode that value. I am making sure that the value is exactly the same as passed in the constant. I have no idea what kind of nonsense is going on there :(

Thanks in advance

I assume if you hardcode the value, the compiler is stupid enough to unroll both loops, and it becomes just too much code. There should be a warning or error on this, but who knows...

What happens if you use unroll pragmas? (I don't remember the syntax for this).

Advertisement

@JoeJ thanks for your reply. I thought about that too, but if that's true then there's a bug in d3dcompiler because it shouldn't be applying any optimization whatsoever. In any case [unroll] only affects loops that are hardcoded and I thought that wasn't supposed to happen implicitly. However, that really seems to be the case so I'll see if I find some way to force loops to not be unrolled.
EDIT: After adding [loop] decorators the problem persists :(

1 hour ago, AlanGameDev said:

(and removing from the cbuffer)

Maybe that removal messed other things up? But i'm out of ideas, you should post the complete shader code. So far it sounds like a compiler bug and AFAIK there are some MS guys here who might wanna try to reproduce.

Nope, I don't think that's the case, here's the full shader:



RWTexture2D<uint> map : register(u0);

cbuffer cbuf : register(b0) {
    uint curLine;
};

[numthreads(1, 1, 1)]
void main(uint3 tid : SV_DispatchThreadID, uint gi : SV_GroupIndex, uint3 gid : SV_GroupID, uint3 gtid : SV_GroupThreadID)
{
    int w = 256;
    //w = curLine; <- this works
    [loop] for (int x = 0; x < w; ++x)
    {
        [loop] for (int y = 1; y < w; ++y)
        {
            uint down = map[int2(x, y-1)];
            if (!down)
            {
                map[int2(x, y-1)] = 64; // map[int2(x, y)]; <- this works
                map[int2(x, y)] = 0;
            }
        }
    }
}

That runs on a 256*256 texture with random 0/64 values.

As-is it doesn't work, however if you uncomment either of the comments it magically works again.

How do I ping the "MS guys"? Also, do you know some place to get DirectX support?

5 minutes ago, AlanGameDev said:

How do I ping the "MS guys"? Also, do you know some place to get DirectX support?

I was only in contact with AMD or NV. They have developer forums on their sites and usually they request a full repo of your project (which you can strip down of course.). Some driver releases later the bugs are fixed :)

So if you can test on multiple hardware you would know whom to contact, NV, AMD or MS.

But maybe some further responce here already helps...

Advertisement
3 hours ago, AlanGameDev said:

How do I ping the "MS guys"? Also, do you know some place to get DirectX support?

I felt my ears burning...

If you have a repro I'd be happy to take a look.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

@ajmiles ?

It's not gonna be very trivial to make a minimal repro because there are some dependencies like DirectXTK and SDL which I'm using for windowing and input.

When the loop area is small it works fine, so I really think the problem is loop unrolling which causes some massive code generation, but I'm not 100% sure, if I remove one of those assignments it works again (presumably because it's less code), but then if I keep the assignment and loop 1/4 of the area the problem persists, and that should unroll into less code so nothing is making much sense :P... and I don't think these problems should be happening when you pass `D3DCOMPILE_DEBUG` what's even more strange. I also tried using `Load()` instead of the indexer just in case, but that didn't make any difference.

@ajmiles I might look into it again soon and if you're OK with those dependencies I could make a simplified branch and share the bitbucket repo, but removing all the deps is too much work although I guess I could copypaste the windowing code from DXUT, I don't know.

In the mean time I'm using the cbuffer to store the size what's the ugliest workaround ever ?‍♂️ .

Thank you

@AlanGameDev Which version of the D3D compiler are you using? I'm worried by the fact you're still using DXUT might mean you're also using the DirectX SDK from June 2010 rather than the WIndows 10 SDK from 2018.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

@ajmiles Definitely not. I'm using the Windows SDK 10.0.17134.0 which I believe is the latest. I just mentioned DXUT because I'm using SDL for windowing and I believe you don't want that dependency so maybe I could copypaste some DXUT snippet to handle windowing using the Win API. I think DXUT is being maintained though, but I could be wrong of course.

This topic is closed to new replies.

Advertisement