Hey Guys,
Assume we have a job queue buffer, and we have two compute shader (cs1, cs2) adding jobs into this job queue buffer by using IncrementCounter function on its counter (DX12). So to make this work properly, we can first dispatch cs1 and then set a barrier for job queue buffer's counter buffer, and then dispatch cs2, to avoid race condition where cs1 is still updating job queue, while cs2 starts to run.
But I was thinking, how IncrementCounter works under the hood. If the atomicity is guaranteed by GPU hardware, I think we probably don't need the barrier between dispatch cs1 and cs2. But I have no idea how GPU guaranteed the atomicity of IncrementCounter, so it would be nice if someone could talk about this when we have overlapped cs using same IncrementCounter on same buffer. Also does the same concept apply to all atomic operations (if we have overlapped compute shader using atomic operation update the same buffer, do we have to serialize those compute shaders to ensure correctness?)
Also what's the difference between Append/Consume buffer and StructuredBuffer with IncrementCounter/DecrementCounter as stack style buffer? I feel they are essentially the same thing (I may be wrong though).
Thanks