Questions about defragment GPU buffer

Hey Guys,

Recently I have a lot of fun playing with GPU atomic, compute shader and dispatchIndirect. And I have a tricky situation:

================================Backgroud====================================

In my project I was doing volume rendering, and to speed that up, I partitioned my volume into blocks(which contain 8^3 voxels), and have a GPU buffer contain idx of blocks which are not empty. So here I got a large buffer contains non-empty block coordinates I call it occupiedBlocksBuf

My scene in that volume is dynamic, so during each pass, I have a compute shader update the whole volume (It was fast, it only update voxels actually changed), so there will be blocks which are previously empty now become non-empty, and also there will be blocks which are previously non-empty now become empty.

For new empty blocks which are previously non-empty, I also maintain a buffer called freedSlotsBuf, so when a block get freed, the compute shader will first find its idx in occupiedBlocksBuf, and write FREED_FLAG into that location, and then append that idx into freedSlotsBuf

For new non-empty blocks which are previously empty, my compute shader will first get available slots from freedSlotsBuf and write the coordinate of that newly non-empty block's coordinate into that slot in occupiedBlocksBuf, so basically filling freedslots in occupiedBlocksBuf first, and then if there are no more freed slots, I append the block's coordinate to the end of occupiedBlocksBuf.

So that's the basic idea. And as you may notice, the 'size' of occupiedBlocksBuf will never decrease, and as my program running, in some cases that buffer will become fragmented (lots of slots get freed), which are bad....

===============================Problem=======================================

I then write a defragmentation shader (I have freedSlotsBuf told me how many freedslot I've got and where are they, so I got everything I need), and use dispatchIndirect to defragment occupiedBlocksBuf. Indirect param are written by compute shader, and based on the size of freedSlotsBuf, so when the size of freedSlotsBuf is smaller than the threshold, Indirect param will be 0,1,1 which result in empty thread.

However, by doing defragmentation the way I described, I have to call the following code every frame on CPU side even though I know 99% of the time, it will map to empty GPU working thread.

void
TSDFVolume::OnDefragment(ComputeContext& cptCtx)
{
    GPU_PROFILE(cptCtx, L"Defragment");
    cptCtx.SetPipelineState(_cptBlockQDefragment);
    cptCtx.SetRootSignature(_rootsig);
    cptCtx.TransitionResource(_occupiedBlocksBuf, UAV);
    cptCtx.TransitionResource(_freedFuseBlocksBuf, csSRV);
    cptCtx.TransitionResource(_jobParamBuf, csSRV);
    cptCtx.TransitionResource(_indirectParams, IARG);
    Bind(cptCtx, 2, 0, 1, &_occupiedBlocksBuf.GetUAV());
    Bind(cptCtx, 2, 1, 1, &_freedFuseBlocksBuf.GetCounterUAV(cptCtx));
    Bind(cptCtx, 3, 0, 1, &_freedFuseBlocksBuf.GetSRV());
    Bind(cptCtx, 3, 1, 1, &_jobParamBuf.GetSRV());
    cptCtx.DispatchIndirect(_indirectParams, 48);
}

There will be UAV transitions, PSO changes, so seems definitly non-zero CPU/GPU cost. Which looks sub-optimal....

to avoid that, we can let CPU decide whether to call OnDefragment or not. But that require we read back freeSlotBuf size from GPU in at some frequency, which may have even worse perf impact....

So any suggestions? or are there existing better ways to do these GPU buf defragmentation?

Thanks in advance.

P.S. When UAV barrier actually have non-zero GPU cost? I feel like if I don't have any read/write between two UAV barriers on the same resource, then second barrier should take no GPU time, right? (please correct me if I got that wrong)

Questions about defragment GPU buffer

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112