Hey Guys,
Recently I encountered a rendering bug which only shows up when debuglayer is off, and later even I fixed all validation errors/warnings (including GPU-Based Validation ones) this bug still exist when debuglayer is off, and it happens on all GPUs available to me (GTX680m, GTX1080) but will not happen on warpdevice*
After days of struggling I found two ways to 'solve' those bug: 1. replacing one particular split barrier with normal barrier; 2. break one cmdlist into 2 and submit them to GPU in order... All these 'solutions' doesn't make any sense to me, and I am almost run out of ideas. Please see here and there.
So I trimmed my project to get rid of Kinect dependencies (it uses Kinect color and depth sensor image as input) and make a repo for anyone who are interested or are willing to test/help (thanks!)
Here is the repo: https://github.com/pengliu916/BugRepo.git
To successful compile and run the code you need DX12 capable GPU and need windows sdk 10.0.14393.0, and to get rid of GPU-Based Validation warning/errors, you GPU need to support typedload.
(The following paragraph is not necessary for the bug, but just in case someone need more information)
This project originally will use depthMap from Kinect Depth sensor to create/update a TSDF (truncated sign distance field) volume to reconstruct 3D model of what Kinect sees. To maintain this dynamic sparse volume efficiently, I use blocks to avoid update each voxel every frame. (instead of checking each voxel against the depthmap, I first check each block (contains 8^3 voxels) aginst depthmap, and then in the next pass do voxel-depthmap check only for voxels in needed blocks....... The bug is in this block update routine. And to avoid depending on Kinect, I modified the project to use GPU generated depth map as input (which is a sphere rotating with a radius in foreground with a wall in the background, and to make extremely slow warp device also generate reasonable result, I made the animation based on frame not time, also I change the volume reso to 64^3. You could change it to 512^3, and it will run 70fps on GTX680m, but remember to make voxel size small to see the whole picture, and you will kinda know this is data corruption bug) You could also press the 'ResetVolume' button to reset related resource. But all other features show up on the right panel may not work or even cause crash since I get rid of a lot important components in a very short time...
If you directly compile and run the project you will see the following
So you see the sphere is broken, and background wall is broken due to wrong Block update (and I visualized wrong result block (missing block) as small red box, and correct block as green big box. So ideally you should not see any small red boxes, but only green big box appear and disappear as the sphere moves as the following HWDeviceReference and WarpDeviceReference
*I lied, warp device won't give you expected result (though it still didn't give you this bug) unless you uncheck the circled checkbox, but it's totally unrelated: that's for the rendering part, but the bug is in volume updating part.
So to 'solve' the bug, there are three ways:
1. Change Core::g_config.enableDebuglayer to true in file KinectVisualizer.cpp line 180 make sure you are in debug build (debug layer will be disabled in other build)
void KinectVisualizer::OnConfiguration() { Core::g_config.FXAA = false; Core::g_config.warpDevice = false; Core::g_config.enableDebuglayer = false; // change this to true will enable debug layer Core::g_config.enableGPUBasedValidationInDebug = false; Core::g_config.swapChainDesc.Width = _width; Core::g_config.swapChainDesc.Height = _height; Core::g_config.swapChainDesc.BufferCount = 5; Core::g_config.passThroughMsg = true; Core::g_config.useSceneBuf = false; }
This will enable debug layer and under Debug build the bug will magically disappear (I don't know why...)
2. Comment out line 1215 in file TSDFVolume\TSDFVolume.cpp
cptCtx.DispatchIndirect(_indirectParams, 0); } //====================================================================== // Code Part A // // The following line will cause the bug if 'Code Part B' is commented BeginTrans(cptCtx, _occupiedBlocksBuf, UAV); // this line // Add blocks to UpdateBlockQueue from DepthMap Trans(cptCtx, _fuseBlockVol, UAV); Trans(cptCtx, *pDepthTex, psSRV | csSRV); Trans(cptCtx, *pWeightTex, csSRV); Trans(cptCtx, _updateBlocksBuf, UAV);
This will remove the split transition (start one, and the end one will automatically become a normal transition). This will 'fix' the bug, I also don't know why...
3. Uncomment 3 lins of code from 1259 in TSDFVolume\TSDFVolume.cpp
//====================================================================== // Code Part B // // The following 3 lines is one work around the bug if // 'Code Part A' is uncommented //cptCtx.Flush(); //cptCtx.SetRootSignature(_rootsig); //_UpdateAndBindConstantBuffer(cptCtx); // Update voxels in blocks from UpdateBlockQueue and create queues for // NewOccupiedBlocks and FreedOccupiedBlocks Trans(cptCtx, _occupiedBlocksBuf, UAV); Trans(cptCtx, _renderBlockVol, UAV); Trans(cptCtx, _fuseBlockVol, UAV); Trans(cptCtx, _newFuseBlocksBuf, UAV);
This will end recording current cmdlist, and flush all cached resource barrier and submit to GPU for execution and grab a new cmdlist from cmdlist pool for the following GPU calls, and set back rootsig and constant buffer. This also 'fixed' the bug, and I don't know why.....
Please let me know if you have any trouble compile and running the code, and any comments or any words are appreciated.
Thanks in advance