kdtree_cuda#86
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds GPU-accelerated GICP (Generalized Iterative Closest Point) matching using KdTree on CUDA. The implementation provides a GPU-based alternative to the existing CPU-based GICP factor, allowing for faster point cloud registration on systems with CUDA support.
Changes:
- Introduces
KdTreeGPUclass for GPU-accelerated nearest neighbor search using KdTree data structure - Adds
IntegratedGICPFactorGPUfactor class for GPU-accelerated GICP matching cost computation - Implements CUDA kernels for KdTree search, correspondence computation, and GICP derivatives linearization
- Adds comprehensive test coverage for both KdTree GPU functionality and GICP GPU factor integration
- Updates demo and CI configuration to support the new GPU GICP factor
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
include/gtsam_points/ann/kdtree_gpu.hpp |
Header defining KdTreeGPU class with GPU nearest neighbor search interface |
include/gtsam_points/factors/integrated_gicp_factor_gpu.hpp |
Header defining IntegratedGICPFactorGPU class for GPU-accelerated GICP matching |
include/gtsam_points/factors/integrated_gicp_derivatives.cuh |
CUDA header for GICP derivatives computation utilities |
include/gtsam_points/cuda/kernels/kdtree.cuh |
CUDA kernel for KdTree nearest neighbor search on GPU |
include/gtsam_points/cuda/kernels/gicp_derivatives.cuh |
CUDA kernels for computing GICP derivatives and error |
include/gtsam_points/cuda/kernels/correspondence.hpp |
Header defining Correspondence structure for point matching |
src/gtsam_points/ann/kdtree_gpu.cpp |
CPU-side implementation of KdTreeGPU (tree construction, memory management) |
src/gtsam_points/ann/kdtree_gpu.cu |
GPU-side implementation of KdTree nearest neighbor search kernel |
src/gtsam_points/factors/integrated_gicp_factor_gpu.cpp |
Implementation of IntegratedGICPFactorGPU factor methods |
src/gtsam_points/factors/integrated_gicp_derivatives.cu |
Base implementation of GICP derivatives computation |
src/gtsam_points/factors/integrated_gicp_derivatives_inliers.cu |
Inlier correspondence management for GICP |
src/gtsam_points/factors/integrated_gicp_derivatives_linearize.cu |
GICP factor linearization on GPU |
src/gtsam_points/factors/integrated_gicp_derivatives_compute.cu |
GICP error computation on GPU |
src/test/test_kdtree.cpp |
Adds KdTreeGPU unit tests for nearest neighbor search accuracy |
src/test/test_matching_cost_factors.cpp |
Adds GICP_CUDA to integration tests for factor optimization |
src/demo/demo_matching_cost_factors.cpp |
Adds GICP_GPU option to matching cost factors demo |
CMakeLists.txt |
Adds new CUDA source files to build configuration |
.github/workflows/build.yml |
Removes CUDA 12.2 build, keeping 12.5 and 13.1 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| check_error << cudaMemcpyAsync(nodes, h_nodes.data(), sizeof(KdTreeNodeGPU) * num_nodes, cudaMemcpyHostToDevice, stream); | ||
| } | ||
|
|
||
| KdTreeGPU::~KdTreeGPU() { |
There was a problem hiding this comment.
Inconsistent stream parameter in cudaFreeAsync calls. In the destructor, cudaFreeAsync is called with nullptr for the stream parameter, but these allocations were done with a specific stream in the constructor (line 52-55). The stream should match the allocation stream or be synchronized before freeing. Using nullptr may cause undefined behavior if the allocation stream is still using the memory.
| KdTreeGPU::~KdTreeGPU() { | |
| KdTreeGPU::~KdTreeGPU() { | |
| // Ensure all GPU work that might use 'indices' or 'nodes' has completed | |
| check_error << cudaDeviceSynchronize(); |
There was a problem hiding this comment.
Fixed in the latest commit. Added cudaDeviceSynchronize() before the cudaFreeAsync calls to ensure all GPU work using those allocations has completed before freeing.
| if (stack_size > MAX_STACK_SIZE - 2) { | ||
| printf("kdtree stack overflow!!"); | ||
| } else if (cut_sq_dist < result.second) { | ||
| search_stack[stack_size].first = other_child; | ||
| search_stack[stack_size++].second = cut_sq_dist; | ||
| } | ||
|
|
||
| search_stack[stack_size].first = best_child; | ||
| search_stack[stack_size++].second = 0.0f; |
There was a problem hiding this comment.
Stack overflow handling is incomplete. When stack_size exceeds MAX_STACK_SIZE - 2, a warning is printed but execution continues, potentially writing beyond array bounds on lines 59-60. After printing the warning, the function should either return early with an error result or skip adding to the stack to prevent buffer overflow.
There was a problem hiding this comment.
Fixed in the latest commit. Added continue after the overflow warning so both the other_child and best_child pushes are skipped when the stack is near-full, preventing any writes beyond array bounds.
|
|
||
| /** | ||
| * @brief Create a unary GICP_GPU factor between a fixed target pose and an active source pose. | ||
| * @param targfixed_target_pose Fixed target pose |
There was a problem hiding this comment.
Typo in parameter name documentation. "targfixed_target_pose" should be "fixed_target_pose" to match the actual parameter name on line 60.
| * @param targfixed_target_pose Fixed target pose | |
| * @param fixed_target_pose Fixed target pose |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- kdtree.cuh: Add `continue` after stack overflow warning to prevent unconditional best_child push from writing beyond array bounds - kdtree_gpu.cpp: Add cudaDeviceSynchronize() before cudaFreeAsync calls in destructor to ensure GPU work on allocated memory has completed
| void IntegratedGICPDerivatives::update_inliers(int new_num_inliers) { | ||
| if (inlier_evaluation_point_gpu == nullptr) { | ||
| return; | ||
| } | ||
|
|
||
| // For now, we keep all correspondences without filtering | ||
| // The correspondence filtering can be added later with validity checks | ||
| this->num_inliers = new_num_inliers; | ||
| } |
| namespace { | ||
|
|
||
| /// @brief Kernel to compute source-target correspondence by KdTree NN search | ||
| struct kdtree_correspondence_kernel { | ||
| kdtree_correspondence_kernel( |
There was a problem hiding this comment.
Fixed in the latest commit. Removed the unused kdtree_correspondence_kernel anonymous-namespace struct and trimmed the includes down to only those actually required by issue_compute_error: thrust/iterator/transform_iterator.h, cub/device/device_reduce.cuh, gicp_derivatives.cuh, and stream_temp_buffer_roundrobin.hpp.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…rated_gicp_derivatives_compute.cu
No description provided.