Skip to content

kdtree_cuda#86

Open
koide3 wants to merge 17 commits into
masterfrom
kdtree_cuda
Open

kdtree_cuda#86
koide3 wants to merge 17 commits into
masterfrom
kdtree_cuda

Conversation

@koide3

@koide3 koide3 commented Dec 6, 2025

Copy link
Copy Markdown
Owner

No description provided.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds GPU-accelerated GICP (Generalized Iterative Closest Point) matching using KdTree on CUDA. The implementation provides a GPU-based alternative to the existing CPU-based GICP factor, allowing for faster point cloud registration on systems with CUDA support.

Changes:

  • Introduces KdTreeGPU class for GPU-accelerated nearest neighbor search using KdTree data structure
  • Adds IntegratedGICPFactorGPU factor class for GPU-accelerated GICP matching cost computation
  • Implements CUDA kernels for KdTree search, correspondence computation, and GICP derivatives linearization
  • Adds comprehensive test coverage for both KdTree GPU functionality and GICP GPU factor integration
  • Updates demo and CI configuration to support the new GPU GICP factor

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
include/gtsam_points/ann/kdtree_gpu.hpp Header defining KdTreeGPU class with GPU nearest neighbor search interface
include/gtsam_points/factors/integrated_gicp_factor_gpu.hpp Header defining IntegratedGICPFactorGPU class for GPU-accelerated GICP matching
include/gtsam_points/factors/integrated_gicp_derivatives.cuh CUDA header for GICP derivatives computation utilities
include/gtsam_points/cuda/kernels/kdtree.cuh CUDA kernel for KdTree nearest neighbor search on GPU
include/gtsam_points/cuda/kernels/gicp_derivatives.cuh CUDA kernels for computing GICP derivatives and error
include/gtsam_points/cuda/kernels/correspondence.hpp Header defining Correspondence structure for point matching
src/gtsam_points/ann/kdtree_gpu.cpp CPU-side implementation of KdTreeGPU (tree construction, memory management)
src/gtsam_points/ann/kdtree_gpu.cu GPU-side implementation of KdTree nearest neighbor search kernel
src/gtsam_points/factors/integrated_gicp_factor_gpu.cpp Implementation of IntegratedGICPFactorGPU factor methods
src/gtsam_points/factors/integrated_gicp_derivatives.cu Base implementation of GICP derivatives computation
src/gtsam_points/factors/integrated_gicp_derivatives_inliers.cu Inlier correspondence management for GICP
src/gtsam_points/factors/integrated_gicp_derivatives_linearize.cu GICP factor linearization on GPU
src/gtsam_points/factors/integrated_gicp_derivatives_compute.cu GICP error computation on GPU
src/test/test_kdtree.cpp Adds KdTreeGPU unit tests for nearest neighbor search accuracy
src/test/test_matching_cost_factors.cpp Adds GICP_CUDA to integration tests for factor optimization
src/demo/demo_matching_cost_factors.cpp Adds GICP_GPU option to matching cost factors demo
CMakeLists.txt Adds new CUDA source files to build configuration
.github/workflows/build.yml Removes CUDA 12.2 build, keeping 12.5 and 13.1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/gtsam_points/factors/integrated_gicp_factor_gpu.hpp Outdated
check_error << cudaMemcpyAsync(nodes, h_nodes.data(), sizeof(KdTreeNodeGPU) * num_nodes, cudaMemcpyHostToDevice, stream);
}

KdTreeGPU::~KdTreeGPU() {

Copilot AI Feb 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent stream parameter in cudaFreeAsync calls. In the destructor, cudaFreeAsync is called with nullptr for the stream parameter, but these allocations were done with a specific stream in the constructor (line 52-55). The stream should match the allocation stream or be synchronized before freeing. Using nullptr may cause undefined behavior if the allocation stream is still using the memory.

Suggested change
KdTreeGPU::~KdTreeGPU() {
KdTreeGPU::~KdTreeGPU() {
// Ensure all GPU work that might use 'indices' or 'nodes' has completed
check_error << cudaDeviceSynchronize();

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit. Added cudaDeviceSynchronize() before the cudaFreeAsync calls to ensure all GPU work using those allocations has completed before freeing.

Comment on lines +52 to +60
if (stack_size > MAX_STACK_SIZE - 2) {
printf("kdtree stack overflow!!");
} else if (cut_sq_dist < result.second) {
search_stack[stack_size].first = other_child;
search_stack[stack_size++].second = cut_sq_dist;
}

search_stack[stack_size].first = best_child;
search_stack[stack_size++].second = 0.0f;

Copilot AI Feb 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack overflow handling is incomplete. When stack_size exceeds MAX_STACK_SIZE - 2, a warning is printed but execution continues, potentially writing beyond array bounds on lines 59-60. After printing the warning, the function should either return early with an error result or skip adding to the stack to prevent buffer overflow.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit. Added continue after the overflow warning so both the other_child and best_child pushes are skipped when the stack is near-full, preventing any writes beyond array bounds.

Comment thread include/gtsam_points/cuda/kernels/kdtree.cuh

/**
* @brief Create a unary GICP_GPU factor between a fixed target pose and an active source pose.
* @param targfixed_target_pose Fixed target pose

Copilot AI Feb 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in parameter name documentation. "targfixed_target_pose" should be "fixed_target_pose" to match the actual parameter name on line 60.

Suggested change
* @param targfixed_target_pose Fixed target pose
* @param fixed_target_pose Fixed target pose

Copilot uses AI. Check for mistakes.
koide3 and others added 2 commits February 21, 2026 11:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- kdtree.cuh: Add `continue` after stack overflow warning to prevent
  unconditional best_child push from writing beyond array bounds
- kdtree_gpu.cpp: Add cudaDeviceSynchronize() before cudaFreeAsync calls
  in destructor to ensure GPU work on allocated memory has completed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comment thread include/gtsam_points/cuda/kernels/kdtree.cuh Outdated
Comment thread include/gtsam_points/cuda/kernels/kdtree.cuh Outdated
Comment on lines +50 to +58
void IntegratedGICPDerivatives::update_inliers(int new_num_inliers) {
if (inlier_evaluation_point_gpu == nullptr) {
return;
}

// For now, we keep all correspondences without filtering
// The correspondence filtering can be added later with validity checks
this->num_inliers = new_num_inliers;
}
Comment on lines +21 to +25
namespace {

/// @brief Kernel to compute source-target correspondence by KdTree NN search
struct kdtree_correspondence_kernel {
kdtree_correspondence_kernel(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit. Removed the unused kdtree_correspondence_kernel anonymous-namespace struct and trimmed the includes down to only those actually required by issue_compute_error: thrust/iterator/transform_iterator.h, cub/device/device_reduce.cuh, gicp_derivatives.cuh, and stream_temp_buffer_roundrobin.hpp.

koide3 and others added 2 commits June 12, 2026 09:26
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants