Based on Dual Robot Manipulators, active 3D reconstruction with TSDF on Table
This README explains how to use the provided Python script to build a TSDF-based 3D map from an RGB-D sequence and camera poses, using GPU acceleration via Open3D’s Tensor VoxelBlockGrid.
This code provides:
- Camera intrinsic loading from
cam_params.json - Pose loading from
traj.txt(supports both 9+3 and 16-float formats) - RGB & depth frame pairing with name-based ordering
- Optional ROI mask input per frame
- GPU-based TSDF fusion using Open3D Tensor
VoxelBlockGrid - Mesh or point-cloud extraction from the TSDF volume
When ROI masks are enabled, only depth values inside the masked region are fused into TSDF, effectively restricting reconstruction to selected objects or regions.
- Python 3.8 – 3.11
# 1) Create and activate a virtual environment
conda create -n tsdf_env python=3.10 -y
conda activate tsdf_env
# 2) Install required packages
pip install numpy scipy tifffile imageio open3d pillow matplotlib
**Dependency roles:**
* `numpy` — array operations
* `scipy` — rotation utilities
* `tifffile` — reading `.tif` depth
* `imageio` — RGB reading
* `open3d` — TSDF, visualization, mesh/point cloud
* `Pillow` — loading `.jpg/.png`
* `matplotlib` — debugging plots
### 2.3 CUDA Requirements
The TSDF module uses:
```python
device = o3c.Device("CUDA:0")
vbg = create_gpu_tsdf(..., device=device)You need:
- NVIDIA GPU
- CUDA drivers installed
- CUDA-enabled Open3D
If you have no GPU:
device = o3c.Device("CPU:0")Performance will be slower.
scene_dir = "/path/to/scene"
json_path = os.path.join(scene_dir, "cam_params.json")
pose_txt = os.path.join(scene_dir, "traj.txt")
rgb_dir = os.path.join(scene_dir, "results")
depth_dir = os.path.join(scene_dir, "results")Suggested directory structure:
scene/
├── cam_params.json
├── traj.txt
├── results/
│ ├── frame0000.jpg
│ ├── frame0001.jpg
│ ├── depth0000.png
│ ├── depth0001.png
│ └── ...
└── masks/ # OPTIONAL
├── mask0000.png
├── mask0001.png
└── ...
File matching used in script:
rgb_paths = sorted(glob.glob(os.path.join(rgb_dir, "frame*.jpg")))[::stride]
depth_paths = sorted(glob.glob(os.path.join(depth_dir, "depth*.png")))[::stride]Modify this if your naming is different.
Example:
{
"camera": {
"fx": 1000.0,
"fy": 1000.0,
"cx": 640.0,
"cy": 360.0,
"scale": 1000.0,
"w": 1280,
"h": 720
}
}Meaning:
fx, fy, cx, cy— intrinsic parametersscale— depth scaling; real depth = pixel_value / scalew, h— image size
Loaded via:
K, scale, (w, h) = load_camera_intrinsics(json_path)Used by:
poses = load_poses_mat16(pose_txt)Format per line:
t00 t01 t02 t03 t10 t11 t12 t13 t20 t21 t22 t23 t30 t31 t32 t33
Also supports 9+3 format:
r11 r12 r13 r21 r22 r23 r31 r32 r33 tx ty tz
TSDF path uses the 16-float version by default.
Assumption:
- Pose represents world_T_cam
- Solver uses its inverse for TSDF:
pose_wc = np.linalg.inv(cam_T_world)Example from main script:
depth_raw = np.array(Image.open(depth_paths[i])).astype(np.uint16)
depth_np = depth_rawRequirements:
- Depth stored in uint16
- Actual depth (meters) =
depth_np / depth_scale depth_maxis defined in meters
- Shape:
(H, W) - Type:
uint8 - Semantics:
mask == 1→ valid ROImask == 0→ ignored region
- Mask resolution must match RGB & depth resolution.
use_masks = TrueWhen enabled, masks are loaded and applied per frame.
if use_masks:
mask_dir = os.path.join(scene_dir, "masks")
mask_paths = sorted(glob.glob(os.path.join(mask_dir, "mask*.png")))[::stride]The number of masks must match RGB / depth frames.
Inside the main streaming loop:
rgb_np = np.array(Image.open(rgb_paths[i])) # (H, W, 3), uint8
depth_raw = np.array(Image.open(depth_paths[i])).astype(np.uint16)
depth_np = depth_raw
if use_masks:
mask = np.array(Image.open(mask_paths[i])).astype(np.uint8)
depth_np[mask == 0] = 0 # <-- ROI filtering happens hereThis ensures that only masked pixels contribute to TSDF fusion.
The ROI logic is fully handled at the depth-image level.
load_poses_txt— 12-float R+t → 4×4load_poses_mat16— 16-float → 4×4
rgbd_to_pcd(...)
integrate_sequenceintegrate_sequence_streaming
load_camera_from_Kcreate_gpu_tsdfintegrate_frame_gpuextract_open3d_pcd_from_tsdf
- Load intrinsics
- Load poses
- Build TSDF VoxelBlockGrid
- Loop through frames and integrate
- Extract mesh and visualize
Install dependencies (Section 2).
Ensure:
cam_params.jsonexiststraj.txtuses the 16-float pose formatresults/containsframe*.jpganddepth*.png
if __name__ == "__main__":
scene_dir = "/your/absolute/path"
json_path = os.path.join(scene_dir, "cam_params.json")
K, scale, _ = load_camera_intrinsics(json_path)
pose_txt = os.path.join(scene_dir, "traj.txt")
rgb_dir = os.path.join(scene_dir, "results")
depth_dir = os.path.join(scene_dir, "results")TSDF hyperparameters:
voxel_size = 0.02
sdf_trunc = 0.08
depth_max = 4.0
stride = 1
max_steps = NoneTSDF Voxel Grid:
vbg = create_gpu_tsdf(
voxel_size=voxel_size,
block_resolution=8,
block_count=100000,
device=device,
)python tsdf_tsdf_mapping.pyPipeline executed:
-
Load intrinsics
-
Load & subsample poses
-
Create TSDF grid
-
For each RGB-D frame:
- Load RGB
- Load depth
- Load pose
- Integrate
-
Extract mesh → visualize
If you want a coordinate frame, uncomment:
axis = o3d.geometry.TriangleMesh.create_coordinate_frame(
size=0.5, origin=[0, 0, 0]
)Otherwise use:
o3d.visualization.draw_geometries([mesh])- Smaller → higher detail, higher memory
- Larger → faster, coarse
- Recommended:
0.01–0.05
- Typically
3 * voxel_size - Too large → blurred edges
- Too small → missing surfaces
- Valid depth range
- Indoor scenes: 3–6m
- 1 = use all frames
-
1 = skip frames (faster)
block_resolution— size of blockblock_count— maximum allocated blocks- Large scenes require larger
block_count
Use:
integrate_sequence(...)or
integrate_sequence_streaming(...)