:Release Notes: - Agents files for opencode and claude - Skills for opencode and claude - 3d models with medical organs - Some textures :Detailed Notes: - :Testing Performed: - :QA Notes: - Looks like shit :) :Issues Addressed: TG-1
21 KiB
Agent Workflow Instruction: Unity-Based Laparoscopic Stereo Benchmark Simulator
1. Mission
Prepare a workflow for building a Unity-based 3D laparoscopic scene simulator for on-the-fly benchmarking of stereo-matching and 3D reconstruction algorithms.
The simulator must generate controllable synthetic laparoscopic stereo scenes with exact ground truth, allowing researchers to test how different algorithms behave under controlled surgical imaging conditions such as specularity, low texture, smoke, blood, tissue deformation, lighting variation, camera motion, tool occlusion, and stereo-baseline changes.
The main research contribution should not be “a synthetic dataset” alone. The contribution should be:
A controllable, reproducible, parameterized benchmarking environment for failure-mode analysis of stereo depth estimation and 3D reconstruction in laparoscopic scenes.
2. Core Research Question
Design the workflow around this question:
How do different stereo-matching and 3D reconstruction algorithms fail under specific laparoscopic imaging conditions, and can a controllable simulator expose these failure modes more systematically than fixed real-world datasets?
The system should support:
- real-time or batch generation of stereo laparoscopic image pairs;
- exact metric ground truth;
- automatic algorithm evaluation;
- controlled parameter sweeps;
- comparison with real laparoscopic stereo benchmarks.
3. Target Users
The workflow should assume the simulator will be used by:
- computer vision researchers;
- medical image analysis researchers;
- surgical robotics researchers;
- PhD/MSc students working on laparoscopic 3D reconstruction;
- developers benchmarking stereo algorithms before testing on real datasets.
4. Required Simulator Outputs
For every generated frame or sequence, the simulator must export the following:
left_rgb.png
right_rgb.png
depth_left.exr / depth_left.npy
depth_right.exr / depth_right.npy
disparity_left.npy
disparity_right.npy
surface_normals.npy
occlusion_mask.png
specularity_mask.png
instrument_mask.png
tissue_mask.png
semantic_labels.png
camera_intrinsics.json
camera_extrinsics.json
stereo_calibration.json
scene_parameters.json
The scene_parameters.json file must record all randomized and manually set parameters, including:
{
"baseline_mm": 5.0,
"focal_length_px": 800,
"working_distance_mm": 80,
"camera_pitch_deg": 0,
"camera_yaw_deg": 0,
"light_intensity": 1.0,
"light_angle_deg": 30,
"tissue_specularity": 0.6,
"tissue_roughness": 0.35,
"smoke_density": 0.2,
"blood_coverage": 0.1,
"deformation_amplitude_mm": 3.0,
"deformation_frequency_hz": 1.2,
"tool_occlusion_ratio": 0.25,
"motion_blur": false,
"noise_level": 0.01,
"random_seed": 12345
}
5. Core Scene Components
The simulator should contain the following modular scene components.
5.1 Stereo Laparoscope Camera
Implement a virtual stereo laparoscope with configurable:
- stereo baseline;
- focal length;
- sensor size;
- image resolution;
- lens distortion;
- convergence angle;
- near/far clipping planes;
- working distance;
- camera pose;
- camera trajectory.
The camera should support:
- fixed stereo capture;
- moving stereo camera;
- rectified output;
- non-rectified raw output;
- exportable calibration parameters.
5.2 Tissue and Organ Scene
Create at least one deformable laparoscopic tissue scene.
Minimum viable scene:
- one abdominal organ or tissue surface;
- non-planar geometry;
- wet/specular material;
- subtle tissue texture;
- deformable surface;
- local folds, ridges, and valleys.
The tissue shader should support:
- diffuse color variation;
- roughness variation;
- specular highlights;
- wetness;
- subsurface-like appearance;
- procedural vessels or texture;
- optional blood patches.
5.3 Surgical Instruments
Add laparoscopic instruments that can:
- enter and leave the field of view;
- occlude tissue;
- touch or deform tissue;
- create difficult object boundaries;
- generate metallic specular highlights.
At minimum, include:
- grasper;
- forceps or generic tool shaft;
- optional needle holder or scissors.
5.4 Lighting System
Implement controllable endoscopic lighting:
- point or spot lights attached near the camera;
- adjustable light intensity;
- adjustable falloff;
- adjustable direction;
- asymmetric illumination;
- overexposure;
- shadow regions.
Lighting must be logged in the scene parameter file.
5.5 Surgical Artifacts
Add optional artifacts:
- smoke or haze;
- blood;
- bubbles;
- blur;
- image noise;
- compression artifacts;
- vignetting;
- lens distortion;
- saturation;
- specular bloom.
Each artifact must be controllable independently.
6. Parameter Sweep Design
The simulator must support automatic benchmark generation through parameter sweeps.
Define parameter groups:
6.1 Camera Parameters
baseline_mm: [2, 4, 6, 8, 10]
working_distance_mm: [40, 60, 80, 100, 120]
focal_length_px: [600, 800, 1000, 1200]
camera_motion_speed: [static, slow, medium, fast]
6.2 Tissue Parameters
texture_level: [low, medium, high]
specularity: [none, low, medium, high]
roughness: [low, medium, high]
deformation_amplitude: [0, 1, 3, 5, 10] mm
deformation_frequency: [0, 0.5, 1.0, 2.0] Hz
6.3 Scene Difficulty Parameters
smoke_density: [0, 0.1, 0.3, 0.5]
blood_coverage: [0, 0.05, 0.15, 0.3]
tool_occlusion_ratio: [0, 0.1, 0.25, 0.5]
light_angle: [0, 15, 30, 45, 60] degrees
image_noise: [none, low, medium, high]
motion_blur: [false, true]
The workflow must include scripts for generating benchmark subsets such as:
baseline_sweep/
specularity_sweep/
smoke_sweep/
deformation_sweep/
tool_occlusion_sweep/
lighting_sweep/
combined_hard_cases/
7. Algorithm Benchmarking Interface
Design a plug-in interface so different stereo-matching algorithms can be benchmarked automatically.
Each algorithm adapter should accept:
left_rgb
right_rgb
camera_intrinsics
stereo_calibration
Each algorithm should output:
predicted_disparity
predicted_depth
optional_confidence_map
runtime_ms
The benchmark runner should support both:
- classical stereo methods;
- deep-learning stereo methods.
Example algorithm categories:
Block Matching
Semi-Global Matching
ELAS / LIBELAS-style methods
RAFT-Stereo
PSMNet
GC-Net
StereoNet
HITNet
laparoscopic-specific stereo methods
custom user algorithms
The agent should design a wrapper format such as:
python run_algorithm.py \
--algorithm raft_stereo \
--left path/to/left.png \
--right path/to/right.png \
--calib path/to/stereo_calibration.json \
--output path/to/prediction/
8. Evaluation Metrics
The benchmark must compute metrics at several levels.
8.1 Disparity Metrics
End-Point Error
Mean Absolute Disparity Error
Median Disparity Error
Bad-1px
Bad-2px
Bad-3px
Bad-5px
8.2 Depth Metrics
Depth RMSE in mm
Depth MAE in mm
Absolute Relative Error
Squared Relative Error
Threshold Accuracy
Scale Drift
8.3 3D Reconstruction Metrics
Point-cloud Chamfer Distance
Surface-to-surface distance
Normal consistency
Completeness
Accuracy
F-score at distance thresholds
8.4 Temporal Metrics
For sequences:
Temporal depth consistency
Frame-to-frame disparity jitter
Optical-flow-aware depth stability
Runtime stability
8.5 Region-Stratified Metrics
Metrics must also be computed separately for:
tissue regions
instrument regions
specular regions
blood regions
smoke-affected regions
shadow regions
occlusion boundaries
depth discontinuities
low-texture tissue
high-curvature tissue
This region-stratified evaluation is essential. The benchmark should reveal not only which algorithm is best on average, but also which visual or geometric condition causes failure.
9. Benchmark Reports
The workflow must generate automatic reports containing:
summary table per algorithm
metric curves across parameter sweeps
failure-case visualizations
error heatmaps
depth/disparity overlays
runtime comparison
ranking by scenario
ranking by robustness
per-region error breakdown
Each benchmark report should include plots such as:
Depth RMSE vs specularity
Bad-3px vs smoke density
Chamfer distance vs deformation amplitude
Runtime vs image resolution
Error near tool boundaries
Error in low-texture regions
The report should clearly identify:
best average algorithm
most robust algorithm
fastest algorithm
best algorithm under smoke
best algorithm under specular highlights
best algorithm near instruments
worst failure modes per algorithm
10. Validation Against Real Datasets
The simulator workflow must include a validation stage against real laparoscopic stereo datasets.
Use these real benchmarks as external references:
SCARED
SERV-CT
EndoAbS
Hamlyn laparoscopic datasets
StereoMIS
The agent should prepare a protocol for checking whether synthetic benchmark results correlate with real-data performance.
Important validation questions:
Do algorithms that perform well in simulation also perform well on SCARED?
Do specularity failures in simulation match failures on real laparoscopic tissue?
Do smoke and tool occlusion scenarios produce realistic ranking changes?
Does the simulator overestimate performance?
Which synthetic parameters best predict real-world errors?
The goal is not to prove the simulator fully replaces real data. The goal is to show that it provides controlled failure-mode analysis that complements real datasets.
11. Repository Structure
Prepare the project with the following structure:
laparo-stereo-sim/
│
├── unity_project/
│ ├── Assets/
│ ├── Packages/
│ ├── ProjectSettings/
│ └── README.md
│
├── benchmark_runner/
│ ├── algorithms/
│ │ ├── sgm/
│ │ ├── raft_stereo/
│ │ ├── psmnet/
│ │ └── custom_template/
│ │
│ ├── evaluation/
│ │ ├── disparity_metrics.py
│ │ ├── depth_metrics.py
│ │ ├── pointcloud_metrics.py
│ │ ├── temporal_metrics.py
│ │ └── region_metrics.py
│ │
│ ├── visualization/
│ │ ├── plot_metrics.py
│ │ ├── render_error_maps.py
│ │ └── generate_report.py
│ │
│ ├── configs/
│ │ ├── baseline_sweep.yaml
│ │ ├── specularity_sweep.yaml
│ │ ├── smoke_sweep.yaml
│ │ ├── deformation_sweep.yaml
│ │ └── combined_hard_cases.yaml
│ │
│ └── run_benchmark.py
│
├── datasets/
│ ├── synthetic/
│ ├── real/
│ │ ├── scared/
│ │ ├── serv_ct/
│ │ ├── endoabs/
│ │ └── hamlyn/
│ └── README.md
│
├── reports/
│ ├── figures/
│ ├── tables/
│ └── benchmark_summary.md
│
├── docs/
│ ├── simulator_design.md
│ ├── benchmark_protocol.md
│ ├── algorithm_interface.md
│ └── validation_protocol.md
│
├── scripts/
│ ├── export_unity_sequence.py
│ ├── convert_depth_to_disparity.py
│ ├── rectify_stereo_pair.py
│ └── prepare_real_datasets.py
│
├── environment.yml
├── requirements.txt
└── README.md
12. Development Phases
Phase 1: Literature and Requirement Review
Prepare a compact review of:
existing laparoscopic stereo datasets
existing surgical simulators
existing stereo depth algorithms
evaluation metrics for stereo and reconstruction
known laparoscopic stereo failure modes
Deliverables:
docs/literature_summary.md
docs/design_requirements.md
docs/benchmark_gap_analysis.md
Phase 2: Minimal Unity Prototype
Build a simple Unity scene with:
one deformable tissue surface
one stereo laparoscope
one controllable light source
left/right RGB export
depth export
camera calibration export
Deliverables:
unity_project/
sample_output/
docs/prototype_notes.md
Phase 3: Ground Truth Export
Add export of:
depth maps
disparity maps
surface normals
segmentation masks
occlusion masks
camera parameters
scene parameter logs
Deliverables:
sample_output/ground_truth/
scripts/convert_depth_to_disparity.py
docs/ground_truth_specification.md
Phase 4: Scene Realism and Artifacts
Add:
specular tissue material
low-texture tissue
blood patches
smoke
instrument occlusion
motion blur
noise
lighting variation
deformation
Deliverables:
unity_project/realistic_scene/
docs/artifact_controls.md
sample_sequences/
Phase 5: Benchmark Runner
Implement the algorithm interface and evaluation system.
Deliverables:
benchmark_runner/run_benchmark.py
benchmark_runner/algorithms/custom_template/
benchmark_runner/evaluation/
benchmark_runner/visualization/
Phase 6: Algorithm Integration
Integrate several baseline algorithms.
Minimum recommended set:
OpenCV StereoBM
OpenCV StereoSGBM
RAFT-Stereo or equivalent deep model
one laparoscopic-specific method if available
one custom placeholder adapter
Deliverables:
benchmark_runner/algorithms/
docs/algorithm_interface.md
reports/baseline_algorithm_results.md
Phase 7: Controlled Experiments
Run parameter sweeps.
Minimum experiments:
specularity sweep
smoke sweep
baseline sweep
deformation sweep
tool occlusion sweep
lighting sweep
combined hard-case sweep
Deliverables:
reports/parameter_sweep_results.md
reports/figures/
reports/tables/
Phase 8: Real Dataset Validation
Evaluate selected algorithms on real datasets and compare trends.
Deliverables:
datasets/real/
scripts/prepare_real_datasets.py
reports/real_dataset_validation.md
reports/sim_to_real_correlation.md
Phase 9: Final Research Packaging
Prepare final outputs:
README.md
docs/methodology.md
docs/benchmark_protocol.md
reports/final_benchmark_report.md
paper_outline.md
demo_video_plan.md
13. Technical Requirements
Unity Requirements
Use:
Unity 2022 LTS or newer
HDRP if photorealistic rendering is needed
Unity Perception or custom ground-truth export tools
C# scripts for camera control and scene randomization
deterministic random seeds
headless or batch rendering if possible
The Unity project must support:
manual scene preview
scripted batch generation
reproducible randomization
per-frame metadata export
config-driven experiments
Python Requirements
Use Python for:
benchmark orchestration
algorithm adapters
metric computation
plot generation
report generation
real dataset preparation
Recommended libraries:
numpy
opencv-python
scipy
matplotlib
pandas
open3d
torch
torchvision
tqdm
pyyaml
scikit-image
14. Reproducibility Requirements
Every experiment must be reproducible.
Each benchmark run must save:
git commit hash
Unity version
Python environment
algorithm version
algorithm parameters
simulator configuration
random seed
date and time
hardware information
runtime logs
Each output folder should contain:
config.yaml
scene_parameters.json
algorithm_parameters.json
metrics.csv
summary.json
visualizations/
15. Dataset Format
Use a clear dataset format:
sequence_0001/
│
├── left/
│ ├── 000000.png
│ ├── 000001.png
│ └── ...
│
├── right/
│ ├── 000000.png
│ ├── 000001.png
│ └── ...
│
├── depth_left/
│ ├── 000000.npy
│ ├── 000001.npy
│ └── ...
│
├── disparity_left/
│ ├── 000000.npy
│ ├── 000001.npy
│ └── ...
│
├── masks/
│ ├── tissue/
│ ├── instrument/
│ ├── specularity/
│ ├── occlusion/
│ └── smoke/
│
├── normals/
│ ├── 000000.npy
│ └── ...
│
├── calibration/
│ ├── intrinsics_left.json
│ ├── intrinsics_right.json
│ ├── extrinsics.json
│ └── stereo_calibration.json
│
└── metadata/
├── scene_parameters.json
├── random_seed.txt
└── config.yaml
16. Ground Truth Rules
The ground truth must be geometrically consistent.
Depth and disparity must satisfy:
disparity = focal_length_px * baseline_m / depth_m
The agent must verify:
depth values are metric
invalid pixels are masked
occluded regions are labeled
left/right consistency is documented
camera intrinsics are correct
baseline units are correct
coordinate systems are documented
Do not apply image-to-image realism translation unless the workflow includes a validation step proving that depth/disparity labels remain geometrically valid.
17. Failure-Mode Taxonomy
The benchmark should classify failures into categories:
specular highlight failure
low-texture failure
smoke or haze failure
blood contamination failure
tool-boundary failure
occlusion-boundary failure
deformation failure
motion-blur failure
overexposure failure
shadow failure
narrow-baseline failure
long-working-distance failure
For each algorithm, the report should answer:
Where does the algorithm fail?
Why does it fail?
Is the failure local or global?
Is the failure stable across frames?
Does confidence estimation detect the failure?
Does the failure also appear on real data?
18. Minimum Viable Prototype
The minimum useful prototype should include:
one synthetic organ surface
one stereo laparoscope
one light source
one specular tissue material
one moving tool
one deformation mode
RGB stereo export
depth export
disparity export
calibration export
OpenCV SGBM benchmark
one deep stereo model benchmark
metric report
error visualization
The minimum demonstration should show:
Algorithm A vs Algorithm B under increasing specularity
Algorithm A vs Algorithm B under increasing smoke
Algorithm A vs Algorithm B near tool occlusions
Algorithm A vs Algorithm B on deforming tissue
19. Expected Research Output
The final project should support a paper or thesis with the following structure:
Title
Abstract
Introduction
Related Work
Simulator Design
Benchmark Protocol
Ground Truth Generation
Algorithm Interface
Controlled Experiments
Real Dataset Validation
Results
Failure-Mode Analysis
Limitations
Conclusion
Possible title:
A Unity-Based Controllable Benchmark for Failure-Mode Analysis of Stereo Depth Estimation in Laparoscopic Scenes
Alternative title:
On-the-Fly Synthetic Benchmarking of Stereo Reconstruction Algorithms under Controlled Laparoscopic Imaging Conditions
20. Key Success Criteria
The workflow is successful if it produces:
a working stereo laparoscopic simulator
accurate ground-truth depth and disparity
reproducible parameter sweeps
automatic stereo algorithm evaluation
clear failure-mode analysis
comparison with real laparoscopic datasets
useful plots and benchmark reports
documentation sufficient for other researchers to reproduce results
The project should be considered weak if it only produces visually appealing synthetic images without rigorous ground truth, controlled experiments, or validation against real datasets.
21. Agent Deliverables Checklist
The agent must prepare:
[ ] literature summary
[ ] benchmark gap analysis
[ ] simulator architecture
[ ] Unity scene design
[ ] camera model specification
[ ] ground-truth export specification
[ ] dataset format specification
[ ] parameter sweep plan
[ ] algorithm plug-in interface
[ ] evaluation metric definitions
[ ] benchmark report template
[ ] real dataset validation plan
[ ] repository structure
[ ] development milestones
[ ] risk analysis
[ ] minimum viable prototype plan
[ ] final paper/thesis outline
22. Major Risks
The workflow must explicitly address these risks:
synthetic-to-real gap
unrealistic tissue appearance
incorrect ground-truth disparity
invalid labels after post-processing
lack of real-data validation
too much focus on rendering instead of benchmarking
too few baseline algorithms
weak novelty compared with existing synthetic datasets
poor documentation
non-reproducible experiments
For each risk, define a mitigation strategy.
23. Recommended Initial Milestone
The first milestone should be:
Build a minimal Unity stereo laparoscope scene that exports rectified left/right RGB images, metric depth, disparity, masks, and calibration, then benchmark OpenCV SGBM and one deep stereo model under a controlled specularity sweep.
This milestone should produce:
one synthetic sequence
one parameter sweep
two algorithm outputs
one metrics table
one error heatmap
one short technical report
Only after this milestone is complete should the project expand to smoke, blood, deformation, tools, and real-dataset validation.