:Release Notes: - Agents files for opencode and claude - Skills for opencode and claude - 3d models with medical organs - Some textures :Detailed Notes: - :Testing Performed: - :QA Notes: - Looks like shit :) :Issues Addressed: TG-1
1037 lines
21 KiB
Markdown
1037 lines
21 KiB
Markdown
# Agent Workflow Instruction: Unity-Based Laparoscopic Stereo Benchmark Simulator
|
|
|
|
## 1. Mission
|
|
|
|
Prepare a workflow for building a **Unity-based 3D laparoscopic scene simulator** for **on-the-fly benchmarking of stereo-matching and 3D reconstruction algorithms**.
|
|
|
|
The simulator must generate controllable synthetic laparoscopic stereo scenes with exact ground truth, allowing researchers to test how different algorithms behave under controlled surgical imaging conditions such as specularity, low texture, smoke, blood, tissue deformation, lighting variation, camera motion, tool occlusion, and stereo-baseline changes.
|
|
|
|
The main research contribution should not be “a synthetic dataset” alone. The contribution should be:
|
|
|
|
> A controllable, reproducible, parameterized benchmarking environment for failure-mode analysis of stereo depth estimation and 3D reconstruction in laparoscopic scenes.
|
|
|
|
---
|
|
|
|
## 2. Core Research Question
|
|
|
|
Design the workflow around this question:
|
|
|
|
> How do different stereo-matching and 3D reconstruction algorithms fail under specific laparoscopic imaging conditions, and can a controllable simulator expose these failure modes more systematically than fixed real-world datasets?
|
|
|
|
The system should support:
|
|
|
|
* real-time or batch generation of stereo laparoscopic image pairs;
|
|
* exact metric ground truth;
|
|
* automatic algorithm evaluation;
|
|
* controlled parameter sweeps;
|
|
* comparison with real laparoscopic stereo benchmarks.
|
|
|
|
---
|
|
|
|
## 3. Target Users
|
|
|
|
The workflow should assume the simulator will be used by:
|
|
|
|
* computer vision researchers;
|
|
* medical image analysis researchers;
|
|
* surgical robotics researchers;
|
|
* PhD/MSc students working on laparoscopic 3D reconstruction;
|
|
* developers benchmarking stereo algorithms before testing on real datasets.
|
|
|
|
---
|
|
|
|
## 4. Required Simulator Outputs
|
|
|
|
For every generated frame or sequence, the simulator must export the following:
|
|
|
|
```text
|
|
left_rgb.png
|
|
right_rgb.png
|
|
depth_left.exr / depth_left.npy
|
|
depth_right.exr / depth_right.npy
|
|
disparity_left.npy
|
|
disparity_right.npy
|
|
surface_normals.npy
|
|
occlusion_mask.png
|
|
specularity_mask.png
|
|
instrument_mask.png
|
|
tissue_mask.png
|
|
semantic_labels.png
|
|
camera_intrinsics.json
|
|
camera_extrinsics.json
|
|
stereo_calibration.json
|
|
scene_parameters.json
|
|
```
|
|
|
|
The `scene_parameters.json` file must record all randomized and manually set parameters, including:
|
|
|
|
```json
|
|
{
|
|
"baseline_mm": 5.0,
|
|
"focal_length_px": 800,
|
|
"working_distance_mm": 80,
|
|
"camera_pitch_deg": 0,
|
|
"camera_yaw_deg": 0,
|
|
"light_intensity": 1.0,
|
|
"light_angle_deg": 30,
|
|
"tissue_specularity": 0.6,
|
|
"tissue_roughness": 0.35,
|
|
"smoke_density": 0.2,
|
|
"blood_coverage": 0.1,
|
|
"deformation_amplitude_mm": 3.0,
|
|
"deformation_frequency_hz": 1.2,
|
|
"tool_occlusion_ratio": 0.25,
|
|
"motion_blur": false,
|
|
"noise_level": 0.01,
|
|
"random_seed": 12345
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Core Scene Components
|
|
|
|
The simulator should contain the following modular scene components.
|
|
|
|
### 5.1 Stereo Laparoscope Camera
|
|
|
|
Implement a virtual stereo laparoscope with configurable:
|
|
|
|
* stereo baseline;
|
|
* focal length;
|
|
* sensor size;
|
|
* image resolution;
|
|
* lens distortion;
|
|
* convergence angle;
|
|
* near/far clipping planes;
|
|
* working distance;
|
|
* camera pose;
|
|
* camera trajectory.
|
|
|
|
The camera should support:
|
|
|
|
* fixed stereo capture;
|
|
* moving stereo camera;
|
|
* rectified output;
|
|
* non-rectified raw output;
|
|
* exportable calibration parameters.
|
|
|
|
### 5.2 Tissue and Organ Scene
|
|
|
|
Create at least one deformable laparoscopic tissue scene.
|
|
|
|
Minimum viable scene:
|
|
|
|
* one abdominal organ or tissue surface;
|
|
* non-planar geometry;
|
|
* wet/specular material;
|
|
* subtle tissue texture;
|
|
* deformable surface;
|
|
* local folds, ridges, and valleys.
|
|
|
|
The tissue shader should support:
|
|
|
|
* diffuse color variation;
|
|
* roughness variation;
|
|
* specular highlights;
|
|
* wetness;
|
|
* subsurface-like appearance;
|
|
* procedural vessels or texture;
|
|
* optional blood patches.
|
|
|
|
### 5.3 Surgical Instruments
|
|
|
|
Add laparoscopic instruments that can:
|
|
|
|
* enter and leave the field of view;
|
|
* occlude tissue;
|
|
* touch or deform tissue;
|
|
* create difficult object boundaries;
|
|
* generate metallic specular highlights.
|
|
|
|
At minimum, include:
|
|
|
|
* grasper;
|
|
* forceps or generic tool shaft;
|
|
* optional needle holder or scissors.
|
|
|
|
### 5.4 Lighting System
|
|
|
|
Implement controllable endoscopic lighting:
|
|
|
|
* point or spot lights attached near the camera;
|
|
* adjustable light intensity;
|
|
* adjustable falloff;
|
|
* adjustable direction;
|
|
* asymmetric illumination;
|
|
* overexposure;
|
|
* shadow regions.
|
|
|
|
Lighting must be logged in the scene parameter file.
|
|
|
|
### 5.5 Surgical Artifacts
|
|
|
|
Add optional artifacts:
|
|
|
|
* smoke or haze;
|
|
* blood;
|
|
* bubbles;
|
|
* blur;
|
|
* image noise;
|
|
* compression artifacts;
|
|
* vignetting;
|
|
* lens distortion;
|
|
* saturation;
|
|
* specular bloom.
|
|
|
|
Each artifact must be controllable independently.
|
|
|
|
---
|
|
|
|
## 6. Parameter Sweep Design
|
|
|
|
The simulator must support automatic benchmark generation through parameter sweeps.
|
|
|
|
Define parameter groups:
|
|
|
|
### 6.1 Camera Parameters
|
|
|
|
```text
|
|
baseline_mm: [2, 4, 6, 8, 10]
|
|
working_distance_mm: [40, 60, 80, 100, 120]
|
|
focal_length_px: [600, 800, 1000, 1200]
|
|
camera_motion_speed: [static, slow, medium, fast]
|
|
```
|
|
|
|
### 6.2 Tissue Parameters
|
|
|
|
```text
|
|
texture_level: [low, medium, high]
|
|
specularity: [none, low, medium, high]
|
|
roughness: [low, medium, high]
|
|
deformation_amplitude: [0, 1, 3, 5, 10] mm
|
|
deformation_frequency: [0, 0.5, 1.0, 2.0] Hz
|
|
```
|
|
|
|
### 6.3 Scene Difficulty Parameters
|
|
|
|
```text
|
|
smoke_density: [0, 0.1, 0.3, 0.5]
|
|
blood_coverage: [0, 0.05, 0.15, 0.3]
|
|
tool_occlusion_ratio: [0, 0.1, 0.25, 0.5]
|
|
light_angle: [0, 15, 30, 45, 60] degrees
|
|
image_noise: [none, low, medium, high]
|
|
motion_blur: [false, true]
|
|
```
|
|
|
|
The workflow must include scripts for generating benchmark subsets such as:
|
|
|
|
```text
|
|
baseline_sweep/
|
|
specularity_sweep/
|
|
smoke_sweep/
|
|
deformation_sweep/
|
|
tool_occlusion_sweep/
|
|
lighting_sweep/
|
|
combined_hard_cases/
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Algorithm Benchmarking Interface
|
|
|
|
Design a plug-in interface so different stereo-matching algorithms can be benchmarked automatically.
|
|
|
|
Each algorithm adapter should accept:
|
|
|
|
```text
|
|
left_rgb
|
|
right_rgb
|
|
camera_intrinsics
|
|
stereo_calibration
|
|
```
|
|
|
|
Each algorithm should output:
|
|
|
|
```text
|
|
predicted_disparity
|
|
predicted_depth
|
|
optional_confidence_map
|
|
runtime_ms
|
|
```
|
|
|
|
The benchmark runner should support both:
|
|
|
|
* classical stereo methods;
|
|
* deep-learning stereo methods.
|
|
|
|
Example algorithm categories:
|
|
|
|
```text
|
|
Block Matching
|
|
Semi-Global Matching
|
|
ELAS / LIBELAS-style methods
|
|
RAFT-Stereo
|
|
PSMNet
|
|
GC-Net
|
|
StereoNet
|
|
HITNet
|
|
laparoscopic-specific stereo methods
|
|
custom user algorithms
|
|
```
|
|
|
|
The agent should design a wrapper format such as:
|
|
|
|
```bash
|
|
python run_algorithm.py \
|
|
--algorithm raft_stereo \
|
|
--left path/to/left.png \
|
|
--right path/to/right.png \
|
|
--calib path/to/stereo_calibration.json \
|
|
--output path/to/prediction/
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Evaluation Metrics
|
|
|
|
The benchmark must compute metrics at several levels.
|
|
|
|
### 8.1 Disparity Metrics
|
|
|
|
```text
|
|
End-Point Error
|
|
Mean Absolute Disparity Error
|
|
Median Disparity Error
|
|
Bad-1px
|
|
Bad-2px
|
|
Bad-3px
|
|
Bad-5px
|
|
```
|
|
|
|
### 8.2 Depth Metrics
|
|
|
|
```text
|
|
Depth RMSE in mm
|
|
Depth MAE in mm
|
|
Absolute Relative Error
|
|
Squared Relative Error
|
|
Threshold Accuracy
|
|
Scale Drift
|
|
```
|
|
|
|
### 8.3 3D Reconstruction Metrics
|
|
|
|
```text
|
|
Point-cloud Chamfer Distance
|
|
Surface-to-surface distance
|
|
Normal consistency
|
|
Completeness
|
|
Accuracy
|
|
F-score at distance thresholds
|
|
```
|
|
|
|
### 8.4 Temporal Metrics
|
|
|
|
For sequences:
|
|
|
|
```text
|
|
Temporal depth consistency
|
|
Frame-to-frame disparity jitter
|
|
Optical-flow-aware depth stability
|
|
Runtime stability
|
|
```
|
|
|
|
### 8.5 Region-Stratified Metrics
|
|
|
|
Metrics must also be computed separately for:
|
|
|
|
```text
|
|
tissue regions
|
|
instrument regions
|
|
specular regions
|
|
blood regions
|
|
smoke-affected regions
|
|
shadow regions
|
|
occlusion boundaries
|
|
depth discontinuities
|
|
low-texture tissue
|
|
high-curvature tissue
|
|
```
|
|
|
|
This region-stratified evaluation is essential. The benchmark should reveal not only which algorithm is best on average, but also which visual or geometric condition causes failure.
|
|
|
|
---
|
|
|
|
## 9. Benchmark Reports
|
|
|
|
The workflow must generate automatic reports containing:
|
|
|
|
```text
|
|
summary table per algorithm
|
|
metric curves across parameter sweeps
|
|
failure-case visualizations
|
|
error heatmaps
|
|
depth/disparity overlays
|
|
runtime comparison
|
|
ranking by scenario
|
|
ranking by robustness
|
|
per-region error breakdown
|
|
```
|
|
|
|
Each benchmark report should include plots such as:
|
|
|
|
```text
|
|
Depth RMSE vs specularity
|
|
Bad-3px vs smoke density
|
|
Chamfer distance vs deformation amplitude
|
|
Runtime vs image resolution
|
|
Error near tool boundaries
|
|
Error in low-texture regions
|
|
```
|
|
|
|
The report should clearly identify:
|
|
|
|
```text
|
|
best average algorithm
|
|
most robust algorithm
|
|
fastest algorithm
|
|
best algorithm under smoke
|
|
best algorithm under specular highlights
|
|
best algorithm near instruments
|
|
worst failure modes per algorithm
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Validation Against Real Datasets
|
|
|
|
The simulator workflow must include a validation stage against real laparoscopic stereo datasets.
|
|
|
|
Use these real benchmarks as external references:
|
|
|
|
```text
|
|
SCARED
|
|
SERV-CT
|
|
EndoAbS
|
|
Hamlyn laparoscopic datasets
|
|
StereoMIS
|
|
```
|
|
|
|
The agent should prepare a protocol for checking whether synthetic benchmark results correlate with real-data performance.
|
|
|
|
Important validation questions:
|
|
|
|
```text
|
|
Do algorithms that perform well in simulation also perform well on SCARED?
|
|
Do specularity failures in simulation match failures on real laparoscopic tissue?
|
|
Do smoke and tool occlusion scenarios produce realistic ranking changes?
|
|
Does the simulator overestimate performance?
|
|
Which synthetic parameters best predict real-world errors?
|
|
```
|
|
|
|
The goal is not to prove the simulator fully replaces real data. The goal is to show that it provides controlled failure-mode analysis that complements real datasets.
|
|
|
|
---
|
|
|
|
## 11. Repository Structure
|
|
|
|
Prepare the project with the following structure:
|
|
|
|
```text
|
|
laparo-stereo-sim/
|
|
│
|
|
├── unity_project/
|
|
│ ├── Assets/
|
|
│ ├── Packages/
|
|
│ ├── ProjectSettings/
|
|
│ └── README.md
|
|
│
|
|
├── benchmark_runner/
|
|
│ ├── algorithms/
|
|
│ │ ├── sgm/
|
|
│ │ ├── raft_stereo/
|
|
│ │ ├── psmnet/
|
|
│ │ └── custom_template/
|
|
│ │
|
|
│ ├── evaluation/
|
|
│ │ ├── disparity_metrics.py
|
|
│ │ ├── depth_metrics.py
|
|
│ │ ├── pointcloud_metrics.py
|
|
│ │ ├── temporal_metrics.py
|
|
│ │ └── region_metrics.py
|
|
│ │
|
|
│ ├── visualization/
|
|
│ │ ├── plot_metrics.py
|
|
│ │ ├── render_error_maps.py
|
|
│ │ └── generate_report.py
|
|
│ │
|
|
│ ├── configs/
|
|
│ │ ├── baseline_sweep.yaml
|
|
│ │ ├── specularity_sweep.yaml
|
|
│ │ ├── smoke_sweep.yaml
|
|
│ │ ├── deformation_sweep.yaml
|
|
│ │ └── combined_hard_cases.yaml
|
|
│ │
|
|
│ └── run_benchmark.py
|
|
│
|
|
├── datasets/
|
|
│ ├── synthetic/
|
|
│ ├── real/
|
|
│ │ ├── scared/
|
|
│ │ ├── serv_ct/
|
|
│ │ ├── endoabs/
|
|
│ │ └── hamlyn/
|
|
│ └── README.md
|
|
│
|
|
├── reports/
|
|
│ ├── figures/
|
|
│ ├── tables/
|
|
│ └── benchmark_summary.md
|
|
│
|
|
├── docs/
|
|
│ ├── simulator_design.md
|
|
│ ├── benchmark_protocol.md
|
|
│ ├── algorithm_interface.md
|
|
│ └── validation_protocol.md
|
|
│
|
|
├── scripts/
|
|
│ ├── export_unity_sequence.py
|
|
│ ├── convert_depth_to_disparity.py
|
|
│ ├── rectify_stereo_pair.py
|
|
│ └── prepare_real_datasets.py
|
|
│
|
|
├── environment.yml
|
|
├── requirements.txt
|
|
└── README.md
|
|
```
|
|
|
|
---
|
|
|
|
## 12. Development Phases
|
|
|
|
### Phase 1: Literature and Requirement Review
|
|
|
|
Prepare a compact review of:
|
|
|
|
```text
|
|
existing laparoscopic stereo datasets
|
|
existing surgical simulators
|
|
existing stereo depth algorithms
|
|
evaluation metrics for stereo and reconstruction
|
|
known laparoscopic stereo failure modes
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
docs/literature_summary.md
|
|
docs/design_requirements.md
|
|
docs/benchmark_gap_analysis.md
|
|
```
|
|
|
|
### Phase 2: Minimal Unity Prototype
|
|
|
|
Build a simple Unity scene with:
|
|
|
|
```text
|
|
one deformable tissue surface
|
|
one stereo laparoscope
|
|
one controllable light source
|
|
left/right RGB export
|
|
depth export
|
|
camera calibration export
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
unity_project/
|
|
sample_output/
|
|
docs/prototype_notes.md
|
|
```
|
|
|
|
### Phase 3: Ground Truth Export
|
|
|
|
Add export of:
|
|
|
|
```text
|
|
depth maps
|
|
disparity maps
|
|
surface normals
|
|
segmentation masks
|
|
occlusion masks
|
|
camera parameters
|
|
scene parameter logs
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
sample_output/ground_truth/
|
|
scripts/convert_depth_to_disparity.py
|
|
docs/ground_truth_specification.md
|
|
```
|
|
|
|
### Phase 4: Scene Realism and Artifacts
|
|
|
|
Add:
|
|
|
|
```text
|
|
specular tissue material
|
|
low-texture tissue
|
|
blood patches
|
|
smoke
|
|
instrument occlusion
|
|
motion blur
|
|
noise
|
|
lighting variation
|
|
deformation
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
unity_project/realistic_scene/
|
|
docs/artifact_controls.md
|
|
sample_sequences/
|
|
```
|
|
|
|
### Phase 5: Benchmark Runner
|
|
|
|
Implement the algorithm interface and evaluation system.
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
benchmark_runner/run_benchmark.py
|
|
benchmark_runner/algorithms/custom_template/
|
|
benchmark_runner/evaluation/
|
|
benchmark_runner/visualization/
|
|
```
|
|
|
|
### Phase 6: Algorithm Integration
|
|
|
|
Integrate several baseline algorithms.
|
|
|
|
Minimum recommended set:
|
|
|
|
```text
|
|
OpenCV StereoBM
|
|
OpenCV StereoSGBM
|
|
RAFT-Stereo or equivalent deep model
|
|
one laparoscopic-specific method if available
|
|
one custom placeholder adapter
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
benchmark_runner/algorithms/
|
|
docs/algorithm_interface.md
|
|
reports/baseline_algorithm_results.md
|
|
```
|
|
|
|
### Phase 7: Controlled Experiments
|
|
|
|
Run parameter sweeps.
|
|
|
|
Minimum experiments:
|
|
|
|
```text
|
|
specularity sweep
|
|
smoke sweep
|
|
baseline sweep
|
|
deformation sweep
|
|
tool occlusion sweep
|
|
lighting sweep
|
|
combined hard-case sweep
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
reports/parameter_sweep_results.md
|
|
reports/figures/
|
|
reports/tables/
|
|
```
|
|
|
|
### Phase 8: Real Dataset Validation
|
|
|
|
Evaluate selected algorithms on real datasets and compare trends.
|
|
|
|
Deliverables:
|
|
|
|
```text
|
|
datasets/real/
|
|
scripts/prepare_real_datasets.py
|
|
reports/real_dataset_validation.md
|
|
reports/sim_to_real_correlation.md
|
|
```
|
|
|
|
### Phase 9: Final Research Packaging
|
|
|
|
Prepare final outputs:
|
|
|
|
```text
|
|
README.md
|
|
docs/methodology.md
|
|
docs/benchmark_protocol.md
|
|
reports/final_benchmark_report.md
|
|
paper_outline.md
|
|
demo_video_plan.md
|
|
```
|
|
|
|
---
|
|
|
|
## 13. Technical Requirements
|
|
|
|
### Unity Requirements
|
|
|
|
Use:
|
|
|
|
```text
|
|
Unity 2022 LTS or newer
|
|
HDRP if photorealistic rendering is needed
|
|
Unity Perception or custom ground-truth export tools
|
|
C# scripts for camera control and scene randomization
|
|
deterministic random seeds
|
|
headless or batch rendering if possible
|
|
```
|
|
|
|
The Unity project must support:
|
|
|
|
```text
|
|
manual scene preview
|
|
scripted batch generation
|
|
reproducible randomization
|
|
per-frame metadata export
|
|
config-driven experiments
|
|
```
|
|
|
|
### Python Requirements
|
|
|
|
Use Python for:
|
|
|
|
```text
|
|
benchmark orchestration
|
|
algorithm adapters
|
|
metric computation
|
|
plot generation
|
|
report generation
|
|
real dataset preparation
|
|
```
|
|
|
|
Recommended libraries:
|
|
|
|
```text
|
|
numpy
|
|
opencv-python
|
|
scipy
|
|
matplotlib
|
|
pandas
|
|
open3d
|
|
torch
|
|
torchvision
|
|
tqdm
|
|
pyyaml
|
|
scikit-image
|
|
```
|
|
|
|
---
|
|
|
|
## 14. Reproducibility Requirements
|
|
|
|
Every experiment must be reproducible.
|
|
|
|
Each benchmark run must save:
|
|
|
|
```text
|
|
git commit hash
|
|
Unity version
|
|
Python environment
|
|
algorithm version
|
|
algorithm parameters
|
|
simulator configuration
|
|
random seed
|
|
date and time
|
|
hardware information
|
|
runtime logs
|
|
```
|
|
|
|
Each output folder should contain:
|
|
|
|
```text
|
|
config.yaml
|
|
scene_parameters.json
|
|
algorithm_parameters.json
|
|
metrics.csv
|
|
summary.json
|
|
visualizations/
|
|
```
|
|
|
|
---
|
|
|
|
## 15. Dataset Format
|
|
|
|
Use a clear dataset format:
|
|
|
|
```text
|
|
sequence_0001/
|
|
│
|
|
├── left/
|
|
│ ├── 000000.png
|
|
│ ├── 000001.png
|
|
│ └── ...
|
|
│
|
|
├── right/
|
|
│ ├── 000000.png
|
|
│ ├── 000001.png
|
|
│ └── ...
|
|
│
|
|
├── depth_left/
|
|
│ ├── 000000.npy
|
|
│ ├── 000001.npy
|
|
│ └── ...
|
|
│
|
|
├── disparity_left/
|
|
│ ├── 000000.npy
|
|
│ ├── 000001.npy
|
|
│ └── ...
|
|
│
|
|
├── masks/
|
|
│ ├── tissue/
|
|
│ ├── instrument/
|
|
│ ├── specularity/
|
|
│ ├── occlusion/
|
|
│ └── smoke/
|
|
│
|
|
├── normals/
|
|
│ ├── 000000.npy
|
|
│ └── ...
|
|
│
|
|
├── calibration/
|
|
│ ├── intrinsics_left.json
|
|
│ ├── intrinsics_right.json
|
|
│ ├── extrinsics.json
|
|
│ └── stereo_calibration.json
|
|
│
|
|
└── metadata/
|
|
├── scene_parameters.json
|
|
├── random_seed.txt
|
|
└── config.yaml
|
|
```
|
|
|
|
---
|
|
|
|
## 16. Ground Truth Rules
|
|
|
|
The ground truth must be geometrically consistent.
|
|
|
|
Depth and disparity must satisfy:
|
|
|
|
```text
|
|
disparity = focal_length_px * baseline_m / depth_m
|
|
```
|
|
|
|
The agent must verify:
|
|
|
|
```text
|
|
depth values are metric
|
|
invalid pixels are masked
|
|
occluded regions are labeled
|
|
left/right consistency is documented
|
|
camera intrinsics are correct
|
|
baseline units are correct
|
|
coordinate systems are documented
|
|
```
|
|
|
|
Do not apply image-to-image realism translation unless the workflow includes a validation step proving that depth/disparity labels remain geometrically valid.
|
|
|
|
---
|
|
|
|
## 17. Failure-Mode Taxonomy
|
|
|
|
The benchmark should classify failures into categories:
|
|
|
|
```text
|
|
specular highlight failure
|
|
low-texture failure
|
|
smoke or haze failure
|
|
blood contamination failure
|
|
tool-boundary failure
|
|
occlusion-boundary failure
|
|
deformation failure
|
|
motion-blur failure
|
|
overexposure failure
|
|
shadow failure
|
|
narrow-baseline failure
|
|
long-working-distance failure
|
|
```
|
|
|
|
For each algorithm, the report should answer:
|
|
|
|
```text
|
|
Where does the algorithm fail?
|
|
Why does it fail?
|
|
Is the failure local or global?
|
|
Is the failure stable across frames?
|
|
Does confidence estimation detect the failure?
|
|
Does the failure also appear on real data?
|
|
```
|
|
|
|
---
|
|
|
|
## 18. Minimum Viable Prototype
|
|
|
|
The minimum useful prototype should include:
|
|
|
|
```text
|
|
one synthetic organ surface
|
|
one stereo laparoscope
|
|
one light source
|
|
one specular tissue material
|
|
one moving tool
|
|
one deformation mode
|
|
RGB stereo export
|
|
depth export
|
|
disparity export
|
|
calibration export
|
|
OpenCV SGBM benchmark
|
|
one deep stereo model benchmark
|
|
metric report
|
|
error visualization
|
|
```
|
|
|
|
The minimum demonstration should show:
|
|
|
|
```text
|
|
Algorithm A vs Algorithm B under increasing specularity
|
|
Algorithm A vs Algorithm B under increasing smoke
|
|
Algorithm A vs Algorithm B near tool occlusions
|
|
Algorithm A vs Algorithm B on deforming tissue
|
|
```
|
|
|
|
---
|
|
|
|
## 19. Expected Research Output
|
|
|
|
The final project should support a paper or thesis with the following structure:
|
|
|
|
```text
|
|
Title
|
|
Abstract
|
|
Introduction
|
|
Related Work
|
|
Simulator Design
|
|
Benchmark Protocol
|
|
Ground Truth Generation
|
|
Algorithm Interface
|
|
Controlled Experiments
|
|
Real Dataset Validation
|
|
Results
|
|
Failure-Mode Analysis
|
|
Limitations
|
|
Conclusion
|
|
```
|
|
|
|
Possible title:
|
|
|
|
```text
|
|
A Unity-Based Controllable Benchmark for Failure-Mode Analysis of Stereo Depth Estimation in Laparoscopic Scenes
|
|
```
|
|
|
|
Alternative title:
|
|
|
|
```text
|
|
On-the-Fly Synthetic Benchmarking of Stereo Reconstruction Algorithms under Controlled Laparoscopic Imaging Conditions
|
|
```
|
|
|
|
---
|
|
|
|
## 20. Key Success Criteria
|
|
|
|
The workflow is successful if it produces:
|
|
|
|
```text
|
|
a working stereo laparoscopic simulator
|
|
accurate ground-truth depth and disparity
|
|
reproducible parameter sweeps
|
|
automatic stereo algorithm evaluation
|
|
clear failure-mode analysis
|
|
comparison with real laparoscopic datasets
|
|
useful plots and benchmark reports
|
|
documentation sufficient for other researchers to reproduce results
|
|
```
|
|
|
|
The project should be considered weak if it only produces visually appealing synthetic images without rigorous ground truth, controlled experiments, or validation against real datasets.
|
|
|
|
---
|
|
|
|
## 21. Agent Deliverables Checklist
|
|
|
|
The agent must prepare:
|
|
|
|
```text
|
|
[ ] literature summary
|
|
[ ] benchmark gap analysis
|
|
[ ] simulator architecture
|
|
[ ] Unity scene design
|
|
[ ] camera model specification
|
|
[ ] ground-truth export specification
|
|
[ ] dataset format specification
|
|
[ ] parameter sweep plan
|
|
[ ] algorithm plug-in interface
|
|
[ ] evaluation metric definitions
|
|
[ ] benchmark report template
|
|
[ ] real dataset validation plan
|
|
[ ] repository structure
|
|
[ ] development milestones
|
|
[ ] risk analysis
|
|
[ ] minimum viable prototype plan
|
|
[ ] final paper/thesis outline
|
|
```
|
|
|
|
---
|
|
|
|
## 22. Major Risks
|
|
|
|
The workflow must explicitly address these risks:
|
|
|
|
```text
|
|
synthetic-to-real gap
|
|
unrealistic tissue appearance
|
|
incorrect ground-truth disparity
|
|
invalid labels after post-processing
|
|
lack of real-data validation
|
|
too much focus on rendering instead of benchmarking
|
|
too few baseline algorithms
|
|
weak novelty compared with existing synthetic datasets
|
|
poor documentation
|
|
non-reproducible experiments
|
|
```
|
|
|
|
For each risk, define a mitigation strategy.
|
|
|
|
---
|
|
|
|
## 23. Recommended Initial Milestone
|
|
|
|
The first milestone should be:
|
|
|
|
> Build a minimal Unity stereo laparoscope scene that exports rectified left/right RGB images, metric depth, disparity, masks, and calibration, then benchmark OpenCV SGBM and one deep stereo model under a controlled specularity sweep.
|
|
|
|
This milestone should produce:
|
|
|
|
```text
|
|
one synthetic sequence
|
|
one parameter sweep
|
|
two algorithm outputs
|
|
one metrics table
|
|
one error heatmap
|
|
one short technical report
|
|
```
|
|
|
|
Only after this milestone is complete should the project expand to smoke, blood, deformation, tools, and real-dataset validation.
|
|
|