Artur Mukhamadiev c51e6d1369 feat(models) added models/textures and Agents related stuff
:Release Notes:
- Agents files for opencode and claude
- Skills for opencode and claude
- 3d models with medical organs
- Some textures

:Detailed Notes:
-

:Testing Performed:
-

:QA Notes:
- Looks like shit :)

:Issues Addressed:
TG-1
2026-06-22 16:57:48 +03:00

21 KiB

Agent Workflow Instruction: Unity-Based Laparoscopic Stereo Benchmark Simulator

1. Mission

Prepare a workflow for building a Unity-based 3D laparoscopic scene simulator for on-the-fly benchmarking of stereo-matching and 3D reconstruction algorithms.

The simulator must generate controllable synthetic laparoscopic stereo scenes with exact ground truth, allowing researchers to test how different algorithms behave under controlled surgical imaging conditions such as specularity, low texture, smoke, blood, tissue deformation, lighting variation, camera motion, tool occlusion, and stereo-baseline changes.

The main research contribution should not be “a synthetic dataset” alone. The contribution should be:

A controllable, reproducible, parameterized benchmarking environment for failure-mode analysis of stereo depth estimation and 3D reconstruction in laparoscopic scenes.


2. Core Research Question

Design the workflow around this question:

How do different stereo-matching and 3D reconstruction algorithms fail under specific laparoscopic imaging conditions, and can a controllable simulator expose these failure modes more systematically than fixed real-world datasets?

The system should support:

  • real-time or batch generation of stereo laparoscopic image pairs;
  • exact metric ground truth;
  • automatic algorithm evaluation;
  • controlled parameter sweeps;
  • comparison with real laparoscopic stereo benchmarks.

3. Target Users

The workflow should assume the simulator will be used by:

  • computer vision researchers;
  • medical image analysis researchers;
  • surgical robotics researchers;
  • PhD/MSc students working on laparoscopic 3D reconstruction;
  • developers benchmarking stereo algorithms before testing on real datasets.

4. Required Simulator Outputs

For every generated frame or sequence, the simulator must export the following:

left_rgb.png
right_rgb.png
depth_left.exr / depth_left.npy
depth_right.exr / depth_right.npy
disparity_left.npy
disparity_right.npy
surface_normals.npy
occlusion_mask.png
specularity_mask.png
instrument_mask.png
tissue_mask.png
semantic_labels.png
camera_intrinsics.json
camera_extrinsics.json
stereo_calibration.json
scene_parameters.json

The scene_parameters.json file must record all randomized and manually set parameters, including:

{
  "baseline_mm": 5.0,
  "focal_length_px": 800,
  "working_distance_mm": 80,
  "camera_pitch_deg": 0,
  "camera_yaw_deg": 0,
  "light_intensity": 1.0,
  "light_angle_deg": 30,
  "tissue_specularity": 0.6,
  "tissue_roughness": 0.35,
  "smoke_density": 0.2,
  "blood_coverage": 0.1,
  "deformation_amplitude_mm": 3.0,
  "deformation_frequency_hz": 1.2,
  "tool_occlusion_ratio": 0.25,
  "motion_blur": false,
  "noise_level": 0.01,
  "random_seed": 12345
}

5. Core Scene Components

The simulator should contain the following modular scene components.

5.1 Stereo Laparoscope Camera

Implement a virtual stereo laparoscope with configurable:

  • stereo baseline;
  • focal length;
  • sensor size;
  • image resolution;
  • lens distortion;
  • convergence angle;
  • near/far clipping planes;
  • working distance;
  • camera pose;
  • camera trajectory.

The camera should support:

  • fixed stereo capture;
  • moving stereo camera;
  • rectified output;
  • non-rectified raw output;
  • exportable calibration parameters.

5.2 Tissue and Organ Scene

Create at least one deformable laparoscopic tissue scene.

Minimum viable scene:

  • one abdominal organ or tissue surface;
  • non-planar geometry;
  • wet/specular material;
  • subtle tissue texture;
  • deformable surface;
  • local folds, ridges, and valleys.

The tissue shader should support:

  • diffuse color variation;
  • roughness variation;
  • specular highlights;
  • wetness;
  • subsurface-like appearance;
  • procedural vessels or texture;
  • optional blood patches.

5.3 Surgical Instruments

Add laparoscopic instruments that can:

  • enter and leave the field of view;
  • occlude tissue;
  • touch or deform tissue;
  • create difficult object boundaries;
  • generate metallic specular highlights.

At minimum, include:

  • grasper;
  • forceps or generic tool shaft;
  • optional needle holder or scissors.

5.4 Lighting System

Implement controllable endoscopic lighting:

  • point or spot lights attached near the camera;
  • adjustable light intensity;
  • adjustable falloff;
  • adjustable direction;
  • asymmetric illumination;
  • overexposure;
  • shadow regions.

Lighting must be logged in the scene parameter file.

5.5 Surgical Artifacts

Add optional artifacts:

  • smoke or haze;
  • blood;
  • bubbles;
  • blur;
  • image noise;
  • compression artifacts;
  • vignetting;
  • lens distortion;
  • saturation;
  • specular bloom.

Each artifact must be controllable independently.


6. Parameter Sweep Design

The simulator must support automatic benchmark generation through parameter sweeps.

Define parameter groups:

6.1 Camera Parameters

baseline_mm: [2, 4, 6, 8, 10]
working_distance_mm: [40, 60, 80, 100, 120]
focal_length_px: [600, 800, 1000, 1200]
camera_motion_speed: [static, slow, medium, fast]

6.2 Tissue Parameters

texture_level: [low, medium, high]
specularity: [none, low, medium, high]
roughness: [low, medium, high]
deformation_amplitude: [0, 1, 3, 5, 10] mm
deformation_frequency: [0, 0.5, 1.0, 2.0] Hz

6.3 Scene Difficulty Parameters

smoke_density: [0, 0.1, 0.3, 0.5]
blood_coverage: [0, 0.05, 0.15, 0.3]
tool_occlusion_ratio: [0, 0.1, 0.25, 0.5]
light_angle: [0, 15, 30, 45, 60] degrees
image_noise: [none, low, medium, high]
motion_blur: [false, true]

The workflow must include scripts for generating benchmark subsets such as:

baseline_sweep/
specularity_sweep/
smoke_sweep/
deformation_sweep/
tool_occlusion_sweep/
lighting_sweep/
combined_hard_cases/

7. Algorithm Benchmarking Interface

Design a plug-in interface so different stereo-matching algorithms can be benchmarked automatically.

Each algorithm adapter should accept:

left_rgb
right_rgb
camera_intrinsics
stereo_calibration

Each algorithm should output:

predicted_disparity
predicted_depth
optional_confidence_map
runtime_ms

The benchmark runner should support both:

  • classical stereo methods;
  • deep-learning stereo methods.

Example algorithm categories:

Block Matching
Semi-Global Matching
ELAS / LIBELAS-style methods
RAFT-Stereo
PSMNet
GC-Net
StereoNet
HITNet
laparoscopic-specific stereo methods
custom user algorithms

The agent should design a wrapper format such as:

python run_algorithm.py \
  --algorithm raft_stereo \
  --left path/to/left.png \
  --right path/to/right.png \
  --calib path/to/stereo_calibration.json \
  --output path/to/prediction/

8. Evaluation Metrics

The benchmark must compute metrics at several levels.

8.1 Disparity Metrics

End-Point Error
Mean Absolute Disparity Error
Median Disparity Error
Bad-1px
Bad-2px
Bad-3px
Bad-5px

8.2 Depth Metrics

Depth RMSE in mm
Depth MAE in mm
Absolute Relative Error
Squared Relative Error
Threshold Accuracy
Scale Drift

8.3 3D Reconstruction Metrics

Point-cloud Chamfer Distance
Surface-to-surface distance
Normal consistency
Completeness
Accuracy
F-score at distance thresholds

8.4 Temporal Metrics

For sequences:

Temporal depth consistency
Frame-to-frame disparity jitter
Optical-flow-aware depth stability
Runtime stability

8.5 Region-Stratified Metrics

Metrics must also be computed separately for:

tissue regions
instrument regions
specular regions
blood regions
smoke-affected regions
shadow regions
occlusion boundaries
depth discontinuities
low-texture tissue
high-curvature tissue

This region-stratified evaluation is essential. The benchmark should reveal not only which algorithm is best on average, but also which visual or geometric condition causes failure.


9. Benchmark Reports

The workflow must generate automatic reports containing:

summary table per algorithm
metric curves across parameter sweeps
failure-case visualizations
error heatmaps
depth/disparity overlays
runtime comparison
ranking by scenario
ranking by robustness
per-region error breakdown

Each benchmark report should include plots such as:

Depth RMSE vs specularity
Bad-3px vs smoke density
Chamfer distance vs deformation amplitude
Runtime vs image resolution
Error near tool boundaries
Error in low-texture regions

The report should clearly identify:

best average algorithm
most robust algorithm
fastest algorithm
best algorithm under smoke
best algorithm under specular highlights
best algorithm near instruments
worst failure modes per algorithm

10. Validation Against Real Datasets

The simulator workflow must include a validation stage against real laparoscopic stereo datasets.

Use these real benchmarks as external references:

SCARED
SERV-CT
EndoAbS
Hamlyn laparoscopic datasets
StereoMIS

The agent should prepare a protocol for checking whether synthetic benchmark results correlate with real-data performance.

Important validation questions:

Do algorithms that perform well in simulation also perform well on SCARED?
Do specularity failures in simulation match failures on real laparoscopic tissue?
Do smoke and tool occlusion scenarios produce realistic ranking changes?
Does the simulator overestimate performance?
Which synthetic parameters best predict real-world errors?

The goal is not to prove the simulator fully replaces real data. The goal is to show that it provides controlled failure-mode analysis that complements real datasets.


11. Repository Structure

Prepare the project with the following structure:

laparo-stereo-sim/
│
├── unity_project/
│   ├── Assets/
│   ├── Packages/
│   ├── ProjectSettings/
│   └── README.md
│
├── benchmark_runner/
│   ├── algorithms/
│   │   ├── sgm/
│   │   ├── raft_stereo/
│   │   ├── psmnet/
│   │   └── custom_template/
│   │
│   ├── evaluation/
│   │   ├── disparity_metrics.py
│   │   ├── depth_metrics.py
│   │   ├── pointcloud_metrics.py
│   │   ├── temporal_metrics.py
│   │   └── region_metrics.py
│   │
│   ├── visualization/
│   │   ├── plot_metrics.py
│   │   ├── render_error_maps.py
│   │   └── generate_report.py
│   │
│   ├── configs/
│   │   ├── baseline_sweep.yaml
│   │   ├── specularity_sweep.yaml
│   │   ├── smoke_sweep.yaml
│   │   ├── deformation_sweep.yaml
│   │   └── combined_hard_cases.yaml
│   │
│   └── run_benchmark.py
│
├── datasets/
│   ├── synthetic/
│   ├── real/
│   │   ├── scared/
│   │   ├── serv_ct/
│   │   ├── endoabs/
│   │   └── hamlyn/
│   └── README.md
│
├── reports/
│   ├── figures/
│   ├── tables/
│   └── benchmark_summary.md
│
├── docs/
│   ├── simulator_design.md
│   ├── benchmark_protocol.md
│   ├── algorithm_interface.md
│   └── validation_protocol.md
│
├── scripts/
│   ├── export_unity_sequence.py
│   ├── convert_depth_to_disparity.py
│   ├── rectify_stereo_pair.py
│   └── prepare_real_datasets.py
│
├── environment.yml
├── requirements.txt
└── README.md

12. Development Phases

Phase 1: Literature and Requirement Review

Prepare a compact review of:

existing laparoscopic stereo datasets
existing surgical simulators
existing stereo depth algorithms
evaluation metrics for stereo and reconstruction
known laparoscopic stereo failure modes

Deliverables:

docs/literature_summary.md
docs/design_requirements.md
docs/benchmark_gap_analysis.md

Phase 2: Minimal Unity Prototype

Build a simple Unity scene with:

one deformable tissue surface
one stereo laparoscope
one controllable light source
left/right RGB export
depth export
camera calibration export

Deliverables:

unity_project/
sample_output/
docs/prototype_notes.md

Phase 3: Ground Truth Export

Add export of:

depth maps
disparity maps
surface normals
segmentation masks
occlusion masks
camera parameters
scene parameter logs

Deliverables:

sample_output/ground_truth/
scripts/convert_depth_to_disparity.py
docs/ground_truth_specification.md

Phase 4: Scene Realism and Artifacts

Add:

specular tissue material
low-texture tissue
blood patches
smoke
instrument occlusion
motion blur
noise
lighting variation
deformation

Deliverables:

unity_project/realistic_scene/
docs/artifact_controls.md
sample_sequences/

Phase 5: Benchmark Runner

Implement the algorithm interface and evaluation system.

Deliverables:

benchmark_runner/run_benchmark.py
benchmark_runner/algorithms/custom_template/
benchmark_runner/evaluation/
benchmark_runner/visualization/

Phase 6: Algorithm Integration

Integrate several baseline algorithms.

Minimum recommended set:

OpenCV StereoBM
OpenCV StereoSGBM
RAFT-Stereo or equivalent deep model
one laparoscopic-specific method if available
one custom placeholder adapter

Deliverables:

benchmark_runner/algorithms/
docs/algorithm_interface.md
reports/baseline_algorithm_results.md

Phase 7: Controlled Experiments

Run parameter sweeps.

Minimum experiments:

specularity sweep
smoke sweep
baseline sweep
deformation sweep
tool occlusion sweep
lighting sweep
combined hard-case sweep

Deliverables:

reports/parameter_sweep_results.md
reports/figures/
reports/tables/

Phase 8: Real Dataset Validation

Evaluate selected algorithms on real datasets and compare trends.

Deliverables:

datasets/real/
scripts/prepare_real_datasets.py
reports/real_dataset_validation.md
reports/sim_to_real_correlation.md

Phase 9: Final Research Packaging

Prepare final outputs:

README.md
docs/methodology.md
docs/benchmark_protocol.md
reports/final_benchmark_report.md
paper_outline.md
demo_video_plan.md

13. Technical Requirements

Unity Requirements

Use:

Unity 2022 LTS or newer
HDRP if photorealistic rendering is needed
Unity Perception or custom ground-truth export tools
C# scripts for camera control and scene randomization
deterministic random seeds
headless or batch rendering if possible

The Unity project must support:

manual scene preview
scripted batch generation
reproducible randomization
per-frame metadata export
config-driven experiments

Python Requirements

Use Python for:

benchmark orchestration
algorithm adapters
metric computation
plot generation
report generation
real dataset preparation

Recommended libraries:

numpy
opencv-python
scipy
matplotlib
pandas
open3d
torch
torchvision
tqdm
pyyaml
scikit-image

14. Reproducibility Requirements

Every experiment must be reproducible.

Each benchmark run must save:

git commit hash
Unity version
Python environment
algorithm version
algorithm parameters
simulator configuration
random seed
date and time
hardware information
runtime logs

Each output folder should contain:

config.yaml
scene_parameters.json
algorithm_parameters.json
metrics.csv
summary.json
visualizations/

15. Dataset Format

Use a clear dataset format:

sequence_0001/
│
├── left/
│   ├── 000000.png
│   ├── 000001.png
│   └── ...
│
├── right/
│   ├── 000000.png
│   ├── 000001.png
│   └── ...
│
├── depth_left/
│   ├── 000000.npy
│   ├── 000001.npy
│   └── ...
│
├── disparity_left/
│   ├── 000000.npy
│   ├── 000001.npy
│   └── ...
│
├── masks/
│   ├── tissue/
│   ├── instrument/
│   ├── specularity/
│   ├── occlusion/
│   └── smoke/
│
├── normals/
│   ├── 000000.npy
│   └── ...
│
├── calibration/
│   ├── intrinsics_left.json
│   ├── intrinsics_right.json
│   ├── extrinsics.json
│   └── stereo_calibration.json
│
└── metadata/
    ├── scene_parameters.json
    ├── random_seed.txt
    └── config.yaml

16. Ground Truth Rules

The ground truth must be geometrically consistent.

Depth and disparity must satisfy:

disparity = focal_length_px * baseline_m / depth_m

The agent must verify:

depth values are metric
invalid pixels are masked
occluded regions are labeled
left/right consistency is documented
camera intrinsics are correct
baseline units are correct
coordinate systems are documented

Do not apply image-to-image realism translation unless the workflow includes a validation step proving that depth/disparity labels remain geometrically valid.


17. Failure-Mode Taxonomy

The benchmark should classify failures into categories:

specular highlight failure
low-texture failure
smoke or haze failure
blood contamination failure
tool-boundary failure
occlusion-boundary failure
deformation failure
motion-blur failure
overexposure failure
shadow failure
narrow-baseline failure
long-working-distance failure

For each algorithm, the report should answer:

Where does the algorithm fail?
Why does it fail?
Is the failure local or global?
Is the failure stable across frames?
Does confidence estimation detect the failure?
Does the failure also appear on real data?

18. Minimum Viable Prototype

The minimum useful prototype should include:

one synthetic organ surface
one stereo laparoscope
one light source
one specular tissue material
one moving tool
one deformation mode
RGB stereo export
depth export
disparity export
calibration export
OpenCV SGBM benchmark
one deep stereo model benchmark
metric report
error visualization

The minimum demonstration should show:

Algorithm A vs Algorithm B under increasing specularity
Algorithm A vs Algorithm B under increasing smoke
Algorithm A vs Algorithm B near tool occlusions
Algorithm A vs Algorithm B on deforming tissue

19. Expected Research Output

The final project should support a paper or thesis with the following structure:

Title
Abstract
Introduction
Related Work
Simulator Design
Benchmark Protocol
Ground Truth Generation
Algorithm Interface
Controlled Experiments
Real Dataset Validation
Results
Failure-Mode Analysis
Limitations
Conclusion

Possible title:

A Unity-Based Controllable Benchmark for Failure-Mode Analysis of Stereo Depth Estimation in Laparoscopic Scenes

Alternative title:

On-the-Fly Synthetic Benchmarking of Stereo Reconstruction Algorithms under Controlled Laparoscopic Imaging Conditions

20. Key Success Criteria

The workflow is successful if it produces:

a working stereo laparoscopic simulator
accurate ground-truth depth and disparity
reproducible parameter sweeps
automatic stereo algorithm evaluation
clear failure-mode analysis
comparison with real laparoscopic datasets
useful plots and benchmark reports
documentation sufficient for other researchers to reproduce results

The project should be considered weak if it only produces visually appealing synthetic images without rigorous ground truth, controlled experiments, or validation against real datasets.


21. Agent Deliverables Checklist

The agent must prepare:

[ ] literature summary
[ ] benchmark gap analysis
[ ] simulator architecture
[ ] Unity scene design
[ ] camera model specification
[ ] ground-truth export specification
[ ] dataset format specification
[ ] parameter sweep plan
[ ] algorithm plug-in interface
[ ] evaluation metric definitions
[ ] benchmark report template
[ ] real dataset validation plan
[ ] repository structure
[ ] development milestones
[ ] risk analysis
[ ] minimum viable prototype plan
[ ] final paper/thesis outline

22. Major Risks

The workflow must explicitly address these risks:

synthetic-to-real gap
unrealistic tissue appearance
incorrect ground-truth disparity
invalid labels after post-processing
lack of real-data validation
too much focus on rendering instead of benchmarking
too few baseline algorithms
weak novelty compared with existing synthetic datasets
poor documentation
non-reproducible experiments

For each risk, define a mitigation strategy.


The first milestone should be:

Build a minimal Unity stereo laparoscope scene that exports rectified left/right RGB images, metric depth, disparity, masks, and calibration, then benchmark OpenCV SGBM and one deep stereo model under a controlled specularity sweep.

This milestone should produce:

one synthetic sequence
one parameter sweep
two algorithm outputs
one metrics table
one error heatmap
one short technical report

Only after this milestone is complete should the project expand to smoke, blood, deformation, tools, and real-dataset validation.