Metadata-Version: 2.4
Name: cellmap-analyze
Version: 0.2.1
Summary: Code to perform analysis on segmentations like those produced by CellMap
Author-email: David Ackerman <ackermand@janelia.hhmi.org>
Maintainer-email: David Ackerman <ackermand@janelia.hhmi.org>
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: bokeh>=3.1.0
Requires-Dist: fastremap==1.15.2
Requires-Dist: funlib.geometry
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: connected-components-3d
Requires-Dist: dask[distributed]==2025.5.1
Requires-Dist: dask-jobqueue==0.9.0
Requires-Dist: fastmorph
Requires-Dist: pybind11-rdp==0.1.5
Requires-Dist: scipy
Requires-Dist: scikit-image
Requires-Dist: tensorstore
Requires-Dist: tqdm
Requires-Dist: zarr>=3.0.8
Requires-Dist: mwatershed==0.5.2
Requires-Dist: pyarrow
Requires-Dist: neuroglancer
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: build; extra == "dev"
Dynamic: license-file

[![CI Status](https://github.com/janelia-cellmap/cellmap-analyze/actions/workflows/tests.yml/badge.svg)](https://github.com/janelia-cellmap/cellmap-analyze/actions/workflows/tests.yml) [![Codecov](https://codecov.io/gh/janelia-cellmap/cellmap-analyze/branch/main/graph/badge.svg)](https://app.codecov.io/gh/janelia-cellmap/cellmap-analyze)

# cellmap-analyze

A suite of Dask-powered tools for processing and analyzing terabyte-scale 3D segmentation datasets. Supports both isotropic and anisotropic voxel sizes.

---

## Features

### Processing Tools

| Tool                     | CLI Command                  | Description                                                 |
| ------------------------ | ---------------------------- | ----------------------------------------------------------- |
| **Connected Components** | `connected-components`       | Threshold predictions, apply masks, and extract connected components. Volume thresholds are in physical units (nm³). |
| **Clean Components**     | `clean-connected-components` | Refine existing segmentations by removing small/large components. |
| **Contact Sites**        | `contact-sites`              | Identify regions where two segmentations are within a configurable physical distance. Handles mismatched voxel sizes by resampling to a common resolution. |
| **Fill Holes**           | `fill-holes`              | Fill interior gaps in segmented volumes.                    |
| **Filter IDs**           | `filter-ids`                 | Exclude unwanted segmentation IDs.                          |
| **Mutex Watershed**      | `mws`                        | Mutex watershed agglomeration from affinities.              |
| **Label With Mask**      | `label-with-mask`            | Label one dataset with IDs from another.                    |
| **Morphological Operations** | `morphological-operations` | Erosion and dilation of segmented datasets. Processing order across blocks is not guaranteed. |
| **Skeletonize**          | `skeletonize`                | Generate skeletons from segmented objects with optional pruning and simplification. Automatically resamples to isotropic resolution before skeletonization. |

### Analysis Tools

| Tool                | CLI Command                  | Description                                                           |
| ------------------- | ---------------------------- | --------------------------------------------------------------------- |
| **Measurement**     | `measure`                    | Compute metrics (volume, surface area, radius of gyration, bounding box) for objects and contact sites. Supports raw intensity statistics when a raw dataset is provided. |
| **Fit Lines**       | `fit_lines_to_segmentations` | Fit geometric lines to elongated/cylindrical structures.              |
| **Assign to Organelles** | `assign_to_organelles`  | Map segmented objects to organelles based on centers of mass.         |

---

## Anisotropic data

All operations handle anisotropic voxel sizes (e.g. `(8, 8, 32)` nm in ZYX). Physical-unit parameters like `minimum_volume_nm_3`, `contact_distance_nm`, and `gaussian_smoothing_sigma_nm` are automatically converted to the appropriate per-axis voxel units. When two datasets have different voxel sizes, they are resampled to a common resolution using nearest-neighbor interpolation.

---

## Installation

```bash
pip install cellmap-analyze
```

---

## Usage

All commands share the same basic interface:

```bash
<command> [options] <config_path>
```

* `<command>`: One of the processing or analysis tools listed above.
* `<config_path>`: Directory containing:

  * `run-config.yaml` (parameters for your chosen command)
  * `dask-config.yaml` (Dask cluster settings)

**Options**:

* `-n, --num-workers N`: Number of Dask workers to launch.

> **Output:** A new directory named `config_path-<YYYYMMDDHHMMSS>` will be created, containing copies of your configs and an `output.log` for monitoring.

---

## Configuration Examples
The following run-config.yaml could be used to run `connected-components`.
### run-config.yaml

```yaml
input_path: /path/to/predictions.zarr/mito/s0
output_path: /path/to/segmentations.zarr/mito
intensity_threshold_minimum: 0.71
minimum_volume_nm_3: 1E7
delete_tmp: true
connectivity: 1
mask_config:
  cell:
    path: /path/to/masks.zarr/cell/s0
    mask_type: inclusive
fill_holes: true
```

### dask-config.yaml

The following `dask-config.yaml` files can be used for a variety of tasks.
#### Local

```yaml
jobqueue:
  local:
    ncpus: 1
    processes: 1
    cores: 1
    log-directory: job-logs
    name: dask-worker

distributed:
  scheduler:
    work-stealing: true
```
#### LSF Cluster

```yaml
jobqueue:
  lsf:
    ncpus: 8        # cores per job chunk
    processes: 12  # worker processes per chunk
    cores: 12      # threads per process (1 thread each)
    memory: 120GB  # 15 GB per slot
    walltime: 08:00
    mem: 12000000000
    use-stdin: true
    log-directory: job-logs
    name: cellmap-analyze
    project: charge_group

distributed:
  scheduler:
    work-stealing: true
  admin:
    log-format: '[%(asctime)s] %(levelname)s %(message)s'
    tick:
      interval: 20ms
      limit: 3h
```

### Submission
To run on 12 dask workers:

**Local run example:**

```bash
connected-components -n 12 config_path
```

**Cluster submit example (LSF):**

```bash
bsub -n 4 -P chargegroup connected-components -n 12 config_path
```

---

## Acknowledgements

The center-finding implementation is taken from [funlib.evaluate](https://github.com/funkelab/funlib.evaluate).
