Skip to article frontmatterSkip to article content

Work with (remote) ome zarr

German BioImaging e.V.

One of the cool things about ome-zarr files is, that it does not make a difference whether a file is stored locally or remotely. This means you can work with ome-zarr files stored on the web, in cloud storage or on your local disk using the same code.

import ngff_zarr as nz
import matplotlib.pyplot as plt
import dask.array as da
from skimage import filters
url = r'https://s3.zih.tu-dresden.de/joso140h:2309xx23ds/cells3d.ome.zarr/'

ngff_image = nz.from_ngff_zarr(url)
ngff_image
Multiscales(images=[NgffImage(data=dask.array<from-zarr, shape=(2, 60, 256, 256), dtype=uint16, chunksize=(1, 15, 128, 128), chunktype=numpy.ndarray>, dims=['c', 'z', 'y', 'x'], scale={'c': 1.0, 'z': 0.5, 'y': 0.5, 'x': 0.5}, translation={'c': 0.0, 'z': 0.0, 'y': 0.0, 'x': 0.0}, name='image', axes_units={'c': None, 'z': None, 'y': None, 'x': None}, axes_orientations=None, computed_callbacks=[]), NgffImage(data=dask.array<from-zarr, shape=(2, 30, 128, 128), dtype=uint16, chunksize=(1, 15, 64, 128), chunktype=numpy.ndarray>, dims=['c', 'z', 'y', 'x'], scale={'c': 1.0, 'z': 1.0, 'y': 1.0, 'x': 1.0}, translation={'c': 0.0, 'z': 0.25, 'y': 0.25, 'x': 0.25}, name='image', axes_units={'c': None, 'z': None, 'y': None, 'x': None}, axes_orientations=None, computed_callbacks=[]), NgffImage(data=dask.array<from-zarr, shape=(2, 15, 64, 64), dtype=uint16, chunksize=(2, 15, 64, 64), chunktype=numpy.ndarray>, dims=['c', 'z', 'y', 'x'], scale={'c': 1.0, 'z': 2.0, 'y': 2.0, 'x': 2.0}, translation={'c': 0.0, 'z': 0.75, 'y': 0.75, 'x': 0.75}, name='image', axes_units={'c': None, 'z': None, 'y': None, 'x': None}, axes_orientations=None, computed_callbacks=[])], metadata=Metadata(axes=[Axis(name='c', type='channel', unit=None, orientation=None), Axis(name='z', type='space', unit=None, orientation=None), Axis(name='y', type='space', unit=None, orientation=None), Axis(name='x', type='space', unit=None, orientation=None)], datasets=[Dataset(path='scale0/cells3d', coordinateTransformations=[Scale(scale=[1.0, 0.5, 0.5, 0.5], type='scale'), Translation(translation=[0.0, 0.0, 0.0, 0.0], type='translation')]), Dataset(path='scale1/cells3d', coordinateTransformations=[Scale(scale=[1.0, 1.0, 1.0, 1.0], type='scale'), Translation(translation=[0.0, 0.25, 0.25, 0.25], type='translation')]), Dataset(path='scale2/cells3d', coordinateTransformations=[Scale(scale=[1.0, 2.0, 2.0, 2.0], type='scale'), Translation(translation=[0.0, 0.75, 0.75, 0.75], type='translation')])], coordinateTransformations=None, omero=None, name='image', type='itkwasm_gaussian', metadata=MethodMetadata(description='Smoothed with a discrete gaussian filter to generate a scale space, ideal for intensity images. ITK-Wasm implementation is extremely portable and SIMD accelerated.', method='itkwasm_downsample.downsample', version='1.8.0')), scale_factors=None, method=<Methods.ITKWASM_GAUSSIAN: 'itkwasm_gaussian'>, chunks=None)

Visualizing remote data

Let’s look at some of the raw data in the highest-resolution level of the ome-zarr object.

ngff_image.images[0].data
Loading...
fig, ax = plt.subplots()
ax.imshow(ngff_image.images[0].data[0, 30, :, :], cmap='gray')
<Figure size 640x480 with 1 Axes>

All of this works without having to download any of the data on our local disk! Let’s try this with a bigger dataset stored remotely:

url = r'https://s3.zih.tu-dresden.de/joso140h:2309xx23ds/20250721_selectedPos_1_Focus_Plane_order2_1.ome.zarr/'

ngff_image = nz.from_ngff_zarr(url)
ngff_image.images[0].data
Loading...

This is almost 1 GB of data and we can still access it lazily over the network without downloading the entire dataset. Let’s visualize a part of it using matplotlib.

fig, ax = plt.subplots()
ax.imshow(ngff_image.images[0].data[5000:6000, 5000:6000], cmap='gray')
<Figure size 640x480 with 1 Axes>

Processing

Viewing is one thing, but in practice we much rather need to process (filter, segment, etc) images. Surely we’ll have to load the data locally for that, right? Luckily, dask in conjunction with remote, lazily loaded ome-zarr data gives us a way out.

We can observe two things:

  • the operation executes blazingly fast because no data is loaded yet
  • the resulting filtered_image is a dask array
def threshold_otsu(image):
    from skimage.filters import threshold_otsu
    thresh = threshold_otsu(image)
    binary = image > thresh
    return binary
binarized = da.map_blocks(threshold_otsu, ngff_image.images[0].data)
binarized
Loading...

Let’s look at the output of binarized to confirm this.

fig, ax = plt.subplots()
ax.imshow(binarized[5000:6000, 5000:6000], cmap='gray')
<Figure size 640x480 with 1 Axes>