Install and usage

omidb is a Python 3 command-line interface and package for parsing and interacting with the OPTIMAM Mammography Image Database. Unless you have authorised access to the official database, it is assumed that you have downloaded the database (most likely a subset of it) via the OMI-DB Sync Tool. For an overview of the database, see Database structure.

CLI

A simple command-line interface (CLI), omidb, has been developed to automate useful data extraction tasks commonly implemented by the hands of researchers working with the database.

The CLI is currently limited to one (very useful) command, summarise, which can be applied to your local copy of OMI-DB:

omidb summarise <path-to-omidb> <path-to-output-csv-file>

This will pass over the JSON data (within the data directory of OMI-DB), and generate a CSV file that summarises the database, at the image level. For example, the majority of images will be associated with a series, medical procedure, study, NBSS episode and client. The command also extracts a few useful DICOM tags, such as the manufacturer of the device, and the intent of presentation. This does not require access to the DICOM images themselves.

The --clients-file <my-client-list.txt> option can be added to specify a list of clients to parse, rather than traversing the entire database. It should point to the path of a text file holding one line per client:

# my-client-list.txt
demd1
demd2

The omidb package logger provides detailed information about the parsing process, e.g. studies that can’t be linked to an event, so, if interested, we recommend you route logging to a file by adding the --log-file <path-to-log-file> option.

Package Usage

Import the package:

>>> import omidb

The following code iteratively parses clients demd8482 and demd11022 from the database:

>>> db = omidb.DB('./OMI-DB', clients=['demd8482', 'demd11022'])
>>> clients = [client for client in db]
>>> [print(client.id) for client in clients]
demd11022
demd8482

The hierarchical structure of OMI-DB is modelled by nested objects:

>>> clients[0].episodes[0].studies[0].series[0].images[0].marks

NBSS data attributes are available through class members:

>>> print(clients[0].classification.value)
Noraml
>>> print(clients[0].episodes[0].value)
RR

Access dicom properties for an image (using pydicom):

>>> print(clients[0].episodes[0].studies[0].series[0].images[0].dcm.PresentationIntentType)
FOR PRESENTATION

or via provided JSON representations of the DICOM headers (so no need for the DICOMs themselves):

>>> print(clients[0].episodes[0].studies[0].series[0].images[0].attributes['00080068'])
{'vr': 'CS', 'Value': ['FOR PRESENTATION']}

Plot individual images (via matplotlib) and images within a series:

>>> clients[0].episodes[0].studies[0].series[0].images[0].plot()
>>> clients[0].episodes[0].studies[0].series[0].plot()

Use FilterImages to perform inplace, recursive dicom property filtering over images:

>>> image_filter = omidb.filters.FilterImages.dicom_filter(
    {'PresentationIntentType': ['FOR PROCESSING']})
>>> image_filter(clients[0])  # In-place filtering

See omidb for API documentation.

Installing

You will need version >=3.7 of Python.

For the CLI only, we recommend pipx:

pipx install omidb

To install the package in your project:

poetry add omidb

or:

pip install omidb

For development:

git clone https://bitbucket.org/scicomcore/omi-db.git
poetry install --dev

To build the documentation:

cd ./docs
poetry run make html