Middle ear OCT/CT image co-registration and fusion

Combine the best of two modalities for better middle ear imaging

High resolution computed tomography (CT) imaging is a mainstay of modern middle ear diagnostics, long serving as the preferred non-invasive visualization modality for diagnosing most middle ear pathologies and for surgical planning. Middle ear optical coherence tomography (ME-OCT) is an emerging, point-of-care diagnostic technology capable of providing volumetric images of the middle ear space through the intact tympanic membrane without exposing the patient to ionizing radiation. These two imaging modalities are complementary in several respects.

CT enables visualization of osseous structures anywhere in the middle ear without suffering from shadowing artefacts. However, CT is non-realtime, exposes the patient to ionizing radiation, has difficulty resolving fine soft tissue structures and delivers lower resolution than optical imaging. In contrast, ME-OCT images using light and so does not expose the patient to ionizing radiation. ME-OCT provides video-rate real-time images, and can resolve fine soft tissue structures like the tympanic membrane. However, being a line-of-sight modality, ME-OCT imaging can be obstructed by bone or soft tissue thicker than approximately 1mm and so it is limited to the field of view visible through the tympanic membrane. ME-OCT also exhibits artefacts from shadowing and multiple scattering that do not affect CT images. Both technologies are capable of producing images with high geometric fidelity to the patient anatomy and so CT and OCT images of the same anatomy can be co-registered through rigid transformations.


One might naturally contemplate the possibility of harnessing the merits of two imaging modalities. This process, known as imaging co-registration and fusion, combines the wide FOV and excellent bony delineation of CT with the high spatial resolution and soft tissue contrast of OCT to generate better ME images than with either modality alone. Generally, image registration involves establishing a mathematical relationship between two related images, while fusion concentrates on presenting the combined image intuitively for clear interpretation.

This technique has significantly improved clinical Hearing diagnostics by combining images from different modalities (i.e., CT and MRI) or varying fields of view. It is particularly beneficial in tasks like evaluating cochlear implant (CI) placement, localising cholesteatoma, and diagnosing otitis media. Beyond otology, it has also proven useful in a wider medical practice.

Case selection and high-resolution CT acquisitionn

Three cases with conductive hearing loss

In our interactive ME-OCT atlas, we present a superimposed comparison of volumetric images taken with both OCT and high-resolution clinical CT in the same patient's ear. The co-registered and fused CT and OCT images allow for a direct assessment of each modality's strengths and limitations. We showcase images from three cases from patietns provided by the courtesy of Dr. David P Morris and Dr. Nael Shoman: a normal ear, an ear with traumatic injury, and an ear with chronic cholesteatoma. Each patient was refered to our clinic for conductive hearing loss and had already received high-resolution CT scans as a part of their standard medical care. The study was performed at a hospital tertiary otology clinic under approval from the hospital’s research ethics board. Informed consent was obtained in accordance with the Helsinki Declaration. High-resolution CT images without contrast were acquired by scanning the head axially from the skull vertex to the base. Details of the CT scanning configurations are summarized here.


ME-OCT acquisition

Point-of-care, real-time structural visualization

Prior to OCT measurement, patients had their ear canals cleared of any cerumen or other debris. With patient and clinician both sitting, as shown in middle ear tour with OCT, the clinician inserted the OCT handpiece speculum into the ear canal, centered the image on the tympanic membrane (TM) under otoscopic guidance and pressed a foot pedal button to initiate the collection of volumetric data. During acquisition, the OCT beam was scanned over the TM surface in a spiral fashion to collect 262,000 image lines distributed roughly uniformly over the TM in 4.5 seconds. Data was processed offline using a custom offline denoising algorithm to remove artifacts and noise within the OCT and corrected the geometrical distortion introduced during the acquisition and then exported to DICOM format.

Image fusion and evaluation

Provide clinical benefits without interrupting existing workflow

The OCT and CT DICOM datasets were then imported into the software package 3D Slicer for co-registration and fusion. Processing consisted of the following steps:

  1. The CT volume was original to create a region of interest (ROI) that contained the OCT volume.
  2. Non-isotropic voxels from the CT dataset were converted to isotropic voxels.
  3. The OCT amplitude data was log-compressed and mapped to a 0-255 greyscale range.
  4. Align the orientation, including the axial, coronal, and sagittal views of OCT to those of CT with rigid transformation in 3D slicers by matching corresponding structures within both sets of images.
  5. A semi-automated, intrinsic, rigid registration was performed using the SlicerIGT module (fiducial registration wizard) of 3D Slicer to co-register the OCT and CT data sets. Pairs of anatomical landmarks (≥3) which were visible in both CT and OCT images were manually identified.
  6. Fused images were generated by overlaying semi-transparent color OCT images onto the greyscale CT images.

For each ear, we independently co-registered CT and OCT images six times. We estimated the registration accuracy using the root mean square error (RMSE) between the selected landmark pairs that are visible in both CT and OCT images. The reproducibility of co-registration was quantified using the Hausdorff distance and Dice similarity coefficient (DSC) of the six independently generated fused images with the results shown below.

co-registration accuracy and reproducibility (click table to enlarge)

We began the registration process by identifying corresponding anatomical landmarks in both CT and OCT images. Subsequently, we aligned the axial, coronal, and sagittal planes of CT and OCT images by fixing CT orientation while adjusting the OCT orientation through rigid transformation, so that these landmarks were consistently positioned across the same planes in their respective images. The landmarks used were the scutum, annulus, umbo and prominent features on the cochlear promontory, as these structures exhibit a high level of similarity between CT and OCT in both coronal and axial views. For each patient, six independent co-registrations were performed, yielding an average Root Mean Square Error (RMSE) of \(0.100 \pm 0.035\) mm between the landmark pairs across co-registrations. Following threshold-based segmentation of the fused images per patient, the average maximum Hausdorff distance across the six co-registrations was calculated to be \(0.951 \pm 0.228\) mm, whereas the average Hausdorff distance was \(0.045 \pm 0.011\) mm. The Dice Similarity Coefficient (DSC) for the segmented volumes registered at \(0.747 \pm 0.0061\) mm . The high similarity between the segmentations indicates this method can reliability fuse the temporal bone CT with middle ear OCT images.