Real time off-axis quantitative phase imaging using CUDA

Posted on February 27, 2012 by admin

In the past decade, quantitative phase imaging (QPI) has attracted increasing scientific interest in the area of cell and tissue imaging as it can study structure and dynamics with nanoscale sensitivity and without exogenous contrast agents. Typically, in order to obtain the pathlength map from an acquired interferogram image, QPI involves off-line post processing. In particular, off-axis methods require an unwrapping algorithm to remove the high-frequency spatial modulation. Phase unwrapping is the process of reconstructing the true phase information from the measured wrapped values which are between –π to +π. High throughput, high speed, real-time phase unwrapping is highly desirable in many applications including applied physics and biomedicine. However, to the best of our knowledge, currently there are no phase unwrapping algorithms that allow QPI operation at video rates (i.e., ~30 frames/s).

Off-axis interferometry takes advantage of the spatial phase modulation introduced by the angularly shifted (tilted) reference plane wave and the spatially-resolved measurement allowed by a 2D detector array such as a CCD. Essentially off-axis interferometry is the spatial equivalent of heterodyne detection in the time domain. Compared to phase-shifting methods, off axis-interferometry allows for single shot measurements and, thus, fast acquisition rates.

We demonstrate real time off-axis Quantitative Phase Imaging (QPI) using a phase reconstruction algorithm based on NVIDIA’s CUDA programming model. The phase unwrapping component is based on Goldstein’s algorithm. Fig. 1 illustrates the phase reconstruction procedure in QPI system.


 Fig. 1. Phase reconstruction in QPI system

By mapping the process of extracting phase information and unwrapping to GPU, we are able to speed up the whole procedure by more than 18.8× with respect to CPU processing and ultimately achieve video rate for mega-pixel images. Table 1 compares the run time between the two implementations. The results shown were averaged over 20 images for each image size. Our CUDA implementation also supports processing of multiple images simultaneously. This enables our imaging system to support high speed, high throughput, and real-time image acquisition and visualization.

Table 1: CUDA implementation versus C based sequential implementation

Image Size

CPU/GPU

Phase extraction (ms)

Residue Identification (ms)

Branch cut Placement (ms)

Unwrap (ms)

Total

(ms)

1024×1024 CPU

317.42

43.42

6.74

89.32

460.7

1 frame GPU

5.05

0.58

1.125

10.014

24.55

  Speedup factor

62.86

74.19

5.99

8.92

18.77

1024×1024 CPU

3174.2

434.2

67.4

893.2

4607.4

10 frames GPU

40.486

5.55

1.128

45.285

111.1

  Speedup factor

78.4

78.19

59.71

19.72

41.47

512×512 CPU

71

11

5

16

105

1 frame GPU

2.18

0.2

0.02

1.87

8

  Speedup factor

32.61

55.84

250

8.55

13.13

512×512 CPU

710

110

50

160

1050

10 frames GPU

11.57

1.4

0.02

6.722

26

  Speedup factor

61.37

78.57

2500

23.8

40.38

Clearly, the GPU implementation demonstrates tremendous improvement on run time performance. The total run time for a single 1024×1024 image reduced from an average of 460 milliseconds for the sequential C-code implementation to 24.55 milliseconds on GPU, which is now suitable for video rate. The total run time for a single lower resolution (512×512) image is 8 milliseconds, allowing for much higher image acquisition rates.

We anticipate that in the near future, from the unwrapped phase images, CUDA-based modules will compute in real-time quantitative parameters of the imaged objects, e.g., cell volumes, refractive indices, tissue morphological parameters, etc, useful for both basic biological studies and medical diagnosis.

Related Publication
H. Pham, H. Ding, N. Sobh, M. Do, S. Patel and G.Popescu , Off-axis quantitative phase imaging processing using CUDA: toward real-time applications , Biomed. Opt. Exp., 2 (7), (2011).