\FAILED\FAILED
Yusheng Zhou Hao Li Jianan Liu Zhengmin Kong Tao Huang \IEEEmembershipSenior Member, IEEE Euijoon Ahn
Zhihan Lv \IEEEmembershipSenior Member, IEEE Jinman Kim \IEEEmembershipMember, IEEE and David Dagan Feng \IEEEmembershipLife Fellow, IEEEThis work was accepted by IEEE Journal of Biomedical and Health Informatics (DOI: 10.1109/JBHI.2024.3444771).YushengZhou and HaoLi contribute equally to the work and are co-first authors. Corresponding Author: ZhengminKongYushengZhou and ZhengminKong are with the School of Electrical Engineering and Automation, Wuhan University, China.(Email: yushengzhou@whu.edu.cn; zmkong@whu.edu.cn)HaoLi is with the Department of Neuroradiology, University HospitalHeidelberg, Heidelberg, Germany. (email: hao.li@med.uni-heidelberg.de)JiananLiu is with Vitalent Consulting,Gothenburg, Sweden. (Email: jianan.liu@vitalent.se)TaoHuang and Euijoon Ahn are with the College of Science and Engineering,James Cook University, Cairns, Australia.(Email: tao.huang1@jcu.edu.au; euijoon.ahn@jcu.edu.au)Zhihan Lv is with theDepartment of Game Design, Faculty of Arts, Uppsala University, Sweden(Email: lvzhihan@gmail.com)Jinman Kim and David Dagan Feng are with the School of Computer Science, University of Sydney, Sydney, Australia (email: jinman.kim@sydney.edu.au; dagan.feng@sydney.edu.au).
Abstract
Motion artifacts compromise the quality of magnetic resonance imaging (MRI) and pose challenges to achieving diagnostic outcomes and image-guided therapies. In recent years, supervised deep learning approaches have emerged as successful solutions for motion artifact reduction (MAR). One disadvantage of these methods is their dependency on acquiring paired sets of motion artifact-corrupted (MA-corrupted) and motion artifact-free (MA-free) MR images for training purposes. Obtaining such image pairs is difficult and therefore limits the application of supervised training. In this paper, we propose a novel UNsupervised Abnormality Extraction Network (UNAEN) to alleviate this problem. Our network is capable of working with unpaired MA-corrupted and MA-free images. It converts the MA-corrupted images to MA-reduced images by extracting abnormalities from the MA-corrupted images using a proposed artifact extractor, which intercepts the residual artifact maps from the MA-corrupted MR images explicitly, and a reconstructor to restore the original input from the MA-reduced images. The performance of UNAEN was assessed by experimenting with various publicly available MRI datasets and comparing them with state-of-the-art methods. The quantitative evaluation demonstrates the superiority of UNAEN over alternative MAR methods and visually exhibits fewer residual artifacts. Our results substantiate the potential of UNAEN as a promising solution applicable in real-world clinical environments, with the capability to enhance diagnostic accuracy and facilitate image-guided therapies. Our codes are publicly available at https://github.com/YuSheng-Zhou/UNAEN.
{IEEEkeywords}
Magnetic Resonance Imaging, Motion Artifact Reduction, Unsupervised Learning, Domain Adaptation, Explicit Abnormality Extraction.
1 Introduction
\IEEEPARstart
Magnetic resonance imaging (MRI) is the most widely-used non-invasive medical imaging technique without radiation exposure. However, limited by the long acquisition time, e.g., 20-25 minutes for the whole brain, MRI is highly sensitive to the patient’s movement [1], and motion artifacts (MA) are frequently unavoidable. The MA is caused by an incorrect signal acquired and filled in the K-space, resulting in blurriness or ghosting artifacts. MA in MRI poses a substantial detriment to the entire image quality, impeding accurate interpretation of diagnostic information and hindering the efficacy of image-guided therapeutic interventions [1]. To tackle this problem, various methods have been proposed to prevent the patient movement and/or correct for MA [2, 3, 4, 5, 6, 7, 8]. Methods to prevent MA involves additional tools such as MR navigators [9, 10], or external tracking devices [11]. Another approach is online motion measurement based on, e.g., the extended Kalman filter algorithm or orientation short tracking pulse-sequence, thereby compensating or reacquiring the K-space data partially in the case of extreme motion [5, 6] in a prospective manner. Although MA may be mitigated with these methods, they have not been widely used due to prolonged scan time and additional costs. Besides, statistical signal processing-based artifact correction is commonly utilized because of no prolonged scan time. Atkinson et al. [7] applied an entropy-based criterion to an MR image to correct motion-induced artifacts. Batchelor et al. [8] proposed a general matrix that mapped the transformation from the artifact-free image to the corrupted image, and then used the inversed mapping to reconstruct the artifact-free image. These techniques used prior statistical information to explore the influence of motion on the K-space data and build models to reconstruct motion artifact-free (MA-free) images. However, these methods still have limited capabilities due to the complexity and unpredictability of patients’ movements[7, 8].
In recent studies, supervised deep learning techniques have been proposed for motion artifact reduction (MAR) in MRI, offering advantages such as avoiding additional acquisition equipment or prolonged scan time and achieving better performance [12, 13, 14]. However, these methods heavily rely on a large number of training samples, where motion MA-free images serve as the reference ground truth to discern the disparities between motion artifact-corrupted (MA-corrupted) and MA-free images. These approaches have demonstrated superior performance compared to traditional statistical signal processing methods. However, the MA-corrupted images used in these studies are synthetically generated, and the acquisition of authentic paired MA-corrupted and MA-free images within clinical settings is exceedingly rare [15], similar to the challenges encountered in acquiring paired high-resolution and low-resolution image pair in the super-resolution task [16, 17, 18]. In clinical practice, the acquisition of images typically involves repeating measurements only when MA is evident. However, it should be noted that repeating the measurement does not guarantee the acquisition of MA-free images. Consequently, the availability of paired MA-corrupted and MA-free image sets is limited. Furthermore, patient movement during the imaging process can result in misalignment between images acquired from different series. These misaligned images are unsuitable for training, as their utilization can lead to highly blurred reconstructed images [17, 18]. Notably, existing image registration algorithms currently available cannot effectively rectify such misalignments [17, 19].
To overcome the issue of limited paired training data, unsupervised learning methods have been leveraged as a possible solution as they can be trained without paired images. The well-recognized unsupervised learning in various computer vision tasks provides us with possible solutions to the aforementioned problems [20, 21, 22, 23]. Different from supervised methods, unsupervised learning can find hidden patterns or features from data without requiring feedback information from the ground truth and does not rely heavily on prior knowledge of the dataset. Specifically, several recent unsupervised learning methods have been implemented in medical image processing and shown promising results with unpaired training data on tasks that are similar to the MAR, such as ISCL [24] and UIDnet [25] for medical image denoising in MRI and CT, ADN proposed by Liao et al. [26] for computed tomography (CT) metal artifact reduction, and CycleGAN [27] proposed by Zhu et al. for understanding images style transfer. These methods use the mechanism of domain transfer, where they implicitly convert the images from one domain to another domain. In an implicit conversion, the network learns the pattern of MA in the source domain and converts the feature of the target domain to approach the source domain. The specific abnormal patterns in the target domain are not tackled and thus the network could be distracted by the other non-critical features.
As opposed to implicit methods, the proposed explicit method guided the network to focus on the MA pattern and extract the MA pattern from the MA-corrupted images, resulting in a more powerful representation learning ability of motion artifacts. Several previous studies have preferred to extract essential patterns explicitly. Tamada et al. [28] utilized a neural network to estimate the artifacts from the input images. Rai et al. [29] used the dictionary learning-based and residual learning-based methods to explicitly extract the noise from MRI/CT patches, and then preserved the noise characteristics by averaging them. Xiao et al. [30] achieved 3D MRI super-resolution via directly learning the residual volume between the input and target using a modified U-Net. As a similar task to these studies, a motion artifact reduction network with explicit artifact extraction can be expected to achieve superior performance than the implicit methods.
To address the problems mentioned above, we proposed an unsupervised MRI artifact reduction framework with explicit artifact extraction. Specifically, given its property, we regarded the motion artifacts in MRI as a separable abnormality independent to the image content, where a network was applied to directly learn its representation from the MA-corrupted image, explicitly extracting the artifact residues. And then motion artifact reduction was achieved by simply subtracting the extracted artifacts from MA-corrupted images. The model equips with a cycle consistency and is trained with unpaired MA-free and MA-corrupted MR images and produces high-quality MA-corrected MR images. The contributions of this work are summarized as follows:
We proposed an unsupervised abnormality extraction network (UNAEN) to extract and remove MA by learning deep feature differences between unpaired MA-free images and MA-corrupted images, which are impractical to obtain in the real-world clinical environment.
Different from the existing methods, UNAEN aimed to explicitly extract abnormal MA information for improving the model’s representation ability of motion artifacts, and corrected the abnormal information from the images. As a result, the feature distribution of MA-reduced images approached that of the MA-free images.
Experimental results showed that compared with other state-of-the-art unsupervised methods, our method obtained improved performance and generated images with superior quality.
2 RELATED WORK
2.1 Deep learning-based Motion Artifact Reduction
Because of the great prosperity of deep learning in the field of computer vision, deep learning-based retrospective MAR schemes (especially convolutional neural networks, CNN) have been widely investigated. The CNN model can be trained with the MA-corrupted images as input and the MA-free images acquired from the same patient as ground truth. Johnson et al., as one of the pioneers using deep learning for MA correction, applied the deep neural network (DNN) to reconstruct the MA-corrected MR image from the MA-corrupted k-space [13]. Han et al. proposed a denoising algorithm based on U-net for the streak artifacts removal, which is induced in the radially acquired images [12]. Sommer et al. utilized a CNN to the extracted MA-only image, which was obtained by subtracting the MA-free image from the MA-corrupted image, resulting in less deformation [14]. However, in most instances, it is very difficult to obtain paired MRI datasets for training neural networks. Although several motion simulation algorithms have been designed to solve this problem by generating synthetic MA-corrupted images from MA-free images using certain predefined movement patterns [31, 28, 32], such approaches may incur a domain gap between the real and synthetic MA-corrupted images. The possible domain gap would downgrade the performance of the network, which is trained on simulated data but applied to real data [16, 17, 18].
2.2 Unsupervised Image-to-Image Translation
Most of the low-level computer vision tasks, e.g. denoising, super-resolution, MAR, etc., can be considered image-to-image translation, which converts images from one domain to another. In recent years, some training strategies based on unpaired images have attracted much attention. Deep Image Prior (DIP) [33] demonstrated the randomly initialized network can generate a feasible hand-crafted prior for image denoising task. However, the disadvantage is the high consumption of resources for iterative computation for each image. Noise2Noise (N2N) [34] and Noise2Void (N2V) [35] only used noisy images to train a CNN denoiser. Although a satisfactory denoising effect can be achieved without noisy-clean image pairs, the global distribution of the noise is still required to choose the applicable loss functions. Recently, generative adversarial network (GAN) [20] had shown great potential in image generation and representation learning, which was derived with many variations for different tasks. The GCBD [36] proposed by Chen et al. illustrated that GAN can train to estimate the noise distribution of the noisy images. UIDnet [25] applied a conditional GAN (cGAN) [37] to generate clean-pseudo noisy pairs for training a denoising network. CycleGAN [27, 38] is a cyclic symmetric network consisting of two generators and two discriminators, which is mainly used for domain adaption. Cycle-MedGANv2[38] improved CycleGAN by introducing two new cyclic feature-based losses (the cycle-perceptual loss and the cycle-style loss) to ensure cycle consistency and was applied to the rigid MR motion artifacts correction task. ISCL [24] added an extra network on the basis of CycleGAN to cooperate with the generators and estimate the noise distribution. By combining the generative model and disentanglement network, ADN [26] constructed multiple encoders and decoders to separate the contents and artifacts in the CT images and get comparable results with supervised learning. As a common basis of the methods mentioned above, GAN is one of the most promising techniques at present to handle the distribution of complex data, whose studies have accumulated solid fundamental knowledge.
3 PROPOSED METHOD
3.1 Network Architecture
Inspired by the cycle consistency of CycleGAN [27], the UNAEN framework contains two modules: a forward module for artifact reduction and a backward module for artifact restoration as shown in Fig.1. The forward module incorporates an artifact extractor, denoted as , which is responsible for learning the artifact distribution within the MA-corrupted MR images. In parallel, the backward module employs an artifact reconstructor, denoted as , to restore the corresponding original input based on the output generated by the forward module. The and are both generators of UNAEN. To train the generators, we employ and as discriminators in the forward and backward modules to distinguish between an MA-corrupted image and an MA-free image.
In the training process, unpaired images are used, where and represent the MA-corrupted image patch and MA-free image patch, respectively. The MA-corrupted MR image is fed into to extract the artifact map , which affected the texture information of the images. The forward module generates the corresponding MA-reduced image by subtracting from :
(1) |
is used to restore the generated and output the restored MA-corrupted image , ensuring the forward module to translate an instance into a counterpart rather than any instance:
(2) |
There is a cycle consistency between and and they are expected to be identical. Since and are unpaired and only have similar content, a forward discriminator is applied to identify the generated image and the real MA-free image . To promote the reconstruction ability of , we train a backward discriminator to distinguish between the original input and restored MA-corrupted image . Therefore, the generators aim to generate samples that approximate the real data, while discriminators are not deceived by the output of the generators.
In this proposed framework, the generators and discriminators need to be trained alternately. In the inference step, only the trained is required. The MA-reduced images are obtained as long as the residual artifact maps are extracted by the from the corresponding MA-corrupted inputs.
The structures of generators and discriminators are shown in Fig.2. The backbone of the generator is built by the Residual Channel Attention Network (RCAN) [39, 40] with a depth of 5 residual groups (RG). And each RG has 5 residual channel attention blocks (RCAB). We set the number of feature channels to 64. It is worth mentioning that the long-term connection existing between MA-corrupted image and extract MA map is not included in the structure of generator , which is different from the general residual learning technique, since residual learning focuses on the missing information of source image compared to target image and supplementing while UNAEN explicitly learns the redundant artifact components in the MA-corrupted image and then subtracts it. Hence, simply learns artifact representations. The discriminators are built with convolutional units, each unit consists of a 33 convolutional layer and a leaky rectified linear unit (leaky ReLU) activation layer [41]. The size of the feature map is reduced by half after every two convolution layers. All but the first unit have a batch normalization layer [42]. Similarly, the number of feature channels is set to 64 in the first convolutional layer of the discriminator and doubled after every two convolutional layers.
3.2 Loss Functions
In this design, three loss functions are selected and used in image restoration, which includes the L1 loss, Structural Similarity Index Measure (SSIM) loss [43, 44], and adversarial loss in the training:
(3) |
(4) |
(5) |
where can be or . SSIM is an indicator to quantify the similarity between two digital images. Eq.(10) shows the calculation of SSIM. In addition, we use the least square loss [45] as the adversarial loss in our model instead of the negative log-likelihood [20] for stabilizing the training procedure.
To train , a discriminator is used to classify the MA-reduced output as an MA-free image. The adversarial loss function is as follow:
(6) |
To train , we use a discriminator , which classifies the restored MA-corrupted result as the original MA-corrupted image. The following adversarial loss function is used to train the :
(7) |
Moreover, we adopt the cycle consistency loss to restrain the restoration of . It is calculated as a weighted sum of L1 loss and SSIM loss between the inputs and reconstruction images:
(8) |
where is the weight of SSIM loss. We set = 0.5 in our experiments.
The final objective function that optimizes the and networks can be represented as:
(9) |
where and are the weights of the adversarial losses of and , respectively. Given that dominates consistent improvement in the network’s performance as a training supervisor and stabilize the training process, and were empirically set to 0.1 to balance the loss components, and achieve the most stable convergence and best performance in our experiments.
3.3 Motion Simulation
We adopt the method proposed by Li et al. [46] to simulate the motion in MR images. Splicing lines from multiple K-space is used to simulate the generation of real motion artifacts. Firstly, a group of images is generated from the original images by rotating them in specific directions and to specific degrees as done in [46]. The duration and frequency of motion for any movement pattern control the severity of the artifact. The original images and the generated images are transformed to K-space using FFT, and K-space segments of the original image are replaced with segments from the generated images’ K-spaces, according to predefined rotation directions and rotation degrees. Finally, the damaged original K-space data is transferred back to the image domain by iFFT to obtain the simulation MA- corrupted MR image. In the motion simulation process, we use the echo group (EG) as the minimum time period unit to obtain a certain number of successive echoes, and the duration of any action is an integer multiple of EG. To simulate the motion of the patient’s head, we set the original images to be rotated 5 degrees to the left and to the right in the plane. Specifically, we use the K-space segments of the rotated images to periodically replace the K-space segments of the original image from the center lines to the edge lines.
4 EXPERIMENTS AND RESULTS
4.1 Dataset Description
In this study, the fastMRI brain dataset [47, 48] was used to evaluate our method. It includes 6970 fully sampled brain MRIs (3001 at 1.5T and 3969 at 3T) collected at NYU Langone Health on Siemens scanners using T1-weighted, T2-weighted, and FLAIR acquisitions. Some of the T1-weighted acquisitions included admissions of contrast agents. We randomly selected 5000 images from the T1 weighted slices with 3T field strength, the matrix size of the images is 320320.
We also conduct another experiment on the BraTS dataset [49, 50, 51], which consists of images from 369 participants with brain tumors. The contrast-enhanced T1w (T1CE) MR images from the BraTS dataset are chosen in this study. The T1CE images are acquired in the axial plane, the matrix size was 240240155 with an isotropic resolution of 1.0 mm. We apply the same data processing flow and configurations as the fastMRI to simulate motion artifacts.
In our experiments, the slices without anatomical structures are discarded. All selected images are corrupted from the K-space by using the motion simulation algorithm (Section 3.3). Specifically, 1 EG contains 10 echos, and the movement interval is set to 3 EG, 6 EG, and 9 EG, resulting in a K-space corrupted line ratio of 75%, 60%, and 50%, respectively. Then the dataset was divided into training, validation, and test sets. Among them, the fastMRI dataset was split based on images while the BraTS dataset based on patients. The unsupervised MAR method only requires unpaired MA-free MR images and MA-corrupted MR images, thus we further divide the training set into two groups. One group contains only MA-free images as a learning target while the other group contains only MA-corrupted images as input to the model. The validation set is used to monitor the networks’ performance during training and the test set to evaluate the networks after training. All of the images are normalized to the range of 0 to 1. To save computation resources, each image is cropped into 128128 patches. After cropping, the numbers of training patches, validation patches and test patches of fastMRI are 36000, 4500 and 4500 respectively, and the numbers of BraTS are 48000, 6000 and 6000 respectively.
4.2 Evaluation Metrics
In order to make a comprehensive comparison, we use SSIM and PSNR as the evaluation metrics in our experiments.
As mentioned in Section 3.2, SSIM can quantify the similarity of two images. It is defined to compare the brightness, contrast, and structure between the MA-reduced output x and the ground truth. The SSIM is in the range of [-1, 1] and a larger value represents a better performance. The specific expression is as below:
(10) |
where and donate the mean and standard deviation of the images, respectively ( donates the covariance of x and y). and are constants.
The PSNR is another widely employed image quality indicator, which represents the ratio between the maximum possible signal value and the interference noise value that affects the signal representation accuracy. It is usually measured in decibels (DB) and a higher value indicates a lower distortion. PSNR can be calculated according to the following formula:
(11) |
(12) |
where is the largest possible pixel value and calculates the mean square error of two images. It is difficult for human eyes to perceive the difference when PSNR exceeds 30.
4.3 Implementation Details
All our experiments were implemented on a workstation with 64GB RAM and two NVIDIA GeForce RTX 2080 Ti graphics cards. Pytorch 1.8.1 was used as the back end. Before each epoch of the training process, all MA-free and MA-corrupted image patches were shuffled. We trained our model for 50 epochs using the ADAM optimizer with , and set the batch size to 4. In each batch, the MA-free patches and the MA-corrupted patches fed to the networks were unpaired. The initial learning rate was set to 0.0001 and dropped by half in every 10 epochs. The generators were trained twice every time the discriminators were trained.
Methods SSIM PSNR explicit w/ 0.9126 30.5387 implicit w/ 0.9087 30.4296 explicit w/o 0.9086 29.8300 implicit w/o 0.9057 29.5269
Dataset Methods =3 EG =6 EG =9 EG SSIM PSNR SSIM PSNR SSIM PSNR fastMRI Before Reduction 0.7981 26.6165 0.8824 30.4109 0.9225 33.4192 UIDnet [25] 0.8551 27.1392 0.9168 30.4248 0.9411 32.5677 Cycle-MedGANv2 [38] 0.8714 27.4449 0.9263 31.1473 0.9559 33.4017 ISCL [24] 0.8958 29.3085 0.9410 32.4944 0.9586 34.4717 DR-CycleGAN [52] 0.9066 30.4468 0.9484 33.2903 0.9621 34.7605 UNAEN (Ours) 0.9126 30.5387 0.9504 33.5448 0.9674 35.9265 BraTS Before Reduction 0.7457 26.4940 0.8281 30.1854 0.8813 33.0922 UIDnet [25] 0.7663 26.3967 0.8376 26.1034 0.9116 30.6354 Cycle-MedGANv2 [38] 0.8613 25.9863 0.8610 28.1681 0.9684 34.1405 ISCL [24] 0.8998 27.9572 0.9531 32.1735 0.9719 34.7336 DR-CycleGAN [52] 0.9091 27.7273 0.9277 29.8630 0.9498 31.0975 UNAEN (Ours) 0.9112 28.2383 0.9665 33.2732 0.9704 34.1600
4.4 Illustration of Motion Artifact Extraction
Figure 3 depicts the motion artifact extractions with varying degrees of artifact severity. The (c) column shows the MA maps, which denote the error between the MA-free images in the (a) column and the corresponding MA-corrupted images in the (b) column. In the proposed method, we trained a residual channel attention network as the artifact extractor to explicitly extract residual artifact maps, which are visualized in the (d) column of Figure 3. The (e) column shows the restored images after MA reduction. The highly consistent patterns between the MA maps and extracted MA maps reveal the effectiveness of the explicit MA extraction. With a certain degree of consistency between the real MA and extracted MA, UNAEN achieved MRI MA reduction with comparable quality to MA-free images by simply subtracting the extracted MA from the MA-corrupted images.
4.5 Ablation Study
We verified the effectiveness of the explicit strategy and with various network configurations on the fastMRI brain dataset. As shown in Table 1, the implicit methods refer to generating MA-reduced images directly by without extracting the MA pattern, whereas the methods with refer to constituting one-cycle consistency in the network. The results show that the metrics of the methods with are better than those of methods without , where the SSIM / PSNR are up to 0.004 / 0.9027 dB higher, and the explicit methods outperform the implicit ones. Besides, solely activating the explicit strategy or the can raise the SSIM value to a comparable extent, and the latter can improve the PSNR value further. The combination of the explicit strategy and achieved the best results in the ablation experiments.
4.6 Comparison with the State-of-the-art Networks
Table 2 shows the comparison between the UNAEN and other SOTA models on two datasets with varying severity of MA. Lower SSIM and PSNR of the MA-corrupted images indicate higher severity of the MA.
The top half of Table 2 shows the experimental results on the fastMRI brain dataset. We observe that the proposed unsupervised model is significantly superior to all comparison unsupervised methods, where the SSIM and PSNR values are up to 0.0575 and 3.3995 dB higher than the other methods. Fig.4 visualizes the artifact reduction effects of different models and shows the qualitative performance on three degrees of artifact severity by displaying the reduction results and corresponding error heat maps compared to ground truth. All five unsupervised methods we compared (UIDnet, Cycle-MedGANv2, ISCL, DR-CycleGAN, and UNAEN) successfully reduce the motion artifact. UIDnet appears to have the weakest artifact reduction ability and its outputs still retain significant artifact traces in the marginal region of the tissue. Similarly, Cycle-MedGANv2 generates blurry images even though it has a higher SSIM and PSNR than UIDnet. ISCL and DR-CycleGAN have improved artifact reduction performance and image quality. However, evident errors on the boundaries of distinct soft tissues are observed in the reduction results, as shown in the error heat maps in Fig.4. More details can be observed in Fig.6. On the contrary, UNAEN achieves higher metrics values and minimizes errors, and with the increase in artifact severity, the performance gap with other methods is larger. In summary, UNAEN outperforms other compared models in terms of overall image quality and feature details in the experiment of the fastMRI brain dataset.
The experimental results on the BraTS dataset are shown in the bottom half of Table 2. UNAEN achieves the best or second-best SSIM and PSNR, which are comparable to the ISCL and outperformed all of the other networks. As to the qualitative comparison visualized in Fig.5 for the motion artifact reduction of the BraTS dataset, UIDnet and ISCL generate blurry images while UNAEN, DR-CycleGAN and Cycle-MedGANv2 generate clear ones. Specifically, DR-CycleGAN showed performance comparable to ISCL and close to UNAEN at the high artifact severity cases, but did not maintain an advantage in processing mild cases, which was just the opposite of Cycle-MedGANv2. Both ISCL and UNAEN have achieved high metrics in the BraTS dataset. It is observed that ISCL sometimes achieved higher metrics values with the BraTS dataset, particularly with mild motion artifact severity. However, the images restored by ISCL are blurry with the detained anatomical structures over-smoothened as shown in Fig.6. Contrarily, the UNAEN-restored images show sharper boundaries between different tissues. This is also observed in the fastMRI dataset in the same figure. Therefore, the higher metrics values are not helpful for the clinical diagnosis from the perspective of visual effects.
Considering that the movement of patients does not exist only in a single plane in practice, we further carried out experiments of MRI inter-plane motion artifact reduction. Specifically, we randomly selected the same number of images from the BraTS dataset as that used in previous in-plane experiments. Then, we simulated artifacts in axial and sagittal planes with =9 EG. The artifact simulation strategy is the same as described in Section 3.3. The experimental results conducted on the newly processed dataset were gathered in Table 3. We can observe that Cycle-MedGANv2 obtains the lowest SSIM and DR-CycleGAN has the lowest PSNR, while UNAEN still maintained its advantage and achieved the highest artifact reduction effect among these methods. To sum up, UNAEN shows an overall superior performance for both in-plane and inter-plane MAR.
Methods SSIM PSNR Before Reduction 0.7439 32.1020 UIDnet [25] 0.7698 31.4149 Cycle-MedGANv2 [38] 0.7637 30.4147 ISCL [24] 0.9045 31.5749 DR-CycleGAN [52] 0.8942 28.8926 UNAEN (Ours) 0.9107 31.6560
To further compare the performance, we then test the floating point operations (FLOPs), the number of network’s parameters (Params) and the inference time (Time) of these comparison methods to demonstrate model efficiency, and the results are shown in Table 4. In our case, the Time we measured is the average inference time of 200 images with a matrix size of 128128. We observe that UIDnet shows the superior efficiency with the second lowest FLOPs, the least Params and the fastest execution speed of 9.14G, 0.56M and 6.47ms respectively, while it performs the worst in motion artifact reduction. In contrast, by achieving the best motion artifact reduction performance, UNAEN shows relatively high FLOPs and inference time.
Methods FLOPs Params Time UIDnet [25] 9.14G 0.56M 6.47ms Cycle-MedGANv2 [38] 33.93G 2.08M 20.05ms ISCL [24] 4.00G 1.26M 6.63ms DR-CycleGAN [52] 17.58G 11.14M 9.48ms UNAEN (Ours) 33.93G 2.08M 19.67ms
5 DISCUSSION
Compared to other proposed methods that use implicit domain transfer approaches, UNAEN’s explicit artifact extraction approach has several advantages. UIDnet trains a cGAN [37] which adds artifacts to clean images in order to generate paired images to train a de-artifacts network under supervision. Due to UIDnet’s non-End-to-End training strategy, more errors will be induced than in other models, limiting the ability to remove artifacts and resulting in the lowest SSIM and PSNR in the experiments. Therefore, significant artifact traces are retained in the image, leading to inaccurate surgery or therapy doses.
As another unsupervised network for domain transfer tasks, Cycle-MedGANv2 can transfer images between different styles. To generate a tighter mapping space, two symmetric generators are used to realize the implicit conversion between the MA-corrupted and the MA-free image domains. However, the experimental results demonstrate that UNAEN outperformed Cycle-MedGANv2 with a big gap, revealing the effectiveness of explicit artifact extraction over the implicit domain transfer.
ISCL is a variation of CycleGAN that adds an additional extractor to cooperate with generators. The generators are responsible for direct conversion between image domains, while the extractor can extract the artifacts. The experimental results in Fig.6 and Table 2 showed that ISCL can further improve the SSIM and PSNR values at the cost of image blurriness.
As another variation of CycleGAN, DR-CycleGAN, a network-intensive method, does not directly transfer images between different domains, but requires specific encoders to extract the artifact features and content features separately, and MA-reduced or original images can be restored by specific decoders through the combination of the extracted features. Although this strategy enhances the disentanglement of artifact and content of MRI, the disadvantages are obvious. The increase in the number of networks makes the training of the model more complicated, and the performance on the data with weak artifact severity or data with entangled artifacts will be significantly limited, as shown in our experimental results on BraTS dataset with =6 EG and =9 EG.
Different from the Cycle-MedGANv2, ISCL and DR-CycleGAN, UNAEN only adopts one cycle consistency and is more stable in training than Cycle-MedGANv2 and generates clearer results than ISCL. The abandonment of redundant training makes the model pay more attention to the artifact removal process, while explicit extraction strategy fits the artifacts reduction task and promotes the representation ability of artifacts. Experimental results demonstrate that our modifications can successfully extract the residual artifacts from the MA-corrupted images and suppress the motion artifact with significantly improved metrics values and enhanced quality of MA-reduced images.
UNAEN shows promising potential in correcting MA to avoid the misrepresentation of anatomical structures in the images. As a result, UNAEN can reduce the artifacts of MRI images and ultimately lead to better patient outcomes through more accurate diagnoses and treatments. Besides, the artifact extraction architecture of UNAEN can be generalized in other aspects of image quality improvement, such as reducing different types of artifacts, deblurring, and denoising. The possibility of these extensions will be further verified in our future work.
Despite the superior artifact reduction effect of UNAEN, we acknowledge some limitations. Firstly, we generated artifacts of brain MRI only through periodic motion patterns, while the movement of patients can be more complex and irregular in real MRI measurements. The performance of the proposed model trained with authentic MA-corrupted and MA-free images remains to be investigated. Besides, although UNAEN outperforms other state-of-the-art methods for MAR in terms of both quantitative metrics and visual quality, it is still an unsupervised learning-based method. It may not perform as well as supervised methods in the evaluation metrics when paired training data are used. Therefore, the network needs further optimization to mitigate the gap between the two types of different feature learning mechanisms.
6 CONCLUSION
In this paper, we proposed an improved GAN-based MRI motion artifact reduction network named UNAEN, which is unsupervisedly trained with unpaired MR images to circumvent the difficulty of obtaining paired MR images. UNAEN considers motion artifacts as the representable abnormality of MA-corrupted images and explicitly extracts and removes it without other transformations, which shows superiorities over the previously used image-to-image translation methods and effectively clears artifact components from MA-corrupted images. Our experimental results show that UNAEN alleviates the problem of lacking paired MA-corrupted and MA-free images and generates higher evaluation metrics and visual quality compared with some baseline models. Therefore, the proposed unpaired deep learning scheme has the potential to revolutionize clinical applications of MR imaging.
References
- [1]M.Zaitsev, J.Maclaren, and M.Herbst, “Motion artifacts in mri: A complexproblem with many partial solutions,” Journal of Magnetic ResonanceImaging, vol.42, no.4, pp. 887–901, 2015.
- [2]A.Stadler, W.Schima, A.Ba-Ssalamah, J.Kettenbach, and E.Eisenhuber,“Artifacts in body mr imaging: their appearance and how to eliminate them,”European radiology, vol.17, pp. 1242–1255, 2007.
- [3]Z.Yang, C.Zhang, and L.Xie, “Sparse mri for motion correction,” in2013 IEEE 10th International Symposium on Biomedical Imaging, 2013.
- [4]P.Noël, R.Bammer, C.Reinhold, and M.A. Haider, “Parallel imagingartifacts in body magnetic resonance imaging,” Canadian Association ofRadiologists Journal, vol.60, no.2, pp. 91–98, 2009.
- [5]N.White, C.Roddey, A.Shankaranarayanan, E.Han, D.Rettmann, J.Santos,J.Kuperman, and A.Dale, “Promo: Real-time prospective motion correction inmri using image-based tracking,” Magnetic Resonance in Medicine,vol.63, no.1, pp. 91–105, 2010.
- [6]M.B. Ooi, S.Krueger, W.J. Thomas, S.V. Swaminathan, and T.R. Brown,“Prospective real-time correction for arbitrary head motion using activemarkers,” Magnetic Resonance in Medicine: An Official Journal of theInternational Society for Magnetic Resonance in Medicine, vol.62, no.4,pp. 943–954, 2009.
- [7]D.Atkinson, D.Hill, P.Stoyle, P.Summers, and S.Keevil, “Automaticcorrection of motion artifacts in magnetic resonance images using an entropyfocus criterion,” IEEE Transactions on Medical Imaging, vol.16,no.6, pp. 903–910, 1997.
- [8]P.Batchelor, D.Atkinson, P.Irarrazaval, D.Hill, J.Hajnal, and D.Larkman,“Matrix description of general motion correction applied to multishotimages,” Magnetic Resonance in Medicine: An Official Journal of theInternational Society for Magnetic Resonance in Medicine, vol.54, no.5,pp. 1273–1280, 2005.
- [9]Z.W. Fu, Y.Wang, R.C. Grimm, P.J. Rossman, J.P. Felmlee, S.J. Riederer,and R.L. Ehman, “Orbital navigator echoes for motion measurements inmagnetic resonance imaging,” Magnetic resonance in medicine, vol.34,no.5, pp. 746–753, 1995.
- [10]K.P. McGee, J.P. Felmlee, A.Manduca, S.J. Riederer, and R.L. Ehman,“Rapid autocorrection using prescan navigator echoes,” MagneticResonance in Medicine: An Official Journal of the International Society forMagnetic Resonance in Medicine, vol.43, no.4, pp. 583–588, 2000.
- [11]J.Maclaren, B.S. Armstrong, R.T. Barrows, K.Danishad, T.Ernst, C.L.Foster, K.Gumus, M.Herbst, I.Y. Kadashevich, T.P. Kusik etal.,“Measurement and correction of microscopic head motion during magneticresonance imaging of the brain,” PloS one, vol.7, no.11, p. e48088,2012.
- [12]Y.Han, J.Yoo, H.H. Kim, H.J. Shin, K.Sung, and J.C. Ye, “Deep learningwith domain adaptation for accelerated projection-reconstruction mr,”Magnetic resonance in medicine, vol.80, no.3, pp. 1189–1205, 2018.
- [13]P.M. Johnson and M.Drangova, “Motion correction in mri using deeplearning,” in Proceedings of the ISMRM Scientific Meeting &Exhibition, Paris, vol. 4098, 2018, pp. 1–4.
- [14]K.Sommer, T.Brosch, R.Wiemker, T.Harder, A.Saalbach, C.S. Hall, and J.B.Andre, “Correction of motion artifacts using a multi-resolution fullyconvolutional neural network,” in Proceedings of the 26th AnnualMeeting of ISMRM, Paris, France Abstract, vol. 1175, 2018.
- [15]G.Oh, J.E. Lee, and J.C. Ye, “Unpaired mr motion artifact deep learningusing outlier-rejecting bootstrap aggregation,” IEEE Transactions onMedical Imaging, vol.40, no.11, pp. 3125–3139, 2021.
- [16]S.Laguna, R.Schleicher, B.Billot, P.Schaefer, B.McKaig, J.N. Goldstein,K.N. Sheth, M.S. Rosen, W.T. Kimberly, and J.E. Iglesias,“Super-resolution of portable low-field mri in real scenarios: integrationwith denoising and domain adaptation,” in Medical Imaging with DeepLearning, 2022.
- [17]J.Liu, H.Li, T.Huang, E.Ahn, K.Han, A.Razi, W.Xiang, J.Kim, and D.D.Feng, “Unsupervised representation learning for 3-dimensional magneticresonance imaging super-resolution with degradation adaptation,” IEEETransactions on Artificial Intelligence, pp. 1–14, 2024, doi:10.1109/TAI.2024.3397292.
- [18]H.Zhou, Y.Huang, Y.Li, Y.Zhou, and Y.Zheng, “Blind super-resolution of 3dmri via unsupervised domain transformation,” IEEE Journal ofBiomedical and Health Informatics, vol.27, no.3, pp. 1409–1418, 2023.
- [19]C.Komninos, T.Pissas, B.Flores, E.Bloch, T.Vercauteren, S.Ourselin,L.DaCruz, and C.Bergeles, “Intra-operative oct (ioct) image qualityenhancement: a super-resolution approach using high quality ioct 3d scans,”in Ophthalmic Medical Image Analysis: 8th International Workshop, OMIA2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27,2021, Proceedings 8.Springer, 2021,pp. 21–31.
- [20]I.Goodfellow, J.Pouget-Abadie, M.Mirza, B.Xu, D.Warde-Farley, S.Ozair,A.Courville, and Y.Bengio, “Generative adversarial networks,”Communications of the ACM, vol.63, no.11, pp. 139–144, 2020.
- [21]D.P. Kingma and M.Welling, “Auto-Encoding Variational Bayes,”arXiv e-prints, p. arXiv:1312.6114, Dec. 2013.
- [22]A.Vanden Oord, N.Kalchbrenner, L.Espeholt, O.Vinyals, A.Gravesetal., “Conditional image generation with pixelcnn decoders,”Advances in neural information processing systems, vol.29, 2016.
- [23]L.Dinh, D.Krueger, and Y.Bengio, “NICE: Non-linear IndependentComponents Estimation,” arXiv e-prints, p. arXiv:1410.8516, Oct.2014.
- [24]K.Lee and W.-K. Jeong, “Iscl: Interdependent self-cooperative learning forunpaired image denoising,” IEEE Transactions on Medical Imaging,vol.40, no.11, pp. 3238–3248, 2021.
- [25]Z.Hong, F.Xiaochen, T.Jiang, and J.Feng, “End-to-end unpaired imagedenoising with conditional adversarial networks,” Proceedings of theAAAI Conference on Artificial Intelligence, vol.34, pp. 4140–4149, 042020.
- [26]H.Liao, W.-A. Lin, S.K. Zhou, and J.Luo, “Adn: Artifact disentanglementnetwork for unsupervised metal artifact reduction,” IEEE Transactionson Medical Imaging, vol.39, no.3, pp. 634–643, 2020.
- [27]J.-Y. Zhu, T.Park, P.Isola, and A.A. Efros, “Unpaired image-to-imagetranslation using cycle-consistent adversarial networks,” inProceedings of the IEEE international conference on computer vision,2017, pp. 2223–2232.
- [28]D.Tamada, M.-L. Kromrey, S.Ichikawa, H.Onishi, and U.Motosugi, “Motionartifact reduction using a convolutional neural network for dynamic contrastenhanced mr imaging of the liver,” Magnetic resonance in medicalsciences, vol.19, no.1, pp. 64–76, 2020.
- [29]S.Rai, J.S. Bhatt, and S.K. Patra, “An unsupervised deep learning frameworkfor medical image denoising,” arXiv preprint arXiv:2103.06575, 2021.
- [30]J.Xiao, Z.Li, B.Bilgic, J.R. Polimeni, S.Huang, and Q.Tian, “Srnr:Training neural networks for super-resolution mri using noisy high-resolutionreference data,” arXiv preprint arXiv:2211.05360, 2022.
- [31]K.Pawar, Z.Chen, N.J. Shah, and G.F. Egan, “Suppressing motion artefactsin mri using an inception-resnet network with motion simulationaugmentation,” NMR in Biomedicine, vol.35, no.4, p. e4225, 2022,e4225 NBM-19-0154.R2.
- [32]M.W. Haskell, S.F. Cauley, B.Bilgic, J.Hossbach, D.N. Splitthoff,J.Pfeuffer, K.Setsompop, and L.L. Wald, “Network accelerated motionestimation and reduction (namer): Convolutional neural network guidedretrospective motion correction using a separable motion model,”Magnetic Resonance in Medicine, vol.82, no.4, pp. 1452–1461, 2019.
- [33]D.Ulyanov, A.Vedaldi, and V.Lempitsky, “Deep image prior,” inProceedings of the IEEE conference on computer vision and patternrecognition, 2018, pp. 9446–9454.
- [34]J.Lehtinen, J.Munkberg, J.Hasselgren, S.Laine, T.Karras,M.Aittala, and T.Aila, “Noise2Noise: Learning Image Restorationwithout Clean Data,” arXiv e-prints, p. arXiv:1803.04189, Mar. 2018.
- [35]A.Krull, T.-O. Buchholz, and F.Jug, “Noise2void - learning denoising fromsingle noisy images,” in 2019 IEEE/CVF Conference on Computer Visionand Pattern Recognition (CVPR), 2019, pp. 2124–2132.
- [36]J.Chen, J.Chen, H.Chao, and M.Yang, “Image blind denoising with generativeadversarial network based noise modeling,” in 2018 IEEE/CVF Conferenceon Computer Vision and Pattern Recognition, 2018, pp. 3155–3164.
- [37]M.Mirza and S.Osindero, “Conditional Generative Adversarial Nets,”arXiv e-prints, p. arXiv:1411.1784, Nov. 2014.
- [38]K.Armanious, A.Tanwar, S.Abdulatif, T.Küstner, S.Gatidis, and B.Yang,“Unsupervised adversarial correction of rigid mr motion artifacts,” in2020 IEEE 17th International Symposium on Biomedical Imaging(ISBI).IEEE, 2020, pp. 1494–1498.
- [39]Y.Zhang, K.Li, K.Li, L.Wang, B.Zhong, and Y.Fu, “Image super-resolutionusing very deep residual channel attention networks,” in Proceedingsof the European conference on computer vision (ECCV), 2018, pp. 286–301.
- [40]Z.Lin, P.Garg, A.Banerjee, S.A. Magid, D.Sun, Y.Zhang, L.VanGool,D.Wei, and H.Pfister, “Revisiting rcan: Improved training for imagesuper-resolution,” arXiv preprint arXiv:2201.11279, 2022.
- [41]K.He, X.Zhang, S.Ren, and J.Sun, “Delving deep into rectifiers: Surpassinghuman-level performance on imagenet classification,” in 2015 IEEEInternational Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
- [42]S.Ioffe and C.Szegedy, “Batch normalization: Accelerating deep networktraining by reducing internal covariate shift,” in Internationalconference on machine learning.PMLR,2015, pp. 448–456.
- [43]E.M. Masutani, N.Bahrami, and A.Hsiao, “Deep learning single-frame andmultiframe super-resolution for cardiac mri,” Radiology, vol. 295,no.3, pp. 552–561, 2020, pMID: 32286192.
- [44]Z.Wang, A.Bovik, H.Sheikh, and E.Simoncelli, “Image quality assessment:from error visibility to structural similarity,” IEEE Transactions onImage Processing, vol.13, no.4, pp. 600–612, 2004.
- [45]X.Mao, Q.Li, H.Xie, R.Y. Lau, Z.Wang, and S.PaulSmolley, “Least squaresgenerative adversarial networks,” in Proceedings of the IEEEinternational conference on computer vision, 2017, pp. 2794–2802.
- [46]H.Li and J.Liu, “3D High-Quality Magnetic Resonance Image Restorationin Clinics Using Deep Learning,” arXiv e-prints, p.arXiv:2111.14259, Nov. 2021.
- [47]F.Knoll, J.Zbontar, A.Sriram, M.J. Muckley, M.Bruno, A.Defazio,M.Parente, K.J. Geras, J.Katsnelson, H.Chandarana, Z.Zhang,M.Drozdzalv, A.Romero, M.Rabbat, P.Vincent, J.Pinkerton, D.Wang,N.Yakubova, E.Owens, C.L. Zitnick, M.P. Recht, D.K. Sodickson, and Y.W.Lui, “fastmri: A publicly available raw k-space and dicom dataset of kneeimages for accelerated mr image reconstruction using machine learning,”Radiology: Artificial Intelligence, vol.2, no.1, p. e190007, 2020,pMID: 32076662.
- [48]J.Zbontar, F.Knoll, A.Sriram, T.Murrell, Z.Huang, M.J.Muckley, A.Defazio, R.Stern, P.Johnson, M.Bruno, M.Parente,K.J. Geras, J.Katsnelson, H.Chandarana, Z.Zhang, M.Drozdzal,A.Romero, M.Rabbat, P.Vincent, N.Yakubova, J.Pinkerton,D.Wang, E.Owens, C.L. Zitnick, M.P. Recht, D.K. Sodickson, andY.W. Lui, “fastMRI: An Open Dataset and Benchmarks for AcceleratedMRI,” arXiv e-prints, p. arXiv:1811.08839, Nov. 2018.
- [49]B.H. Menze, A.Jakab, S.Bauer, J.Kalpathy-Cramer, K.Farahani, J.Kirbyetal., “The multimodal brain tumor image segmentation benchmark(brats),” IEEE Transactions on Medical Imaging, vol.34, no.10, pp.1993–2024, 2015.
- [50]S.Bakas, H.Akbari, A.Sotiras, M.Bilello, M.Rozycki, J.S. Kirby, J.B.Freymann, K.Farahani, and C.Davatzikos, “Advancing the cancer genome atlasglioma mri collections with expert segmentation labels and radiomicfeatures,” Scientific data, vol.4, no.1, pp. 1–13, 2017.
- [51]S.Bakas, M.Reyes, A.Jakab, S.Bauer, M.Rempfler, A.Crimi, R.T. Shinohara,C.Berger, S.M. Ha, M.Rozycki etal., “Identifying the best machinelearning algorithms for brain tumor segmentation, progression assessment, andoverall survival prediction in the brats challenge,” arXiv preprintarXiv:1811.02629, 2018.
- [52]F.Pan, Q.Fan, H.Xie, C.Bai, Z.Zhang, H.Chen, L.Yang, X.Zhou, Q.Bao,and C.Liu, “Correction of arterial-phase motion artifacts in gadoxeticacid-enhanced liver mri using an innovative unsupervised network,”Bioengineering, vol.10, no.10, p. 1192, 2023.