HIGH-THROUGHPUT PREPROCESSING PIPELINE FOR RESTING-STATE FMRI

Resting-state functional magnetic resonance imaging (fMRI) examines functional connectivity between brain regions by measuring spontaneous ﬂuctuation of blood-oxygen-level dependent (BOLD) signal while subjects are at rest. Recent studies have revealed that neurodegenerative diseases are associated with abnormalities in resting-state functional connectivity. This thesis work aimed to develop a high-throughput preprocessing pipeline for minimizing spatial and temporal artifacts in resting-state fMRI data. The building blocks and cluster computing capabilities of the pipeline are discussed in detail. In an eﬀort to validate this pipeline, a seed-based analysis was performed on preprocessed data using 3 seeds placed in the posterior cingulate cortex (PCC), supplementary motor area (SMA) and inferior parietal sulcus (IPS). These 3 seeds represent the core components of the default mode network (DMN) and task-positive network (TPN), two of the most commonly examined resting-state networks. Consistent with literature, results indicate that activity in the PCC is functionally correlated to regions of the DMN and anti-correlated to regions of the TPN, whereas activities in the SMA and IPS are functionally correlated to regions of the TPN and anti-correlated to regions of the DMN. These functional connectivity patterns were consistent across scans on both group-level and subject-level. This preprocessing pipeline will support researchers and clinical collaborators of Medical Image Analysis Lab (MIAL) in their analysis of resting-state functional connectivity.


Background and Motivation
Functional Magnetic Resonance Imaging (fMRI) is traditionally used to study brain regions that are active when a task is performed. The experimental design in task-based fMRI typically involves a sequence of "rest" and "task" periods. During "task" period, subjects are asked to perform a certain task or they may be presented with some stimulus.
The "rest" period acts as a control or baseline and subjects are simply asked to be still and let their mind wander. The signal at rest is initially thought to be random. It is only recently that the study of resting-state, a constant condition without any task or stimulus, begins to emerge.
In 1990s, Bharat Biswal, then a PhD student at the Medical College of Wisconsin in Milwaukee, tried to investigate noise sources in the brain by running resting-state scans, in the hope that it would improve interpretations of the signals from tasks. But when he analyzed the data, he discovered that the left and right sensorimotor cortices had significant correlation during rest (Biswal, 2012). This implied that the brain maintained some meaningful functional activity even when subjects were not cognitively engaged. Although his work was not well received in the beginning, it demonstrated for the first time the phenomenon, in which fMRI signal collected during rest, showed functional connectivity. Functional connectivity is defined as statistical dependence between spatially remote neurophysiological events (Friston et al., 1993).
Several years after Biswal's discovery, Marcus Raichle, a neuroscientist at Washington University in St. Louis, Missouri, discovered a group of brain regions that showed temporal correlation during rest. Using positron electron tomography (PET), he noticed that the activity in certain brain regions was reduced during the performance of attention-demanding tasks. Surprisingly, these activities increased again during rest when the tasks were completed. This suggested that the brain maintained some default activity in the absence of external tasks (Raichle, 2015). This group of brain regions was later named the default mode network (DMN). Using resting-state fMRI, Michael Greicius replicated the same network by placing a seed region of interest in the posterior cingulate cortex (Greicius et al., 2003). His work demonstrated the significance of resting-state fMRI and the discovery of other resting-state networks soon followed. As shown in Figure   1.1, high level of functional connectivity between regions of known primary visual network, auditory network and higher order cognitive network has now been documented Yeo et al., 2011). By 2008, a proliferation of resting-state studies was underway. Still, the underlying mechanism of resting-state networks remains largely unclear. The detection of resting-state networks across different animal species and under various anesthesia presents a strong possibility that there is an underlying neurophysiological mechanism (Vincent et al., 2007;Shmuel and Leopold, 2008). This is supported by study that showed correlation between resting-state networks and real neural activity using simultaneous fMRI and electroencephalography (EEG) (Laufs et al., 2003). Resting-state activity is thought to play a crucial role in the development and maintenance of synaptic connections (Pizoli et al., 2011). It may even consolidate memory or prime the responses to future stimuli (Albert et al., 2009). From an energy cost perspective, the amount of energy devoted to resting-state is a compelling indication that intrinsic brain activity is of importance. The resting brain represents only 2% of the body weight, yet it accounts for 20% of the body's energy consumption (Raichle and Mintun, 2006). Task-related increases, on the other hand, only accounts for about 5% of the brains total energy consumption. Considering that resting-state consumes the vast majority of the brain's resources, resting-state fMRI may provide a more complete view of functional connectivity, whereas task-based fMRI may underestimate the size and number of functionally-linked areas (Xiong, 1998). This also means that when studying neurodegenerative diseases, resting-state fMRI may show a richer source of disease-related changes.
The simplicity of resting-state fMRI data acquisition has catalyzed its application to clinical populations. Unlike task-based fMRI, resting-state fMRI may be performed on patients with limited physical or mental ability. Resting-state fMRI has been used to identify motor areas for presurgical planning in patients with tumors affecting sensorimotor regions (Zhang et al., 2009). The high spatial resolution of resting-state fMRI has been advantageous in mapping epileptic networks for presurgical planning in epilepsy patients (Liu et al., 2009). Studies have demonstrated that resting-state functional connectivity measures may be used as biomarker to identify neurodegenerative diseases (Fox and Greicius, 2010). Using graph analysis and clustering coefficients, Alzheimer's disease (AD) patients were separated from healthy controls with a sensitivity of 72% and a specificity of 78% (Supekar et al., 2008). Resting-state functional connectivity measure in AD patients have been further shown to correlate with a clinical measure, the Mini-Mental Status Examination score (Li et al., 2002). The clinical application of resting-state functional connectivity has been the most rapidly developing area of resting-state research in the past decade. Advanced signal processing software, capable of handling large-scale dataset, is needed to fully realize the potential of resting-state fMRI in the clinical realm.

Basic MR Physics
Magnetic resonance imaging (MRI) is a non-invasive imaging modality that uses strong magnetic field and electromagnetic waves to generate three-dimensional images. Hydrogen protons can be viewed as small magnets that possess angular momentum or nuclear spin.
When exposed to a strong static magnetic field B 0 , randomly oriented hydrogen protons align with B 0 in two ways: parallel and anti-parallel. These two states of alignment are on different energy level. As there are more protons on the lower energy level, parallel to B 0 , a longitudinal magetization is created. The protons precess around the magnetic field lines at the Larmor frequency ω 0 which can be calculated using the Larmor equation: where γ is the gyromagnetic ratio.
When a radio frequency (RF) excitation pulse oscillating at the Larmor frequency is applied, energy is transferred from the RF wave to the protons. This phenomenon is called resonance. The RF pulse causes some protons to transfer to the higher energy state and align anti-parallel to B 0 , thus reducing the longitudinal magnetization. At the same time, it synchronizes the protons so that they precess in phase, creating a transversal magnetization. After the RF pulse is switched off, protons start to precess out of phase and begin to return to their lower state of energy. So the transversal magnetization decreases and the longitudinal magnetization increases again.
The time it takes for the longitudinal magnetization to recover 63% of its equilibrium magnetization is described by the longitudinal relaxation time T 1 . T 1 is also called the spin-lattice relaxation time because T 1 -relaxation is influenced by the exchange of thermal energy between the protons and the surrounding or lattice. The transversal relaxation time, T 2 , describes the time it takes for 63% of the transverse magnetization to decay. It is also called the spin-spin relaxation time because T 2 -relaxation is influenced by inhomogeneity of the external magnetic field and inhomogeneity of local magnetic fields within the tissues. Both T 1 and T 2 vary with tissue types, giving rise to tissue contrasts in images.
A 180 • RF pulse can be applied to refocus the dephasing protons. This technique is called the spin-echo technique. In gradient-echo technique, a gradient magnetic field is applied in addition to the 180 • RF pulse. After some time, the previously dephasing protons start to precess in phase, resulting in a stronger transverse magnetization and creating an echo or the MR signal. The protons then dephase again and can be refocused by another 180 • pulse and so on, each time creating a smaller echo. The time it takes for the spins to precess in phase again is called the echo time TE. And the time between two excitation pulses is called the time to repeat or TR. T 1 -weighted image can be acquired using short TR and short TE. Conversely, T 2 -weighted image can be acquired using long TR and long TE.
To examine a specific section or slice of the brain, a gradient field is superimposed on the external field. This gradient field causes protons in different slices to precess at different frequencies. By changing the frequencies of the RF pulse, different slices can be imaged. Two additional gradients, the frequency encoding and phase-encoding gradient, are used to determine the point in a slice from which the MR signal originates. After the slice-selection gradient is applied, the frequency-encoding gradient is applied along an axis of the slice, resulting in different precession frequencies along the axis. The phase-encoding gradient is then applied to alter the precession frequencies along the other axis briefly so that the protons precess at different phase angles when the phase-encoding gradient is turned off. Fourier transformation can be used to reconstruct the image based on the frequency and phase information.

BOLD fMRI
The most common method for performing fMRI uses the blood-oxygenation-level dependent (BOLD) contrast. BOLD imaging takes advantage of the differences in magnetic susceptibility between oxygenated and deoxygenated haemoglobin.
Deoxygenated haemoglobin is paramagnetic and it has unpaired electron(s) that create small distortion in local magnetic field within the tissues. Higher concentration of deoxygenated haemoglobin leads to faster decay of the transverse magnetization and thus reduces MR signal intensity. This effect is called the T 2 * -effect, which is similar to T 2 -effect, but with additional dependence on inhomogeneity in local magnetic field caused by changes in blood oxygenation.
It is well established that neuronal activity in the brain is coupled with oxygen metabolism (cerebral metabolic rate of oxygen, CMRO 2 ) and cerebral blood flow (CBF).
The coupling of metabolic and vascular events during neural activation is illustrated in  BOLD fMRI is an indirect measure of neural activity. The signal time course associated with a task or spontaneous brain activity is called haemodynamic response. A typical haemodynamic response is shown in Figure 1.3. There is an initial dip in signal intensity following neuronal activation and oxygen metabolism. The effect of cerebral blood flow then results in a positive BOLD response that lasts for about 5 to 10 seconds.
Changes in cerebral blood volume (CBV) are slow. The higher blood volume leads to increased concentration of deoxygenated haemoglobin, resulting in a signal undershoot. The haemodynamic response measured by BOLD fMRI depends on oxygenation level, blood flow and blood volume. It is much slower than the underlying neural processes. Thus, the temporal resolution of fMRI is typically on the order of seconds. This is low compared with other functional imaging modalities that have millisecond temporal resolution, such as EEG and magnetoencephalography (MEG). The primary benefit of fMRI is that it provides relatively high spatial resolution compared with PET, EEG and MEG. The spatial resolution of fMRI is typically on the order of 3 -4 mm.

Artifacts
As with most imaging modalities, fMRI data can be corrupted by artifacts. The main source of artifacts in fMRI originates from head motion, physiological noise and hardware limitations. When head motion is present, the measured intensity at a given voxel may be composed of values originating from different brain locations. To reduce motion artifacts, fMRI volumes are realigned using a rigid body transformation. This method, however, does not correct for spin history artifacts in which a particular slice no longer correspond to the same part of brain but correspond to another part that is at different degree of excitation. Spin history artifacts can cause non-linear distortions in signal intensity that are difficult to eliminate. Study has shown that even with perfect spatial realignment, head motion can bias correlations between brain regions differentially depending on their distance (Power et al., 2012). It is recommended that motion regressors be included in models of the BOLD response to correct for these artifacts.
Physiological noise is primarily driven by cardiac and respiratory cycles. It is the dominating noise source in high field fMRI. Low frequency fluctuations due to vascular and metabolic noise (<0.01 Hz) can be removed using high-pass filter. Cardiac noise (∼0.3 Hz) and respiration noise (∼1.0 Hz), are more difficult to remove because fMRI typically has long TR so these frequencies alias into lower frequencies. These artifacts can lead to reduced sensitivity and higher inter-subject variability (Birn, 2012). However, cardiac and respiration noise correction is often omitted because existing toolboxes either lack robust automated correction or require independent physiological measurements such as Electrocardiogram (ECG) and pulse oximetry.
Another dominating noise is low frequency drift caused by scanner instabilities. This is commonly removed by applying high-pass filtering. Thermal motion of electrons within the scanner can also introduce artifacts. Since thermal noise tends to be highly random across space and time, its effect can be minimized by performing spatial smoothing. The presence of artifacts in fMRI data necessitates effective preprocessing techniques to ensure the validity and power of subsequent statistical analysis. This is especially true for resting-state fMRI because resting-state functional connectivity is often derived from temporal correlations between brain regions. Inadequate preprocessing can result in spurious correlations and false inferences.

Thesis Objectives
The goal of this thesis project is to develop a preprocessing pipeline, capable of removing artifacts while retaining useful neuronal signals in resting-state fMRI, for researchers at Medical Image Analysis Lab (MIAL) and their clinical collaborators. The preprocessing pipeline will be integrated with Cloud Engine Resource for Accelerated Medial Image Computing for Clinical Applications (CERAMICCA) platform to provide distributed cluster computing and web user interface. This pipeline will facilitate future undertaking of big data studies aimed at analyzing large-scale fMRI data in the context of various neurological disorders.

Overview
The preprocessing pipeline can be divided into 3 components, namely structural,

Discard Initial Volumes
fMRI images are first converted to NIfTI format, float datatype and LAS orientation. The next step is to discard initial volumes that have been acquired before steady-state magnetization is reached. Magnetic gradients typically takes a few seconds to completely stabilize and subjects may need some time to get accustomed to scanner noise. Thus, the first 10 seconds of acquisitions are usually discarded (Soares et al., 2016). User must specify the number of volumes to discard, otherwise all volumes are retained.

Despiking
Spikes in fMRI images are removed using AFNI's 3dDespike. This method fits a smooth curveỹ to the timeseries y of each voxel using L 1 -norm and computes the residuals r.  removed and replaced with a more moderate value. Despiking has been shown to improve realignment of fMRI images (Jo et al., 2013). Also, despiking helps prevent ringing artifact that can be introduced by applying temporal filter on spike-contaminated fMRI data.

Slice Time Correction
Slice time correction is performed to temporally shift the time series of each voxel so that all slices are aligned to a common temporal origin. fMRI images are usually acquired as 2D images, slice by slice in either sequential or interleave manner. This means that adjacent parts of the brain are acquired at different times, causing slice-timing problem.
As illustrated in Figure 2  recommended that motion correction be performed before slice time correction; for subject with low to moderate movement, the preference is to perform motion correction after slice time correction. Regardless of the order, it is recommended that the order be kept consistent for all subjects in the same study (Sladky et al., 2011). An alternative method that performs slice time correction and motion correction simultaneously through four-dimensional realignments have been proposed. This method can potentially overcome the limitation of serial correction and properly compensate for the interactions between time shifts and movement (Roche, 2011). However, this is not yet implemented in major fMRI preprocessing packages.
The impact of slice time correction on resting-state fMRI is relatively minimal when compared with event-related and task-based fMRI. Users may choose to perform slice time correction by providing slice order information. Slice time correction is performed using FSL's slicetimer which takes the middle time point as reference. The timeseries of each voxel is shifted using a Hanning-windowed sinc interpolation so that all slices appear to be acquired at the same time as the reference slice. Taking the middle time point as reference has the advantage of reducing interpolation error because the maximum time shift introduced is T R 2 , compared to a maximum time shift of TR if the first time point is used as reference.

Motion Correction
To correct for spatial artifacts caused by head motion, each fMRI volume is aligned to a reference volume. Traditionally, the first or middle volume is arbitrarily selected as the reference volume. There is no consensus on which choice of reference volume is best. Here, the volume with the least amount of outliers is considered to be most stable and is thus selected as the reference volume. To calculate the number of outliers at each time point, AFNI's 3dToutcount is used to fit a trend to the time series of each voxel and compute the MAD of the residuals. Data points that are far away from the trend are considered as outliers. For every volume, the fraction of brain voxels that are considered as outliers is computed. The volume with the least amount of outlier is chosen as the reference volume.
Then FSL's mcflirt is used to perform rigid registration with 6 degrees of freedom.
In Figure 2.4, a reference volume is shown in red outline, overlaid on a volume that is acquired at a different time. Clearly, motion correction restores the precise spatial correspondence between voxels and anatomical regions that over time gets destroyed by head motion. Motion correction also provides an estimation of rotational and translational displacements in x, y and z axes. These motion parameters can later be used in nuisance regression to remove residual motion-induced fluctuations that are still present in motion-corrected images.
where ∆r and ∆t denote the backward temporal derivatives of rotational and translational displacements. Note that rotational displacements are converted from degrees to millimeters by calculating displacement on the surface of a sphere of radius 50 mm. DVARS is computed using signal intensity according to the following equation: where N represents the total number of brain voxels and I i (x, y, z) is the signal intensity at brain voxel (x, y, z) at time point i. A threshold of 0.5 mm and 0.5% are applied to FD and DVARS respectively, in order to help detect time points that have been corrupted by large motion. Identified time points can be used for motion scrubbing, a technique that censors out bad time points. Due to lack of standardization of FD and DVARS threshold, automatic motion scrubbing is not implemented. FD and DVARS are provided only for quality control.

Coregistration
After motion correction, the reference volume is coregistered with structural MRI using FSL's boundary-based registration (BBR). BBR uses white matter mask from T 1 -weighted image to create white matter boundaries that are then mapped to fMRI image using rigid registration with 6 degrees of freedom. Along each point on the white matter boundary, the difference between a pair of intensity at 2 mm distance away from either side of the boundary is used to calculate a cost function where large intensity difference is a sign of good fit. A sigmoid-like non-linear function is applied on the intensity difference to suppress the effect of extremely large differences, which are likely to be caused by outliers. The cost function also takes into consideration the sign of the intensity difference as higher intensity is expected in the grey matter in fMRI image. The working principle of BBR is illustrated in Figure 2.5. Compared to other methods provided by FSL, BBR performs substantially better, thus BBR is highly recommended for within-subject registration of fMRI to structural image (Jenkinson, 2014). To provide visual quality control, pial and white-matter surfaces from T 1 -weighted image are overlaid on coregistered and skull-stripped fMRI image as shown in Figure 2.6. Additionally, the percentage of overlap and Dice coefficient between T 1 -weighted mask and coregistered fMRI mask are computed to help assess goodness of fit. Typically, additional spatial transformation is performed to transform coregistered fMRI images into a common standard space, such as Montreal Neurological Institute (MNI) and Talairach space. This allows volumetric comparison of fMRI images between subjects. Here, this is omitted to emphasize on subject-specific preprocessing and to avoid resampling which can introduce interpolation errors. If needed, the final preprocessed fMRI images can be transformed into a standard space by taking advantage of MIAL's advanced registration pipeline. Rather than transforming fMRI image to structual or standard space, the inverse of the transformation matrix generated by BBR is applied to T 1 -weighted segmentation and parcellation images to transform structural labels defined in structural space to fMRI space.

Skull-stripping and Grand Mean Scaling
Coregistered fMRI images are skull-stripped using FSL's Brain Extraction Tool 2 (bet2) to remove skull and non-brain tissues like eyes. A binary brain mask is generated from the mean fMRI volume across time. The binary mask is then applied to all fMRI volumes. This is followed by grand mean scaling where fMRI images are scaled by a single value such that the new overall 4D mean of all brain voxels is 10000. fMRI study usually involves acquiring data from subjects over multiple sessions. Due to scanner and environmental factors, images acquired from the same subject over different sessions may be on different scales. Grand mean scaling takes care of scaling differences between sessions by removing intersession variance in global signal. Also, it helps facilitate the interpretation of the weights of covariates in general linear model analysis and allows data across subjects to be combined for Independent Component Analysis (ICA).

Nuisance Regression
The next step is to remove residual motion-induced artifacts by performing nuisance regression. This method models fMRI timeseries as a linear combination of confounding signals and residuals. Nuisance regression computes the weight of each confounding signal or regressor and subtracts it from the timeseries. Often, the 6 motion parameters estimated by motion correction are used as regressors. regressors is increasingly common. The temporal derivatives of the 6 motion parameters may be added to form a total of 12 motion-related regressors. Furthermore, the quadratic terms of these 12 motion-related regressors may be added to form a total of 24 motion-related regressors. These 24 motion-related regressors are thought likely to become a standard in the future. In fact, FSL and Human Connectome Project (HCP) use these 24 motion-related timeseries to derive features for their automatic noise classifier (Salimi-Khorshidi et al., 2014). Here, these 24 motion-related regressors are used by default, but users have the option of using 6 or 12 motion-related regressors. Figure 2.7: Example of nuisance regression where a signal is linearly mixed with 6 motion parameters (motion-induced noise). The underlying signal is recovered by regressing out the 6 motion parameters.
Besides motion regressors, users can include 3 tissue-based regressors: mean cerebrospinal fluid timeseries, mean white matter timeseries and global signal.
Conventional fMRI is primarily interested in grey matter because cerebral blood flow and volume are higher in grey matter than white matter. So, mean white matter timeseries is often regressed out as part of preprocessing. Global signal regression, however, is still very much controversial (Murphy and Fox, 2016). Therefore, it is not regressed out by default.
Regression is performed using AFNI's 3dDeconvolve and 3dTproject. Importantly, regression is performed before temporal filtering as performing nuisance regression after temporal filtering may re-introduce unwanted frequencies.

Spatial Smoothing
Smoothing is applied to average the signal over nearby voxels. This helps attenuate noise and increase signal-to-noise ratio (SNR), but it also reduces spatial resolution and may even wipe out small local activations that might be meaningful. Typical FWHM value is between 5 to 10 mm (Mikl et al., 2008).
It is important to note that raw fMRI images coming directly from scanners are inherently smoothed. The smoothness value may vary depending on scanner and acquisition parameters. Most software allows users to specify the FWHM size of smoothing kernel. A common misconception is that this FWHM value controls the final smoothness in the data when in fact it simply adds more smoothness to the data. This means that the specified FWHM value does not reflect the final smoothness in the fMRI images and thus cannot be used to appropriately calculate cluster correction thresholds.
Also, applying the same smoothing value to images acquired from different scanners can produce different results. Friedman et al. has shown that a fixed level of smoothness is needed to reduce inter-scanner variability of activation in fMRI. To avoid these problems, FWHM parameter in this pipeline is used to control the final smoothness in the data rather than the size of smoothing kernel. Here, AFNI's 3dBlurToFWHM is used to iteratively estimate smoothness in the data and apply smoothing until the fMRI image reaches the approximate final FWHM smoothness value.

Temporal Filtering
The purpose of temporal filtering is to remove frequencies that are not of interest and thereby improve the SNR. Confounding signals with known frequencies include slow drifts (<0.01 Hz), breathing (∼0.3 Hz) and heartbeat (∼1.0 Hz). Using Fourier Transform, fMRI data can be decomposed into a linear combination of sines and cosines of different frequencies. The frequency range covered by fMRI data depends on TR, with 1 2 T R (Nyquist frequency) being the highest distinguishable frequency. Frequencies higher than Nyquist frequency are aliased into lower frequencies.
fMRI tends to have a 1/f noise profile in which noise is disproportionally expressed in low frequencies. Therefore, highpass filtering is used in most studies. Using FSL's temporal filter, a highpass cut-off at 0.01 Hz is applied by default to remove low frequency noise such as slow drifts. FSL uses a local fit of a straight line to perform highpass filtering. This method has a more gradual rolloff that is preferable to the sharp rolloff of Finite Impulse Response (FIR) type of filter as it does not introduce autocorrelations into the fMRI data. Traditionally, a lowpass cut-off at 0.1 Hz is also applied as studies have shown that spontaneous low frequency fluctuations (∼<0.1 Hz) reflect spontaneous neural activity (Fransson, 2005). But there is new evidence that neuronal-related resting-state signal is present up to 0.2 Hz (Feinberg et al., 2010;Gohel and Biswal, 2015). Furthermore, lowpass filtering is rather controversial in that it has been shown to introduce spurious correlations and impose autocorrelations (Davey et al., 2013). So, lowpass filtering is not applied here by default. However, users can elect to perform lowpass filtering by specifying their preferred lowpass cut-off frequency.

Timecourse Extraction
Finally, the preprocessed fMRI images are ready for functional connectivity analysis. To expedite timeseries analysis, the mean timeseries of each region of interest (ROI) defined by FreeSurfer's labels is extracted. Additionally, user can provide a whole brain segmentation image and corresponding list of ROI labels to extract mean timeseries of ROI not defined by FreeSurfer. As shown in Figure 2.8, a visual overview of FD, DVARS and mean timeseries of all ROIs is provided for quality control.

Scheduling Component
This pipeline is designed to distribute preprocessing computations to high performance computing (HPC) clusters. To preprocess fMRI data, users only need to fill out a web form via CERAMICCA's web user interface. No installation of software is required. When user submits a request, this component comprehensively checks for the existence of input data, validity of user options and any condition that is likely to cause future failure. If the test fails, then the request is rejected. This fail-fast system is designed to avoid wasting computing resources. A successful request is submitted to MIAL's scheduler. Also, three

Introduction
To test the effectiveness of this preprocessing pipeline, a seed-based analysis is performed on an open-source fMRI dataset with high test-retest reliability. In seed-based analysis, a seed or region of interest (ROI) is selected based on priori knowledge and functional connectivity is examined by correlating the average timeseries of voxels within the ROI against timeseries of other voxels in the brain. This method produces functional connectivity maps where spatially distinct brain regions are considered functionally connected if their timeseries are highly correlated. Patterns of functional connectivity detected using seed-based analysis are highly reproducible across participants and scans (Shehzad et al., 2009). The effectiveness of this preprocessing pipeline is evaluated based on whether known functional connectivity maps can be reliably reproduced across participants and scans.
The New York University (NYU) Child Study Center (CSC) TestRetest fMRI dataset (https://www.nitrc.org/projects/nyu_trt/) is chosen because it has been reported to show functional connectivity maps of high reliability and consistency for seed-based analysis (Shehzad et al., 2009). This dataset consists of resting-scans from 25 participants at 3 different time points. Using this dataset, Shehzad et al. have shown that the functional connectivity maps associated with 3 seeds placed in posterior cingulate cortex (PCC), supplementary motor area (SMA) and inferior parietal sulcus (IPS) are consistent across participants, intersession scans (<5 months apart), intrasession scans (<1 hour apart) and multiscans (all 3 scans). Here, the same dataset is preprocessed and a similar seed-based analysis is applied to extract functional connectivity maps for comparison with previously reported results. Voxel-wise intraclass correlation coefficient (ICC) is also computed to assess the consistency of functional connectivity maps on subject-level across intersession scans, intrasession scans and multiscans.

Materials
The NYU CSC TestRetest dataset provides raw resting-state fMRI data and anonymized T 1 -weighted images of 25 right-handed native English-speaking participants. For each participants, 3 resting-scans of 197 continuuous echoplanar imaging (EPI) fMRI volumes were acquired. Scan 1 was acquired in the first session. After 5-16 months (mean 11 ± 4 months), the second session was conducted where scans 2 and 3 were acquired 45 minutes apart.

Preprocessing
The T 1 -weighted images and fMRI data were preprocessed using parameters similar to that in previous study. Specifically, the T 1 -weighted images were segmented into cerebrospinal fluid (CSF), white matter (WM) and grey matter (GM) tissues to generate tissue-based masks for timeseries extraction. To enable group analysis, the T 1 -weighted images were transformed into MNI152 (Montreal Neurological Institute) space using a 12 degreee of freedom linear affine transformation followed by a non-linear registration as implemented in FSL. The fMRI data were despiked, slice time corrected using interleave order, motion corrected to volume with least amount of outliers and coregistered with T 1 -weighted image. Next, skull-stripping and grand mean scaling were applied. Global signal, mean WM timeseries, mean CSF timeseries and 6 motion parameters were all regressed out. To facilitate comparison with published result, spatial smoothing was applied using Gaussian kernel of 6 mm FWHM rather than using the iterative method mentioned in section 2.3.8. A temporal filtering with 0.004 Hz highpass cut-off and 0.178 Hz lowpass cut-off was subsequently applied. The final fMRI data were then transformed into MNI152 space.

Functional Connectivity Analysis
To assess brain functional connectivity, the PCC, SMA and IPS were chosen as seed ROIs.
These regions represent the core components of the default mode network (DMN) and task-positive network (TPN), 2 of the most commonly examined networks in resting-state literature. Binary seed masks were created by placing a sphere of 5 mm radius centered on the MNI coordinates of the PCC (-6 -58 28), SMA (-2 10 48), and IPS (26 -58 48). The mean timeseries of each seed ROI was extracted using the seed masks. For each participant, multiple regression analysis was performed using FSL to identify voxels whose timeseries correlated with the mean timeseries of each seed ROIs. These correlation estimates were used for subsequent group-level analyses.
Group-level mixed-effects analyses were performed using FSL. Cluster-level thresholding (min Z > 2.3; cluster significance: p < 0.005, corrected) was applied to correct for multiple comparison. This produced thresholded Z-score maps of positive and negative functional connectivity maps for each seed ROI. Group-level functional connectivity maps of intersession, intrasession and multiscans were also computed to examine within-subject effects. For each participant, fixed-effects analysis was performed by concatenating the correlation estimates of scans 1 and 2. This comparison between scans 1 and 2 is defined as intersession reliability. Similarly, correlation estimates of scans 2 and 3 were concatenated for intrasession fixed-effects analysis and correlation estimates of scans 1, 2 and 3 were concatenated for multiscans fixed-effects analysis. This subject-level fixed-effects analysis was then followed by group-level mixed-effects analysis and cluster-level thresholding.
Voxel-wise ICC was computed to assess the reliability of each functional connection. ICC is a popular test-retest reliability metric that combines between-subject variance and within-subject variance to determine whether differences between subjects are measurable and whether subject activations are consistent from one scan to another.
The correlation between each voxel timeseries and each seed timeseries was computed and z-transformed. The correlation coefficients of scans 1 and 2 were combined and one-way ANOVA was applied to obtain the between-subject mean square (MSB) and within-subject mean square (MSW). Intersession ICC was then computed using Equation

Results
The group-level Z statistic maps of positive and negative functional connectivity for each seed, presented in Figure 3.1, showed substantial overlap across scans. Consistent with previous studies, the PCC showed possitive correlations with brain regions of the DMN such as the medial prefrontal cortex, lateral parietal cortex and middle temporal gyrus Non-overlapping regions, particularly regions in the lateral parietal cortex in scan 3 for IPS seed, showed Z statistic very near the applied threshold. The highlights the limitation of comparing sessions by the amount of correlated voxels that is, this approach strongly depends on the statistical threshold used to define correlation.
The consistency of group-level functional connectivity was further examined by plotting the unthresholded Z statistics and fitting a linear regression model to the datapoints. As shown in Figure 3.2, the Z statistics of scan 1 and 2 were highly correlated.
The Z statistics of scan 2 and 3 were also found to be highly correlated. Of the 3 seeds, the PCC demonstrated the highest intersession and intrasession consistency.  Importantly, the cross-scan stability of functional connectivity patterns was evident at individual level as Figure 3.3 reflected a considerable amount of voxels exhibiting ICC > 0.5. By convention, ICC > 0.5 is considered good to excellent (Caceres et al., 2009). High ICC values mean that the within-subject variability is much lower than between-subject variability. This indicates that areas of high group-level activation in one scan, not only showed group-level activation in subsequent scans but also preserved subject differentiability. Intrasession reliability is noticeably higher than intersession reliability, consistent with published result. Due to the limitation of ICC, some functional connections may appear to have low reliability or low ICC even though both within-subject and between-subject variance are low (Shehzad et al., 2009).

Discussion
The cross-scan functional connectivity maps for the PCC, SMA and IPS are consistent with previous results published on the same dataset (Figure 3.4). Some minor differences are to be expected due to differences in preprocessing and analysis steps. For instance, Shehzad et al. performed pre-whitening which was omitted here. Also, seeds were defined here by placing a sphere on the centre of reported seed coordinates rather than performing parcellation of brain regions. Another difference is the removal of motion-induced fluctuations. Here, the 6 motion parameters were regressed out, whereas the 6 motion parameters were orthogonalized with respect to the seed timeseries in the original study.
Nevertheless, the results clearly indicate that the preprocessing pipeline has been effective at removing spatial and temporal artifacts.

Chapter 4 Conclusion
In summary, a preprocesing pipeline is developed to minimize spatial and temporal artifacts in resting-state fMRI data. Effective preprocessing is crucial to the interpretation and validation of studies examining resting-state functional connectivity. The building blocks of the pipeline include despiking, slice time correction, motion correction, coregristration, nuisance regression, spatial smoothing and temporal filtering. Users can initiate preprocessing jobs via a web user interface. To handle large-scale dataset, the preprocessing pipeline is equipped with the ability to distribute computations to high performance clusters. Useful visualizations are provided for quality control.
Further work remains to be done to ensure that the preprocessed data is sufficiently free of artifacts. Automatic denoising methods are still being developed and optimized. The FMRIB's ICA-based Xnoiseifier (FIX), which performs denoising by automatically classifying ICA components into good or artefactual components, appears to be working well in high-resolution data. Preliminary analyses show that FIX greatly reduces motion-related artifacts (Smith et al., 2013). This method, however, requires good training data to perform well. Hence, it was not added into this pipeline. This decision should be revisited in the future and investigation should be made to determine if the provided sample training data works well for typical fMRI data, or better yet training data can be generated from current data pool. Another potential method to add into the pipeline is motion scrubbing. This method identifies timepoints that are irreversibly damaged by motion and subsequently removes these timepoints from timeseries analysis.
Motion-related metrics is already computed for identification of bad timepoints, but to date there is no standardized cut-off for these metrics. Other issues revovle around how best to perform motion scrubbing, as strategies range from destructive manner in which volumes at or surrounding bad timepoints are removed to non-destrucutive manner in which all volumes are kept intact and bad timepoints are ignored in timeseries analysis.
A seed-based analysis is performed on preprocessed data to investigate whether spatial and temporal artifacts are effectively removed to the extent that known functional connectivity networks can be reliably reproduced on group-level across scans. The DMN and TPN are 2 of the most commonly examined networks in resting-state fMRI literature. Therefore, the 3 core components that represent the DMN and TPN are chosen as seed ROIs: PCC, SMA and IPS. Across scans, activity in the PCC is functionally correlated to regions of the DMN and anti-correlated to regions of the TPN. In contrast, activities in the SMA and IPS are functionally correlated to regions of the TPN and anti-correlated to regions of the DMN across scans. This is consistent with literature. The stability of functional connections is also evident on individual level as a considerable amount of voxels exhibited high reliability (ICC > 0.5).
The consistency between the networks identified in this seed-based analysis and those in literature provides validation for the pipeline. This pipeline will be an useful addition to the CERAMICCA platform, supporting researchers and clinical collaborators of MIAL in their analysis of resting-state functional connectivity. It will also facilitate future undertaking of big data studies aimed at analyzing large-scale fMRI data in the context of various neurological disorders.