Mindscan - An Open-source MRI Dataset for Advancing AI Research in Brain Mapping

Aug 13, 2023

Mindscan - Synthetic Brain Images: Bridging the Gap in Brain Mapping with Generative Models for Advancing AI Research in Brain Mapping

By: Paul Chris Luke

Collaborative Notice

This research paper represents a collaborative effort involving multiple contributors. The primary author and researcher, Paul Chris Luke, has played a central role in conceptualizing, designing, and executing the study outlined in this paper. Additionally, as part of their master's degree program, a junior data scientist has actively participated in the research process and contributed to specific aspects of the project.

Contributors:

Paul Chris Luke (Primary Author and Researcher)
Drici Mourad (Contributor and Master's in Data Science and Analytics) at University of Westminster

Drici Mourad will be primarily responsible for the technical implementation aspects of the project, including Python programming, AI-related tasks, and the development of algorithms. Additionally, he will take charge of the final formatting and preparation of the paper for submission, ensuring that it meets the technical and structural requirements of academic publications.

It is important to note that while Paul Chris Luke has been responsible for the overall direction, methodology, and composition of the paper, the collaboration has enriched the study with diverse perspectives, skills, and insights. The combined effort underscores the interdisciplinary nature of AI research in the field of medical imaging and highlights the contributions of both experienced and emerging researchers.

This collaborative approach is in line with the commitment to fostering a cooperative and dynamic research environment, nurturing the growth of aspiring researchers, and advancing the collective understanding of AI applications in the realm of medical imaging.

This project embraces an open-source ethos, and the research code, datasets, and methodologies will be made available to the public on GitHub, fostering transparency, reproducibility, and wider community engagement.

Jul 13, 2023

Introduction

The integration of artificial intelligence (AI) technologies into the field of medical imaging has led to significant advancements in diagnostic accuracy and patient care. One notable application of AI lies in functional brain mapping, where computational models are employed to analyze complex neuroimaging data. However, the efficacy of these models relies fundamentally on the quality and diversity of the datasets used for training. This paper addresses a pivotal requirement in AI-driven medical imaging: the acquisition, transformation, and augmentation of datasets to enhance the capabilities of AI models in the domain of functional brain mapping.

The utilization of AI in functional brain mapping offers the potential to uncover intricate neural processes and cognitive functions. Nevertheless, the potency of AI algorithms is contingent on access to data that accurately reflects the complexity and variability of real-world scenarios. This study thus focuses on innovative strategies for enhancing data quality and expanding data variety to advance the reliability and generalizability of AI models in this domain.

This paper outlines a comprehensive approach aimed at elevating the capacity of AI models in functional brain mapping research. By systematically addressing key challenges associated with data transformation, augmentation, and utilization, this study endeavors to contribute to the refinement of AI applications in medical imaging.

The research objectives are structured to explore various facets of data-driven AI applications in functional brain mapping. First, an examination of the feasibility of converting common MRI scan file formats—namely DICOM and NIfTI—into a unified and AI-compatible data repository is undertaken. Subsequently, the study delves into the domain of procedural generation, assessing the viability of generating synthetic datasets that align with the statistical properties of existing data. Further exploration extends to the creation of anatomically precise three-dimensional (3D) models derived from these synthetic datasets.

This paper underscores the importance of domain expertise, rigorous quality control, and data diversity in ensuring the reliability and usability of transformed and generated datasets. Ethical considerations are also paramount in handling sensitive medical information responsibly.

In conclusion, this paper aspires to contribute to the burgeoning field of AI-driven medical imaging, with a specific focus on functional brain mapping. The research framework presented offers insights into data transformation, synthesis, and utilization, paving the way for future advancements in understanding the intricacies of brain function.

The subsequent sections of this paper delve into the objectives, methodology, and implications of the research, providing a comprehensive view of the contributions made in this study.

Research/Study Background

The advancement of computer vision techniques in the field of brain mapping has ushered in new avenues for understanding neural structures and functions. However, a pervasive challenge in the integration of artificial intelligence (AI) in this domain lies in the discrepancy between synthetic and real data. While synthesized training data has gained prominence across various domains, the domain gap between real and synthetic data remains a formidable obstacle, particularly within the intricate context of brain imaging.

Efforts to bridge this gap have encompassed a spectrum of strategies, ranging from data mixing to intricate domain adaptation and domain-adversarial training. While these endeavors have shown promise in minimizing the divergence between synthetic and real data distributions, achieving seamless generalization to real-world brain image datasets has remained elusive. This research contributes to this ongoing discourse by presenting compelling evidence that synthetically generated data, when meticulously crafted to reduce domain gap, can enable AI models to generalize effectively to in-the-wild brain image datasets.

This paper pivots around a comprehensive methodology that orchestrates the fusion of a procedurally-generated parametric 3D brain model with a meticulously curated asset library. The amalgamation of these elements yields a repository of training images characterized by unparalleled realism and diversity. Central to our approach is the leverage of generative adversarial networks (GANs), allowing us to create synthetic brain images that faithfully preserve the statistical properties and anatomical intricacies of real scans. Such fidelity, achieved without compromising individual privacy, emerges as a pivotal enabler for advancing AI research in brain mapping.

The training of machine learning systems on these synthetic brain images is an instrumental step in this research endeavor. The achieved proficiency in brain-related tasks, such as precise landmark localization and brain region segmentation, attests to the utility of synthetic data. Notably, this approach not only rivals the accuracy of real data but also unveils avenues for novel methodologies that surmount the challenges associated with labor-intensive manual labeling in brain mapping studies.

This study embarks on a journey to address the pressing need for high-quality and diverse training data in AI-driven brain mapping. By uncovering the potential of synthetic data, complemented by procedural precision and GAN-based generation, this research reimagines the boundaries of data utilization in the quest to decode the complexities of neural architecture and function.

Aim and Objectives

The primary aim of this research project is to advance AI-driven research in functional brain mapping through the creation of innovative approaches for data transformation and augmentation. To accomplish this overarching goal, the following specific objectives have been identified:

a) Data Transformation for AI Training:

Investigate the feasibility of converting MRI scan file formats, particularly DICOM and NIfTI, into a harmonized and AI-friendly data library.
Evaluate the effectiveness of the transformed data library for enhancing the training process of AI models, particularly in the context of brain mapping tasks.

b) Procedural Generation of Synthetic Data:

Explore the feasibility of generating a synthetic dataset using procedural generation techniques based on the characteristics of the existing dataset.
Assess the utility and compatibility of the procedurally-generated synthetic dataset with AI training methodologies, focusing on its potential to supplement real data in training AI models.

c) Creation of 3D Models from Synthetic Data:

Investigate the technical feasibility of producing three-dimensional (3D) models derived from the procedurally-generated synthetic dataset.
Evaluate the anatomical accuracy and realism of the generated 3D models and their potential utility in enhancing AI-driven brain mapping tasks.

Through the pursuit of these objectives, this research endeavors to pave the way for the effective utilization of transformed data and procedurally-generated synthetic datasets in AI-driven brain mapping research. By addressing key questions regarding data transformation, generation, and utilization, this study aims to contribute to the advancement of AI applications in the domain of functional brain imaging.

Research Questions

Research Question 1: Is it Viable to Convert DICOM and NIfTI MRI Scan File Formats into a Consumable Data Library for AI Training?

This research question explores the viability of transforming MRI scan file formats, specifically DICOM (Digital Imaging and Communications in Medicine) and NIfTI (Neuroimaging Informatics Technology Initiative), into an AI-friendly and cohesive data library for effective AI training purposes.

Research Question 2: Is it Possible to Procedurally-Generate a Synthetic Data Set Based on the Given Dataset?

This inquiry explores the feasibility of creating a synthetic dataset through procedural generation techniques that are informed by the characteristics of the existing dataset.

Research Question 3: Is it Possible to Generate 3D Models of the Procedurally-Generated Synthetic Dataset?

This question explores the feasibility of creating three-dimensional (3D) models derived from the procedurally-generated synthetic dataset, assessing the potential to produce realistic and anatomically accurate representations.

Literature Review

The landscape of functional brain mapping has undergone dynamic evolution, propelled by advances in imaging technologies and the integration of artificial intelligence (AI) methodologies. This comprehensive literature review embarks on a journey through seminal contributions that have enriched our comprehension of functional brain mapping. Spanning dynamic PET imaging, unsupervised learning in functional MRI (fMRI), real-time mapping through neural interfaces, and the transformative application of synthetic data in AI research, this review underscores the synthesis of disciplines that shape the current research endeavor.

The pivotal study by Barnes et al. (1997) stands as a hallmark in functional brain mapping, marking a decisive stride in dynamic 3-D PET imaging¹. By pioneering real-time brain activity monitoring, their work established PET imaging as a cornerstone for unraveling functional brain regions' temporal and spatial dynamics. This seminal advancement in imaging methodologies has played a catalytic role in inspiring the integration of AI techniques to extract nuanced insights from intricate neural activities.

The contribution of Faisan et al. (2005) resonates deeply with our research orientation, introducing unsupervised learning techniques for mapping active brain functional MRI (fMRI) signals². The employment of hidden semi-Markov event sequence models provided a means to unearth latent patterns within fMRI data, offering a deeper understanding of functional networks' interplay during various cognitive tasks. This insight underscored the need to bridge the gap between simulated and real data domains, catalyzing our exploration of synthetic data for AI-driven research in brain mapping.

Advancements in neural interfaces brought the work of Wang et al. (2023) to the forefront, showcasing real-time functional brain mapping capabilities³. Through the integration of high-channel-count, ultra-conformal neural interfaces, their research accentuated the potential for achieving unprecedented spatiotemporal resolution in brain activity mapping. This development aligns seamlessly with our endeavor to explore three-dimensional modeling derived from procedurally-generated synthetic datasets, bridging the gap between cutting-edge imaging techniques and AI methodologies.

Equally significant is the groundbreaking study by Wood et al. (2021)⁴, which introduced the innovative application of synthetic data in AI research. While their focus was primarily on face analysis, the notion of using synthetic data to bridge domain gaps and preserve data privacy parallels our motivation in addressing sensitive health data. This reference crystallized our understanding of the potential of synthetic data to enable AI-driven insights while safeguarding individual privacy.

To enrich our exploration, Shi et al. (2019) present a novel approach to impute missing values using non-negative matrix factorization⁵. Their methodology, albeit within a different context, deepened our comprehension of imputation techniques. Informed by their insights, we explore imputation's potential to enhance data quality and normalize varied brain datasets, a critical endeavor in our pursuit of seamless AI training.

In synthesis, these contributions collectively illuminate the trajectory of functional brain mapping research. Dynamic PET imaging, unsupervised learning in fMRI, real-time neural interfaces, and the innovative application of synthetic data converge to define the multidisciplinary landscape that guides our research questions and objectives. As we navigate this intersection of imaging innovation and AI prowess, we draw inspiration from these references to embark on a journey that promises to uncover the intricacies of functional brain dynamics.

Methodology

The methodology employed in this study is devised to address the multifaceted challenges of training AI models for functional brain mapping, utilizing both real and synthetic data. The cohesive approach encompasses data acquisition, transformation, synthesis, and utilization, culminating in the training and evaluation of AI models.

Data Acquisition and Preparation

The acquisition of a diverse and comprehensive dataset forms the foundation of this study. Collaborative partnerships with esteemed medical institutions, hospitals, and research centers are leveraged to procure a substantial collection of MRI scans. These scans, encompassing diverse anatomical regions and medical conditions, are drawn from reputable open-access repositories, such as OpenNeuro[^6^]. The inclusion of data from such sources enhances dataset variety, allowing for improved model robustness and generalization.

Data Transformation and Imputation

DICOM and NIfTI MRI scan file formats serve as primary inputs in this study. The transformation of these formats into an AI-compatible and unified data library is achieved through meticulous preprocessing. Leveraging domain expertise, medical images undergo normalization, denoising, and voxelization. The synthesized datasets also benefit from imputation techniques inspired by Shi et al. (2019)⁵, contributing to data quality enhancement and standardization.

Synthetic Data Generation

Central to this study's innovation is the synthesis of realistic brain images using procedurally-generated parametric 3D brain models. Drawing inspiration from Wood et al. (2021)⁴, Generative Adversarial Networks (GANs) are harnessed to create synthetic brain scans. GANs ensure the preservation of statistical properties and anatomical intricacies while safeguarding individual privacy. A meticulous asset library contributes to the realism and diversity of the generated images, catering to the myriad scenarios encountered in real-world brain imaging.

AI Model Training and Evaluation

Machine learning systems are trained on both real and synthetic datasets, each curated to align with distinct facets of brain imaging. The AI models undergo rigorous training on the curated datasets, culminating in the accomplishment of various brain-related tasks. Precise landmark localization and brain region segmentation are among the focal objectives of these trained models. The performance evaluation of AI models encompasses accuracy, generalization, and robustness, forming a critical metric to gauge the efficacy of the proposed approach.

Ethical Considerations

The ethical dimensions of this research are underpinned by the principles of data privacy and responsible research conduct. Adherence to ethical guidelines and data privacy regulations safeguards patient information and ensures the secure handling of sensitive medical data.

The methodology outlined in this section converges diverse methodologies, from data acquisition to AI model training. The innovative synthesis of synthetic data and the seamless amalgamation of real and synthetic datasets underscore the interdisciplinary nature of this research. The subsequent sections elucidate the empirical findings and implications of this methodology, contributing to the advancement of AI-driven functional brain mapping research.

Study Limitations

As with any research endeavor, it is essential to acknowledge the inherent limitations that influence the scope, applicability, and interpretability of the findings. These limitations provide a framework for understanding the constraints and challenges faced during the course of the study. In the context of our exploration into advancing AI-driven research in functional brain mapping, several key limitations merit consideration. These limitations stem from factors such as data availability, technical complexity, and practical feasibility, underscoring the multifaceted nature of the research landscape.

1. Limited Availability of High-Quality MRI Data at Scale:

One of the primary limitations revolves around the availability of comprehensive and high-quality MRI data on a large scale. Despite dedicated efforts to collaborate with medical institutions and research centers, acquiring an expansive MRI dataset covering diverse anatomical regions and medical conditions remains a complex undertaking. The scarcity of meticulously annotated MRI scans, both in synthetic and real data domains, poses challenges to addressing the research questions posed in this study comprehensively.

2. Computing Power and Resource Intensiveness for Synthetic Data Generation:

The generation of synthetic brain images using complex techniques such as Generative Adversarial Networks (GANs) demands substantial computational resources. GAN training involves iterative optimization processes and intricate neural network architectures, necessitating high-performance hardware, such as powerful GPUs (Graphics Processing Units). The requirement for extended training times and substantial memory capacity raises concerns about accessibility for researchers with limited computing resources. Moreover, the energy consumption associated with training GANs and creating synthetic data sets warrants consideration, aligning with broader concerns about the environmental impact of resource-intensive AI methodologies.

3. Complexity of Synthetic Data Realism:

While the use of synthetic data, generated through advanced techniques like Generative Adversarial Networks (GANs), holds promise for bridging the gap between real and synthetic data distributions, there remain inherent challenges in achieving complete realism. Despite efforts to preserve statistical properties and anatomical intricacies, the inherent complexity of brain imaging and neural structures presents a formidable obstacle. GAN-based synthetic data may still lack certain nuanced variations and subtleties present in real brain scans, potentially limiting the models' ability to generalize effectively in all real-world scenarios.

While this study aspires to contribute to the forefront of AI-driven research in functional brain mapping, it is crucial to recognize the constraints that shape its trajectory. The limitations stemming from the scarcity of comprehensive and high-quality MRI data, the demanding computational requirements for synthetic data generation, and the nuanced realism challenges of synthetic data underscore the need for cautious interpretation of results and pave the way for future directions in this research domain. As we navigate the intricate interplay between data transformation, synthesis, and AI-driven insights, a balanced understanding of these limitations informs the broader implications of our findings.

Project Timetable

The following is a tentative project timetable outlining the key milestones and activities to be undertaken from the initiation of the project until its completion:

August 15, 2023: Research kick-off, team alignment, and project scope definition.
September 1, 2023: Completion of data acquisition phase, including collaboration agreements with medical institutions.
September 15, 2023: Data preprocessing and transformation for AI training, including DICOM and NIfTI format conversion.
October 1, 2023: Implementation of imputation techniques to enhance data quality and consistency.
October 15, 2023: Commencement of synthetic data generation using Generative Adversarial Networks (GANs).
November 1, 2023: Evaluation and refinement of procedurally-generated synthetic dataset.
November 15, 2023: Development of AI models for brain landmark localization and region segmentation.
December 1, 2023: AI model training and performance evaluation on real and synthetic datasets.
December 15, 2023: Data analysis, result interpretation, and finalization of research findings.

Please note that this timetable is subject to adjustments based on unforeseen challenges and progress. The completion date of December 15, 2023, marks the anticipated conclusion of the project, followed by the dissemination of results, paper writing, and potential avenues for future research.

Summary:

The project proposal titled "Mindscan - Synthetic Brain Images for Advancing AI Research in Brain Mapping" presents a comprehensive endeavor aimed at enhancing the capabilities of AI models in the field of functional brain mapping. This collaborative effort, led by primary author and researcher Paul Chris Luke, delves into the integration of artificial intelligence (AI) technologies in medical imaging, with a specific focus on functional brain mapping. The study's innovative approach involves the acquisition, transformation, synthesis, and utilization of data to advance AI-driven research.

The introduction of AI methodologies in functional brain mapping has opened new avenues for understanding neural structures and functions. However, the gap between synthetic and real data remains a challenge, prompting this research to explore ways to bridge this divide effectively. The objectives of the study are structured to address key challenges associated with data transformation, augmentation, and utilization. The project aims to investigate the feasibility of transforming MRI scan file formats, procedurally generating synthetic datasets, and creating anatomically precise three-dimensional models derived from synthetic data.

The proposed methodology integrates diverse strategies, from data acquisition and transformation to the training and evaluation of AI models. Collaborative partnerships with medical institutions enable the acquisition of a comprehensive MRI dataset covering diverse anatomical regions and medical conditions. The transformation of MRI scan formats into an AI-compatible library, alongside the procedural generation of synthetic brain images using Generative Adversarial Networks (GANs), adds complexity and depth to the research. Ethical considerations are paramount, ensuring data privacy and responsible research conduct.

The study acknowledges its limitations, including the limited availability of high-quality MRI data at scale, the computational power required for synthetic data generation, and the challenge of achieving complete realism in synthetic data. These limitations guide the interpretation of findings and open avenues for future research directions.

Finally, the project timetable outlines key milestones and activities from project kick-off to completion, with a tentative end date of December 15, 2023. The timeline reflects activities such as data acquisition, transformation, synthetic data generation, AI model development, and result interpretation.

In essence, the project proposal seeks to advance AI-driven research in functional brain mapping through innovative strategies for data transformation and synthesis. By addressing critical research questions and challenges, this study aspires to contribute to the refinement of AI applications in medical imaging, ultimately enhancing our understanding of the complexities of neural architecture and function.

Keywords:

artificial intelligence, medical imaging, magnetic resonance imaging, dataset, collaborative project

References:

D. Barnes, G. Egan, G. O’Keefe and D. Abbott, “Characterization of dynamic 3-D PET imaging for functional brain mapping,” in IEEE Transactions on Medical Imaging, vol. 16, no. 3, pp. 261–269, June 1997, doi: 10.1109/42.585760. ↩
S. Faisan, L. Thoraval, J. . -P. Armspach, M. . -N. Metz-Lutz and F. Heitz, “Unsupervised learning and mapping of active brain functional MRI signals based on hidden semi-Markov event sequence models,” in IEEE Transactions on Medical Imaging, vol. 24, no. 2, pp. 263–276, Feb. 2005, doi: 10.1109/TMI.2004.841225. ↩
X. Wang et al., “Real-Time Functional Brain Mapping Based on High-Channel-Count, Ultra-Conformal Neural Interface,” 2023 IEEE 36th International Conference on Micro Electro Mechanical Systems (MEMS), Munich, Germany, 2023, pp. 67–70, doi: 10.1109/MEMS49605.2023.10052566. ↩
E. Wood, T. Baltrušaitis, C. Hewitt, S. Dziadzio, M. Johnson, V. Estellers, T. J. Cashman, and J. Shotton, “Fake It Till You Make It: Face analysis in the wild using synthetic data alone,” arXiv:2109.15102 [cs.CV], 2021. ↩ ↩²
Shi, J., Li, H., Zhang, Y., & Zhang, B. (2019). A novel approach to imputing missing values in real estate big data using non-negative matrix factorization. IEEE Access, 7, 43401-43412. doi: 10.1109/ACCESS.2019.2907442 ↩ ↩²