(Working Title) EEmaGe: EEG-based Image Generation for Visual Reconstruction

Hypothesis

1) Supervision is not required to construct the human’s visual system. 2) Excluding visual cues from extracting EEG features are ultimately required.

Abstract

Visual reconstruction from EEG has been paved with the advancement of AI. Recent studies have demonstrated the feasibility of reconstructing images from EEG recordings in designed experiments. Nevertheless, even though breakthroughs in AI have begun with imitating the human system, these frameworks lack resemblance to the visual system of the human. To address this challenge, this research proposes a novel framework called EEmaGe which utilizes self-supervised learning to reconstruct images from raw EEG data. Unlike the previous methods which rely on supervised learning and labeled data using visual cues, the framework employs a self-supervised autoencoder and its downstream task to extract meaningful EEG features. The experimental results showcase the state-of-the-art performance of the framework in metrics related to MSE. As the RE2I approach, the research has the potential to contribute to advance our knowledge about the intricacies of the human brain and to develop more sophisticated AI systems that effectively mock human visual perception.

\[\mathbf{Acronym / Abbreviation} \\ \begin{array}{|c|c|} \hline \text{Artificial Intelligence (AI)} & \text{Machine Learning (ML)} \\ \hline \text{Reconstruction From EEG to Image (RE2I)} & \text{electroencephalogram ? / electroencephalography ? (EEG)} \\ \hline \text{Convolutional Neural Network (CNN)} & \text{Small-World Neural Network (SWNet)} \\ \hline \text{Generative Adversarial Network (GAN)} & \text{Brain-Computer Interface (BCI)} \\ \hline \text{Self-SUpervised Learning (SSL)} & \text{functional Magnetic Resonance Imaging (fMRI)} \\ \hline \text{Near-infrared spectroscopy (NIRS)} & \text{Magnetoencephalography (MEG)} \\ \hline \text{Frechet Inception Distance (FID)} & \text{} \\ \hline \text{} & \text{} \\ \hline \end{array}\]

Keywords

Parallel Gradient Descent
- Conduct gradient descent while feed-forwarding to decrease the consumed training time.
electroencephalogram?/electroencephalography? (EEG)
- “Electroencephalography is a medical imaging technique that reads scalp electrical activity generated by brain structures. The electroencephalogram is defined as electrical activity of an alternating type recorded from the scalp surface after being picked up by metal electrodes and conductive media.”

I. Introduction

Objects always exist regardless of someone’s perception. This influenced that the definitions of looking, seeing, and watching are different. Looking is to toward eyes somewhere, seeing is to perceive things what eyes direct, and watching is to spend time and pay attention to the things 4. In other words, ‘looking’ belongs to the ‘seeing’ set and ‘seeing’ belongs to the ‘watching’ set. The visual system of humans performs looking, meaning that, supervision is not required to imitate the system.

BCI, firstly proposed by Vidal 10, has seeked to the key of the human brain where the area has yet been conquered. Disabled people are expected to be benefited to live real lives with others, if BCI researches continuously evolve. Among the methodlogies of BCIs, EEG analysis has especially been drawn attention due to its advantages, non-invasive and cost-effectie sensors which are utilized during brain measurements. The analysis, which uses a signal recorded electrical activities of brains 5, is pervasively adopted in medical and research areas to diagnose brain diseases. Even though its effectiveness in those areas, EEG required manual analysis of experts like physicians and researchers 11.

AI has been adnvaced with imitating the system of human beings. For instance, Neural Network 1 mimicked the human nervous system as well as its advancement like CNN 2 and SWNet 3 (imitate/mimic/resemble).

Aligning with AI technologies, many researches founded that EEG is able to reproduce visual experiences with deep learning algorithms.

For instance, shedding a light on generating images from feature vectors with GANs, the networks were employed to generate images from EEG signals [6, 8, 9].

The adoption of SSL has enabled the extraction of features from EEG data without supervision []. However, despite this liberation from supervision, visual cues are still employed, thereby limiting the ability to fully imitate the visual system.

Furthermore, eliminating visual cues is crucial in moving towards a more universal BCI, making this research applicable to everyday life situations.

In conclusion, the paper conducts and expects to the following bullet points;

EEmaGe, a novel visual reconstruction framework, which applies SSL in order to preclude the supervision and visual cues is proposed.
With this work, a step to generalized BCI which is needed to real-world application is anticipated.

A. Brain-Computer Interface

BCI has been researched for several purposes. Aligning with its name, BCI.

A few methods to measure brain signals exist such as fMRI, NIRS, MEG, and EEG. Especially, fMRI shows the best performance in visual reconstruction so far. [].

On the other hand, EEG is .

The EEG brain waves are categorized by their frequency into four different groups, beta ( > 13Hz), alpha (8-13Hz), theta (4-8Hz), and delta (0.5-4Hz) 5. Gamma ? in Palazzo. While an alpha wave dominates the brain waves during the activities that the eyes are closed, a beta wave is activated when the eyes are opened.

B. EEG-Image Pair Datasets

In the task of visual reconstruction, three qualified datasets are available. Yet, Thoughtviz 13 dataset, originated from Kumar, et al. 14 which collected an EEG dataset for the speech recognition task, utilized imaginary images of participants. The dataset collected by relying on the thought of the participants has an alpha wave data induced by thinking. In fact, the EEG beta waves are dominant while the eyes open 5. Palazzo, et al 6 collected the pairs of EEG and image data from six participants. The ImageNet subset, consisting of fifty images per class where the number of the class is forty, were selected by those researchers. Consequently, 12,000 EEG sequences (2000 images * 6 participants) were gathered via 128 EEG channels. Few sequences were excluded through preprocessing so that 11,466 were valid to account for the opened dataset.

C. Visual Reconstruction With AI

Palazzo et al 6 firstly founded that visual experience can be generated from EEG signals. The research recorded the signals while presenting the subset of ImageNet 7 dataset to subjects. Although the research firstly founded that visual experience can be generated from EEG signals, the proposed method were limited to the dataset category since it relied on supervised learning. NeuroVision

III. Methodology

A. Dataset Usages

Hinged on the fact that the beta wave is required to reconstruct visual stimuli as same as possible, Perceivelab-Dataset 6 is well-suited for the purpose of this work.

B. EEmaGe

In order to achieve the goals of the research which is the exclusion of the supervision in the reconstruction task, self-supervised learning is adopted to implement a model training. Autoencoder is a representative self-supervised learning method hinged on neural net architectures.

EEmaGe is an autoencoder-based model architecture which gets an input $(e, i)$ pair where $e$ is an EEG and $i$ is an image. The pair is shuffled from its own pair matched by original datasets. (or make all possible pairs to maximize the number of the data ) Two autoencoders which their encoders share weights with themselves comprise the architecture.

Training

Downstream Task In this research, a downstream task is defined as reconstructing images from EEG signals with an autoencoder. The autoencoder is made up of the EEG encoder and the image decoder from EEmaGe. Inferences of the autoencoder solely implement to make the images. 1) foundation model itself. 2) fine-tuning (but need to show better performance than previous research) Downstream Task A downstream task is defined as reconstructing images from EEG signals with an autoencoder. Transferring the EEG encoder and the image decoder from EEmage, inferences of the autoencoder implement to generate images. This task can be differentiated with 1) utilizing the autoencoder as a foundation model itself and 2) fine-tuning the autoencoder to maximize its performance.

The output images are compared by FID with the original images.

IV. Implementation

A. Environments

A bare metal computer eqiups Intel i5-10400F CPU, GTX 1660Ti GPU and two Samsung 8 GB 2666Hz RAM. The versions of Python and its framework are following: Python 3.10.12, PyTorch 2.2, and Tensorflow 2.13.0. Further details are specified in the requirements.txt file at the project repository. loss function - mse

B. Experimental Results

Evaluation ??

V. Conclusion

The novel framework, EEmaGe, has successfully reconstructed human vision. EEmaGe has two goals: to eliminate the need for supervision in constructing the human visual system and to exclude visual cues from extracting EEG features. The first goal is achieved by adopting an autoencoder architecture while training SSL. The second goal is achieved by the downstream task which inferences original images with the trained EEG encoder and image decoder. These goals lead to an ultimate goal that restores the human vision with any EEG signals. The availability of chances to change the image decoder in the downstream task supports that our EEG encoder robustly extracts a generalized EEG features as well as the restoration of the real world is possible if decoders is enabled to represent all stuffs in lives. As a result, these approaches are expected to further advance the development of generalized BCI.

References

$\tag*{}\label{1} \text{[1] McCulloch, Warren S., and Walter Pitts. "A logical calculus of the ideas immanent in nervous activity." The bulletin of mathematical}\\\text{ biophysics 5 (1943): 115-133. [MLA]}$ $\tag*{}\label{2} \text{[2] Fukushima, Kunihiko. "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift}\\\text{ in position." Biological cybernetics 36.4 (1980): 193-202. [MLA]}$ $\tag*{}\label{3} \text{[3] Javaheripi, Mojan, Bita Darvish Rouhani, and Farinaz Koushanfar. "SWNet: Small-world neural networks and rapid convergence." }\\\text{arXiv preprint arXiv:1904.04862 (2019). [MLA]}$ $\tag*{}\label{4} \text{[4] https://www.britannica.com/dictionary/eb/qa/see-look-watch-hear-and-listen , accessed in Mar 4 2024. []}$ $\tag*{}\label{5} \text{[5] Teplan, Michal. "Fundamentals of EEG measurement." Measurement science review 2.2 (2002): 1-11. [MLA]}$ $\tag*{}\label{6} \text{[6] S. Palazzo, C. Spampinato, I. Kavasidis, D. Giordano and M. Shah, "Generative Adversarial Networks Conditioned by Brain Signals," }\\\text{2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 3430-3438, doi: 10.1109/ICCV.2017.369. [IEEE]}$ $\tag*{}\label{7} \text{[7] J. Deng, W. Dong, R. Socher, L. -J. Li, Kai Li and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," 2009 IEEE }\\\text{Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848. [IEEE]}$ $\tag*{}\label{8} \text{[8] Khare, Sanchita, et al. "NeuroVision: perceived image regeneration using cProGAN." Neural Computing and Applications 34.8 (2022):}\\\text{ 5979-5991. [MLA]}$ $\tag*{}\label{9} \text{[9] P. Singh, P. Pandey, K. Miyapuram and S. Raman, "EEG2IMAGE: Image Reconstruction from EEG Brain Signals," ICASSP 2023 - 2023 }\\\text{IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, }\\\text{doi: 10.1109/ICASSP49357.2023.10096587. [IEEE]}$ $\tag*{}\label{10} \text{[10] Vidal, Jacques J. "Toward direct brain-computer communication." Annual review of Biophysics and Bioengineering 2.1 (1973): 157-180. [MLA]}$ $\tag*{}\label{11} \text{[11] https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification/overview , accessed in Mar 4 2024. []}$ $\tag*{}\label{12} \text{[12] Kaneshiro B, Perreau Guimaraes M, Kim HS, Norcia AM, and Suppes P (2015). EEG data analyzed in "A Representational Similarity }\\\text{Analysis of the Dynamics of Object Processing Using Single-Trial EEG Classification". Stanford Digital Repository. }\\\text{Available at: http://purl.stanford.edu/bq914sc3730 [Dataset]}$ $\tag*{}\label{13} \text{[13] Tirupattur, Praveen, et al. "Thoughtviz: Visualizing human thoughts using generative adversarial network." Proceedings of the 26th }\\\text{ACM international conference on Multimedia. 2018. [MLA]}$ $\tag*{}\label{14} \text{[14] Kumar, Pradeep, et al. "Envisioned speech recognition using EEG sensors." Personal and Ubiquitous Computing 22 (2018): 185-199. [MLA]}$ $\tag*{}\label{15} \text{}$