Blinking is a useful biological signal that can gate gaze regression models to avoid the use of incorrect data in downstream tasks. Existing datasets are imbalanced both in frequency of class but also in intra-class difficulty which we demonstrate is a barrier for curriculum learning. We thus propose a novel curriculum augmentation scheme that aims to address frequency and difficulty imbalances implicitly which are are terming Curriculum Learning by Augmentation (CLbA).
Using Curriculum Learning by Augmentation (CLbA), we achieve a state-of-the-art performance of mean Average Precision (mAP) 0.971 using ResNet-18 up from the previous state-of-the-art of mean Average Precision (mAP) of 0.757 using DenseNet-121 whilst outcompeting Curriculum Learning by Bootstrapping (CLbB) by a significant margin with improved calibration. This new training scheme thus allows the use of smaller and more performant Convolutional Neural Network (CNN) backbones fulfilling Nyquist criteria to achieve a sampling frequency of 102.3Hz. This paves the way for inference of blinking in real-time applications.
Saccadic eye movements are known to serve as a suitable proxy for tasks prediction. In mobile eye-tracking, saccadic events are strongly influenced by head movements. Common attempts to compensate for head-movement effects either neglect saccadic events altogether or fuse gaze and head-movement signals measured by IMUs in order to simulate the gaze signal at head-level. Using image processing techniques, we propose a solution for computing saccades based on frames of the scene-camera video. In this method, fixations are first detected based on gaze positions specified in the coordinate system of each frame, and then respective frames are merged. Lastly, pairs of consecutive fixations –forming a saccade- are projected into the coordinate system of the stitched image using the homography matrices computed by the stitching algorithm. The results show a significant difference in length between projected and original saccades, and approximately 37% of error introduced by employing saccades without head-movement consideration.
Virtual Reality (VR) technology has advanced to include eye-tracking, allowing novel research, such as investigating how our visual system coordinates eye movements with changes in perceptual depth. The purpose of this study was to examine whether eye tracking could track perceptual depth changes during a visual discrimination task. We derived two depth-dependent variables from eye tracker data: eye vergence angle (EVA) and interpupillary distance (IPD). As hypothesized, our results revealed that shifting gaze from near-to-far depth significantly decreased EVA and increased IPD, while the opposite pattern was observed while shifting from far-to-near. Importantly, the amount of change in these variables tracked closely with relative changes in perceptual depth, and supported the hypothesis that eye tracker data may be used to infer real-time changes in perceptual depth in VR. Our method could be used as a new tool to adaptively render information based on depth and improve the VR user experience.
Estimating the cognitive load of aircraft pilots is essential to monitor them constantly to identify and overcome unfavorable situations. Presently, the cognitive load of pilots is estimated using manual filling up of forms, and there is a lack of a system that can estimate workload automatically. In this paper, we used eye-tracking technology for cognitive load estimation and developed a Virtual Reality dashboard that visualizes cognitive and ocular data. We undertook a flight simulation study to observe users’ workload during primary and secondary task execution while flying the aircraft. We also undertook an eye-tracking study to identify appropriate 3D graph properties for developing graphs of the cognitive dashboard. We found a significant interaction between users’ primary and secondary tasks. We also observed that primitives like size and color were easier to encode numerical and nominal information. Finally, we developed a dashboard leveraging the results of the 3D graph study for estimating the pilot’s cognitive load. The VR dashboard enabled visualization of the cognitive load parameters derived from the ocular data in real time. The aim of the 3D graph study was to identify the optimal information to be displayed to the participants/pilots. Apart from estimating cognitive load using ocular data, the dashboard can also visualize ocular data collected in a Virtual Reality environment.
We present an analysis of the eye tracking signal quality of the HoloLens 2’s integrated eye tracker. Signal quality was measured from eye movement data captured during a random saccades task from a new eye movement dataset collected on 30 healthy adults. We characterize the eye tracking signal quality of the device in terms of spatial accuracy, spatial precision, temporal precision, linearity, and crosstalk. Most notably, our evaluation of spatial accuracy reveals that the eye movement data in our dataset appears to be uncalibrated. Recalibrating the data using a subset of our dataset task produces notably better eye tracking signal quality.
Eye behavior has gained much interest in the VR research community as an interactive input and support for collaboration. Researchers used head behavior and saliency to implement gaze inference models when eye-tracking is missing. However, these solutions are resource-demanding and thus unfit for untethered devices, and their angle accuracy is around 7°, which can be a problem in high-density informative areas. To address this issue, we propose a lightweight deep learning model that generates the probability density function of the gaze as a percentile contour. This solution allows us to introduce a visual attention representation based on a region rather than a point. In this way, we manage the trade-off between the ambiguity of a region and the error of a point. We tested our model in untethered devices with real-time performances; we evaluated its accuracy, outperforming our identified baselines (average fixation map and head direction).
With the increasing use of the Internet, people encounter a variety of news in online media and social media every day. For digital content without fact-checking mechanisms, it is likely that people perceive fake news as real when they do not have extensive knowledge about the news topic. In this paper, we study human eye movements when reading fake news and real news. Our results suggest that people regress more with their eyes when reading fake news, while the time until the first fixation in the text area of interest is not a distinguishing factor between real and fake content. Our results show that although the truthfulness of the content is not known to people in advance, their visual behavior differs when reading such content, indicating a higher level of confusion when reading fake content.
Much of the current expertise literature has found that domain specific tasks evoke different eye movements. However, research has yet to predict optimal image exploration using saccadic information and to identify and quantify differences in the search strategies between learners, intermediates, and expert practitioners. By employing LSTMs for scanpath classification, we found saccade features over time could distinguish all groups at high accuracy. The most distinguishing features were saccade velocity peak (72%), length (70%), and velocity average (68%). These findings promote the holistic theory of expert visual exploration that experts can quickly process the whole scene using longer and more rapid saccade behavior initially. The potential to integrate expertise model development from saccadic scanpath features into intelligent tutoring systems is the ultimate inspiration for our research. Additionally, this model is not confined to visual exploration in dental xrays, rather it can extend to other medical domains.
This article evaluates the Low/High Index of Pupillary Activity (LHIPA), a measure of cognitive effort based on pupil response, in the context of reading. At the beginning of 2nd and 3rd grade, 107 children (8-9 y.o.) from music and general primary school were asked to read 40 sentences with keywords differing in length and frequency while their eye movements were recorded. Sentences with low frequency or long keywords received more attention than sentences with high frequent or short keywords. The word frequency and length effects were more pronounced in younger children. At the 2nd grade, music children dwelt less on sentences with short frequent keywords than on sentences with long frequent keywords. As expected LHIPA decreased over sentences with low frequency short keywords suggesting more cognitive effort at earlier stages of reading ability. This finding shows the utility of LHIPA as a measure of cognitive effort in education.
Eye-tracking is a critical source of information for understanding human behavior and developing future mixed-reality technology. Eye-tracking enables applications that classify user activity or predict user intent. However, eye-tracking datasets collected during common virtual reality tasks have also been shown to enable unique user identification, which creates a privacy risk. In this paper, we focus on the problem of user re-identification from eye-tracking features. We adapt standardized privacy definitions of k-anonymity and plausible deniability to protect datasets of eye-tracking features, and evaluate performance against re-identification by a standard biometric identification model on seven VR datasets. Our results demonstrate that re-identification goes down to chance levels for the privatized datasets, even as utility is preserved to levels higher than 72% accuracy in document type classification.
In this paper, we present a new approach for decomposing scan paths and its utility for generating new scan paths. For this purpose, we use the K-Means clustering procedure to the raw gaze data and subsequently iteratively to find more clusters in the found clusters. The found clusters are grouped for each level in the hierarchy, and the most important principal components are computed from the data contained in them. Using this tree hierarchy and the principal components, new scan paths can be generated that match the human behavior of the original data. We show that this generated data is very useful for generating new data for scan path classification but can also be used to generate fake scan paths. Code can be downloaded here https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/?p=%2FHPCGen&mode=list.
We are constantly exposed to visually rich, oftentimes cluttered, environments. Previous studies have demonstrated the negative effects of clutter on visual search behavior and various oculomotor metrics. However, little is known about the consequences of clutter on other cognitive processes, like learning and memory. In the present study, we explored the effects of scene clutter on gaze behavior during a learning task and whether these gaze patterns influenced memory performance in a later cued recall task. Using spatial density analysis, we found that a higher degree of scene clutter resulted in more dispersed gaze behavior during the learning task. Additionally, participants recalled target locations less precisely in cluttered than in uncluttered scenes during the recall task. These findings have important implications for theories linking exploratory viewing with memory performance as well as for making recommendations on how interior spaces could be better organized to facilitate daily living.
Patterns of scene exploration can be characterized by two modes of processing: an ambient mode, serving global spatial orientation in the visual environment, followed by a focal mode, facilitating a more elaborated and object-centered analysis. Previous studies suggest that shifts in these processing modes can also be identified in relation to changes in dynamic environments based on eye movement parameters. For the analysis of these changes in eye movement behavior, the coefficient K has been suggested. Here, we used a multi-step task, in which participants needed to perform a series of sub-tasks, to investigate whether shifts in ambient/focal attention can also be identified in complex task solving. While differences in eye movements and in the coefficient K were observed when switching between sub-tasks, the overall temporal dynamics in visual attention remained stable.
Conventional eye-tracking methods require NIR-LEDs at the corners and edges of displays as references. However, extensive eyeball rotation results in the loss of reflections. Therefore, we propose imperceptible markers that can be dynamically displayed using liquid crystals. Using the characteristics of polarized light, the imperceptible markers are shown on a screen as references for eye-tracking. Additionally, the marker positions can be changed using the eyeball pose in the previous frame. The point-of-gaze was determined using the imperceptible markers based on model-based eye gaze estimation. The accuracy of the estimated PoG obtained using the imperceptible marker was approximately 1.69°, higher than that obtained using NIR-LEDs. Through experiments, we confirmed the feasibility and effectiveness of relocating imperceptible markers on the screen.
Modern appearance-based gaze tracking algorithms require vast amounts of training data, with images of a viewer annotated with “ground truth” gaze direction. The standard approach to obtain gaze annotations is to ask subjects to fixate at specific known locations, then use a head model to determine the location of “origin of gaze”. We propose using an IR gaze tracker to generate gaze annotations in natural settings that do not require the fixation of target points. This requires prior geometric calibration of the IR gaze tracker with the camera, such that the data produced by the IR tracker can be expressed in the camera’s reference frame. This contribution introduces a simple tracker/camera calibration procedure based on the PnP algorithm and demonstrates its use to obtain a full characterization of gaze direction that can be used for ground truth annotation.
Surveys are a widely used method for data collection from participants. However, responding to surveys is a time consuming task and requires cognitive and physical efforts of the participants. Eye-based interactions offer the advantage of high speed pointing, low physical effort and implicitness. These advantages are already successfully leveraged in different domains, but so far not investigated in supporting participants in responding to surveys. In this paper, we present EyeLikert, a tool that enables users to answer Likert-scale questions in surveys with their eyes. EyeLikert integrates three different eye-based interactions considering the Midas Touch problem. We hypothesize that enabling eye-based interactions to fill out surveys offers the potential to reduce the physical effort, increase the speed of responding questions, and thereby reduce drop-out rates.
Gaze calibration is common in traditional infrared oculographic eye tracking. However, it is not well studied in visible-light mobile/remote eye tracking. We developed a lightweight real-time gaze error estimator and analyzed calibration errors from two perspectives: facial feature-based and Monte Carlo-based. Both methods correlated with gaze estimation errors, but the Monte Carlo method associated more strongly. Facial feature associations with gaze error were interpretable, relating movements of the face to the visibility of the eye. We highlight the degradation of gaze estimation quality in a sample of children with autism spectrum disorder (as compared to typical adults), and note that calibration methods may improve Euclidean error by 10%.
We used a gaze-contingent eye-tracking setup to investigate how peripheral vision before the saccade affects post-saccadic foveal processing. Studies have revealed robust changes in foveal processing when the target is available in peripheral vision (the extrafoveal preview effect). To further characterize the role of peripheral vision, we adopted a paradigm where an upright/inverted extrafoveal face stimulus was shown and changed orientation (invalid preview) on 50% of trials during the saccade. Invalid preview significantly reduced post-saccadic discrimination performance compared to valid preview (aka preview effect). In addition, the preview face varied in eccentricity and added noise which affected its visibility. Face visibility was operationalized by a lateralized face identification task, run in a separate session. A mixed model analysis suggests that visibility modulated the preview effect. Overall, these findings constrain theories of how preview effects might influence perception under natural viewing conditions.
A distinctive characteristic of human driver behavior is the spatial bias of gaze allocation toward the vanishing point of the road. This behavior can be evaluated by comparing fixation maps against a spatial-bias baseline using typical metrics such as the Pearson’s Correlation Coefficient (CC) and the Kullback-Leibler divergence (KL). CC and KL penalize false positives and negatives differently, which implies that they can be affected by the characteristics of the baseline. In this paper, we analyze the use of CC and KL for the evaluation of drivers’ fixation maps against two types of spatial-bias baselines: baselines obtained from recorded fixation maps (data-based) and 2D-Gaussian baselines (function-based). Our results indicate that the use of CC can lead to misleading interpretations due to single fixations outside of the spatial bias area when compared to data-based baselines. Thus, we argue that KL and CC should be considered simultaneously under specific modeling assumptions.
Eye-tracking is a key technology for future retinal projection based AR glasses as it enables techniques such as foveated rendering or gaze-driven exit pupil steering, which both increases the system’s overall performance. However, two of the major challenges video oculography systems face are robust gaze estimation in the presence of glasses slippage, paired with the necessity of frequent sensor calibration. To overcome these challenges, we propose a novel, calibration-free eye-tracking sensor for AR glasses based on a highly transparent holographic optical element (HOE) and a laser scanner. We fabricate a segmented HOE generating two stereo images of the eye-region. A single-pixel detector in combination with our stereo reconstruction algorithm is used to precisely calculate the gaze position. In our laboratory setup we demonstrate a calibration-free accuracy of 1.35° achieved by our eye-tracking sensor; highlighting the sensor’s suitability for consumer AR glasses.
Eye movements analysis has great potential for understanding operator behaviour in many safety-critical domains, including aviation. In addition to traditional eye-tracking measures on pilots’ visual behavior, it seems promising to incorporate machine learning approaches to classify pilots’ visual scanning patterns. However, given the multitude of pattern measures, it is unclear which are better suited as predictors. In this study we analyzed the visual behaviour of eight pilots, flying different flight phases in a moving-base flight simulator. With this limited dataset we present a methodological approach to train linear Support Vector Machine models, using different combinations of the attention ratio and scanning pattern features. The results show that the overall accuracy to classify the pilots’ visual behaviour in different flight phases, improves from 51.6% up to 64.1% when combining the attention ratio and instrument scanning sequence in the classification model.
Gaze patterns are known to be highly individual, and therefore eye movements can serve as a biometric characteristic. We explore aspects of the fairness of biometric identification based on gaze patterns. We find that while oculomotoric identification does not favor any particular gender and does not significantly favor by age range, it is unfair with respect to ethnicity. Moreover, fairness concerning ethnicity cannot be achieved by balancing the training data for the best-performing model.
Eye movements in reading are known to reflect cognitive processes involved in reading comprehension at all linguistic levels, from the sub-lexical to the discourse level. This means that reading comprehension and other properties of the text and/or the reader should be possible to infer from eye movements. Consequently, we develop the first neural sequence architecture for this type of tasks which models scan paths in reading and incorporates lexical, semantic and other linguistic features of the stimulus text. Our proposed model outperforms state-of-the-art models in various tasks. These include inferring reading comprehension or text difficulty, and assessing whether the reader is a native speaker of the text’s language. We further conduct an ablation study to investigate the impact of each component of our proposed neural network on its performance.
We present a novel view on gaze event classification by redefining it as a time-to-event problem. In contrast to previous models, which consider the classification as discrete events, our redefinition allows for estimating the remaining time until the next saccade event. Therefore, we provide a feature analysis and an initial solution for compensating the latency of wearable eye-trackers build in today’s head-mounted displays. Similar to previous classifiers, we utilize oculomotor features such as velocity, acceleration, and event durations. In total, we analyze 104 extracted features of three datasets and apply different regression methods. We identify optimal window sizes for each feature and extract the importance of all extracted windows using recursive feature elimination. Afterwards, we evaluate the performance of all regressors using earlier selected features. We show that our selected regressors can predict the time-to-event better than the baseline, indicating the potential usage of time-to-event prediction of saccades.
Cloud gaming (CG) is a new approach to deliver a high-quality gaming experience to gamers anywhere, anytime, and on any device. To achieve this goal, CG requires a high bandwidth, which is still a major challenge. Many existing research pieces have focused on modeling or predicting the players’ Visual Attention Map (VAM) and allocating bitrate accordingly. Although studies indicate that both modalities of audio and video influence human perception, a few studies considered audio impacts in the cloud-based attention models. This paper demonstrates that the audio features in video games change the players’ VAMs in various game scenarios. Our findings indicated that incorporating game audio improves the accuracy of the predicted attention maps by 13% on average compared to the previous VAMs generated based on visual saliency by Game Attention Model for CG. The audio impact is more evident in video games with fewer visual components or indicators on the screen.
In online lectures, showing an on-screen instructor gained popularity amidst the Covid-19 pandemic. However, evidence in favor of this is mixed: they draw attention and may distract from the content. In contrast, using signaling (e.g., with a digital pointer) provides known benefits for learners. But effects of signaling were only researched in absence of an on-screen instructor. In the present explorative study, we investigated effects of an on-screen instructor on the division of learners´ attention; specifically, on following a digital pointer signal with their gaze. The presence of an instructor led to an increased number of fixations in the presenter area. This did neither affect learning outcomes nor gaze patterns following the pointer. The average distance between the learner's gaze and the pointer position predicts the student's quiz performance, independent of the presence of an on-screen instructor. This can also help in creating automated immediate-feedback systems for educational videos.
Calibration is performed in eye-tracking studies to map raw model outputs to gaze-points on the screen and improve accuracy of gaze predictions. Calibration parameters, such as user-screen distance, camera intrinsic properties, and position of the screen with respect to the camera can be easily calculated in controlled offline setups, however, their estimation is non-trivial in unrestricted, online, experimental settings. Here, we propose the application of deep learning models for eye-tracking in online experiments, providing suitable strategies to estimate calibration parameters and perform personal gaze calibration. Focusing on fixation accuracy, we compare results with respect to calibration frequency, the time point of calibration during data collection (beginning, middle, end), and calibration procedure (fixation-point or smooth pursuit-based). Calibration using fixation and smooth pursuit tasks, pooled over three collection time-points, resulted in the best fixation accuracy. By combining device calibration, gaze calibration, and the best-performing deep-learning model, we achieve an accuracy of 2.580−a considerable improvement over reported accuracies in previous online eye-tracking studies.
A multitude of eye-trackers are available differing in cost and usability. Accuracy and precision provide a benchmark for performance, however, parameters given by manufacturers are often recorded under perfect conditions and may not apply to lab conditions with human subjects. This preregistered study compared three remote eye-trackers under realistic lab conditions: The high-cost EyeLink 1000+, the medium-cost Tobii Pro X3-120, and the low-cost Gazepoint GP3 HD Desktop. Participants (N = 25) completed an extensive test battery twice with each device. Parameters were compared within-subjects using robust linear mixed models. The EyeLink performed better than the Tobii Pro X3-120 and GP3 HD in most categories (accuracy EyeLink: 0.436°; Tobii Pro X3-120: 1.22°; GP3 HD: 1.03°), however, with suboptimal head positions, the Tobii Pro X3-120’s and GP3 HD's accuracies equaled or exceeded the EyeLink's. Tobii Pro X3-120 and GP3 HD performed comparably. Benchmark information, obtained under various realistic recording conditions, is provided for each device.
In this work, we investigate the hypothesis that adding 3D information of the periocular region to an end-to-end gaze-estimation network can improve gaze-estimation accuracy in the presence of slippage, which occurs quite commonly for head-mounted AR/VR devices. To this end, using UnityEyes we generate a simulated dataset with RGB and depth-maps of the eye with varying camera placement to simulate slippage artifacts. We generate different noise profiles for the depth-maps to simulate depth sensor noise artifacts. Using this data, we investigate the effects of different fusion techniques for combining image and depth information for gaze estimation. Our experiments show that under an attention-based fusion scheme, 3D information can significantly improve gaze-estimation and compensates well for slippage induced variability. Our finding supports augmenting 2D cameras with depth-sensors for the development of robust end-to-end appearance based gaze-estimation systems.
We present a method for skill characterisation of sonographer gaze patterns while performing routine second trimester fetal anatomy ultrasound scans. The position and scale of fetal anatomical planes during each scan differ because of fetal position, movements and sonographer skill. A standardised reference is required to compare recorded eye-tracking data for skill characterisation. We propose using an affine transformer network to localise the anatomy circumference in video frames, for normalisation of eye-tracking data. We use an event-based data visualisation, time curves, to characterise sonographer scanning patterns. We chose brain and heart anatomical planes because they vary in levels of gaze complexity. Our results show that when sonographers search for the same anatomical plane, even though the landmarks visited are similar, their time curves display different visual patterns. Brain planes also, on average, have more events or landmarks occurring than the heart, which highlights anatomy-specific differences in searching approaches.
Visualising patterns in clinicians’ eye movements while interpreting fetal ultrasound imaging videos is challenging. Across and within videos, there are differences in size and position of Areas-of-Interest (AOIs) due to fetal position, movement and sonographer skill. Currently, AOIs are manually labelled or identified using eye-tracker manufacturer specifications which are not study specific. We propose using unsupervised clustering to identify meaningful AOIs and bi-contour plots to visualise spatio-temporal gaze characteristics. We use Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to identify the AOIs, and use their corresponding images to capture granular changes within each AOI. Then we visualise transitions within and between AOIs as read by the sonographer. We compare our method to a standardised eye-tracking manufacturer algorithm. Our method captures granular changes in gaze characteristics which are otherwise not shown. Our method is suitable for exploratory data analysis of eye-tracking data involving multiple participants and AOIs.
We compare the measurement error and validity of webcam-based eye tracking to that of a remote eye tracker as well as software integration of both. We ran a study with n = 83 participants, consisting of a point detection task and an emotional visual search task under three between-subjects experimental conditions (webcam-based, remote, and integrated). We analyzed location-based (e.g., fixations) and process-based eye tracking metrics (ambient-focal attention dynamics). Despite higher measurement error of webcam eye tracking, our results in all three experimental conditions were in line with theoretical expectations. For example, time to first fixation toward happy faces was significantly shorter than toward sad faces (the happiness-superiority effect). As expected, we also observed the switch from ambient to focal attention depending on complexity of the visual stimuli. We conclude that webcam-based eye tracking is a viable, low-cost alternative to remote eye tracking.
We investigate the hypothesis that gaze-signal can improve egocentric action recognition on the standard benchmark, EGTEA Gaze++ dataset. In contrast to prior work where gaze-signal was only used during training, we formulate a novel neural fusion approach, Cross-modality Attention Blocks (CMA), to leverage gaze-signal for action recognition during inference as well. CMA combines information from different modalities at different levels of abstraction to achieve state-of-the-art performance for egocentric action recognition. Specifically, fusing the video-stream with optical-flow with CMA outperforms the current state-of-the-art by 3%. However, when CMA is employed to fuse gaze-signal with video-stream data, no improvements are observed. Further investigation of this counter-intuitive finding indicates that small spatial overlap between the network’s attention-map and gaze ground-truth renders the gaze-signal uninformative for this benchmark. Based on our empirical findings, we recommend improvements to the current benchmark to develop practical systems for egocentric video understanding with gaze-signal.
Adaptive learning systems analyse a learner's input and respond on the basis of it, for example by providing individual feedback or selecting appropriate follow-up tasks. To provide good feedback, such a system must have a high diagnostic capability. The collection of gaze data alongside the traditional data obtained through mouse and keyboard input seems to be a promising approach for this. We use the example of graphical differentiation to investigate whether and how the integration of eye tracking data into such a system can succeed. For this purpose, we analyse students' eye tracking data and gather empirical understanding about which measures are suitable as decision support for adaptation
Real-Time Advanced Eye Movements Analysis Pipeline (RAEMAP) is an advanced pipeline to analyze traditional positional gaze measurements as well as advanced eye gaze measurements. The proposed implementation of RAEMAP includes real-time analysis of fixations, saccades, gaze transition entropy, and low/high index of pupillary activity. RAEMAP will also provide visualizations of fixations, fixations on AOIs, heatmaps, and dynamic AOI generation in real-time. This paper outlines the proposed architecture of RAEMAP.
The human gaze characteristics provide informative cues on human behavior during various activities. Using traditional eye trackers, assessing gaze characteristics in the wild requires a dedicated device per participant and therefore is not feasible for large-scale experiments. In this study, we propose a commodity hardware-based multi-user eye-tracking system. We leverage the recent advancements in Deep Neural Networks and large-scale datasets for implementing our system. Our preliminary studies provide promising results for multi-user eye-tracking on commodity hardware, providing a cost-effective solution for large-scale studies.
Eye-tracking has been used in various domains, including human-computer interaction, psychology, and many others. Compared to commercial eye trackers, eye tracking using off-the-shelf cameras has many advantages, such as lower cost, pervasiveness, and mobility. Quantifying human attention on the mobile device is invaluable in human-computer interaction. Like videos and mobile games, dynamic visual stimuli require higher attention than static visual stimuli such as web pages and images. This research aims to develop an accurate eye-tracking algorithm using the front-facing camera of mobile devices to identify human attention hotspots when viewing video type contents. The shortage of computational power in mobile devices becomes a challenge to obtain higher user satisfaction. Edge computing moves the processing power closer to the source of the data and reduces the latency introduced by the cloud computing. Therefore, the proposed algorithm will be extended with mobile edge computing to provide a real-time eye tracking experience for users
In this paper, we first position the current dwell selection among gaze-based interactions and its advantages against head-gaze selection, which is the mainstream interface for HMDs. Next, we show how dwell selection and head-gaze selection are used in an actual interaction situation. By comparing these two selection methods, we describe the potential of dwell selection as an essential AR/VR interaction.
Reading plays a vital role in updating the researchers on recent developments in the field, including but not limited to solutions to various problems and collaborative studies between disciplines. Prior studies identify reading patterns to vary depending on the level of expertise of the researcher on the content of the document. We present a pilot study of eye-tracking measures during a reading task with participants across different areas of expertise with the intention of characterizing the reading patterns using both eye movement and pupillary information.
The topic of visual saliency is well-known and spreads across numerous disciplines. In this paper, we examine how saliency models perform on images with specific characteristics. We explore a saliency of 14 different digitized paintings, all representing a biblical scene of The Last Supper. We evaluate the performance of different saliency models. The models are using the traditional approach as well as the deep learning approach. For the evaluation, we use three different metrics, AUC, NSS and CC. As ground truth, we use eye-tracking data from 35 participants. Our analysis shows that deep-learning methods predict the most salient parts of the paintings very similar to real eye fixations. We also checked the consistency of gaze patterns among the participants.
Aircraft maintenance technicians (AMTs) play an essential role in life-long security of helicopters. There are two major types of operations in maintenance activity: information intake/processing and motor actions. Modeling expertise of the AMT is the main objective of this doctoral project. Given the constraints of real-world research, mobile eye-tracking appears to be an essential tool for the measurement of information intake, notably concerning the use of maintenance documentation during the maintenance task preparation and execution. . This extended abstract will present the main research objectives, our approach and methodology and some preliminary results.
The viability and need for eye movement-based authentication has been well established in light of the recent adoption of Virtual Reality headsets and Augmented Reality glasses. Previous research has demonstrated the practicality of eye movement-based authentication, but there still remains space for improvement in achieving higher identification accuracy. In this study, we focus on incorporating linguistic features in eye movement based authentication, and we compare our approach to authentication based purely on common first-order metrics across 9 machine learning models. Using GazeBase, a large eye movement dataset with 322 participants, and the CELEX lexical database, we show that AdaBoost classifier is the best performing model with an average F1 score of 74.6%. More importantly, we show that the use of linguistic features increased the accuracy of most classification models. Our results provide insights on the use of machine learning models, and motivate more work on incorporating text analysis in eye movement based authentication.
We propose machine learning models to recognize state of non-concentration using eye-gaze data to increase the productivity. The experimental results show that Random Forest classifier with a 12 s window can divide the states with an F1-score more than 0.9.
Fluid intelligence is considered to be the foundation to many aspects of human learning and performance. Individuals’ behavior while solving intelligence tests is therefore an important component in understanding problem-solving strategies and learning processes. We present preliminary results of a novel eye-tracking-based approach to predict participants’ decisions while solving a fluid intelligence test that utilizes semantic scanpath comparisons. Normalizing scanpaths and applying a knn classifier allows us to make individual predictions and combine them to predict final scores. We evaluated our proposed approach on the TüEyeQ dataset published by Kasneci et al. containing data of 315 university students, who worked on the Culture Fair Intelligence Test. Our approach was able to explain 39.207% of variance in the final score and predictions for participants’ final scores showed a correlation of τ = 0.65759 with participants’ actual scores. Overall, the proposed method has shown great potential that can be expanded on in future research.
Webcam-based eye-tracking promises easy and quick data collection without the need for specific or additional eye-tracking hardware. This makes it especially attractive for educational research, in particular for modern formats, such as MOOCs. However, in order to fulfill its promises, webcam-based eye tracking has to overcome several challenges, most importantly, varying spatial and temporal resolutions. Another challenge that the educational domain faces especially, is that typically individual students are of interest in contrast to average values. In this paper, we explore whether an attention measure that is based on eye movement synchronicity of a group of students can be applied with unreliably-sampled data. Doing so we aim to reproduce earlier work that showed that, on average, eye movement synchronicity can predict performance in a comprehension quiz. We were not able to reproduce the findings with unreliably-sampled data, which highlights the challenges that lie ahead of webcam-based eye tracking in practice.
This pilot study analyzes the reading patterns of 15 German students while receiving instant messages through a smartphone, imitating an online conversation. With this pilot study, we aim to test the eye-tracking setup and methodology employed, in which we analyze specifically the moment in which participants return to the reading after answering the instant messages. We explore the relationships with reading comprehension performance and differences across readers, considering individual differences regarding reading habits and multitasking behavior.
Mind wandering (MW) is defined as a shift of attention to task-unrelated internal thoughts that is pervasive and disruptive for learning performance. Current state-of-the-art gaze-based attention-aware intelligent systems are capable of detecting MW from eye movements and delivering interventions to mitigate its negative effects. However, the beneficial functions of MW and its trait-level tendency, defined as the content of MW experience, are still largely neglected by these systems. In this pilot study, we address the questions of whether different MW trait-level tendencies can be detected through off-screen fixations’ frequency and duration and blink rate during a lecture viewing task. We focus on prospective planning and creative problem-solving as two of the main MW trait-level tendencies. Despite the non-significance, the descriptive values show a higher frequency and duration of off-screen fixations, but lower blink rate, in the creative problem-solving MW condition. Interestingly, we do find a highly significant correlation between MW level and engagement scores in the prospective planning MW group. Potential explanations for the observed results are discussed. Overall, these findings represent a preliminary step towards the development of more accurate and adaptive learning technologies, and call for further studies on MW trait-level tendency detection.
Video-based online educational content has been more popular nowadays. However, due to the limited communication and interaction between the learners and instructors, various problems regarding learning performance have occurred. Gaze sharing techniques received much attention as a means to address this problem, however, there still exists a lot of room for improvement. In this work-in-progress paper, we introduce some possible improvement points regarding gaze visualization strategies and report the preliminary results of our first step towards our final goal. Through a user study with 30 university students, we found the feasibility of the prototype system and the future directions of our research.
Eye Tracking is getting increasingly popular in mathematics education research. However, there is a debate about the specific benefits that eye tracking offers in mathematics educational research, and about its reliability and validity to assess relevant cognitive processes. The aim of the present study is to investigate whether eye tracking allows valid investigation of teachers’ assessment competencies in comparison with retrospective think-aloud. Specifically, we are interested in the cognitive processes underlying teachers’ assessment of incorrect student solutions. For this purpose, we developed ten vignettes that display written student work on fraction problems, each including a mistake; the display of student work is followed by three potential teacher reactions. Participants have to choose the teacher reaction they consider most appropriate. The vignettes have already been validated and piloted. We are currently planning an explorative qualitative eye tracking study with pre-service and in-service teachers (N = 6). This study will provide the opportunity to explore the relation between eye tracking data and think-aloud protocols regarding cognitive processes underlying diagnostic assessment. More generally, the results will contribute to a better understanding of the interpretation of eye tracking data in teacher education research.
Educational systems are based on the teaching inputs from teachers and the learning outputs from the pupils/students. Moreover, also the surrounding environment as well as the social activities and networks of the involved people have an impact on the success of such a system. However, in most of them, standard teaching equipment is used while it is difficult to gain some knowledge and insights about the fine-grained spatio-temporal activities in the classroom. On the other hand, understanding why an educational system is not that successful as it was expected, can hardly be explored by just generating statistics about the learning outputs based on several student-related metrics. In this paper we discuss the benefits of using eye tracking as a powerful technology to track people’s visual attention to explore the value of an educational system, however, we also take a look at the drawbacks that come in the form of data recording, storing, processing, and finally, data analysis and visualization to rapidly gain insights in such large datasets apart from many more. We argue that if eye tracking is applied in a clever way, an educational system might draw valuable conclusions to improve it from several perspectives, be it under the light of online/remote or classroom teaching or also from the perspectives of teachers and pupils/students.
Increasingly, videogame designers harness game usability techniques to inform design decisions and reduce costs, but advanced techniques are not commonplace yet—perhaps due to failing to see their benefits or a lack of expertise and facilities. In this paper, we present a case study that demonstrates the value of using eye-tracking to help guide the design process of a narrative game by an independent game studio. The designers of this studio had defined four options of how text should be presented to players (Horizontal, Vertical, Subtitle, and Messenger), and were deadlocked on what to choose. As part of a collaboration with a university, a within-subjects eye-tracking study was conducted with 15 participants to evaluate the options. Combining the eye-tracking data with stated user preferences, designers reached a consensus that the game in question benefits from strategies derived from messenger on smart-phone interaction design—later confirmed with statistical analysis of fixation transition on elements of interest on screen. The use of eye-tracking broke the deadlock and helped inform a final design decision, as demonstrated by a decision making impact analysis that describes the design process. The paper ends with a reflection on the applicability of this academia-industry model to other contexts.
This paper describes a virtual reality prototype’s game and level design that should serve as stimulus material to arouse confusion among players. In doing so, we will use the stimulus to create a system that analyzes the players’ gaze behavior and determine whether and when the player is confused or already frustrated. Consequently, this information can then be used by game and level designers to give the player appropriate assistance and guidance during gameplay. To reach our goal of creating a design that arouses confusion, we used guidelines for high-quality level design and did the exact opposite. We see the PLEY Workshop as a forum to identify and discuss the potentials and risks of the stimulus’ design and level structure. In general, the paper should provide insights into our design decisions for researchers interested in investigating gaze behavior in games.
Maintaining autonomous activities can be challenging for patients with neuromuscular disorders or quadriplegia, where control of joysticks for powered wheelchairs may not be feasible. Advancements in human machine interfaces have resulted in methods to capture the intentionality of the individual through non-traditional controls and communicating the users desires to a robotic interface. This research explores the design of a training game that teaches users to control a wheelchair through such a device that utilizes electromyography (EMG). The training game combines the use of EMG and eye tracking to enhance the impression of dignity while building self-efficacy and supporting autonomy for users. The system implements both eye tracking and surface electromyography, via the temporalis muscles, for gamified training and simulation of a novel wheelchair interface.
Eye movement data can be used for a variety of research in marketing, advertisement, and other design-related industries to gain interesting insights into customer preferences. However, interpreting such data can be a challenging task due to its spatio-temporal complexity. In this paper we describe a web-based tool that has been developed to provide various visualizations for interpreting eye movement data of static stimuli. The tool provides several techniques to visualize and analyze eye movement data. These visualizations are interactive and linked in a coordinated way to help gain more insights. Overall, this paper illustrates the features and functionality offered by the tool by using data recorded from transport map readers in a previously conducted experiment as use case. Furthermore, the paper discusses limitations of the tool and possible future developments.
Day and night mode is widely used when working with any digital device, including map navigation. Many users have the day and night mode change set automatically. However, it is not proven if this functionality helps improve the transfer of information between the map and the user when changing the lighting conditions. The short paper aims to evaluate the influence of day and night modes on the map users’ perception. User testing was realised in the eye-tracking laboratory with 43 participants. These participants were categorised by the average number of hours spent driving per week and their use of map navigation. The eye-tracking experiment focuses on the orientation of the participants in the day and night mode of map views when changing the lighting conditions. For that, the Euro Truck Simulator game environment was chosen, where the participants were guided by the map navigation in the bottom right corner of the screen. The lighting conditions in the ET laboratory have been adjusted to match the lighting conditions for both day and night as realistically as possible, and the map navigation mode was switched between day and night mode. The explanatory research suggested that using day mode during nighttime may cause disorientation and dazzle; using night mode during daytime does not cause that problem, but the user perception is slightly slower.
We propose a three-step concept and visual design for supporting the visual exploration of high-dimensional data in scatterplots through eye-tracking. First, we extract subsets in the underlying data using existing classifications, automated clustering algorithms, or eye-tracking. For the latter, we map gaze to the underlying data dimensions in the scatterplot. Clusters of data points that have been the focus of the viewers’ gaze are marked as clusters of interest (eye-mind hypothesis). In a second step, our concept extracts various properties from statistics and scagnostics from the clusters. The third step uses these measures to compare the current data clusters from the main scatterplot to the same data in other dimensions. The results enable analysts to retrieve similar or dissimilar views as guidance to explore the entire data set. We provide a proof-of-concept implementation as a test bench and describe a use case to show a practical application and initial results.
Gaze-based analysis of areas of interest (AOIs) is widely used in information visualisation research to understand how people explore visualisations or assess the quality of visualisations concerning key characteristics such as memorability. However, nearby AOIs in visualisations amplify the uncertainty caused by the gaze estimation error, which strongly influences the mapping between gaze samples or fixations and different AOIs. We contribute a novel investigation into gaze uncertainty and quantify its impact on AOI-based analysis on visualisations using two novel metrics: the Flipping Candidate Rate (FCR) and Hit Any AOI Rate (HAAR). Our analysis of 40 real-world visualisations, including human gaze and AOI annotations, shows that gaze uncertainty frequently and significantly impacts the analysis conducted in AOI-based studies. Moreover, we analysed four visualisation types and found that bar and scatter plots are usually designed in a way that causes more uncertainty than line and pie plots in gaze-based analysis.
The relationship between ocular metrics and factor ratings for mental workloads are examined using a Bayesian statistical state-space modeling technique. During a visual search task experiment, microsaccade frequency and pupil size were observed as measures of mental workload. Individual mental workloads were measured using a 6 factor NASA-TLX rating scale. The models calculated generalized temporal changes of microsaccade frequency and pupil size during tasks. The contributions of factors of mental workload are examined using effect size. Also, chronological analysis was introduced to detect the reactions of the metrics during tasks. The results suggest that the response selectivity of microsaccades and pupil size can be used as factors of mental workload.
A dry-electrode head-mounted sensor for visually-evoked electroencephalogram (EEG) signals has been introduced to the gamer market, and provides wireless, low-cost tracking of a user’s gaze fixation on target areas in real-time. Unlike traditional EEG sensors, this new device is easy to set up for non-professionals. We conducted a Fitts’ law study (N = 6) and found the mean throughput (TP) to be 0.82 bits/s. The sensor yielded robust performance with error rates below 1%. The overall median activation time (AT) was 2.35 s with a minuscule difference between one or nine concurrent targets. We discuss whether the method might supplement camera-based gaze interaction, for example, in gaze typing or wheelchair control, and note some limitations, such as a slow AT, the difficulty of calibration with thick hair, and the limit of 10 concurrent targets.
We tested experimentally the idea of reducing the number of buttons in the gaze-based text entry system by replacing all vowels with a single diamond character, which we call super-vowel. It is inspired by historical optimizations of the written language, like Abjar. This way, the number of items on the screen was reduced, simplifying text input and allowing to make the buttons larger. However, the modification can also be a distractor that increases the number of errors. As a result of an experiment on 29 people, it turned out that in the case of non-standard methods of entering text, the modification slightly increases the speed of entering the text and reduces the number of errors. However, this does not apply to the standard keyboard, a direct transformation of physical computer keyboards with a Qwerty button layout.
Gaze-aware interfaces should work on all display sizes. This paper researches whether angular velocity or tangential speed should be kept when scaling a gaze-aware interface based on circular smooth pursuits to another display size. We also address the question of which target speed and which trajectory size feels most comfortable for the users. We present the results of a user study where the participants were asked how they perceived the speed and the radius of a circular moving smooth pursuit target. The data show that the users’ judgment of the optimal speed corresponds with an optimal detection rate. The results also enable us to give an optimal value pair for target speed and trajectory radius. Additionally, we give a functional relation on how to adapt the target speed when scaling the geometry to keep optimal detection rate and user experience.
Interacting with a group of people requires to direct the attention of the whole group, thus requires feedback about the crowd’s attention. In face-to-face interactions, head and eye movements serve as indicator for crowd attention. However, when interacting online, such indicators are not available. To substitute this information, gaze visualizations were adapted for a crowd scenario. We developed, implemented, and evaluated four types of visualizations of crowd attention in an online study with 72 participants using lecture videos enriched with audience’s gazes. All participants reported increased connectedness to the audience, especially for visualizations depicting the whole distribution of gaze including spatial information. Visualizations avoiding spatial overlay by depicting only the variability were regarded as less helpful, for real-time as well as for retrospective analyses of lectures. Improving our visualizations of crowd attention has the potential for a broad variety of applications, in all kinds of social interaction and communication in groups.
A user’s free hands provide an intuitive platform to position and design virtual menu interfaces. We explore how the hands and eyes can be integrated in the design of hand-attached menus. We synthesise past work from the literature and derive a design space that crosses properties of menu systems with an hand and eye input vocabulary. From this, we devise three menu systems that are based on the novel concept of Look & Turn: gaze indicates menu selection, and rotational turn of the wrist navigates menu and manipulates continuous parameters. Each technique allows users to interact with the hand-attached menu using the same hand, while keeping the other hand free for drawing. Based on a VR prototype that combines eye-tracking and glove-based finger tracking, we discuss first insights on technical and human factors of the promising interaction concept.
Comparing the performance of new eye-tracking devices against an established benchmark is vital for identifying differences in the way eye movements are reported by each device. This paper introduces a new paired data set comprised of eye movement recordings captured simultaneously with both the EyeLink 1000—considered the “gold standard” in eye-tracking research studies—and the recently released AdHawk MindLink eye tracker. Our work presents a methodology for simultaneous data collection and a comparison of the resulting eye-tracking signal quality achieved by each device.
The Oculomotor plant mathematical model (OPMM) is a dynamic system that describes a human eye in motion. In this study, we focus on an anatomically inspired homeomorphic model where every component is a mathematical representation of a certain biological phenomenon of a real oculomotor plant. This approach estimates internal state of oculomotor plant from recorded eye movements. In the past, the utility of such models was shown to be useful in biometrics and gaze contingent rendering via eye movement prediction. In previous studies, an implicit underlying assumption was that a set of parameters estimated for a certain subject should remain consistent in time and generalize to unseen data. We note a major drawback of the prior work, as it operated under this assumption without explicit validation. This work creates a quantifiable baseline for the specific OPMM where the generalizability of the model parameters is the foundational property of their estimation.
Recently, image-to-image translation (I2I) has met with great success in computer vision, but few works have paid attention to the geometric changes that occur during translation. The geometric changes are necessary to reduce the geometric gap between domains at the cost of breaking correspondence between translated images and original ground truth. We propose a novel geometry-aware semi-supervised method to preserve this correspondence while still allowing geometric changes. The proposed method takes a synthetic image-mask pair as input and produces a corresponding real pair. We also utilize an objective function to ensure consistent geometric movement of the image and mask through the translation. Extensive experiments illustrate that our method yields a 11.23% higher mean Intersection-Over-Union than the current methods on the downstream eye segmentation task. The generated image has a 15.9% decrease in Frechet Inception Distance indicating higher image quality.
Iris-based biometric authentication is a wide-spread biometric modality due to its accuracy, among other benefits. Improving the resistance of iris biometrics to spoofing attacks is an important research topic. Eye tracking and iris recognition devices have similar hardware that consists of a source of infra-red light and an image sensor. This similarity potentially enables eye tracking algorithms to run on iris-driven biometrics systems. The present work advances the state-of-the-art of detecting iris print attacks, wherein an imposter presents a printout of an authentic user’s iris to a biometrics system. The detection of iris print attacks is accomplished via analysis of the captured eye movement signal with a deep learning model. Results indicate better performance of the selected approach than the previous state-of-the-art.