The current best practice for hands-free selection using Virtual and Augmented Reality (VR/AR) head-mounted displays is to use head-gaze for aiming and dwell-time or clicking for triggering the selection. There is an observable trend for new VR and AR devices to come with integrated eye-tracking units to improve rendering, to provide means for attention analysis or for social interactions. Eye-gaze has been successfully used for human-computer interaction in other domains, primarily on desktop computers. In VR/AR systems, aiming via eye-gaze could be significantly faster and less exhausting than via head-gaze.
To evaluate benefits of eye-gaze-based interaction methods in VR and AR, we compared aiming via head-gaze and aiming via eye-gaze. We show that eye-gaze outperforms head-gaze in terms of speed, task load, required head movement and user preference. We furthermore show that the advantages of eye-gaze further increase with larger FOV sizes.
The relationships between eye and head movements during the viewing of various visual stimuli using a head mounted display (HMD) and a large flat display were compared. The visual sizes of the images displayed were adjusted virtually, using an image processing technique. The negative correlations between head and eye movements in a horizontal direction were significant for some visual stimuli using an HMD and after some practice viewing images. Also, scores of two factors for subjective assessment of viewing stimuli positively correlated with horizontal head movements when certain visual stimuli were used. The result suggests that under certain conditions the viewing of tasks promotes head movements and stimulates correlational relationships between eye and head movements.
Most applications involving gaze-based interaction are supported by estimation techniques that find a mapping between gaze data and corresponding targets on a 2D surface. However, in Virtual and Augmented Reality (AR) environments, interaction occurs mostly in a volumetric space, which poses a challenge to such techniques. Accurate point-of-regard (PoR) estimation, in particular, is of great importance to AR applications, since most known setups are prone to parallax error and target ambiguity. In this work, we expose the limitations of widely used techniques for PoR estimation in 3D and propose a new calibration procedure using an uncalibrated head-mounted binocular eye tracker coupled with an RGB-D camera to track 3D gaze within the scene volume. We conducted a study to evaluate our setup with real-world data using a geometric and an appearance-based method. Our results show that accurate estimation in this setting still is a challenge, though some gaze-based interaction techniques in 3D should be possible.
Several methods of gaze control of video playback were implemented in MovEye application. Two versions of MovEye are almost ready: for watching online movies from the YouTube service and for watching movies from the files stored on local drives. We have two goals: the social one is to help people with physical disabilities to control and enrich their immediate environment; the scientific one is to compare the usability of several gaze control methods for video playback in case of healthy and disabled users. This paper aims to our gaze control applications. Our next step will be conducting the accessibility and user experience (UX) tests for both healthy and disabled users. The long-time perspective of this research could lead to the implementation of gaze control in TV sets and other video playback devices.
Playing music with the eyes is a challenging task. In this paper, we propose a virtual digital musical instrument, usable by both motor-impaired and able-bodied people, controlled through an eye tracker and a "switch". Musically speaking, the layout of the graphical interface is isomorphic, since the harmonic relations between notes have the same geometrical shape regardless of the key signature of the music piece. Four main design principles guided our choices, namely: (1) Minimization of eye movements, especially in case of large note intervals; (2) Use of a grid layout where "nodes" (keys) are connected each other through segments (employed as guides for the gaze); (3) No need for smoothing filters or time thresholds; and (4) Strategic use of color to facilitate gaze shifts. Preliminary tests, also involving another eye-controlled musical instrument, have shown that the developed system allows "correct" execution of music pieces even when characterized by complex melodies.
Text entry by gazing on a virtual keyboard (also known as eye typing) is an important component of any gaze communication system. One of the main challenges for efficient communication is how to avoid unintended key selections due to the Midas' touch problem. The most common selection technique by gaze is dwelling. Though easy to learn, long dwell-times slows down the communication, and short dwells are prone to error. Context switching (CS) is a faster and more comfortable alternative, but the duplication of contexts takes a lot of screen space. In this paper we introduce two new CS designs using dynamic expanding targets that are more appropriate when a reduced interaction window is required. We compare the performance of the two new designs with the original CS design using QWERTY layouts as contexts. Our results with 6 participants typing with the 3 keyboards show that the use of smaller size layouts with dynamic expanding targets are as accurate and comfortable as the larger QWERTY layout, though providing lower typing speeds.
Gaze and head tracking, or pointing, in head-mounted displays enables new input modalities for point-select tasks. We conducted a Fitts' law experiment with 41 subjects comparing head pointing and gaze pointing using a 300 ms dwell (n = 22) or click (n = 19) activation, with mouse input providing a baseline for both conditions. Gaze and head pointing were equally fast but slower than the mouse; dwell activation was faster than click activation. Throughput was highest for the mouse (2.75 bits/s), followed by head pointing (2.04 bits/s) and gaze pointing (1.85 bits/s). With dwell activation, however, throughput for gaze and head pointing were almost identical, as was the effective target width (≈ 55 pixels; about 2°) for all three input methods. Subjective feedback rated the physical workload less for gaze pointing than head pointing.
Gaze-assisted interaction has commonly been used in a standard desktop setting. When interacting with large displays, as new scenarios like situationally-induced impairments emerge, it is more convenient to use the gaze-based multi-modal input than other inputs. However, it is unknown as to how the gaze-based multi-modal input compares to touch and mouse inputs. We compared gaze+foot multi-modal input to touch and mouse inputs on a large display in a Fitts' Law experiment that conforms to ISO 9241-9. From a study involving 23 participants, we found that the gaze input has the lowest throughput (2.33 bits/s), and the highest movement time (1.176 s) of the three inputs. In addition, though touch input involves maximum physical movements, it achieved the highest throughput (5.49 bits/s), the least movement time (0.623 s), and was the most preferred input.
To improve the performance of an image retrieval system, a novel content-based image retrieval (CBIR) framework with eye tracking data based on an implicit relevance feedback mechanism is proposed in this paper. Our proposed framework consists of three components: feature extraction and selection, visual retrieval, and relevance feedback. First, by using the quantum genetic algorithm and the principle component analysis algorithm, optimal image features with 70 components are extracted. Second, a finer retrieving procedure based on multiclass support vector machine (SVM) and fuzzy c-mean (FCM) algorithm is implemented for retrieving most relevant images. Finally, a deep neural network is trained to exploit the information of the user regarding the relevance of the returned images. This information is then employed to update the retrieving point for a new round retrieval. Experiments on two databases (Corel and Caltech) show that the performance of CBIR can be significantly improved by using our proposed framework.
Gaze sharing is found to be beneficial for computer-mediated collaboration by several studies. Usually, a coordinate-based visualization like a gaze cursor is used which helps to reduce or replace deictic expressions and thus makes gaze sharing a useful addition for tasks where referencing is crucial. But the spatial-based visualization limits its use to What-You-See-Is-What-I-See (WYSIWIS) interfaces and puts a ceiling on group size, as multiple object tracking becomes near impossible beyond dyads. In this paper, we will explore and discuss the use of gaze sharing outside the realm of WYSIWIS interfaces and dyads by replacing the gaze cursor with information-based gaze sharing in form of a reading progress bar to a chat interface. Preliminary results with triads show that historic and present information on attention and reading behavior are a useful addition to increase group awareness.