Head-mounted displays(HMDs) are expected to be used as daily devices. Developing interfaces to control contents projected in a head-mounted display (HMD) is key to leading the spread of HMD usage. With the need for a new interface of the HMD, gaze interface has been researched. One of the main challenges has been detecting the user’s control intention from gaze movements. Considering short periods of gaze movement away, cumulative gaze dwell time is considered suitable for predicting operations from natural gaze movements while browsing contents. In this study, we evaluated a gaze-only contents browsing method using cumulative dwell time by comparing the hand controller method. The results showed that the proposed gaze method can achieve the same level of time and usability with less physical workload than the controller method. It was also found that more people prefer the proposed gaze-based method (16 in total; gaze: 3, rather gaze: 6, neutral: 1, rather controllers: 6). These results indicate the applicability of implicit interaction by gaze for browsing contents on HMDs.
In augmented reality assistance systems for manual assembly, the user is forced to move the eyes to areas that are away from the performed assembly task in various situations which can cause inevitable physical effort and mental distraction. To counteract these movements and distraction potentials, we have adapted, developed, and evaluated approaches to minimize such movements and thus reduce distractions. Eye-Gaze Attention Markers can be used to direct the user’s attention back to the current point of interest. With Eye-Gaze Following, an eye-gaze-based extension of an available Mixed Reality Toolkit (MRTK) object following functionality, currently relevant information can be attached to the user’s eye gaze to reduce focus shifts. For these approaches, we present the design and their implementation, as well as a user study (N = 40) evaluating completion time, mental workload, eye movements, and focus shifts to determine the impact on eye movement and distraction minimization. Furthermore, we provide guidelines for the use and improvements of the approaches presented. With the proposed approaches, we were able to achieve improvements in terms of eye movements and distraction minimization and identify further potential for research.
Eye tracking enables several novel ways of interacting with virtual reality (VR) games, yet research on blink interaction techniques (BITs) in VR remains scarce. This short paper introduces a preliminary design space for BITs in VR and describes an exploratory user study comparing five games designed around five BITs. Results from questionnaires indicate that the BITs were satisfactory, and qualitative feedback highlight factors influencing players’ preferences and the importance of considering fatigue from high blink rates and potential accessibility limitations of single-eyed interactions.
Guiding visual attention to specific parts of an interface is essential. One powerful tool for guiding attention is gaze cues, that direct visual attention in the same direction as a presented gaze. In this paper, we explored how to direct users’ visual attention on 2D screens using gaze cues from avatars and humans. For this, we conducted a lab experiment (N = 30) based on three independent variables: (1) stimulus shown either as avatars or human faces, (2) target direction with a target appearing left or right from a stimulus, and (3) gaze validity indicating whether a stimulus’ gaze was directed towards a target (valid gaze) or not (invalid gaze). Our results show that participants’ total and average fixation on a target lasted longer in the presence of the human image than the avatar stimulus when a target appeared on the right side and when a stimulus’ gaze was towards the target. Moreover, participants’ average fixation was longer on the human than avatar stimulus gazing in the opposite direction from a target than towards it.
Mobile virtual reality (VR) provides an accessible alternative to high-end VR, but currently offers only limited interaction, hindering its usability, the variety of VR experiences it can provide, and widespread adoption. To improve interaction on mobile VR, we present a novel input solution: the Low-Fi VR Controller. The controller uses the smartphone camera to track markers to provide 6DOF input. It costs virtually nothing as it is made of cost-effective accessible materials. We performed a user study based on Fitts’ law to evaluate the controller's performance in selection tasks, and to compare three selection activation methods (Instant, Dwell, Marker). Despite tracking issues, selection throughput with the Instant method was comparable to other similar ray-based selection techniques reported in other studies, at roughly 2.2 bps. Our results validate the controller as an acceptable 3D input device and will propose avenues to improve performance and user experience with the controller in future work.
The use of microgestures has improved the robustness and naturalness of subtle hand interactions. However, the conventional approach often neglects the context in which users perform microgestures. We present VibAware, a context-aware tap and swipe gesture recognition using bio-acoustic sensing. We use both active and passive sensing approaches to recognize finger-based microgestures while also recognizing the associated interaction contexts, including Within-Hand, Surface, and Hand Grasp. We employ accelerometers and an active acoustic transmitter to form a bio-acoustic system with multiple bandpass filter processing. Through user studies, we validate the accuracy of context-aware tap and swipe gesture recognition. We propose a context-aware microgesture recognition pipeline to enable adaptive input controls for rich and affordable hand interactions.
We present ExtEdge, a design and method for augmenting the visual experiences of a smartphone using electro-tactile stimulation. We aim to explore new interactions through tactile augmentation of visual information, such as virtual characters, at the edges of the screen. ExtEdge presents distributed tactile sensations through electrical stimulations from electrode arrays mounted on both edges of the smartphone. This distributed tactile stimulus fits the image at the edges of the screen and provides a more realistic presentation. We first investigated two-point discrimination thresholds in the contact areas of the grasp—middle finger, ring finger, thumb, and thenar eminence—as a performance evaluation of the distributed tactile presentation and found that our design was adequate. We then conducted a user study with some application scenarios and confirmed that ExtEdge enriches visual information.
We present OptiRing, a ring-based wearable device that enables subtle single-handed micro-interactions using low-resolution camera-based sensing. We demonstrate that our approach can work with ultra-low image resolutions (e.g., 5 × 5 pixels), which is instrumental in addressing privacy concerns and reducing computational needs. Using a miniature camera, OptiRing supports thumb-to-index finger gestures, such as stateful pinch and left/right swipes, as well as continuous 1-DOF input. We present a modeling approach that uses heuristic-based methods to identify interactions and machine learning for input gating, generalizing recognition across users and sessions. We assess this technique’s capabilities, accuracy, and limitations through a user study with 15 participants. OptiRing achieved 93.1% accuracy in gesture recognition, 99.8% accuracy for stateful pinch gestures, and a minimal number of false positives. Further, we validate OptiRing’s ability to handle continuous 1D input in a Fitts’ law study. We discuss these findings, the tradeoff between resolution and interaction accuracy, and the potential of this technology, which can help address challenges associated with optical sensing for input.
We investigate the potential impact of visualization placement on enhancing the user experience for outdoor augmented data tours, the tours that rely on situated data visualization in augmented reality (AR). Little is known about the optimal placement of outdoor AR information visualizations. Inspired by an existing guided tour about the environmental impact of buildings, we investigated three potential placements for data visualizations: on buildings, on the floor in the user’s immediate area, and on a virtual map displayed in front of the user. For visualizations placed on the floor and on the map, we added a virtual link that connects the visualization with the corresponding real-world building. A user study with 18 participants indicates that visualizations placed on buildings are preferred, placing visualizations close to the user on a map results in the highest reading accuracy, and placing visualizations on buildings and on maps results in similar recall scores. Our study provides insight into the design of augmented data tours in outdoor environments: positioning AR visualizations close to participants seems to increase reading accuracy, and recalling information from visualizations placed on the floor seems to be more difficult than recalling information from visualizations placed on buildings.
How can we design the user interfaces for augmented reality (AR) so that we can interact as simple, flexible and expressive as we can with smartphones in one hand? To explore this question, we propose PalmGazer as an interaction concept integrating eye-hand interaction to establish a singlehandedly operable menu system. In particular, PalmGazer is designed to support quick and spontaneous digital commands– such as to play a music track, check notifications or browse visual media – through our devised three-way interaction model: hand opening to summon the menu UI, eye-hand input for selection of items, and dragging gesture for navigation. A key aspect is that it remains always-accessible and movable to the user, as the menu supports meaningful hand and head based reference frames. We demonstrate the concept in practice through a prototypical mobile UI with application probes, and describe technique designs specifically-tailored to the application UI. A qualitative evaluation highlights the system’s interaction benefits and drawbacks, e.g., that common 2D scroll and selection tasks are simple to operate, but higher degrees of freedom may be reserved for two hands. Our work contributes interaction techniques and design insights to expand AR’s uni-manual capabilities.
Augmented reality head-worn displays (AR HWDs) of the near future will be worn all day every day, delivering information to users anywhere and anytime. Recent research has explored how information can be presented on AR HWDs to facilitate easy acquisition without intruding on the user’s physical tasks. However, it remains unclear what users would like to do beyond passive viewing of information, and what are the best ways to interact with everyday content displayed in AR HWDs. To address this gap, our research focuses on the implementation of a functional prototype that leverages the concept of Glanceable AR while incorporating various interaction capabilities for users to take quick actions on their personal information. Instead of being overwhelmed and continuously attentive to virtual information, our system centers around the idea that virtual information should stay invisible and unobtrusive when not needed but is quickly accessible and interactable. Through an in-the-wild study involving three AR experts, our findings shed light on how to design interactions in AR HWDs to support everyday tasks, as well as how people perceive using feature-rich Glanceable AR interfaces during social encounters.
Many methods have been proposed for presenting digital information in mirror space. However, most existing methods can only display digital information in mirror space. Therefore, we study the concept of MiTAI (Mirror-Transcending Aerial Imaging), which allows for the continuous movement of digital information between mirrors and physical space. MiTAI provides a novel interactive experience combining mirrors, physical space, and digital information. In this paper, we present the first systemization of MiTAI that users can experience, as well as an initial study toward a concept evaluation of MiTAI. We describe an implementation of the system in detail, including an optical follow-up of the previous study, the design of the control system, the presentation of content, and a method for interacting with the aerial image. The implemented system provided a MiTAI viewing experience for approximately 150 attendees through a hands-on demonstration exhibit and had a strong impact on them. We also report on user feedback from the exhibition and future improvements.
3D Computer-Aided Design (CAD) users need to overcome several obstacles to benefit from the flexibility of programmatic interface tools. Besides the barriers of any programming language, users face challenges inherent to 3D spatial interaction. Scripting simple operations, such as moving an element in 3D space, can be significantly more challenging than performing the same task using direct manipulation. We introduce the concept of bidirectional programming for Constructive Solid Geometry (CSG) CAD tools, informed by interviews we performed with programmatic interface users. We describe how users can navigate and edit the 3D model using direct manipulation in the view or code editing while the system ensures consistency between both spaces. We also detail a proof-of-concept implementation using a modified version of OpenSCAD.
For avatars - virtual bodys controlled by a human - an established visualization is to display incomplete visualizations, e.g. only head and torso. However, the preference for the body visibility of intelligent virtual agents (IVAs) - fully computer generated virtual humans - may differ. Additionally, the presence of IVAs can have a psychological effect on users that are similar to that of a real human, e.g. facilitation or inhibition of cognitive performance. To investigate the connection between these two topics, we examine the effects of an IVA’s level of body visibility on the users’ sense of social presence and task performance in virtual reality. In a within-subject user study, 30 participants solved anagram tasks in the presence of five different levels of visibility of our IVA: voice-only, mouth-only, head-only, upper body and full body. While we could not find any differences in the task performance of the users, lower levels of visibility led to a decreased feeling of social presence. Furthermore, by using eye tracking, we found that visually rich representations were looked at for a longer amount of time, but only during the explanation of the task. Afterwards, the users did not pay much visual attention to the agent anymore. Finally, preferences of the users show that the chosen representation is dependent on some factors; most importantly, it should support the users, but not distract them from their task.
When future passengers are immersed in Virtual Reality (VR), the resulting disconnection from the physical world may degrade their situational awareness on the road. We propose incorporating real-world cues into virtual experiences when passing specific locations to address this. We designed two visualizations using points of interest (POIs), street names alone or combined with live street views. We compared them to two baselines, persistently displaying live cues (Always Live) or no cues (Always VR). In a field study (N=17), participants estimated their locations while exposed to VR entertainment during car rides. The results show that adding environmental cues inevitably degrades VR presence compared to Always VR. However, POI-triggered Text&Live preserves VR presence better than Always Live and attracts user attention to the road more than POI-triggered Text. We discuss situational awareness challenges for using mobile VR on the road and potential incorporation strategies across transport contexts.
Understanding how people effectively perform actions together is fundamental when designing Collaborative Mixed Reality (CMR) applications. While most of the studies on CMR mostly considered either how users are immersed in the CMR (e.g., in virtual or augmented reality), or how the physical workspace is shared by users (i.e., distributed or collocated), little is known about how their combination could influence user’s interaction in CMR. In this paper, we present a user study (n=23) that investigates the effect of the mixed reality setup on the user’s immersion and spatial interaction during a joint-action task. Groups of two participants had to perform two types of joint actions while carrying a virtual rope to maintain a certain distance: (1) Gate, where participants had to pass through a virtual aperture together and (2) Fruit, where participants had to use a rope to slice a virtual fruit moving in the CMR. Users were either in a distributed or collocated setup, and either immersed in virtual or augmented reality. Our results showed that users’ proxemics was altered by the immersion type and location setup, but also the user’s subjective experience. These results contribute to the understanding of joint action in CMR and they are discussed to improve the design of CMR applications.
The most intuitive mode of locomotion in VR experiences is natural walking, which is constrained by the size of the walkable physical area. Perception-based locomotion techniques such as Redirected Walking (RDW) can circumvent this limitation by using artificial manipulations of the user’s position or rotation in VR. In this work, we focus on integrating redirection techniques into a continuous, uninterrupted VR experience and analyze the effects of interactions in VR between two remote users in a shared virtual space to hide discrete redirections from the user’s perception as much as possible. Our results show that discrete translations, which our subjects used to cover an average of 37.8% of their distance, were almost imperceptible during the execution of the collaborative interactions for 81% of our participants. Only 9.5% of our study participants noticed rotation gains. Based on these results, we provide guidelines on how developers can more effectively use RDW techniques in multi-user VR experiences.
Semi-automated timber fabrication tasks demand the expertise and dexterity of human workers in addition to the use of automated robotic systems. In this paper, we introduce a human-robot collaborative system based on Augmented Reality (AR). To assess our approach, we conducted an exploratory user study on a head-mounted display (HMD) interface for task sharing between humans and an industrial robotic platform (N=16). Instead of screen-based interfaces, HMDs allowed users to receive information in-situ, regardless of their location in the workspace and the need to use their hands to handle tools or carry out tasks. We analyzed the resulting open-ended, qualitative user feedback as well as the quantitative user experience data. From the results, we derived challenges to tackle in future implementations and questions that need to be investigated to improve AR-based HRI in fabrication scenarios. The results also suggest that some aspects of human-robot interaction, like communication and trust, are more prominent when implementing a non-dyadic scenario and dealing with larger robots. The study is intended as a prequel to future work into AR-based collaboration between multiple humans and industrial robots.
Multi-user VR applications have great potential to foster remote collaboration and improve or replace classical training and education. An important aspect of such applications is how participants move through the virtual environments. One of the most popular VR locomotion methods is the standard teleportation metaphor, as it is quick, easy to use and implement, and safe regarding cybersickness. However, it can be confusing to the other, observing, participants in a multi-user session and, therefore, reduce their presence. The reason for this is the discontinuity of the process, and, therefore, the lack of motion cues. As of yet, the question of how this teleport metaphor could be suitably visualized for observers has not received very much attention. Therefore, we implemented several continuous and discontinuous 3D visualizations for the teleport metaphor and conducted a user study for evaluation. Specifically, we investigated them regarding confusion, spatial awareness, and spatial and social presence. Regarding presence, we did find significant advantages for one of the visualizations. Moreover, some visualizations significantly reduced confusion. Furthermore, multiple continuous visualizations ranked significantly higher regarding spatial awareness than the discontinuous ones. This finding is also backed up by the users’ tracking data we collected during the experiments. Lastly, the classic teleport metaphor was perceived as less clear and rather unpopular compared with our visualizations.
The theory of swarm control shows promise for controlling multiple objects, however, its scalability is limited by costs, such as hardware and infrastructure needs. Virtual Reality (VR) can overcome these limitations, but research on swarm interaction in VR is limited. This paper introduces a novel Swarm Manipulation interaction technique and compares it with two baseline techniques: Virtual Hand and Controller (ray-casting). We evaluated these techniques in a user study (N = 12) in three tasks (selection, rotation, and resizing) across five conditions. Overall, our results show that the swarm manipulation technique did result in good performance, exhibiting significantly faster speeds compared with at least one of the other two techniques across most conditions for these three tasks. Furthermore, this technique notably reduced resizing size deviations in the resizing task. However, we also observed a trade-off between speed and accuracy in the rotation task. The results demonstrate the potential of the Swarm Manipulation technique to enhance the usability and user experience in VR compared to conventional manipulation techniques. In future studies, we aim to understand and improve swarm interaction, user control, and internal particle cooperation.
Touchless gesture interfaces often use cursor-based interactions, where widgets are targeted by a movable cursor and activated with a mid-air gesture (e.g., push or Pinch). Continuous interactions like slider manipulation can be challenging in mid-air because users need to precisely target widgets and then maintain an ‘activated’ state whilst moving the cursor. We investigated proxemic cursor interactions as a novel alternative, where cursor proximity allows users to acquire and keep control of user interface widgets without precisely targeting them. Users took advantage of proxemic targeting, though gravitated towards widgets when negotiating the boundaries between multiple elements. This allowed users to gain control more quickly than with non-proxemic behaviour, and made it easier to move between user interface elements. We find that proxemic cursor interactions can improve the usability of touchless user interfaces, especially for slider interactions, paving the way to more comfortable and efficient use of touchless displays.
Research has proposed various interaction techniques to manage the occlusion of 3D data in Virtual Reality (VR), e.g., via gradual refinement. However, tracking dynamically moving data in a dense 3D environment poses the challenge of ever-changing occlusion, especially if motion carries relevant information, which is lost in still images. In this paper, we evaluated two interaction modalities for Spatial Dense Dynamic Data (SDDD), adapted from existing interaction methods for static and spatial data. We evaluated these modalities for exploring SDDD in VR, in an experiment with 18 participants. Furthermore, we investigated the influence of our interaction modalities on different levels of data density on the users’ performance in a no-knowledge task and a prior-knowledge task. Our results indicated significantly degraded performance for higher levels of density. Further, we found that our flashlight-inspired modality successfully improved tracking in SDDD, while a cutting plane-inspired approach was more suitable for highlighting static volumes of interest, particularly in such high-density environments.
This paper presents a design space of spatial interaction techniques for multi-display visualizations. By analyzing 56 papers on multi-display tools and techniques, we identify the three dimensions and their associated design choices for different types of spatial interactions using multiple displays and devices. This work aims to provide guidance for designing interactive multi-display visualizations that meet different requirements of visual analysis and sensemaking tasks while also inspiring future design ideas. Visualization researchers can create new multi-display visualizations by using combinations of the design choices associated with different visualization tasks. Our design space also allows them to explore areas of spatial interactions that have been underutilized to their full potential or to understand the requirements for new spatial interaction techniques for multi-display visualizations.
Timeline control is an important interaction during video watching, which can help users quickly locate or jump the video playback time, especially when watching lengthy videos. 360° video has gradually entered the public view due to its ability to select the target view in all directions, and becomes a better alternative to the single view video. At present, 360° video is mostly displayed on a two-dimensional screen, and the timeline only follows a similar design as normal video. However, virtual reality (VR) headsets provide a more immersive viewing experience for 360° videos, while also providing more spatial dimensions for the design of timelines. In this paper, we explore the introduction of 360° video timeline design in VR viewing environments where video time control is rarely involved. In total, we propose 6 design alternatives involving two variables: the shape of timeline (two-dimensional, three-dimensional) and the interaction distance of timeline (close to screen, close to body, attached to wrist). We evaluated their performance and user experience in a user study with 24 participants. Our results show that the usability of the 3D timeline is better than the 2D timeline in 360° space. And the two within reach timelines, close to body and attached to wrist, have advantages in performance and experience over timelines close to screen. Based on these findings, we propose a set of design recommendations for timeline design in different application scenarios.
Visual network exploration is essential in numerous disciplines, including biology, digital humanities, and cyber security. Prior research has shown that immersive, stereoscopic 3D can enhance spatial comprehension and accuracy in exploring node-link diagrams. However, 3D graphs can present challenges, including node occlusion and edge crossings, which necessitate continual manual perspective adjustments. We introduce a virtual reality (VR) framework that assists users in navigating to optimal viewing points based on their current task, addressing these issues. The framework quantifies the perceptual quality of viewports based on graph drawing aesthetics suggested by the literature. This information is then visualized in two ways: on a spherical 3D representation surrounding the user and on a handheld 2D overview map. Users can interact with both representations to easily adjust their viewpoint. Moreover, they can interactively combine different aesthetics to discover the optimal viewing points for their specific tasks. Two qualitative evaluations involving law enforcement and biology experts demonstrate the value of our approach. Domain experts reported that the suggested viewports corresponded to their intuition and simplified the process of finding task-supportive perspectives with minimal interaction. Our approach can be incorporated into existing VR graph exploration tools, improving the initial perspective selection and reducing manual navigation.
This paper presents an exploratory study of expert behavior while analysing spatio-temporal data in an immersive environment for the racing engineering domain. The analysis of moving-object data has become crucial in diverse fields, including sports analysis, logistics and autonomous navigation. This data provides valuable insights into events occurring at specific points in space and time, enabling professionals to make informed decisions. For example, in motorsports, race engineers rely on spatio-temporal data and vehicle telemetry to identify key events and optimize performance. However, conventional stacked line charts used by race engineers offer limited spatial association. Effective visualization techniques that incorporate the spatial context are essential to fully understand and interpret the complex nature of moving-object data. To address this, we propose novel immersive designs for analyzing motorsports data. Our study uses a critical task in racing engineering—comparing cornering performance of two drivers to document patterns of behavior of expert analysis. Experts used conventional charts, situated 3D graphs and a hybrid with both options within an Immersive Analytics environment. Results demonstrate the potential of our approach. All experts preferred the hybrid visualization combining conventional and situated approaches. In the short study period, experts found value in the situated approach to understand differences in distances between events.
The current level of road safety for cyclists is estimated mainly based on police reports and self-reports collected during surveys. However, the former focuses on post-accident situations, while the latter is based on subjective perception and focuses only on road sections. This work builds the foundation to automatically assess perceived cyclists’ safety by analyzing their head movements. In an indoor experiment (N = 12) using a Virtual Reality bicycle simulator, we discovered that perceived safety correlates with head rotation frequency and duration but not with head rotation angles. Based on this, we implemented a novel and minimalistic approach to detect head movements based on sensor data from Apple AirPods and an iPhone and conducted an outdoor experiment (N = 8). Our results indicate that perceived safety correlates with head rotation frequency and duration only at uncontrolled intersections when turning left and does not necessarily apply to all situations.
Virtual reality (VR) applications enable users to perform physical exercises from home. Due to the absence of a real trainer, visual cues are typically used to provide important information about game mechanics or the current tasks. In this paper, we investigate age-related differences regarding visual perception and processing of these game cues during a VR exergame. Therefore, we conducted a comparative user study with nine older and nine younger adults, in which participants had to hit balls of different colors, which approached them. Our results show that younger participants performed better in the game than older participants, especially if a dual-task with additional information was presented. This implies an effect of age on both, task performance, as well as information processing and memory. Eye-tracking data suggests that these differences are not caused by not seeing the visual information signs that were presented, but rather differences in the processing. Furthermore, older adults reported a higher feeling of presence and lower simulator sickness scores. These insights provide important implications for the development of VR exergames for different age groups, especially if additional game information needs to be displayed.
Thanks to stand-alone Virtual Reality (VR) advances, users can play realistic simulations of real-life sports at their homes. In these game simulations, players control their avatars by doing the same movements as in real life (RL) while playing against a person or AI opponent, making VR sports attractive for the players. In this paper, we surveyed a popular VR table tennis game community, focusing on understanding their demographics, challenges, and experiences with skill transfers between VR and RL. Our results show that, on average, VR table tennis players are primarily men, live in Europe/Asia, and are 38 years old. We also found that the current state of VR technology affects the player’s experience and that players see VR as a convenient way to play matches but that RL is better for socialization. Finally, we identified skills like backhand and forehand strikes that players perceived to be transferred from VR to RL and vice versa. Our research findings have the potential to serve as a valuable resource for VR table tennis game developers seeking to integrate mid-air controllers into their future projects.
There are many reasons to direct the flow of people in urban environments: avoiding overcrowdedness during large-scale events, minimising contacts during a pandemic or redirecting pedestrians in case of temporary obstructions such as accidents. In practice, this is largely achieved through visual means such as signage, via physical barriers or by police officers. These approaches have in common that they visually change the environment, which might negatively affect people’s experience. In this paper, we explore stereophonic guiding sounds as an alternative crowd flow management technique, which does not visually affect an environment. We designed three different sounds to attract pedestrians that we played back on parametric loudspeakers to localise them precisely in space. We evaluated them in a study with 16 participants, who were asked to navigate an urban environment that we projected in an immersive video environment. Our results show that this approach is generally feasible, that sounds can motivate people to take a detour, and that the type of sounds affects the degree of motivation. The work reported here thus provides new insights into using stereophonic sound as a spatial user interface in controlling the flow of people in urban environments.
As technology advances, Unmanned Aerial Vehicles (UAVs) have emerged as an innovative solution to a variety of problems in many fields. Automated control of UAVs is most common in large area operations, but they may also increase the versatility of smart home compositions by acting as a physical helper. For example, a voice-controlled UAV could act as an intelligent aerial assistant that can be seamlessly integrated into smart home systems. In this paper, we present a novel Augmented Reality (AR)-based UAV control that provides high-level control over a UAV by automating common UAV missions. In our work, we enable users to operate a small UAV hands-free using only a small set of voice commands. To help users identify the targets, and to understand the UAV’s intentions, targets within the user’s field of vision are highlighted in an AR interface. We evaluate our approach in a user study (n=26) regarding usability, physical and mental demand, as well as a focus on the users’ preferences. Our study showed that the use of the proposed control was not only accepted, but some users stated that they would use such a system at home to help with some tasks at home.
Virtual reality (VR) is a powerful tool for investigating user experience in interactive systems by simulating experiences that are challenging to obtain in real life. However, it is crucial to design VR tasks appropriately, especially when used for training first responders who face high cognitive workloads in extreme conditions. Poorly designed VR tasks can strain cognitive resources and hinder human performance and training transfer. Therefore, careful attention must be given to developing VR tasks that enhance cognitive processing in high-cognitive workload scenarios. This study aims to assess the impact of different types and intensities of alerts on reaction time and comprehension of notifications, and task performance in a VR environment designed to induce various levels of cognitive load. Two experiments were conducted, including a remote experiment, where participants had to complete a task in which they identified matching figures while reacting to and interpreting alerts delivered through auditory, visual, and haptic channels. Results showed that, while task performance was not affected by alert type or intensity, audio alerts were the slowest to react to, indicating higher cognitive processing in alerts of that modality.
Blind and low-vision (BLV) individuals often face challenges in their attendance and appreciation of musical performances (e.g., concerts, musicals, opera) due to limited mobility and accessibility of visual information. However, the emergence of Virtual Reality (VR) based musical performance as a common medium of music access opens up opportunities to mitigate the challenges and enhance the musical experiences by investigating non-visual VR accessibility. This study aims to 1) gain an in-depth understanding of the experiences of BLV individuals, including their preferences, challenges, and needs in listening to and accessing various modes (audio, video, and on-site experiences) of music and musical performances and 2) explore the opportunities that VR can create for making the immersive musical experiences accessible for BLV people. Using a mixed-methods approach, we conducted an online survey and a semi-structured interview study with 102 and 25 BLV participants, respectively. Our findings suggest design opportunities for making the VR space non-visually accessible for BLV individuals, enabling them to participate equally in the VR world and to further access immersive musical performances created by VR technology. Our research contributes to the growing body of knowledge on accessibility in virtual environments, particularly in the context of music listening and appreciation for BLV individuals.
Researchers commonly need to deploy specialised computer vision models on Mixed Reality devices, such as Microsoft HoloLens, to extract richer contextual information. This paper introduces an approach to alleviate the negative impacts of elevated latency caused by model inference time, network transactions, and photo capture during the use of external deep learning models. In contrast to previous methods that constrained user movements, our approach augments users’ environmental perception while preserving freedom of movement.
In virtual reality (VR) teleoperation and remote task guidance, a remote user may need to assign tasks to local technicians or robots at multiple sites. We are interested in scenarios where the user works with one site at a time, but must maintain awareness of the other sites for future intervention. We present an instrumented VR testbed for exploring how different spatial layouts of site representations impact user performance. In addition, we investigate ways of supporting the remote user in handling errors and interruptions from sites other than the one with which they are currently working, and switching between sites. We conducted a pilot study and explored how these factors affect user performance.
Many real-world factory tasks require human expertise and involvement for robot control. However, traditional robot operation requires that users undergo extensive and time-consuming robot-specific training to understand the specific constraints of each robot. We describe a user interface that supports a user in assigning and monitoring remote assembly tasks in Virtual Reality (VR) through high-level goal-based instructions rather than low-level direct control. Our user interface is part of a testbed in which a motion-planning algorithm determines, verifies, and executes robot-specific trajectories in simulation.
Touchless gesture interfaces often use cursor-based interactions, where widgets are targeted by a movable cursor and activated with a mid-air gesture. Proxemic cursor interactions are a novel alternative that facilitate faster selection without the need for direct targeting. We present an interactive demonstration exhibit that uses proxemic cursor interactions for input to a touchless public display.
Wearable keyboards have been widely tried for text input with some set on the VR user. While the position of the wearable keyboard could yield some differences, such as ease of use and input speed, comparisons have not been comprehensive. Two experiments were conducted to investigate the impacts of the mounting positions and the user’s actions including standing and walking. Each position is almost the same in typing performance, but some are affected by user’s situations especially while walking. One interesting finding is that impressions about the position change before and after actual use. We recommend that the user should not rely too much on the first impression, and should actually try possible positions before use, the differences are not so large though.
This paper proposes Volu-Me, which displays multi-layer images by projecting onto a layered-fan rotating at high speed. In this system, each propeller part of the fan is projected by a high-speed projector. The rotation of the fan is measured by a sensor, and the projected image is generated in accordance with the rotation of the propeller section, which makes the system applicable to any fan. This system enables users to observe 3D layered images without wearing special equipment, and multiple users can observe at the same time. In this paper, we implement the prototype of the system and show the results of the projections, and discuss the potential and future extension of the system.
In video remote communication, the Gaze Cue, which visualizes the direction of a partner’s gaze, effectively enhances topic sharing by facilitating mutual gaze between users. However, since this Gaze Cue appears within the user’s field of view, viewers may feel obstructed by that. Additionally, when users want to share a topic, they promote topic sharing by verbally expressing their intentions, such as saying, “Look at that". In this study, assuming VR remote tourism, we experimented with a school tour experience. We examined the relationship between the speed of joint attention and the type of Gaze Cue switched in response to voice.
Exercise video games (exergames) can facilitate home exercising, which can help overcome some barriers to exercising. Augmented Reality (AR) headsets offer the potential to make in-home exergaming more safe, immersive, inclusive, and accessible, but there is limited research on AR for home-based exercising, let alone exergaming. We conducted semi-structured interviews as part of a participatory design approach to investigate how homes can be augmented to facilitate exercising. Thematic analysis revealed themes regarding suitable home exercises, and physical objects at home that can be augmented to facilitate different exercises. Results from our initial exploration can be used to guide the design of future AR systems for exercising and exergaming.
Spatial music offers an immersive auditory experience by orchestrating sounds, rhythms, and beats to not only envelop the listener but also dynamically move around, behind, and above them. Despite its promise, existing tools for spatial music remain inefficient, with prior research citing the lack of tailored input devices for spatializing in higher dimensions. To address these challenges, we propose a virtual reality (VR) application designed for intuitive music spatialization in 3D space. Users spatialize music by placing and dragging sound sources in VR to animate their movement. Since animating movement can be imprecise, our application incorporates auto-correction features that adjust the movement of sound objects, ensuring a smoother end result.
E-sports, or electronic sports, have emerged as a popular form of competition in the digital age. With growing involvement from well-known and established enterprises, e-sports has grown into a thriving industry. Being a professional e-athlete has become a dream for many young people. A crucial aspect of e-athletic performance is the level of cognitive functions, including reaction time, motor time and eye-hand coordination. Both conventional and virtual reality trainings have been shown to improve these skills. In this study, 128 amateur e-athletes (80 males, 48 females, mean age 22.7±4.02) were randomly assigned to two experimental groups: E8 (n=32), training for eight consecutive weekdays and E28 (n=30), training for twenty-eight consecutive weekdays as well as two control groups: C8 (n=34) and C28 (n=32), with a similar number of men and women in both groups. Differences in daily gaming time, e-sports experience, age, and initial level of reaction time and eye-hand coordination between groups were not statistically significant. Groups E8 and E28 underwent 15-minute training sessions in the VR game, Beat Saber, for successively eight and twenty eight consecutive weekdays. The study found that training in VR improved motor time and eye-hand coordination in both E8 and E28 groups, improvement in reaction time was observed only in E28 group.
The Shooting-Based Technique (SBT) is a highly interactive locomotion method in virtual reality (VR) that enables user navigation through simulated shooting actions. SBT has already been used in a commercialized product and offers users a novel experience compared to traditional techniques. However, there has been scant research or scientific investigation into SBT. Thus, to further discover the possibility of SBT, we implemented it by creating an archery scenario and conducting an initial user study involving 24 participants in comparison with two widely used locomotion methods: steering-based joystick and teleport. Initial results indicate a preference for SBT among participants, which lays the foundation for extensive research in the future.
Fitness encompasses a diverse array of activities, including gym sessions, home workouts, and various other forms of physical exercise. The importance of proper muscle movement remains consistent across all these settings. Ensuring people do the physical movements correctly, avoid injury, gain insights and maintain motivation is traditionally accomplished with the help of an expert instructor. In the absence of an expert, the most common at-home training methods are books, videos, or apps but they provide limited feedback. In this work, we introduce ARFit, an augmented reality application that uses pose-tracking technology to capture user exercise movements and provide feedback to ensure accurate posture.
Generative Artificial Intelligence (GenAI) has emerged as a fundamental component of intelligent interactive systems, enabling the automatic generation of multimodal media content. The continuous enhancement in the quality of Artificial Intelligence-Generated Content (AIGC), including but not limited to images and text, is forging new paradigms for its application, particularly within the domain of Augmented Reality (AR). Nevertheless, the application of GenAI within the AR design process remains opaque. This paper aims to articulate a design space encapsulating a series of criteria and a prototypical process to aid practitioners in assessing the aptness of adopting pertinent technologies. The proposed model has been formulated based on a synthesis of design insights garnered from ten experts, obtained through focus group interviews. Leveraging these initial insights, we delineate potential applications of GenAI in AR.
This paper introduces GesMessages, an innovative mid-air interactive application that uses simple gestures to manage real-time message notifications on laptops and large displays. Leveraging cameras on computers or smart devices, the application offers three distinct gestures: expanding notifications for immediate attention, hiding non-urgent messages, and deleting spam messages. We present the technical setup and system design. Additionally, we explore potential applications in context-awareness systems, contributing to gestural interaction research. Our work fosters a deeper understanding of mid-air interaction’s impact on message management and future interactive systems.
Immersive sports are emerging recently after the metaverse hype. This paper investigates an immersive setup that leverages virtual reality to recreate the kinetic and sensory nuances of dragon boating. We employ a body-centric approach to enable users to engage with authentic paddling actions, e.g., mobilising iron sticks on the boat with reasonably emulated water resistance. The virtual environments mirror real-world scenes, e.g., Pearl River (Guangdong). We employed multi-modal experiences to replicate the visuals and tactile, kinesthetic, and environmental intricacies of dragon boating. Our Dragon Boat simulator serves as a groundwork for redefining the boundaries of immersive sports training that bridges the gap between virtuality and the real world.
The act of communicating with others during routine daily tasks is both common and intuitive for individuals. However, the hands- and eyes-engaged nature of present digital messaging applications makes it difficult to message someone amidst such activities. We introduce GlassMessaging , a messaging application designed for Optical See-Through Head-Mounted Displays (OST-HMDs). It facilitates messaging through both voice and manual inputs, catering to situations where hands and eyes are preoccupied. GlassMessaging was iteratively developed through a formative study identifying current messaging behaviors and challenges in common multitasking with messaging scenarios.
We propose a new optical system that can present a vertical aerial image on the display. By presenting the shadow image of the aerial image at the display, the three-dimensionality and depth perception of the aerial image can be reinforced. The proposed system can efficiently utilize the luminance of the light source display thanks to its polarization characteristics.
In this demo, we build upon VRS-NeRF to provide an implementation for real-time variable rate shading (VRS) in virtual reality (VR). Here, we demo a recent advancement of Neural Radiance Fields (NeRF) merging multiple pixels into one reducing the overall rendering time. This system allows the real-time visualization of small to medium NeRF scenes. Further, we demonstrate the applicability of VRS-NeRF for 2D applications and further show its generalizability by providing two additional methods for the determination of the shading rate and an adaptive method for adjusting the step size during ray marching.
Aerial imaging, a technology that makes displayed objects appear as if they are actually present in real space, has a variety of potential applications in the entertainment and other fields. However, the viewing area is narrow, making simultaneous observation by multiple people difficult. To solve this problem, we propose a method to expand the horizontal viewing area by symmetrically combining two optical systems. In this paper, we formulate the relationship between design and viewing area, implement a prototype, and investigate the actual visibility and luminance of aerial images.
Virtual reality (VR) can introduce new perspectives and methods to existing research fields, such as the humanities. We present a VR research tool for facilitating the study of so-called ‘Written Artefacts’ (WA). The tool was created as part of an interdisciplinary research project and displays the WA in their spatial context: the ancient theatre of Miletus. We closely collaborated with researchers from the humanities and implemented different features for visualizing content, filtering information, and changing perspectives on the research data in VR. Our demo gives users access to the immersive research environment and allows them to explore the virtual theatre of Miletus and its WA.
With the rise of digital media, how we read and share information has undergone significant changes. However, existing XR reading projects still have barriers to promoting the reading experience. Thus, the study proposes using XR technology to enhance the reading experience by projecting a 3D reading UI into 3D space. Instead of simply displaying a 2D UI or media in reality, as seen in existing VR/AR reading samples, this study designs an adaptive UI environment covering four aspects: 3D layout, navigation, interactive data, and sharing. The project aims to facilitate efficient and immersive online reading by providing users with a seamless spatial reading experience. This study seeks to showcase the scalability of XR technology in online reading.
Aerial object detection is the process of detecting objects in remote sensing images, such as aerial or satellite imagery. However, due to the unique characteristics and challenges of remote sensing images, such as large image sizes and dense distribution of small objects, annotating the data is time-consuming and costly. Active learning methods can reduce the cost of labeling data and improve the model’s generalization ability by selecting the most informative and representative unlabeled samples. In this paper, we studied how to apply active learning techniques to remote sensing object detection tasks and found that traditional active learning frameworks are not suitable. Therefore, we designed a remote sensing task-oriented active learning framework that can more efficiently select representative samples and improve the performance of remote sensing object detection tasks. In addition, we proposed an adaptive weighting loss to further improve the generalization ability of the model in unlabeled areas. A large number of experiments conducted on the remote sensing dataset DOTA-v2.0 showed that applying various classical active learning methods to the new active learning framework can achieve better performance.
The human visual system (HVS) is capable of responding in real-time to complex visual environments. During the process of freely observing visual scenes, predicting eye movements and visual fixations is a task known as scanpath prediction, which aims to simulate the HVS. In this paper, we propose a visual transformer-based model to study the attentional processes of the human visual system in analyzing visual scenes, thereby achieving scanpath prediction. This technology has important applications in human-computer interaction, virtual reality, augmented reality, and other fields. We have significantly simplified the workflow of scanpath prediction and the overall model architecture, achieving performance superior to existing methods.
Human-Object Interaction task (HOI) aims to detect all <human, action, object> triplets in the image that exist interaction. General methods contain two-stage and one-stage algorithms. Recently, some works based on Transformer have been proposed. Transformer-based approaches directly detect HOI relationships in an end-to-end way, making the whole pipeline simpler. For the existing HOI detection model HOTR, we aim to optimize its internal structure of Transformer to better adapt to the HOI detection task. We use the residual structure to strengthen the components related to the interaction in the interaction detection, and input the enhanced results into the next decoder layer. Finally, the accuracy of the baseline method is improved to a certain extent, which is improved by 1.44% mAP and 3.21% mAP on the benchmark datasets V-COCO and HICO-DET respectively, and the accuracy exceeds most mainstream methods.
Image feature matching is a critical task in human-computer interaction, such as driverless cars, collaborative medical robots, aiming to establish correspondence between two images inputted from human or computer. To ensure the accuracy of matches, removing mismatches is of utmost importance. In recent years, machine learning has emerged as a new perspective for achieving effective mismatch removal. However, current learning-based methods suffer from a lack of generalizability due to their heavy reliance on extensive image data for training. Consequently, handling cases with a high mismatch ratio becomes challenging. In this paper, a novel approach is proposed that incorporates the triangular topology constraint into machine learning, which we called LTM. By summarizing topology constraints around the matching points and integrating the idea of sampling, we successfully address the mismatch removal task. Notably, our method stands out by requiring few parameters as input, enabling us to train it using just several image pairs from four sets. Remarkably, our method achieves outstanding results on diverse datasets using various machine learning techniques compared with many existing methods.
In industrial human-computer interaction, the safety of human is particularly important, helmet wearing plays a crucial role in ensuring construction safety and security. In smart industrial systems, intelligent helmet detection is a fundamental task. However, in practical scenarios, safety helmets often appear as small objects in the monitoring scenes, posing challenges for traditional object detection methods. This paper addresses the enhancement of small-scale helmet detection in intelligent monitoring. We achieve this by combining unsupervised foreground detection with an improved object detection network. Experimental results demonstrate significant improvements in detection accuracy compared to the original YOLOv5 algorithm.
As a promising direction to alleviate the issue of high cost in label annotation, active learning has made a remarkable progress in image recognition. Recently, some efforts have tried to brought it into more visual tasks such as object detection, human-object interaction and so on. Despite the effectiveness of existing methods, there still lacking research specifically for temporal scenes, which is common in real world vision task scenes like automatic driving. To fill this gap, we propose a inconsistency based active learning method specific for temporal object detection. Specifically, We design two novel active learning strategies to ensure that the selected samples are informative and contain less redundant information in temporal scenarios. In the first stage, we adopt temporal inconsistency strategy to avoid sampling too much redundant samples; in the second stage, we sample the most informative samples by utilizing the committee inconsistency strategy. Additionally, we conduct extensive experiments on Waymo dataset, the experimental results have demonstrate the effectiveness of our proposed method. Specifically, we achieve the same performance as training on the whole dataset when just sampling 20% samples, which is much better than entropy based method.
To comprehensively analyze the application of interactive spatial design in the health field literature, an investigation was conducted using literature from Web of Science. A scientific knowledge map was created using VOSviewer, considering publication year, country, keywords, and references. The analysis revealed a growing number of publications, with the UK and US leading in research. The main research areas were Smart Space Design, Interaction Design and User Experience, Design methodology, and assessment. Citations within the references constitute the main knowledge base for applying interactive spatial design in health, effectively linking and encompassing a significant portion of research efforts. In future investigations on the relationship between space design, interactive engagement, and health, it is advisable to consider additional factors such as socio-economic and demographic characteristics. This inclusive approach will facilitate a more comprehensive understanding of the intricate interplay between spatial design, interactive interfaces, and their impact on well-being.
Light is a common environmental medium that is used to evoke mood and provide an immersive experience. In this paper, we present the design and implementation of Forestlight, a virtual reality (VR) biofeedback System that helps users take deep breath for pressure relief through interactive lighting. We envisioned the breath-based interactive lighting in a virtual forest environment might be effective in stress reduction. With Forestlight, we also wanted to explore the opportunities for the application of virtual reality technology in respiratory biofeedback system. Based on the current stage of the prototype design, we conducted a user study to examine the effectiveness of stress reduction. This study suggests that the biofeedback-driven interactive lighting in VR system can perform as persuasive technology in the domain of health self-management. The interactive lighting that combines decorative and informative aspects in VR system can be pleasant and helpful to the user.
In addition to public space such as parks and squares, the external space of high-rise mixed commercial and residential communities also plays a role of providing public activities. However, the preliminary survey found that these spaces are low utilized and cannot provide a good experience for users. The aim of this paper is to investigate users' perception of the spatial environment and the influence of space on users' behaviour, and to discover the relationship between microclimate, spatial environment interface and behaviour. The synergistic relationship between microclimate, space and behaviour was investigated by using a combination of microclimate measurements, spatio-temporal behavioural maps (n=2098), interviews (n=161), questionnaires (n=122) and field observations. The results show that rich site facilities and suitable microclimate environment are the main factors influencing human behaviour; people perceive different microclimate elements differently; and the level of lighting at the top of the space influences human behaviour together with site facilities. The above findings reveal the relationship between microclimate, spatial environment interface and behaviour, which is of great significance for improving the vitality of the external space of public buildings and promoting the healthy development of external space.
The wide spread of artificial intelligence and spatial computing is driving the emerging development of healthful spatial interface (HSI), which requires new paradigms for generating relevant designs. In this paper, we present a case study on exploring tools and methods for ideating HSI concepts based on bodystorming. Specifically, we set out a workshop utilizing office vitality as the research context to investigate interactive elements in a vertical space for individual workers. We developed a real-size vertical space mockup capsuled with some tools, as well as a co-creation workflow to support bodystorming activities of this workshop. Based on a pilot study with nine participants, we obtained some qualitative data to reflect on the effectiveness of our workshop in facilitating design ideations of HSIs. Moreover, we found the proposed approach helped us quickly explore how HSIs could be designed and vertically arranged for increased office vitality. We conclude with implications and discussions for future work.
Our modern work environment and society often faces a serious problem of workplace stress which is frequently undetected until it leads to numerous health challenges and negatively impacts productivity. Current research in this field highlights workplace stress reflection and awareness using collective visualization. This paper focuses on the design of BallBounce a workplace biofeedback system that collects continuous time-domain measures of office workers’ physiological data, calculates cortisol level changes through measuring skin conductance and use p5.js, a processing software JavaScript variant to visualize collective stress data anonymously aimed at decreasing the physiological stress experienced in the context of Spatial User Interfaces by office workers and spark ideas of stress-relieving measures. This work makes a contribution to the spatial interaction design for well-being by the development of an unobtrusive, pervasive early stress detection system in organizations which can have a positive impact on both the health of employees and the collective atmosphere of enterprises.