We propose a novel body-centered interaction system making use of a spherical camera attached to a hand. Its broad and unique field of view enables an all-in-one approach to sensing multiple pieces of contextual information in hand-based spatial interactions: (i) hand location on the body surface, (ii) hand posture, (iii) hand keypoints in certain postures, and (iv) the near-hand environment. The proposed system makes use of a deep-learning approach to perform hand location and posture recognition. The proposed system is capable of achieving high hand location and posture recognition accuracy, 85.0 % and 88.9 % respectively, after collecting sufficient data and training. Our result and example demonstrations show the potential of utilizing 360° cameras for vision-based sensing in context-aware body-centered spatial interactions.
We investigate how smartphones can be used to mediate the manipulation of smartphone-based content in spatial augmented reality (SAR). A major challenge here is in seamlessly transitioning a phone between its use as a smartphone to its use as a controller for SAR. Most users are familiar with hand extension as a way for using a remote control for SAR. We therefore propose to use hand extension as an intuitive mode switching mechanism for switching back and forth between the mobile interaction mode and the spatial interaction mode. Based on this intuitive mode switch, our technique enables the user to push smartphone content to an external SAR environment, interact with the external content, rotate-scale-translate it, and pull the content back into the smartphone, all the while ensuring no conflict between mobile interaction and spatial interaction. To ensure feasibility of hand extension as mode switch, we evaluate the classification of extended and retracted states of the smartphone based on the phone’s relative 3D position with respect to the user’s head while varying user postures, surface distances, and target locations. Our results show that a random forest classifier can classify the extended and retracted states with a 96% accuracy on average.
We present TanGo, an always-available input modality on VR headset, which can be complementary to current VR accessories. TanGO is an active mechanical structure symmetrically equipped on Head-Mounted Display, enabling 3-dimensional bimanual sliding input with each degree of freedom furnished a brake system driven by micro servo generating totally 6 passive resistive force profiles. TanGo is an all-in-one structure that possess rich input and output while keeping compactness with the trade-offs between size, weight and usability. Users can actively gesture like pushing, shearing, or squeezing with specific output provided while allowing hands to rest in stationary experiences. TanGo also renders users flexibility to switch seamlessly between virtual and real usage in Augmented Reality without additional efforts and instruments. We demonstrate three applications to show the interaction space of TanGo and then discuss its limitation and show future possibilities based on preliminary user feedback.
Hands-free manipulation of 3D objects has long been a challenge for augmented and virtual reality (AR/VR). While many methods use eye gaze to assist with hand-based manipulations, interfaces cannot yet provide completely gaze-based 6 degree-of-freedom (DoF) manipulations in an efficient manner. To address this problem, we implemented three methods to handle rotations of virtual objects using gaze, including RotBar: a method that maps line-of-sight eye gaze onto per-axis rotations, RotPlane: a method that makes use of orthogonal planes to achieve per-axis angular rotations, and RotBall: a method that combines a traditional arcball with an external ring to handle user-perspective roll manipulations. We validated the efficiency of each method by conducting a user study involving a series of orientation tasks along different axes with each method. Experimental results showed that users could accomplish single-axis orientation tasks with RotBar and RotPlane significantly faster and more accurate than RotBall. On the other hand for multi-axis orientation tasks, RotBall significantly outperformed RotBar and RotPlane in terms of speed and accuracy.
Interaction cues inform users about potential actions to take. Tutorials, games, educational systems, and training applications often employ interaction cues to direct users to take specific actions at particular moments. Prior studies have investigated many aspects of interaction cues, such as the feedforward and perceived affordances that often accompany them. However, two less-researched aspects of interaction cues include the effects of their purpose (i.e., the type of task conveyed) and their timing (i.e., when they are presented). In this paper, we present a study that evaluates the effects of interaction cue purpose and timing on performance while learning and retaining tasks with a virtual reality (VR) training application. Our results indicate that participants retained manipulation tasks significantly better than travel or selection tasks, despite both being significantly easier to complete than the manipulation tasks. Our results also indicate that immediate interaction cues afforded significantly faster learning and better retention than delayed interaction cues.
Impossible spaces make it possible to maximize the area of virtual environments that can be explored on foot through self-overlapping virtual architecture. This paper details a study exploring how users’ ability to detect overlapping virtual architecture is affected when the virtual environment includes distractors that impose additional cognitive load by challenging the users. The results indicate that such distractors both increase self-reported task load and reduce users’ ability to reliably detect overlaps between adjacent virtual rooms. That is, rooms could overlap by up to 68% when distractors were presented, compared to 40% when no distractors were present.
We perform an experiment on distance perception in a large-screen display immersive virtual environment. Large-screen displays typically make direct blind walking tasks impossible, despite them being a popular distance response measure in the real world and in head-mounted displays. We use a movable large-screen display to compare direct blind walking and indirect triangulated pointing with monoscopic viewing. We find that participants judged distances to be 89.4% ± 28.7% and 108.5% ± 44.9% of their actual distances in the direct blind walking and triangulated pointing conditions, respectively. However, we find no statistically significant difference between these approaches. This work adds to the limited number of research studies on egocentric distance judgments with a large display wall for distances of 3-5 meters. It is the first, to our knowledge, to perform direct blind walking with a large display.
Teleporting interfaces are widely used in virtual reality applications to explore large virtual environments. When teleporting, the user indicates the intended location in the virtual environment and is instantly transported, typically without self-motion cues. This project explored the cost of teleporting on the acquisition of survey knowledge (i.e., a ”cognitive map”). Two teleporting interfaces were compared, one with and one without visual and body-based rotational self-motion cues. Both interfaces lacked translational self-motion cues. Participants used one of the two teleporting interfaces to find and study the locations of six objects scattered throughout a large virtual environment. After learning, participants completed two measures of cognitive map fidelity: an object-to-object pointing task and a map drawing task. The results indicate superior spatial learning when rotational self-motion cues were available. Therefore, virtual reality developers should strongly consider the benefits of rotational self-motion cues when creating and choosing locomotion interfaces.
Due to the additive light model employed by most optical see-through head-mounted displays (OST-HMDs), they provide the best augmented reality (AR) views in dark environments, where the added AR light does not have to compete against existing real-world lighting. AR imagery displayed on such devices loses a significant amount of contrast in well-lit environments such as outdoors in direct sunlight. To compensate for this, OST-HMDs often use a tinted visor to reduce the amount of environment light that reaches the user’s eyes, which in turn results in a loss of contrast in the user’s physical environment. While these effects are well known and grounded in existing literature, formal measurements of the illuminance and contrast of modern OST-HMDs are currently missing. In this paper, we provide illuminance measurements for both the Microsoft HoloLens 1 and its successor the HoloLens 2 under varying environment lighting conditions ranging from 0 to 20,000 lux. We evaluate how environment lighting impacts the user by calculating contrast ratios between rendered black (transparent) and white imagery displayed under these conditions, and evaluate how the intensity of environment lighting is impacted by donning and using the HMD. Our results indicate the further need for refinement in the design of future OST-HMDs to optimize contrast in environments with illuminance values greater than or equal to those found in indoor working environments.
Considering human capability for spatial orientation and navigation, the visualization used to support the localization of off-screen targets inevitably influences the visual-spatial processing that relies on two frameworks. So far it is not proven which frame of reference, egocentric or exocentric, contributes most to efficient viewpoint guidance in a head-mounted Augmented Reality environment. This could be justified by the lack of objectively assessing the allocation of attention and mental workload demanded by the guidance method. This paper presents a user study investigating the effect of egocentric and exocentric viewpoint guidance on visual attention and mental workload. In parallel to a localization task, participants had to complete a divided attention task using the oddball paradigm. During task fulfilment, the heart rate variability was measured to determine the physiological stress level. The objective assessment of mental workload was supplemented by subjective ratings using the NASA TLX. The results show that egocentric viewpoint guidance leads to most efficient target cueing in terms of faster localization, higher accuracy and slower self-reported workload. In addition, egocentric target cueing causes a slight decrease in physiological stress and enables faster recognition of simultaneous events, although visual attention seemed to be covertly oriented.
We present RayGraphy display technology that renders volumetric graphics by superimposing the trajectories of lights in indoor space filled with fog. Since the traditional FogScreen approach requires the shaping of a thin layer of fog, it can only show two-dimensional images in a narrow range that is close to the fog-emitting nozzle. Although a method that renders volumetric graphics with plasma generated using high-power laser was also proposed, its operation in a public space is considered quite dangerous. The proposed system mainly comprises dozens of laser projectors circularly arranged in a fog-filled space, and renders volumetric graphics in a fog by superimposing weak laser beams from the projectors. Compared to the conventional methods, this system employing weak laser beams and the non-shaped innocuous fog is more scalable and safer. We aim to construct a new spatial augmented reality platform where computer-generated images can be drawn directly in the real world. We implement a prototype that consists of 32 laser projectors and a fog machine. Moreover, we evaluate and discuss the system performance and characteristics in experiments.
Visually Induced Motion Sickness (VIMS) plagues a significant number of individuals who utilize Virtual Reality (VR) systems. Although several solutions have been proposed that aim to reduce the onset of VIMS, a reliable approach for moderating it within VR experiences has not yet been established. Here, we set the initial stage to explore the use of controlled olfactory stimuli towards reducing symptoms associated with VIMS. In this experimental study, participants perceived different olfactory stimuli while experiencing a first-person-view rollercoaster simulation using a VR Head-Mounted Display (HMD). The onsets of VIMS symptoms were analyzed using both the Simulator Sickness Questionnaire (SSQ) and the Fast Motion Sickness Scale (FMS). Notable reductions in overall SSQ and FMS scores suggest that providing a peppermint aroma reduces the severity of VIMS symptoms experienced in VR. Additional anecdotal feedback and potential future studies on using controlled olfactory stimuli to minimize the occurrence of VIMS symptoms are also discussed.
Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.
The combination of eye and head movements plays a major part in our visual process. The neck provides mobility for the head motion and also limits the range of visual motion in space. In this paper, a robotic neck augmentation system is designed to surmount the physical limitations of the neck. It applies in essential a visuomotor modification to the vision-neck relationship. We conducted two experiments to measure and verify the performance of the neck alternation. The multiple test results indicate the system has a positive effect to augment the motions of vision. Specifically, the robotic neck can enlarge the range of neck motion to 200%, and influence the response motion, by overall 22% less in time and 28% faster in speed.
Visual video analytics research, stemming from data captured by surveillance cameras, have mainly focused on traditional computing paradigms, despite emerging platforms including mobile devices. We investigate the potential for situated video analytics, which involves the inspection of video data in the actual environment where the video was captured [14]. Our ultimate goal is to explore the means to visually explore video data effectively, in situated contexts. We first investigate the performance of visual analytic tasks in situated vs. non-situated settings. We find that participants largely benefit from environmental cues for many analytic tasks. We then pose the question of how best to represent situated video data. To answer this, in a design session we explore end-users’ views on how to capture such data. Through the process of sketching, participants leveraged being situated, and explored how being in-situ influenced the participants’ integration of their designs. Based on these two elements, our paper proposes the need to develop novel spatial analytic user interfaces to support situated video analysis.
Today’s augmented and virtual reality (AR/VR) systems do not provide body, hand or mouth tracking without special worn sensors or external infrastructure. Simultaneously, AR/VR systems are increasingly being used in co-located, multi-user experiences, opening the possibility for opportunistic capture of other users. This is the core idea behind BodySLAM, which uses disparate camera views from users to digitize the body, hands and mouth of other people, and then relay that information back to the respective users. If a user is seen by two or more people, 3D pose can be estimated via stereo reconstruction. Our system also maps the arrangement of users in real world coordinates. Our approach requires no additional hardware or sensors beyond what is already found in commercial AR/VR devices, such as Microsoft HoloLens or Oculus Quest.
Smartphones secure a significant amount of personal and private information, and are playing an increasingly important role in people’s lives. However, current techniques to manually authenticate to smartphones have failed in both not-so-surprising (shoulder surfing) and quite surprising (smudge attacks) ways. In this work, we propose a new technique called 3D Pattern. Our 3D Pattern technique takes advantage of pre-touch sensing, which could soon allow smartphones to sense a user’s finger position at some distance from the screen. We describe and implement the technique, and evaluate it in a small pilot study (n=6) by comparing it to PIN and pattern locks. Our results show that although our prototype takes longer to authenticate, it is completely immune to smudge attacks and promises to be more resistant to shoulder surfing.
As immersive, room-scale virtual and augmented reality become more utilized in productive workflows, the need for fast, equally-immersive text input grows. Traditional keyboard interaction in these room-scale environments is limiting because of its predominantly-seated usage and the necessity for visual indicators of the hands and keys potentially breaking immersion. Pen-based VR/AR interaction presents an alternative immersive text input modality with high throughput. In this paper, we present two novel interfaces designed for a pen-shaped stylus that do not require the positioning of the controller within a region of space, but rather detect input from rotation and on-board buttons. Initial results show that compared with Controller Pointing text entry techniques, Tilt-Type was slower but produced fewer errors and was less physically demanding. Additional studies are needed to validate the results.
Face-to-face communication has evolved as most natural means for communication. However, virtual group meetings have received considerable attention as an alternative for allowing multiple persons to communicate over distance, e. g., via video conferences or immersive virtual reality (VR) systems, but they incur numerous limitations and challenges. In particular, they often hinder spatial perception of full-body language, deictic relations, or eye-to-eye contact. The differences between video conferences and immersive VR meetings still remain poorly understood. We report about a pilot study in which we compared virtual group meetings using video conferences and VR meetings with and without head-mounted displays (HMDs). The results suggest that participants feel higher sense of presence when using an immersive VR meeting, but only if an HMD is used. Usability of video conferences as well as immersive VR is acceptable, whereas non-immersive VR without HMD was not acceptable.
Design thinking is regarded as one of the important problem-solving skills that include a prototyping stage for early testing the feasibility of an idea. With the advancement in technologies, virtual reality is being applied to create immersive experiences in design prototyping. While tracking physical objects and visualizing the tracking in the virtual world could enhance the immersive experiences, existing tracking devices are not easy to set up and might not scale cost-effectively in multi-user scenarios. The aim of this study is, therefore, to propose a framework for using simple and low-cost hardware for tracking physical objects in a multi-user environment. A sample implementation along with demonstrations on the tracking are also presented.
The rapid growth of e-commerce is challenging parcel delivery companies to meet the demand while keeping their costs low. While drivers have to meet unrealistic targets, the companies deal with a high staff turnover percentage for the last-mile. Consequently, with every turnover, the critical experience for the last-mile is lost. In this paper, we introduce an augmented-reality based system that keeps track of events during the parcel handling process for last-mile logistics. The system can register, track parcels inside the truck to the driver and project the location in augmented form. The system, which we have integrated into a mobile application, potentially reduces processing times and accelerates the training for new drivers. We discuss first-hand observations and propose further research areas. Field-tests are currently under development to verify whether the system can operate in a real-life environment.
We have developed an augmented-reality (AR) based system which keeps track of events during the parcel handling process for last-mile logistics. The system can retrieve and highlight in AR on user's smartphone, the location of parcels in large piles of parcels. A camera array automatically detects the parcels placed manually on the shelf by an operator. New parcels are scanned, and parcel fingerprints are generated semi-automatically. The system can detect and track the known parcels by fingerprint and can further highlight the location of the parcel using 3D visual clues, directly on the smartphone of the operator.
A swarm is the coherent behavior that emerges ubiquitously from simple interaction rules between self-organized agents. Understanding swarms is of utmost importance in many disciplines and jobs, but hard to teach due to the elusive nature of the phenomenon, which requires to observe events at different scales (i.e., from different perspectives) and to understand the links between them. In this article, we investigate the potential of combining a swarm of tangible, haptic-enabled robots with Virtual Reality, to provide a user with multiple perspectives and interaction modalities on the swarm, ultimately aiming at supporting the learning of emergent behaviours. The framework we developed relies on Cellulo robots and Oculus Quest and was preliminarily evaluated in a user study involving 15 participants. Results suggests that the framework effectively allows users to experience the interaction with the swarm under different perspectives and modalities.
BUDI (Building Urban Designs Interactively) is an integrated 3D visualization and remote collaboration platform for complex urban design tasks. Users with different backgrounds can remotely engage in the entire design cycle, improving the quality of the end result. In BUDI, a virtual environment was designed to seamlessly expand beyond a traditional two-dimensional surface into a fully immersive three-dimensional space. Clients on various devices connect with servers for different functionalities tailored for various user groups. A demonstration with a local urban planning use-case shows the costs and benefits of BUDI as a spatial-based collaborative platform. We consider the trade-offs encountered when trying to make the collaboration seamless. Specifically, we introduce the multi-dimensional data visualization and interactions the platform provides, and outline how users can interact with and analyze various aspects of urban design.
We envision the development of a novel Virtual Equipment System to replace existing 2d Interfaces in virtual reality with a set of embedded and distributed input devices. We have built a prototype that takes advantage of the user's spatial awareness, offering them a set of virtual equipment relevant to their sensory organs. It allows the users to quickly access a suite of intuitively designed interfaces using their spatial and body awareness. Our Virtual Equipment System can be standardized and applied to other extended reality devices and frameworks.
In immersive VR environments, object selection is an essential interaction. However, current object selection techniques suffer from issues of hand jitter, accuracy, and fatigue, especially to select nail-size objects. Here, we present locked dwell time-based point and tap, a novel object selection technique designed for nail-size object selection in a dense virtual environment. The objects are within arm’s reach. We also compare locked dwell time-based point and tap with magnetic grasp, pinch and raycasting. 40 participants evaluated the effectiveness and efficiency of these techniques. The results found that locked dwell time-based point and tap took significantly less task completion time and error rate. It was also the most preferred and caused least effort among all the techniques. We also measured easy to use, easy to learn and perceived naturalness of the techniques.
We present a case study on the use of mixed reality (MR) spatial computing in a fully remote classroom. We conducted a 10-week undergraduate class fully online, using a combination of traditional teleconferencing software and MR spatial computing (Magic Leap One headsets) using an avatar-mediated social interaction application (Spatial). The class culminated in a virtual poster session, using Spatial in MR to present project results, and we conducted a preliminary investigation of students experiences via interviews and questionnaires. Students reported that they had a good experience using MR for the poster session and that they thought it provided advantages over 2D video conferencing. Particular advantages cited were a stronger sense that they were in the presence of other students and instructors, an improved ability to tell where others were directing their attention, and a better ability to share 3D project content and collaborate.
A common way to perform data entry in virtual reality remains using virtual laser pointers to select characters from a flat 2D keyboard in 3-dimensional space. In this demo, we present a data input method that takes advantage of 3D space by interacting with a keyboard with keys arranged in three dimensions. Each hand is covered by a hemisphere of keys based on the QWERTY layout, allowing users to type by moving their hands in a motion similar to punching. Although the goal is to achieve a gesture more akin to tapping, current controllers or hand tracking technology doesn't allow such high fidelity. Thus, the presented interaction using VR controllers is more comparable to punching.
We present a user study exploring spatial scale perception for design tasks by simulating different levels of eye heights (EHs) and inter-pupillary distances (IPDs) in a virtual environment. The study examined two levels of spatial scale perception of two-year-old children and adults for a chair scale estimation task. This was to provide appropriate perspectives to enable a suitable estimation of the virtual object scale for the target group during the design process. We found that the disparity between the perspective taken and the target user group had a significant impact on the resulting scale of the chairs. Our key contribution is in providing evidence to support that experiencing different spatial scale perception in VR has the potential to improve the designer's understanding of the end-user's perspective during the design process.
Surface gestures have been the dominant form of interaction for mobile devices with touchscreen. From our survey of current consumer mobile Augmented Reality (AR) applications, we found that these applications also adopted the surface gestures metaphor as their interaction technique. However, we believe that designers should be able to utilize the affordance of the three-dimensional interaction space that AR provides. We compared two interaction techniques of surface and motion gestures in a Pokémon GO-liked mobile AR game that we developed. Ten participants experienced both techniques to capture three sizes of Pokémon. The comparison examined the two types of gestures in terms of accuracy, game experience, and subjective ratings on goodness, ease of use, and engagement. It was found that the motion gesture provided better engagement and game experience, while the surface gesture was more accurate and easier to use.
Virtual Reality (VR) offers a boundless space for users to create, express, and explore in the absence of the limitation of the physical world. Teleportation is a locomotion technique in a virtual environment that overcomes our spatial constraint and a common approach for travel in VR applications. However, in a multi-user virtual environment, teleportation causes spatial discontinuity of user’s location in space. This may cause confusion and difficulty in tracking one’s collaborator who keeps disappearing and reappearing around the environment. To reduce the impact of such issue, we have identified the requirements for designing the substituted visualization (SV) and present four SVs of the collaborator during the process of teleportation, which includes hover, jump, fade, and portal.
Immersive reality technologies have been widely utilized in the area of cultural heritage, also known as Virtual Heritage. We present a tangible Virtual Reality (VR) interaction demo that allows users to freely walk in the physical space while engaging with digital and tangible objects in a “learning area”. The space setup includes stations that are used symbiotically in the virtual and physical environments, such setup defines consistency throughout the experience. With this method, we enhance the immersive learning experience by mapping the large virtual space into a smaller physical place with a seamless transition.
In this paper, we propose a observation method to easily distinguish the temporal changes of three-dimensional (3D) motions and its temporal manipulation interface. Conventional motion observation methods have several limitations when observing 3D motion data. Direct Manipulation (DM) interface is suitable for observing the temporal features of videos. Besides, it is suitable for daily use because it does not require learning any special operations. Our aim is to introduce DM into 3D motion observation without losing these advantage by mapping temporal changes into the specific vector in 3D space in the real-space.
Spatial interactions have a great potential in ubiquitous environments. Physical objects, endowed with interconnected sensors, cooperate in a transparent manner to help users in their daily tasks. In our context, we qualify an interaction as spatial if it results from considering spatial attributes (location, orientation, speed…) of the user's body or of a given object used by her/him. According to our literature review, we found that despite their benefits (simplicity, concision, naturalness…), spatial interactions are not as widespread as other interaction models such as graphical or tactile ones. We think that this fact is due to the lack of software tools and frameworks that can make the design and development of spatial interaction easy and fast. In this paper, we propose a spatial interaction modeling language named SUIL (Spatial User Interaction Language) which represents the first step towards the development of such tools.