We investigate the role of prediction in biological movement perception by comparing different representations of human movement in a virtual reality (VR) and online experiment. Predicting movement enables quick and appropriate action by both humans and artificial agents in many situations, e.g. when the interception of objects is important. We use different predictive movement primitive (MP) models to probe the visual system for the employed prediction mechanism. We hypothesize that MP-models, originally devised to address the degrees-of-freedom (DOF) problem in motor production, might be used for perception as well.
In our study we consider object passing movements. Our paradigm is a predictive task, where participants need to discriminate movement continuations generated by MP models from the ground truth of the natural continuation. This experiment was conducted first in VR, and later on continued as online experiment. We found that results transfer from the controlled and immersive VR setting with movements rendered as realistic avatars to a simple and COVID-19 safe online setting with movements rendered as stick figures. In the online setting we further investigate the effect of different occlusion timings. We found that contact events during the movement might provide segmentation points that render the lead-in movement independent of the continuation and thereby make perceptual predictions much harder for subjects. We compare different MP-models by their capability to produce perceptually believable movement continuations and their usefulness to predict this perceptual naturalness.
Our research might provide useful insight for application in computer animation, by showing how movements can be continued without violating the expectation of the user. Our results also contribute towards an efficient method of animating avatars by combining simple movements into complex movement sequences.
The perception of shapes and images are important areas that have been explored in previous research. While the problem of the interestingness of images has been explored, there is no previous work in the interestingness of 3D shapes to the best of our knowledge. In this paper, we study this novel problem. We collect data on how humans perceive the interestingness of 3D shapes and analyze the data to gain insights on what makes a shape interesting. We show that a function can be learned to autonomously predict an interestingness score given a new shape and demonstrate some interestingness-guided 3D shape applications.
In this paper we present evidence that human perception of grasp might be most dependent on the information retrieved during the inward latch rather than the release of objects. This research is motivated by a number of haptic simulations and devices and grounded in perception science. We ran a user study (n=12) with two devices one capable of delivering compliant simulations for both grip and release (CLAW), i.e. symmetric device; the other only capable of delivering adaptive grip simulations (CapstanCrunch), i.e. asymmetric device. We fund that both performed similarly well for realism scores in a grasping task with objects of different stiffness. That similar performance was despite CapstanCrunch release was delivered by a constant spring independently of the compliance of the object. Our results show preliminary evidence that when simulating haptic grasp the release might be less important. And we propose a new theory of asymmetry of grasp in haptic perception.
Ultrasonic mid-air haptic feedback enables the tactile exploration of virtual objects in digital environments. However, an object’s shape and texture is perceived multimodally, commencing before tactile contact is made. Visual cues, such as the spatial distribution of surface elements, play a critical first step in forming an expectation of how a texture should feel. When rendering surface texture virtually, its verisimilitude is dependent on whether these visually inferred prior expectations are experienced during tactile exploration. To that end, our work proposes a method where the visual perception of roughness is integrated into the rendering algorithm of mid-air haptic texture feedback. We develop a machine learning model trained on crowd-sourced visual roughness ratings of texture images from the Penn Haptic Texture Toolkit (HaTT). We establish tactile roughness ratings for different mid-air haptic stimuli and match these ratings to our model’s output, creating an end-to-end automated visuo-haptic rendering algorithm. We validate our approach by conducting a user study to examine the utility of the mid-air haptic feedback. This work can be used to automatically create tactile virtual surfaces where the visual perception of texture roughness guides the design of mid-air haptic feedback.
Deep neural networks for video-based eye tracking have demonstrated resilience to noisy environments, stray reflections, and low resolution. However, to train these networks, a large number of manually annotated images are required. To alleviate the cumbersome process of manual labeling, computer graphics rendering is employed to automatically generate a large corpus of annotated eye images under various conditions. In this work, we introduce a synthetic eye image generation platform that improves upon previous work by adding features such as an active deformable iris, an aspherical cornea, retinal retro-reflection, gaze-coordinated eye-lid deformations, and blinks. To demonstrate the utility of our platform, we render images reflecting the represented gaze distributions inherent in two publicly available datasets, NVGaze and OpenEDS. We also report on the performance of two semantic segmentation architectures (SegNet and RITnet) trained on rendered images and tested on the original datasets.
In theory, visual saliency in a graph should be used to draw attention to its most important component(s). Thus salience is commonly viewed both as a basis for predicting where graph readers are likely to look, and as a core design technique for emphasizing what a reader is intended to see among competing elements in a given chart or plot. We briefly review models, metrics, and applicable theories as they pertain to graphs. We then introduce new saliency models based on perceptual and cognitive theories that, to our knowledge, have not been previously applied to models for viewing statistical graphics. The resulting frameworks can be broadly classified as bottom-up perceptual models or top-down cognitive models. We report the results of evaluating these new theory-informed approaches on gaze data collected for statistical graphs and for more general information visualizations. Interestingly, the new models fare no better than previous ones. We review the experience, noting why we expected these hypotheses to be effective, and discuss how and why their performance did not match our aspirations. We suggest directions for future research that may help to clarify some of the issues raised.
Many HMD applications are improved when people perceive and act similarly in the virtual world as they would in an equivalent real environment. However, multiple studies show people underestimate egocentric distances in HMDs while being reasonably accurate in the real world. Some studies also show that brightness in the periphery may be one factor influencing these distance judgments. We built upon this previous work and examine if the brightness of the entire screen affects distance judgments. Using direct blind-walking, we compared a bright environment to an artificially darkened one with a between-subject experiment. The results of the experiment indicate that overall brightness has no significant effect on egocentric distance judgment. We discuss the implications of our findings in the context of previous work. Our findings may help designers of HMD software featuring dark environments (e.g., some entertainment applications).
Commodity-level virtual reality equipment is now available to all ages. To better understand how cognitive development affects people’s spatial memory in virtual reality, we assess how adults (20-29 years old) and teenagers (14-17 years old) represent their spatial memory of objects in an immersive virtual environment (IVE) where height is encoded. Despite virtual reality being a favorable conduit for the study of egocentric spatial memory, prior studies have predominately looked at objects placed at similar heights. Within a stairwell environment, participants learned the positions of nine target objects. In one condition, all objects were placed near eye height. In another, they were placed at varying heights. Our results indicate that participants’ errors and latencies were similar in both environments, and across age groups. Our results have implications for the development of IVEs and the expansion of immersive technology to a more diverse, younger audience.
Mixed reality is increasingly being used for training, especially for tasks that are difficult for humans to perform such as far distance perception. Although there has been much prior work on the perception of distance in mixed reality, the majority of the work has focused on distances ranging up to about 30 meters. The little work that has investigated distance perception beyond 30 meters has been mixed in terms of whether people tend to over- or underestimate distances in this range. In the current study, we investigated the perception of distances ranging from 25-500 meters in a fully mediated outdoor environment using the ZED Mini and HTC Vive Pro. To our knowledge, no work has examined distance perception to this range in mixed reality. Participants in the study verbally estimated the distances to a variety of objects. Distances were overestimated from 25-200 meters but underestimated from 300-500 meters. Overall, far distance perception in mixed reality may be biased in different directions depending on the range of distances, suggesting a need for future research in this area to support training of far distance perception.
A virtual reality (VR) system’s interface determines its corresponding interaction fidelity. While there exist several naturally-mapped interfaces, these have not been directly compared – in terms of presence, engagement, perceived performance, and real performance – to a more commonly-used interface for interacting with virtual reality: the gamepad. Because gamepads have evolved over several decades and have widespread adoption, it is pragmatic to ask whether it is viable to import them into the design of VR experiences. To study this, we developed a first-person shooter (FPS) Virtual environment and contrasted the VR experience using the Microsoft Xbox 360 controller against the HTC VIVE wand. We assessed the effect of the input device on self-reported presence, engagement, and performance as well as real performance in terms of accuracy and speed, two metrics relevant for our particular environment. A within-subjects FPS shooting task under accuracy and time constraints confirmed our hypotheses in favor of the VIVE wand: presence, engagement, perceived performance, and real performance were deemed significantly higher than the Xbox controller condition. These results are contextualized with participants‘ comments in a post-experiment debrief, in which (paradoxically) several participants reported preferring the gamepad. We discuss the implications of our findings and how what we call genre fidelity can impact the design of VR experiences.
In sound localization experiments, currently used metrics for front-back confusion (FBC) analysis weight the occurring FBCs equally, regardless of their deviation from the cone of confusion. To overcome this limitation, we introduce the FBC Score. A sound localization experiment in the horizontal plane with 12 bilaterally implanted cochlear implants (CI) users and 12 normal hearing subjects was performed to validate the method with real data. The overall FBC Rate of the CI users was twice as high as the FBC Score. For the control group, the FBC Rate was 4 times higher than the FBC Score. The results indicate that the FBC Rate is inflated by FBCs that show a considerable deviation from the corresponding value on the cone of confusion.
We detail the results of two experiments that empirically evaluate the efficiency of spatial selection performance of popular pointing and controller based direct 3D interaction within the participants’ maximum arms reach or personal space in an immersive virtual environment. In the first experiment, targets were presented in a circular configuration of 13 targets in accordance with the ISO 9241-9 Fitts’ Law standard, with a target distance of 5, 9 or 18 cm and target width of 0.65, 0.85, or 1.05cm, and with each circular configuration presented in a random distance of 60%, 75% or 90% of maximum arms length of the participant. The targets were centered at eye level, and in a within-subjects protocol participants selected the targets using a pointer at the tip of the controller or at the tip of their index finger on the dominant hand in the pointing based condition. The second experiment presented each target in a similar width and distance configuration as the previous experiment, but each target also varied in depth from one trial to the next between 60%, 75% or 90% of maximum arms reach. The selections were performed in an HTC Vive head mounted display VR world, using the popular Leap Motion gesture tracking device or the Vive controller. The results of both our experiment taken together revealed that with commodity 3D input devices, participants were more efficient in spatial selection performance in the popular controller based condition, as compared to the pointing based condition. Implications for VR developers and consumers are discussed in the latter sections.
With the rise of immersive virtual reality and telepresence, it is important to understand the factors that contribute to creating an optimal user experience. In particular, there are divergent recommendations for setting camera heights in virtual contexts that facilitate telepresence. Therefore, we conducted a 2x2 mixed-design experiment with 93 college students asking them to select their preferred camera height when varying camera placement (overhead, chest) and the presence of human avatars (present, not present). We found that while camera placement did not have a significant effect of preferred camera height, the presence of avatars (increased height preference) and gender (women preferred lower heights) were significant. Our results provide evidence that factors within a virtual environment and individual differences influence users’ preferences of camera height. Thus, systems designed for immersive virtual reality and telepresence should customize camera height based on these factors.
Perceptual studies on pedestrians and bicyclists have shown that highlighting the most informative features of the road user increases visibility in real-life traffic scenarios. As technological limitations on automotive lamp configurations are decreasing, prototypes display novel layouts driven by aesthetic design. Could these advances also be used to aid perceptual processes? We tested this question in a computer-based visual search experiment. 3D rendered images of the frontal view of a motorbike and of a car with regular (REG) or contour enhancing (CON) headlamp configurations were compiled into scenes, simulating the crowding effect in night-time traffic. Perceptual performance in finding a target motorbike (present in 2/3 of all trials) among 28 cars of either condition was quantified through discriminability, reaction time, and eye movement measures. All measures showed a significant perceptual advantage in CON vs REG trials. Results suggest that facilitating object perception in traffic by highlighting relevant features of vehicles could increase visual performance in both speed and discriminability. Furthermore, the associated gain in eye movement efficiency may decrease fatigue. Similar methods, with varying trade-off ratios between fidelity and low-level control, could be applied to design the most ideal lamp configuration for traffic safety.
While digital humans are key aspects of the rapidly evolving areas of virtual reality, gaming, and online communications, many applications would benefit from using digital personalized (stylized) representations of users, as they were shown to highly increase immersion, presence and emotional response. In particular, depending on the target application, one may want to look like a dwarf or an elf in a heroic fantasy world, or like an alien on another planet, in accordance with the style of the narrative. While creating such virtual replicas requires stylization of the user’s features onto the virtual character, no formal study has however been conducted to assess the ability to recognize stylized characters. In this paper, we present a perceptual study investigating the effect of the degree of stylization on the ability to recognize an actor, and the subjective acceptability of stylizations. Results show that recognition rates decrease when the degree of stylization increases, while acceptability of the stylization increases. These results provide recommendations to achieve good compromises between stylization and recognition, and pave the way to new stylization methods providing a tradeoff between stylization and recognition of the actor.
Redirected walking (RDW) allows users to explore a virtual environment larger than the physical tracking space by real walking. When RDW is applied within certain thresholds, users do not notice it and remain immersed in the virtual environment. There are many factors that affect these thresholds such as walking speed, gender, and field of view. However, redirected walking thresholds (RDTs) were never studied when an avatar is present, and it is not known how the sense of embodiment over this avatar affects users’ RDTs.
In this paper, we present an experiment to investigate the effect of sense of embodiment (SoE) of an avatar on curvature RDTs. The SoE was manipulated by changing perspective as well as sensorimotor congruency and subjectively assessed using an SoE questionnaire. Results showed perspective and movement congruency have significant effects on the SoE, and that agency, which is a central component of embodiment, has a negative effect on curvature RDTs. Moreover, existing results on gender effects on RDTs were also reconfirmed.
The demand for cartoon animation content is on the rise, driven largely by social media, video conferencing and virtual reality. New content creation tools and the availability of open-source game engines allow characters to be animated by even novice creators (e.g., Loomie and Pinscreen for creating characters, Hyprface and MotionX for facial motion capture, etc,). However, lighting of animated characters is a skilled art-form, and there is a lack of guidelines for lighting characters expressing emotion in an appealing manner.
Recent perceptual research has attempted to provide user guidelines through rigorous and labor-intensive rating-scale experiments exploring many parameters of light [Wisessing et al. 2016, 2020]. We propose an alternative approach based on the method of adjustment, similar to common lighting tools in 3D content creation software, but with the power of modern real-time graphics. Our framework allows users to interactively adjust lighting parameters and instantly assess the results on the animated characters, instead of having to wait for the render to complete.
Synthesizing motions that look realistic and diverse is a challenging task in animation. Therefore, a few generic walking motions are typically used when creating crowds of walking virtual characters, leading to a lack of variations as motions are not necessarily adapted to each and every virtual character’s characteristics. While some attempts have been made to create variations, it appears necessary to identify the relevant parameters that influence users’ perception of such variations to keep a good trade-off between computational costs and realism. In this paper, we therefore investigate the ability of viewers to identify an invariant parameter of human walking named the Walk Ratio (step length to step frequency ratio), which was shown to be specific to each individual and constant for different speeds, but which has never been used to drive animations of virtual characters. To this end, we captured 2 female and 2 male actors walking at different freely chosen speeds, as well as at different combinations of step frequency and step length. We then performed a perceptual study to identify the Walk Ratio that was perceived as the most natural for each actor when animating a virtual character, and compared it to the Walk Ratio freely chosen by the actor during the motion capture session. We found that Walk Ratios chosen by observers were in the range of Walk Ratios measured in the literature, and that participants perceived differences between the Walk Ratios of animated male and female characters, as evidenced in the biomechanical literature. Our results provide new considerations to drive the animation of walking virtual characters using the Walk Ratio as a parameter, and might provide animators with novel means to control the walking speed of characters through simple parameters while retaining the naturalness of the locomotion.