Simulating the behavior of crowds of artificial entities that have humanoid embodiments has become an important element in computer graphics and special effects. However, many important questions remain in relation to the perception of social behavior and expression of emotions in virtual crowds. Specifically, few studies have considered the role of background context on the perception of the full-body emotion expressed by sub-constituents of the crowd i.e. individuals and small groups. In this paper, we present the results of perceptual studies in which animated scenes of expressive virtual crowd behavior were rated in terms of their valence by participants. The behaviors of a task-irrelevant crowd in the background were altered between neutral, happy and sad in order to investigate effects on the perception of emotion from task-relevant individuals in the foreground. Effects of the task irrelevant background on ratings of foreground characters were found, including cases that accompanied negatively valenced stimuli.
Animated characters appear in applications for entertainment, education, and therapy. When these characters display appropriate emotions for their context, they can be particularly effective. Characters can display emotions by accurately mimicking the facial expressions and vocal cues that people display or by damping or exaggerating the emotionality of the expressions. In this work, we explored which of these strategies would be most effective for animated characters. We investigated the effects of altering the auditory and facial levels of expressiveness on emotion recognition accuracy and ratings of perceived emotional intensity and naturalness. We ran an experiment with emotion (angry, happy, sad), auditory emotion level (low, high), and facial motion magnitude (damped, unaltered, exaggerated) as within-subjects factors. Participants evaluated animations of a character whose facial motion matched that of an actress we tracked using an active appearance model. This method of tracking and animation can capture subtle facial motions in real-time, a necessity for many interactive animated characters. We manipulated auditory emotion level by asking the actress to speak sentences at varying levels, and we manipulated facial motion magnitude by exaggerating and damping the actress's spatial motion. We found that the magnitude of auditory expressiveness was positively related to emotion recognition accuracy and ratings of emotional intensity. The magnitude of facial motion was positively related to ratings of emotional intensity but negatively related to ratings of naturalness.
Abstract motion textures are widely applied in visual design and immersive environments such as games to imbue the environment or presentation with affect. While visual designers and artists carefully manipulate visual elements such as colour, form and motion to evoke affect, understanding what aspects of motion contribute to this still remains a matter of designer craft rather than validated principle. We report an empirical study of how simple features of motion in 3D textures, or motionscapes, contribute to the elicitation of affect. 12 university students were recruited to evaluate a series of 3D motionscapes. Results showed basic motion properties including speed, direction, path curvature and shape had significant influence on affective impressions such as valence, comfort, urgency and intensity, suggesting further directions for applications and explorations in this design space.
Designating targets, such as elementary primitives (e.g., vertices, edges and faces) or complex objects (e.g., 3D objects and structures), to a partner in Collaborative Virtual Environments is a real challenge in different applications. In fact, the communication constraints in such environments limit the understanding of the partner's actions, which lead to wrong selections and thus to conflicting actions. Beyond these limitations, applications providing complex data to manipulate, such as molecular environments, introduce additional constraints which limit the perception and access to the partner's working space. This paper proposes a remote designation procedure linked with an haptic attraction model for molecular deformation tasks, we propose to guide physically the partner to the designated atom. Experimental results showed a significant improvement in performance and efficiency for the different steps of collaborative tasks (i.e., designation/selection of atoms and deformation of structures). Moreover, this haptic guidance method enables more accurate selections of atoms for the partner.
We report on an experiment testing gaze-contingent depth-of-field (DOF) for the reduction of visual discomfort when viewing stereoscopic displays. Subjective results are compelling, showing, we believe for the first time, that gaze-contingent depth-of-field significantly reduces visual discomfort. When individual stereoacuity is taken into account, objective measurements of gaze vergence corroborate previous reports, showing significant bias toward the zero depth plane, where error is smallest. As with earlier similar attempts, participants expressed a dislike toward gaze-contingent DOF. Although not statistically significant, this dislike is likely attributed to the eye tracker's spatial inaccuracy as well as the DOF simulation's noticeable temporal lag.
A pervasive artifact that occurs when visualizing 3D content is the so-called "cardboarding" effect, where objects appear flat due to depth compression, with relatively little research conducted to perceptually quantify its effects. Our aim is to shed light on the subjective preferences and practical perceptual limits of stereo vision with respect to cardboarding. We present three experiments that explore the consequences of displaying simple scenes with reduced depths using both subjective ratings and adjustments and objective sensitivity metrics. Our results suggest that compressing depth to 80% or above is likely to be acceptable, whereas sensitivity to the cardboarding artifact below 30% is very high. These values could be used in practice as guidelines for commonplace depth mapping operations in 3D production pipelines.
Despite extensive usability research on virtual reality (VR) within academia and the increasing use of VR within industry, there is little research evaluating the usability and benefits of VR in applied settings. This is problematic for individuals desiring design principals or best practices for incorporating VR into their business. Furthermore, the literature that does exist often doesn't account for the characteristics of intended users. This shortage is problematic because individual differences have been shown to have a significant impact on performance in spatial tasks. The research presented here is an evaluation of a VR system in use at The Boeing Company, with 28 employees performing navigation and wayfinding tasks across two shading conditions (flat/smooth) and two display conditions (desktop/immersive). Performance was measured based on speed and accuracy. Individual difference factors were used as covariates. Results showed that women and those with high spatial orientation ability performed faster in smooth shading conditions, while flat shading benefited those with low spatial ability particularly for the navigation task. Unexpectedly, immersive presentation did not improve performance significantly. These results demonstrate the impact of individual differences on spatial performance and help determine appropriate tasks, display parameters, and suitable users for the VR system.
The decoupling of eye vergence and accommodation (V/A) has been found to negatively impact depth interpretation, visual comfort and fatigue. In this paper, we explore a hypothesis that placement of visual cues within a scene can assist a viewer in the process of maintaining the V/A decoupling. This effect is demonstrated through the use of a continuous depth plane that connects spatially distinct scene elements. Our experimental design enables us to make the following three contributions: (1) We show that a continuous depth element can improve the time it takes to transition visual attention in depth. (2) We observe that the subjective assessment of fatigue emerges before we detect a quantitative decline in performance. (3) We aim to motivate that stereoscopic 3D content creators may learn scene composition, framing and montage from visual psychophysics.
An ongoing challenge with head-mounted eye-trackers is how to analyze the data from multiple individuals looking at the same scene. Our work focuses on static scenes. Previous approaches involve capturing a high resolution panorama of the scene and then mapping the fixations from all viewers onto this panorama. However such approaches are limited as they typically restrict all viewers to observe the scene from the same stationary vantage point. We present a system which incorporates user-perspective gaze data with a 3D reconstruction of the scene. The system enables the visualization of gaze data from multiple viewers on a single 3D model of the scene instead of multiple 2D panoramas. The subjects are free to move about the scene as they see fit which leads to more natural task performance. Furthermore since it is not necessary to warp the scene camera video into a flat panorama, our system preserves the relative positions of the objects in the scene during the visualization process. This gives better insight into the viewer's problem solving and search task strategies. Our system has high applicability in complex static environments such as crime scenes and marketing studies in retail stores.
Many people who first see a high dynamic range (HDR) display get the impression that it is a 3D display, even though it does not produce any binocular depth cues. Possible explanations of this effect include contrast-based depth induction and the increased realism due to the high brightness and contrast that makes an HDR display "like looking through a window". In this paper we test both of these hypotheses by comparing the HDR depth illusion to real binocular depth cues using a carefully calibrated HDR stereoscope. We confirm that contrast-based depth induction exists, but it is a vanishingly weak depth cue compared to binocular depth cues. We also demonstrate that for some observers, the increased contrast of HDR displays indeed increases the realism. However, it is highly observer-dependent whether reduced, physically correct, or exaggerated contrast is perceived as most realistic, even in the presence of the real-world reference scene. Similarly, observers differ in whether reduced, physically correct, or exaggerated stereo 3D is perceived as more realistic. To accommodate the binocular depth perception and realism concept of most observers, display technologies must offer both HDR contrast and stereo personalization.
When synthesizing images of fine objects like hair, we usually adopt sub-pixel drawing techniques to improve the image quality. For this paper, we analyzed the statistical features of images of thin lines and found that the distributions of the pixel values tended to be Gaussian. A psychophysical experiment showed that images of stripes with the appropriate Gaussian noise added are perceived to be finer than the original ones. We applied this perceptional property to hair rendering and developed a fast fine hair drawing algorithm.
Recent advances in technology and the opportunity to obtain commodity-level components have made the development and use of three-dimensional virtual environments more available than ever before. How well such components work to generate realistic virtual environments, particularly environments suitable for perception and action studies, is an open question. In this paper we compare two virtual reality systems in a variety of tasks: distance estimation, virtual object interaction, a complex search task, and a simple viewing experiment. The virtual reality systems center around two different head-mounted displays, a low-cost Oculus Rift and a high-cost Nvis SX60, which differ in resolution, field-of-view, and inertial properties, among other factors. We measure outcomes of the individual tasks as well as assessing simulator sickness and presence. We find that the low-cost system consistently outperforms the high-cost system, but there is some qualitative evidence that some people are more subject to simulator sickness in the low-cost system.
Distance perception is a crucial component for many virtual reality applications, and numerous studies have shown that egocentric distances are judged to be compressed in head-mounted display (HMD) systems. Geometric minification, a technique where the graphics are rendered with a field of view that larger than the HMD's field of view, is one known method of eliminating the distance compression [Kuhl et al. 2009; Zhang et al. 2012]. This study uses direct blind walking to determine how minification might impact distance judgments in the Oculus Rift HMD which has a significantly larger FOV than previous minification studies. Our results show that people were able to make accurate distance judgments in a calibrated condition and that geometric minification causes people to overestimate distances. Since this study shows that minification can impact wide FOV displays such as the Oculus, we discuss how it may be necessary to use calibration techniques which are more thorough than those described in this paper.
The use of third-person self-avatars has become common in non-immersive virtual environments, due in part to the advantages of seeing a full body while moving around in space. However, little research has investigated how third-person self-avatars can benefit task performance in immersive virtual environments (IVEs). This paper explores a procedure that may enhance user ability to act in an IVE from the point of view of a self-avatar projected in front of the viewer. Such abilities could be crucial in applications such as remote operator control, when a third person view is necessary or is the only view available. We conducted an experiment to assess the role of dynamic visual-motor and haptic feedback with a third-person avatar on a user's ability to point to remembered targets from the location of the avatar. The dynamic condition was compared within-subjects to a static condition in which the user's movements were not tracked with the avatar but the same task was required. Overall, participants were relatively good at pointing to targets from the location of the avatar. However, participants in the dynamic condition showed pointing performance that corresponded more closely to the avatar's location than was the case in the static condition. Subjective reports of bodily self-perception were also included and showed increased self-identification, presence, and agency in the dynamic compared to the static condition.
Relying exclusively on visual information to maintain orientation while traveling in virtual environments is challenging. However, it is currently unclear how much body-based information is required to produce a significant improvement in navigation performance. In our study participants explored unfamiliar virtual mazes using visual-only and physical rotations. Participants's ability to remain oriented was measured using a novel pointing task. While men consistently benefitted from using physical rotations versus visual-only rotations (lower absolute pointing errors, configuration errors, and absolute ego-orientation errors), women did not. We discuss design implications for locomotion interfaces in virtual environments. Our findings also suggest that investigating individual differences may help to resolve apparent conflicts in the literature regarding potential benefits of physical rotational cues for effective spatial orientation.
Research in visuo-motor coupling has shown that the matching of visual and proprioceptive information is important for calibrating movement. Many state-of-the art virtual reality (VR) systems, commonly known as immersive virtual environments (IVE), are created for training users in tasks that require accurate manual dexterity. Unfortunately, these systems can suffer from technical limitations that may force de-coupling of visual and proprioceptive information due to interference, latency, and tracking error. We present an empirical evaluation of how visually distorted movements affects users' reach to near field targets using a closed-loop physical reach task in an IVE. We specifically examined the recalibration of movements when the visually reached distance is scaled differently than the physically reached distance. Subjects were randomly assigned to one of three visual feedback conditions during which they reached to target while holding a tracked stylus: i) Condition 1 (-20% gain condition) in which the visual stylus appeared at 80% of the distance of the physical stylus, ii) Condition 2 (0% or no gain condition) in which the visual stylus was co-located with the physical stylus, and iii) Condition 3 (+20% gain condition) in which the visual stylus appeared at 120% of the distance of the physical stylus. In all conditions, there is evidence of visuo-motor calibration in that users' accuracy in physically reaching to the target locations improved over trials. During closed-loop physical reach responses, participants generally tended to physically reach farther in condition 1 and closer in condition 3 to the perceived location of the targets, as compared to condition 2 in which participants' physical reach was more accurate to the perceived location of the target.
Delivering appealing virtual characters conveying personality is becoming extremely important in the entertainment industry and beyond. A theory called the 'Uncanny Valley' has been used to describe the phenomenon that the appearance of a virtual character can contribute to negative/positive audience reactions to that character [Mori 1970]. Since the style used to render a character strongly changes the appearance, we investigate whether a difference in render style can indirectly influence audience reaction, which we measure based on perception of personality. Based on psychology research, we first scripted original character dialogues in order to convey a range of ten typical personality types. Then, a professional actor was recruited to act out these dialogues, while his face and body motion and audio were recorded. The performances were mapped onto a virtual character rendered in two styles that differ in appearance: an appealing cartoon style and unappealing ill style (Figure 1). In our experiment, participants were asked questions about the character's personality in order for us to test if the difference in render style causes differences in personality perception. Our results found an indirect effect of render style where the cartoon style was rated as having a more agreeable personality than the ill style. This result has implications for developers interested in creating appealing virtual humans, avoiding the 'Uncanny Valley' phenomenon.
Response lag in digital games is known to negatively affect a player's game experience. Particularly with networked multiplayer games, where lag is typically unavoidable, the impact of delays needs to be well understood so that its effects can be mitigated. In this paper, we investigate two aspects of lag independently: latency (constant delay) and jitter (varying delay). We evaluate how latency and jitter each affect a player's enjoyment, frustration, performance, and experience as well as the extent to which players can adjust to such delays after a few minutes of gameplay. We focus on a platform game where the player controls a virtual character through a world. We find that delays up to 300ms do not impact the players' experience as long as they are constant. When jitter was added to a delay of 200ms, however, the lag was noticed by participants more often, hindered players' ability to improve with practice, increased how often they failed to reach the goal of the game, and reduced the perceived motion quality of the character.
When two eyes view different objects or scenes at the same time, stable binocular single vision gives way to alternations in perception. Called binocular rivalry, this beguiling phenomenon discloses the existence of inhibitory interactions between neural representations associated with the conflicting visual inputs. One strategy for learning details about this neural competition is to ascertain to what extent it is susceptible to top-down influences such as expectations engendered by one's own activity [Maruya et al. 2007]. Thanks to advances in virtual reality (VR) technology, this strategy can be implemented in the laboratory, which we have done. Specifically we are measuring the extent to which self-generated motion (walking) biases perception during binocular rivalry between two competing visual optic flow fields (one specifying forward locomotion and the other specifying backward locomotion). Our progress to date is reported in this poster.
Modern media recommendation systems work based on the content of videos previously seen. While this may work for the habitual viewer, it is not always appropriate for many users whose tastes change based on their moods. A lot can be gathered from a person's facial expression while they are engaged in an activity, for example by observing someone as they view a film, we can likely tell if they enjoyed it or not. While a content providing company may not have the resources to employ human observers to watch audiences (or the inclination to do so, because it is not a very appealing idea), an automatic system that does this would be more feasible. Photoplethysmography is a field in which a person's physiological details can be gathered optically without requiring any physical contact with that person. By monitoring the intensity of light on a patch of the viewer's skin, we can accurately estimate their heart rate [Poh et al. 2011]. Our system combines heart rate measurement and facial expressions to quantify how much a user enjoys a video.
In many medical Augmented Reality (AR) applications, doctors want the ability to place a physical object, such as a needle or other medical device, at the depth indicated by a virtual object. Often, this object will be located behind an occluding surface, such as the patient's skin. In this poster we describe efforts to determine how accurately this can be done. In particular, we used a table-mounted AR haploscope to conduct two depth-matching experiments.
Figure 1 shows our AR haploscope, which was originally designed and built by Singh [2013]. We experimented with different methods of calibrating the haploscope, with the goal of being able to place virtual AR target objects in space that can be depth-matched with an accuracy that is as close as possible to physical test targets. We eventually developed a calibration method that uses two laser levels to generate vertical fans of light (Figure 1). We shoot these light fans through the AR haploscope optics, where it bounces off of the optical combiners and onto the image generators. We first set the fans parallel, in order to properly model an observer's inter-pupillary distance (IPD). Next, we cant the fans inwards, in order to model different vergence distances.
We validated this calibration with two AR depth-matching experiments. These experiments measured the effect of an occluding surface, and examined near-field reaching space distances of 38 to 48 cm. Experiment I replicated a similar experiment reported by Edwards et al [2004], and involved 10 observers in a within-subjects design. Figure 2 shows the results. Errors ranged from -5 to +3 mm when the occluder was present, -4 to +2 mm when the occluder was absent, and observers sometimes judged the virtual object to be closer to themselves after the presentation of the occluder. We can model the strong linear effect shown in Figure 2 by considering how the observers' IPD changes as they converge to different distances. Experiment II replicated Experiment I with three experienced psychophysical observers and additional replications. The results showed significant individual differences between the observers, on the order of 8 mm, and the individual results did not follow the averaged results from Experiment I.
Overall, these experiments suggest that IPD needs to be accurately modeled at each depth, and the change in IPD with vergence needs to be tracked. Our results suggest improved calibration methods, which we will validate with additional experiments. In addition, these experiments used a highly salient occluding surface, and we also intend to study the effect of occluder salience.
A necessary part of developing effective and realistic Virtual Reality (VR) simulations is emulating perceptual sensations that occur to humans in corresponding natural environments. VR users are often seated and unable to freely move through the virtual world, therefore necessitating other means to simulate and perceive self-movement. One approach to tackle this challenge is to induce embodied illusions of self-motion ("vection") in stationary observers, typically by providing moving visual stimuli on a wide field-of-view display. While numerous stimulus parameters have been shown to affect vection [see Riecke, 2011 for a review], there is little research investigating how the type of display itself might contribute. Here, we compared the vection-inducing potential as well as user experience and usability of two common displays for large-field stimulation: A passive stereoscopic projection setup and a 3D television with shutter glasses. Uncovering differences in vection between these displays would contribute to the theoretical understanding of vection and the potential relevance of different display properties, and guide the development of more immersive and effective VR setups. From a practical standpoint, this study helps to determine whether the more expensive projection system provides a benefit over the more accessible and affordable 3D television.
Redirected Walking is technique that leverages human perception characteristics to allow locomotion in virtual environments larger than the tracking area. Among the many redirection techniques, some strictly depend on the user's current position and orientation, while more recent algorithms also depend on the user's predicted behavior. This prediction serves as an input to a computationally expensive search to determine an optimal path. The search output is formulated as a series of gains to be applied at different stages along the path. An example prediction could be if a user is walking down a corridor, a natural prediction would be that the user will walk along a straight line down the corridor, and she will choose one of the possible directions with equal probability. In practice, deviations from the expected virtual path are inevitable, and as a result, the real world path traversed will differ from the original prediction. These deviations can not only force the search to select a less optimal path in the next iteration, but also in cases cause the users to go off bounds, requiring resets, causing a jarring experience for the user. We propose a method to account for these deviations by modifying the redirection gains per update frame, aiming to keep the user on the intended predicted physical path.
A previous study, Young et al. [2014], explored comparisons between cost-differentiated head-mounted displays for the evaluation of various perception and action tasks. One of the comparisons in that work was egocentric distance estimation comparison using an Nvis SX 60 head-mounted display (HMD) and an Oculus Rift. This poster abstract supplements that study results with an additional HMD for evaluation of egocentric distance estimation task, an Nvis SX111. We find that performance of the Nvis SX111 is comparable to that of the Nvis SX60 in Young et al. [2014], which was significantly more compressed than the Oculus Rift.
The results of multiple studies conducted in the real world have demonstrated that human observers are usually very accurate in judging egocentric distance (the distance from the observer to a target) up to about 25 meters if the target is located on a continuous ground surface. Sinai et al. [1998] found that the presence of a texture discontinuity on the ground surface leads to distance underestimation. Two experiments were conducted to investigate the effect of a discontinuous ground surface on distance perception in virtual environments (VEs). The first experiment was the replication of Experiment 1 from Wu et al. [2007]. The second experiment involved the manipulation of the position of the texture boundary.
We believe that every physical entity has a set of attributes that defines the degree of how creatively it can be used. For example, the affordances, abstractness and modular nature of Lego™ blocks allows them to take on different forms of expression that showcases varying levels of human creativity (e.g. building alphabets, creating homes etc.). Similarly, when a DIY designer uses bottles to build houses, it projects his creative skills, but at the same time it speaks about the materiality, affordances and embodiment of the bottle which lends itself readily to creative and novel interaction design efforts. This theory, that every entity has a set of attributes that allows them to lend themselves more or less readily to creative and novel interactive design explorations is what we call explorances and is the focus of our proposed work.
Communicating different emotions to individuals who are blind through tactile feedback is an active area of research. But most work is static in nature as different facial expressions of emotions are conveyed through a fixed set of facial features which may have meaning only to those who previously had sight. To individuals who are congenitally blind, these fixed sets of information are abstract, and little research reflects how this population can properly interpret and relate these fixed sets of signs as per their own nonvisual experience. Our goal is to develop a complete system that integrates feature extraction with haptic recognition. As emotion detection through image and video analysis often fails, we give emphasis on active exploration of facial expressions of one's self so that the movement of facial features and expressions becomes meaningful to users toward becoming proficient at interpreting facial expressions related to different emotions. We propose a dynamic haptic environment where an individual who is blind can perceive the reflection of his own facial movements to better understand and explore different facial expressions on the basis of the movement of his own facial features.
A common workflow for modern comic artists is to create an inked line drawing using traditional tools. Then the drawing is scanned as black and white line art and coloured digitally using an application such as Photoshop [Abel and Madden 2012]. The first step of digital colouring, called flatting, assigns symbolic colours to each region in an image which a colourist can use to apply final colours and other effects. There is little direct support for flatting in software, which results in a manual, labour intensive process for artists. Further, an object in a drawing may be split into multiple regions due to occlusions, requiring an artist to assign the same colour to each region of the object. This work describes a quantitative framework that allows occlusion cues, derived from vision research, to be computed and compared with each other. This framework is used to simplify comic flatting, allowing an artist to flat multiple regions in a single click. Simple gestural tools that can be used to add additional guidance to occlusion processing are also provided.
Haptic expression of emotions has received less attention than other modalities. Bonnet et al. [2011] combine visio-haptic modalities to improve the recognition and discrimination of some emotions. However, few works investigated how these modalities complement each other. For instance, Bickmore et al. [2010] highlight some non-significant tendencies of complementarity between the visual and haptic modalities.
A perceptually uniform gloss space was defined in Pellacini et al. [2000] as a reparameterization of the Ward BRDF model. Two dimensions, distinctness-of-image gloss and contrast gloss, were found to be enough to describe gloss perception, and the CIELAB lightness function was added to represent the diffuse component.