Passive optical motion capture is one of the predominant technologies for capturing high fidelity human skeletal motion, and is a workhorse in a large number of areas such as bio-mechanics, film and video games. While most state-of-the-art systems can automatically identify and track markers on the larger parts of the human body, the markers attached to fingers provide unique challenges and usually require extensive manual cleanup. In this work we present a robust online method for identification and tracking of passive motion capture markers attached to the fingers of the hands. The method is especially suited for large capture volumes and sparse marker sets of 3 to 10 markers per hand. Once trained, our system can automatically initialize and track the markers, and the subject may exit and enter the capture volume at will. By using multiple assignment hypotheses and soft decisions, it can robustly recover from a difficult situation with many simultaneous occlusions and false observations (ghost markers). We evaluate the method on a collection of sparse marker sets commonly used in industry and in the research community. We also compare the results with two of the most widely used motion capture platforms: Motion Analysis Cortex and Vicon Blade. The results show that our method is better at attaining correct marker labels and is especially beneficial for real-time applications.
We present a graphical authoring tool for creating complex narratives in large, populated areas with crowds of virtual humans. With an intuitive drag-and-drop interface, our system enables an untrained author to assemble story arcs in terms of narrative events that seamlessly control either principal characters or choreographed heterogeneous crowds within the same conceptual structure. Smart Crowds allow groups of characters to be dynamically assembled and scheduled with ambient activities, while also permitting individual characters to be selected from the crowd and featured more prominently as an individual in a story with more sophisticated behavior. Our system runs in real-time at interactive rates with no pause or costly pre-computation step between creating a story and simulating it, making this approach ideal for storyboarding or pre-visualization of narrative sequences.
We present novel techniques for interactive editing the motion of an animated character by gesturing with a mobile device. Our approach is based on the notion that humans are generally able to convey motion using simple and abstract mappings from their own movement to that of an animated character. We first explore the feasibility of extracting robust sensor data with sufficiently rich features and low noise, such that the signal is predictably representative of a user's illustrative manipulation of the mobile device. In particular, we find that the linear velocity and device orientation computed from the motion sensor data are well-suited to the task of interactive character control. We show that these signals can be used for two different methods of interactively editing the locomotion of an animated human figure: discrete gestures for editing single motions, and continuous gestures for editing ongoing motions. We illustrate these techniques using various types of motion edits which affect jumps, strides and turning.
In computer graphics, simulated objects typically have two or three different representations, a visual mesh, a simulation mesh and a collection of convex shapes for collision handling. Using multiple representations requires skilled authoring and complicates object handing at run time. It can also produce visual artifacts such as a mismatch of collision behavior and visual appearance. The reason for using multiple representation has been performance restrictions in real time environments. However, for virtual worlds, we believe that the ultimate goal must be WYSIWYS -- what you see is what you simulate, what you can manipulate, what you can touch.
In this paper we present a new method that uses the same representation for simulation and collision handling and an almost identical visualization mesh. This representation is very close and directly derived from a visual input mesh which does not have to be prepared for simulation but can be non-manifold, non-conforming and self-intersecting.
In this paper, we introduce an optimization-based approach for creating a simulation-driven game designed to teach players about nano-technology. We focus our effort on Metal-Organic Frameworks (MOFs), a new class of nano-materials used for a wide variety of safety, filtering, and manufacturing tasks. In particular, we design a tool to allow users to create their own unique MOFs, introduce a new form of interactive simulation of MOF structures, and validate our new simulation model with existing offline chemical simulation techniques. We combine our new design tool and simulation technique into a simple game called Master of Filtering, designed to let players design and test brand new MOFs within an interactive game setting. Following an optimization-driven approach, we are able to generate an initiative scoring mechanism such as higher scores in the game are positively correlated with better key chemical properties of the user-designed MOFs.
We address the long-standing problem of iteration count and time step dependent constraint stiffness in position-based dynamics (PBD). We introduce a simple extension to PBD that allows it to accurately and efficiently simulate arbitrary elastic and dissipative energy potentials in an implicit manner. In addition, our method provides constraint force estimates, making it applicable to a wider range of applications, such those requiring haptic user-feedback. We compare our algorithm to more expensive non-linear solvers and find it produces visually similar results while maintaining the simplicity and robustness of the PBD method.
We present a novel algorithm to extract the rotational part of an arbitrary 3 X 3 matrix. This problem lies at the core of two popular simulation methods in computer graphics, the co-rotational Finite Element Method and Shape Matching techniques. In contrast to the traditional method based on polar decomposition, degenerate configurations and inversions are handled robustly and do not have to be treated in a special way. In addition, our method can be implemented with only a few lines of code without branches which makes it particularly well suited for GPU-based applications. We demonstrate the robustness, coherence and efficiency of our method by comparing it to stabilized polar decomposition in several simulation scenarios.
We propose an interactive sculpting system for seamlessly editing pre-computed animations of liquid, without the need for any resimulation. The input is a sequence of meshes without correspondences representing the liquid surface over time. Our method enables the efficient selection of consistent space-time parts of this animation, such as moving waves or droplets, which we call space-time features. Once selected, a feature can be copied, edited, or duplicated and then pasted back anywhere in space and time in the same or in another liquid animation sequence. Our method circumvents tedious user interactions by automatically computing the spatial and temporal ranges of the selected feature. We also provide space-time shape editing tools for non-uniform scaling, rotation, trajectory changes, and temporal editing to locally speed up or slow down motion. Using our tools, the user can edit and progressively refine any input simulation result, possibly using a library of pre-computed space-time features extracted from other animations. In contrast to the trial-and-error loop usually required to edit animation results through the tuning of indirect simulation parameters, our method gives the user full control over the edited space-time behaviors.
For real-time applications, blendshape animations are usually calculated on the CPU, which are slow to animate, and are therefore generally limited to only the closest level of detail for a small number of characters in a scene. In this paper, we present a GPU based blendshape animation technique. By storing the blendshape model (including animations) on the GPU, we are able to attain significant speed improvements over CPU-based animation. We also find that by using compute shaders to decouple rendering and animation we can improve performance when rendering a crowd animation. Further gains are also made possible by using a smaller subset of blendshape expressions, at the cost of expressiveness. However, the quality impact can be minimised by selecting this subset carefully. We discuss a number of potential metrics to automate this selection.
We present a new method for particle based fluid simulation, using a combination of Projective Dynamics and Smoothed Particle Hydrodynamics (SPH). The Projective Dynamics framework allows the fast simulation of a wide range of constraints. It offers great stability through its implicit time integration scheme and is parallelizable in large parts, so that it can make use of modern multi core CPUs. Yet existing work only uses Projective Dynamics to simulate various kinds of soft bodies and cloth. We are the first ones to incorporate fluid simulation into the Projective Dynamics framework. Our proposed fluid constraints are derived from SPH and seamlessly integrate into the existing method. Furthermore, we adapt the solver to handle the constantly changing constraints that appear in fluid simulation. We employ a highly parallel matrix-free conjugate gradient solver, and thus do not require expensive matrix factorizations.
We present a system for creating hair model that matches a user's hairstyle. The model consists of guide hair strands and is ready to be used in a real-time hair simulator. Our goal differs from most previous work which aims to create realistic high resolution hair for off-line applications or create mesh of exterior of hair volume for image manipulation. Our primary aim is for user to be able to put his/her hairstyle into game or other real-time applications. By taking photos in 8 views of the user's head using a smart phone camera and segmenting images with some easy to use tools, the player will obtain his/her own hair model in NVIDIA's HairWorks, which is a hair simulator used in many games. We show a number of results demonstrating the capabilities of our system in this paper.
A navigation mesh is a representation of a 2D or 3D virtual environment that enables path planning and crowd simulation for walking characters. Various state-of-the-art navigation meshes exist, but there is no standardized way of evaluating or comparing them. Each implementation is in a different state of maturity, has been tested on different hardware, uses different example environments, and may have been designed with a different application in mind.
In this paper, we conduct the first comparative study of navigation meshes. First, we give general definitions of 2D and 3D environments and navigation meshes. Second, we propose theoretical properties by which navigation meshes can be classified. Third, we introduce metrics by which the quality of a navigation mesh implementation can be measured objectively. Finally, we use these metrics to compare various state-of-the-art navigation meshes in a range of 2D and 3D environments.
We expect that this work will set a new standard for the evaluation of navigation meshes, that it will help developers choose an appropriate navigation mesh for their application, and that it will steer future research on navigation meshes in interesting directions.
A multi-layered environment (MLE) [van Toll et al. 2011] is a representation of the walkable environment (WE) in a 3D virtual environment that comprises a set of two-dimensional layers together with the locations where the different layers touch, which are called connections. This representation can be used for crowd simulations, e.g. to determine evacuation times in complex buildings, or for finding the shortest routes. The running times of these algorithms depend on the number of connections.
Finding an environment with the smallest number of connections, is an NP-Hard problem [Hillebrand et al. 2016]. Our first algorithm tackles this problem by using an integer linear program which is capable of finding the best possible solution, but naturally takes a long time. Hence, we provide two heuristics that search for MLEs with a low number of connections. One algorithm uses local search to gradually improve the found solution. The other one, called the height heuristic, is very fast and gives good solutions in practical environments.
Character navigation in virtual environments is traditionally approached by first planning a path in the free portion of the environment and then employing steering behaviors that reactively adapt to constraints encountered during path following. Unfortunately a shortcoming of this approach is that the path planning stage does not take into account locomotion behavior choices and trade-offs during the path computation. We propose an approach for incorporating the behavioral capabilities of the character in the path planning stage. The produced paths address trade-offs related to path length and navigation behavior for handling narrow passages. The proposed behavioral path planner uses a combination of clearance-based path planning and character geometry collision detection with the 3D environment in order to achieve results suitable for interactive navigation in cluttered environments. The resulted paths address the natural behavior of preferring paths with enough clearance for regular walking when possible, while also considering shorter paths which need a combination of collision avoidance and lateral steps to be executed.
This paper presents a mirror-like augmented reality (AR) system to display the internal anatomy of a user. Using a single Microsoft V2.0 Kinect, we animate in real-time a user-specific internal anatomy according to the user's motion and we superimpose it onto the user's color map, as shown in Fig.1.e. The user can visualize his anatomy moving as if he was able to look inside his own body in real-time.
A new calibration procedure to set up and attach a user-specific anatomy to the Kinect body tracking skeleton is introduced. At calibration time, the bone lengths are estimated using a set of poses. By using Kinect data as input, the practical limitation of skin correspondance in prior work is overcome. The generic 3D anatomical model is attached to the internal anatomy registration skeleton, and warped on the depth image using a novel elastic deformer, subject to a closest-point registration force and anatomical constraints. The noise in Kinect outputs precludes any realistic human display. Therefore, a novel filter to reconstruct plausible motions based on fixed length bones as well as realistic angular degrees of freedom (DOFs) and limits is introduced to enforce anatomical plausibility. Anatomical constraints applied to the Kinect body tracking skeleton joints are used to maximize the physical plausibility of the anatomy motion, while minimizing the distance to the raw data. At run-time, a simulation loop is used to attract the bones towards the raw data, and skinning shaders efficiently drag the resulting anatomy to the user's tracked motion.
Our user-specific internal anatomy model is validated by comparing the skeleton with segmented MRI images. A user study is established to evaluate the believability of the animated anatomy.
This paper addresses the problem of recognizing human actions captured with depth cameras. Human action recognition is a challenging task as the articulated action data is high dimensional in both spatial and temporal domains. An effective approach to handle this complexity is to divide human body into different body parts according to human skeletal joint positions, and performs recognition based on these part-based feature descriptors. Since different types of features could share some similar hidden structures, and different actions may be well characterized by properties common to all features (sharable structure) and those specific to a feature (specific structure), we propose a joint group sparse regression-based learning method to model each action. Our method can mine the sharable and specific structures among its part-based multiple features meanwhile imposing the importance of these part-based feature structures by joint group sparse regularization, in favor of discriminative part-based feature structure selection. To represent the dynamics and appearance of the human body parts, we employ part-based multiple features extracted from skeleton and depth data respectively. Then, using the group sparse regularization techniques, we have derived an algorithm for mining the key part-based features in the proposed learning framework. The resulting features derived from the learnt weight matrices are more discriminative for multi-task classification. Through extensive experiments on three public datasets, we demonstrate that our approach outperforms existing methods.
We present a framework capable of automatically planning and animating dynamic multi-contact jumping trajectories for arbitrary legged characters and environments. Our contribution is the introduction of an approximate yet efficient multi-contact impulse formulation, used at the motion planning phase. We combine this criterion with a heuristic for estimating the contact locations without explicitly computing them, which breaks the combinatorial aspect of contact generation. This low dimensional formulation is used to efficiently extend an existing ballistic motion planner to legged characters. We then propose a procedural method to animate the planned trajectory. Our approach thus results from a trade-off between accuracy of the law of physics and computational efficiency. We empirically justify this approach by demonstrating a large variety of behaviors for four legged characters in five scenarios.
Many of the existing data-driven human motion synthesis methods rely on statistical modeling of motion capture data. Motion capture data is a high dimensional time-series data, therefore, it is usually required to construct an expressive latent space through dimensionality reduction methods in order to reduce the computational costs of modeling such high-dimensional data and avoid the curse of dimensionality. However, different features of the motion data have intrinsically different scales and as a result we need to find a strategy to scale the features of motion data during dimensionality reduction. In this work, we propose a novel method called Scaled Functional Principal Component Analysis (SFPCA) that is able to scale the features of motion data for FPCA through a general optimization framework. Our approach can automatically adapt to different parameterizations of motion. The experimental results demonstrate that our approach performs better than standard linear and nonlinear dimensionality reduction approaches in keeping the most informative motion features according to human vision judgment.
Motion analysis and visualization are crucial in sports science for sports training and performance evaluation. While primitive computational methods have been proposed for simple analysis such as postures and movements, few can evaluate the high-level quality of sports players such as their skill levels and strategies. We propose a visualization tool to help visualizing boxers' motions and assess their skill levels. Our system automatically builds a graph-based representation from motion capture data and reduces the dimension of the graph onto a 3D space so that it can be easily visualized and understood. In particular, our system allows easy understanding of the boxer's boxing behaviours, preferred actions, potential strength and weakness. We demonstrate the effectiveness of our system on different boxers' motions. Our system not only serves as a tool for visualization, it also provides intuitive motion analysis that can be further used beyond sports science.
Only time will tell if motion-controlled systems are the future of gaming and other industries and if mid-air gestural input will eventually offer a more intuitive way to play games and interact with computers. Whatever the eventual outcome, it is necessary to assess the ergonomics of mid-air input metaphors and propose design guidelines which will guarantee their safe use in the long run. This paper presents an ergonomic study showing how to mitigate the muscular strain induced by prolonged mid-air gesture interaction by encouraging postural shifts during the interaction. A quantitative and qualitative user study involving 30 subjects validates the setup. The simulated musculo-skeletal load values support our hypothesis and show a statistically significant 19% decrease in average muscle loads on the shoulder, neck, and back area in the modified condition compared to the baseline.
We present a new behavioural animation method that combines motion graphs for synthesis of animation and mind maps as behaviour controllers for the choice of motions, significantly reducing the cost of animating secondary characters. Motion graphs are created for each facial region from the analysis of a motion database, while synthesis occurs by minimizing the path distance that connects automatically chosen nodes. A Mind map is a hierarchical graph built on top of the motion graphs, where the user visually chooses how a stimulus affects the character's mood, which in turn will trigger motion synthesis. Different personality traits add more emotional complexity to the chosen reactions. Combining behaviour simulation and procedural animation leads to more emphatic and autonomous characters that react differently in each interaction, shifting the task of animating a character to one of defining its behaviour.
Recent advances in scanning technology have enabled the widespread capture of 3D character models based on human subjects. Intuition suggests that, with these new capabilities to create avatars that look like their users, every player should have his or her own avatar to play video games or simulations. We explicitly test the impact of having one's own avatar (vs. a yoked control avatar) in a simulation (i.e., maze running task with mines). We test the impact of avatar identity on both subjective (e.g., feeling connected and engaged, liking avatar's appearance, feeling upset when avatar's injured, enjoying the game) and behavioral variables (e.g., time to complete task, speed, number of mines triggered, riskiness of maze path chosen). Results indicate that having an avatar that looks like the user improves their subjective experience, but there is no significant effect on how users perform in the simulation.
Virtual Reality and immersive experiences, which allow players to share the same virtual environment as the characters of a virtual world, have gained more and more interest recently. In order to conceive these immersive virtual worlds, one of the challenges is to give to the characters that populate them the ability to express behaviors that can support the immersion. In this work, we propose a model capable of controlling and simulating a conversational group of social agents in an immersive environment. We describe this model which has been previously validated using a regular screen setting and we present a study for measuring whether users recognized the attitudes expressed by virtual agents through the realtime generated animations of nonverbal behavior in an immersive setting. Results mirrored those of the regular screen setting thus providing further insights for improving players experiences by integrating them into immersive simulated group conversations with characters that express different interpersonal attitudes.
This study explores presentation techniques for a chat-based virtual human that communicates engagingly with users. Interactions with the virtual human occur via a smartphone outside of the lab in natural settings. Our work compares the responses of users who interact with an animated virtual character as opposed to a real human video character capable of displaying realistic backchannel behaviors. An audio-only interface is compared additionally with the two types of characters. The findings of our study suggest that people are socially attracted to a 3D animated character that does not display backchannel behaviors more than a real human video character that presents realistic backchannel behaviors. People engage in conversation more by talking for a longer amount of time when they interact with a 3D animated virtual human that exhibits realistic backchannel behaviors, compared to communicating with a real human video character that does not display backchannel behaviors.
The aim of this study was to investigate the effect of non-player character (NPC) appearance in games - specifically, how a character's appearance affects a player's performance, and their perception of the game. We ran an experiment where participants played a mobile game on a 9.7" tablet, the goal of which was to kill all of the enemy characters in the game. The visual appearance of the enemy characters varied in the level of aggression in appearance. One crowd had small, green characters with no weapons and minimal armour. The other crowd had red characters that were large and wearing both weapons and armour. Both crowds had the same level of aggression in behaviour, stance and animations, as well as the same intersection-test capsule to ensure the gameplay was balanced and both crowds were equally difficult to kill. As expected, the second crowd was perceived as highly aggressive and less friendly than the first crowd. We found no differences in the enjoyment levels of the game but interestingly, we found that the visual appearance of the crowd had a direct effect on the player performance in combat. In contrast to our hypothesis, players performed worse (i.e., were killed more often) when in combat against the characters with the less-aggressive appearance.
Animated digital characters play an important role in virtual experiences. In this work, we utilize data from a large scale user study as training data for a generative model for producing a variety of animated smiles. Our method involves a four stage process that samples a variety of facial expressions, and annotates them with perceived happiness from the user study. The expressions are then transformed into a standardized space and used by a non-parametric classifier to predict happiness of new smiles.
This study investigates whether there are significant differences in the gestures made by gamers and non-gamers whilst playing commercial games that employ gesture inputs. Specifically, the study focuses on testing a prototype of multimodal capture tool that we used to obtain real-time audio, video and skeletal gesture data. Additionally, we developed an experimental design framework for the acquisition of spatio-temporal gesture data and analysed the vector magnitude of a gesture to compare the relative displacement of each participant whilst playing a game.
Planar shape morphing methods offer solutions to blend two shapes with different silhouettes. A naive method to solve the shape morphing problem is to linearly interpolate the coordinates of each corresponding vertex pair between the source and the target polygons. However, simple linear interpolation sometimes creates intermediate polygons that contain self-intersection, resulting in geometrically incorrect transformations.
We introduce Confluence, a web-based social physics game framework and development tools. It seeks to combine the powers of graphical game engines with physics and animation, and social simulation of social physics engines, with convenient tools like social network visualization, strategy analysis and in-game social rule authoring. We evaluate it in developing a game with character personalities and goals that are influenced by social norms.
The assessment of the quality of body movements in real-time is of utmost importance in exergames, that is digital games controlled by body movements, that are designed for the elderly population. In consideration of the fact that among elderly people the number of injuries and fatalities (caused by fall incidences) is increasing, the ultimate goal of exergames is not only to provide fun, entertainment and exercise but also to improve postural control and balance. It is known that improving balance can reduce the number of falls among the elderly population. Real-time assessment of body movements during exergaming could be used to adapt the difficulty of the game as a function of the quality of the movements of the player, as well as to provide immediate feedback. This in turn could increase motivation to play and therefore increase the effectiveness of exergames as tools to improve balance. In a previous study we identified curvature and speed of motion trajectories as promising metrics for balance quantification using bi-dimensional force plate data [Soancatl et al. 2016]. The main aims of this study are (1) to investigate whether curvature and speed could be used to quantify balance using three-dimensional trajectories derived from whole body movements as recorded by Kinect, and (2) to identify which body parts provide the most insight into balance. We consider measures to be suitable for balance quantification if they can differentiate between two groups (older and younger participants). This categorization can provide insight into balance control, as in general it is known that younger adults (here: younger than 60 years) have better postural control than older adults.
Recognizing and captioning the occurrence of virtual episodes can add descriptive capabilities to games and simulations featuring human-like agents. We introduce a captioning heuristic for multi-character animations. The input of our algorithm consists of the traces (lowest-level procedure names) of each character's animation, such as walking, running, talking, reaching, etc. To identify virtual episodes from these traces we pre-authored episode-centric trees called Core Components Trees (CCT). We compute a vagueness measure over each possible match between episode CCTs and the given trace inputs using fuzzy logic, and derive the best match to describe and caption the perceived episodes.
KINDLING sets forth to make an engaging and intuitive platform for fire evacuation analytics. By modeling the problem as a game, real or imaginary buildings can be created as game levels with a level editor, and attacked by other players, providing a highly advanced worst case scenario. Attacking players use a limited number of fires to attempt the most efficient destruction of the simulated crowd in a level designed by another player. Scores are recorded and displayed much like any other video game. However, each attempt at the level collects data for important evacuation metrics. With this data represented as a heat-map overlay on the level, the creator can improve upon the level by adding precautionary measures to dangerous locations; such as an extinguisher to clear the way to an exit, or a fire door to hold back the spread of the fire. In this way, KINDLING provides both meaningful feedback on the evacuation safety of a real or virtual space, and evolving dynamic game-play between the level creator and other players.