We present CoLux, a novel system for measuring micro 3D motion of multiple independently moving objects at macroscopic standoff distances. CoLux is based on speckle imaging, where the scene is illuminated with a coherent light source and imaged with a camera. Coherent light, on interacting with optically rough surfaces, creates a high-frequency speckle pattern in the captured images. The motion of objects results in movement of speckle, which can be measured to estimate the object motion. Speckle imaging is widely used for micro-motion estimation in several applications, including industrial inspection, scientific imaging, and user interfaces (e.g., optical mice). However, current speckle imaging methods are largely limited to measuring 2D motion (parallel to the sensor image plane) of a single rigid object. We develop a novel theoretical model for speckle movement due to multi-object motion, and present a simple technique based on global scale-space speckle motion analysis for measuring small (5--50 microns) compound motion of multiple objects, along all three axes. Using these tools, we develop a method for measuring 3D micro-motion histograms of multiple independently moving objects, without tracking the individual motion trajectories. In order to demonstrate the capabilities of CoLux, we develop a hardware prototype and a proof-of-concept subtle hand gesture recognition system with a broad range of potential applications in user interfaces and interactive computer graphics.
Light fields are a powerful concept in computational imaging and a mainstay in image-based rendering; however, so far their acquisition required either carefully designed and calibrated optical systems (micro-lens arrays), or multi-camera/multi-shot settings. Here, we show that fully calibrated light field data can be obtained from a single ordinary photograph taken through a partially wetted window. Each drop of water produces a distorted view on the scene, and the challenge of recovering the unknown mapping from pixel coordinates to refracted rays in space is a severely underconstrained problem. The key idea behind our solution is to combine ray tracing and low-level image analysis techniques (extraction of 2D drop contours and locations of scene features seen through drops) with state-of-the-art drop shape simulation and an iterative refinement scheme to enforce photo-consistency across features that are seen in multiple views. This novel approach not only recovers a dense pixel-to-ray mapping, but also the refractive geometry through which the scene is observed, to high accuracy. We therefore anticipate that our inherently self-calibrating scheme might also find applications in other fields, for instance in materials science where the wetting properties of liquids on surfaces are investigated.
Despite significant recent progress, dense, time-resolved imaging of complex, non-stationary 3D flow velocities remains an elusive goal. In this work we tackle this problem by extending an established 2D method, Particle Imaging Velocimetry, to three dimensions by encoding depth into color. The encoding is achieved by illuminating the flow volume with a continuum of light planes (a "rainbow"), such that each depth corresponds to a specific wavelength of light. A diffractive component in the camera optics ensures that all planes are in focus simultaneously. With this setup, a single color camera is sufficient for tracking 3D trajectories of particles by combining 2D spatial and 1D color information.
For reconstruction, we derive an image formation model for recovering stationary 3D particle positions. 3D velocity estimation is achieved with a variant of 3D optical flow that accounts for both physical constraints as well as the rainbow image formation model. We evaluate our method with both simulations and an experimental prototype setup.
Consumer time-of-flight depth cameras like Kinect and PMD are cheap, compact and produce video-rate depth maps in short-range applications. In this paper we apply energy-efficient epipolar imaging to the ToF domain to significantly expand the versatility of these sensors: we demonstrate live 3D imaging at over 15 m range outdoors in bright sunlight; robustness to global transport effects such as specular and diffuse inter-reflections---the first live demonstration for this ToF technology; interference-free 3D imaging in the presence of many ToF sensors, even when they are all operating at the same optical wavelength and modulation frequency; and blur-free, distortion-free 3D video in the presence of severe camera shake. We believe these achievements can make such cheap ToF devices broadly applicable in consumer and robotics domains.
We present a scalable approach for the optimization of flip-preventing energies in the general context of simplicial mappings and specifically for mesh parameterization. Our iterative minimization is based on the observation that many distortion energies can be optimized indirectly by minimizing a family of simpler proxy energies. Minimization of these proxies is a natural extension of the local/global minimization of the ARAP energy. Our algorithm is simple to implement and scales to datasets with millions of faces. We demonstrate our approach for the computation of maps that minimize a conformal or isometric distortion energy, both in two and three dimensions. In addition to mesh parameterization, we show that our algorithm can be applied to mesh deformation and mesh quality improvement.
Many algorithms on meshes require the minimization of composite objectives, i.e., energies that are compositions of simpler parts. Canonical examples include mesh parameterization and deformation. We propose a second order optimization approach that exploits this composite structure to efficiently converge to a local minimum.
Our main observation is that a convex-concave decomposition of the energy constituents is simple and readily available in many cases of practical relevance in graphics. We utilize such convex-concave decompositions to define a tight convex majorizer of the energy, which we employ as a convex second order approximation of the objective function. In contrast to existing approaches that largely use only local convexification, our method is able to take advantage of a more global view on the energy landscape. Our experiments on triangular meshes demonstrate that our approach outperforms the state of the art on standard problems in geometry processing, and potentially provide a unified framework for developing efficient geometric optimization algorithms.
We introduce an efficient computational method for generating dense and low distortion maps between two arbitrary surfaces of same genus. Instead of relying on semantic correspondences or surface parameterization, we directly optimize a variance-minimizing transport plan between two input surfaces that defines an as-conformal-as-possible inter-surface map satisfying a user-prescribed bound on area distortion. The transport plan is computed via two alternating convex optimizations, and is shown to minimize a generalized Dirichlet energy of both the map and its inverse. Computational efficiency is achieved through a coarse-to-fine approach in diffusion geometry, with Sinkhorn iterations modified to enforce bounded area distortion. The resulting inter-surface mapping algorithm applies to arbitrary shapes robustly, with little to no user interaction.
We introduce a new technique for real-time physically based volume sculpting of virtual elastic materials. Our formulation is based on the elastic response to localized force distributions associated with common modeling primitives such as grab, scale, twist, and pinch. The resulting brush-like displacements correspond to the regularization of fundamental solutions of linear elasticity in infinite 2D and 3D media. These deformations thus provide the realism and plausibility of volumetric elasticity, and the interactivity of closed-form analytical solutions. To finely control our elastic deformations, we also construct compound brushes with arbitrarily fast spatial decay. Furthermore, pointwise constraints can be imposed on the displacement field and its derivatives via a single linear solve. We demonstrate the versatility and efficiency of our method with multiple examples of volume sculpting and image editing.
Learning physics-based locomotion skills is a difficult problem, leading to solutions that typically exploit prior knowledge of various forms. In this paper we aim to learn a variety of environment-aware locomotion skills with a limited amount of prior knowledge. We adopt a two-level hierarchical control framework. First, low-level controllers are learned that operate at a fine timescale and which achieve robust walking gaits that satisfy stepping-target and style objectives. Second, high-level controllers are then learned which plan at the timescale of steps by invoking desired step targets for the low-level controller. The high-level controller makes decisions directly based on high-dimensional inputs, including terrain maps or other suitable representations of the surroundings. Both levels of the control policy are trained using deep reinforcement learning. Results are demonstrated on a simulated 3D biped. Low-level controllers are learned for a variety of motion styles and demonstrate robustness with respect to force-based disturbances, terrain variations, and style interpolation. High-level controllers are demonstrated that are capable of following trails through terrains, dribbling a soccer ball towards a target location, and navigating through static or dynamic obstacles.
We present a real-time character control mechanism using a novel neural network architecture called a Phase-Functioned Neural Network. In this network structure, the weights are computed via a cyclic function which uses the phase as an input. Along with the phase, our system takes as input user controls, the previous state of the character, the geometry of the scene, and automatically produces high quality motions that achieve the desired user control. The entire network is trained in an end-to-end fashion on a large dataset composed of locomotion such as walking, running, jumping, and climbing movements fitted into virtual environments. Our system can therefore automatically produce motions where the character adapts to different geometric environments such as walking and running over rough terrain, climbing over large rocks, jumping over obstacles, and crouching under low ceilings. Our network architecture produces higher quality results than time-series autoregressive models such as LSTMs as it deals explicitly with the latent variable of motion relating to the phase. Once trained, our system is also extremely fast and compact, requiring only milliseconds of execution time and a few megabytes of memory, even when trained on gigabytes of motion data. Our work is most appropriate for controlling characters in interactive scenes such as computer games and virtual reality systems.
Given a robust control system, physical simulation offers the potential for interactive human characters that move in realistic and responsive ways. In this article, we describe how to learn a scheduling scheme that reorders short control fragments as necessary at runtime to create a control system that can respond to disturbances and allows steering and other user interactions. These schedulers provide robust control of a wide range of highly dynamic behaviors, including walking on a ball, balancing on a bongo board, skateboarding, running, push-recovery, and breakdancing. We show that moderate-sized Q-networks can model the schedulers for these control tasks effectively and that those schedulers can be efficiently learned by the deep Q-learning algorithm.
This paper addresses the problem of offline path and movement planning for wall climbing humanoid agents. We focus on simulating bouldering, i.e. climbing short routes with diverse moves, although we also demonstrate our system on a longer wall. Our approach combines a graph-based high-level path planner with low-level sampling-based optimization of climbing moves. Although the planning problem is complex, our system produces plausible solutions to bouldering problems (short climbing routes) in less than a minute. We further utilize a k-shortest paths approach, which enables the system to discover alternative paths - in climbing, alternative strategies often exist, and what might be optimal for one climber could be impossible for others due to individual differences in strength, flexibility, and reach. We envision our system could be used, e.g. in learning a climbing strategy, or as a test and evaluation tool for climbing route designers. To the best of our knowledge, this is the first paper to solve and simulate rich humanoid wall climbing, where more than one limb can move at the same time, and limbs can also hang free for balance or use wall friction in addition to predefined holds.
We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.
This article proposes a real-time method that uses a single-view RGB-D input (a depth sensor integrated with a color camera) to simultaneously reconstruct a casual scene with a detailed geometry model, surface albedo, per-frame non-rigid motion, and per-frame low-frequency lighting, without requiring any template or motion priors. The key observation is that accurate scene motion can be used to integrate temporal information to recover the precise appearance, whereas the intrinsic appearance can help to establish true correspondence in the temporal domain to recover motion. Based on this observation, we first propose a shading-based scheme to leverage appearance information for motion estimation. Then, using the reconstructed motion, a volumetric albedo fusing scheme is proposed to complete and refine the intrinsic appearance of the scene by incorporating information from multiple frames. Since the two schemes are iteratively applied during recording, the reconstructed appearance and motion become increasingly more accurate. In addition to the reconstruction results, our experiments also show that additional applications can be achieved, such as relighting, albedo editing, and free-viewpoint rendering of a dynamic scene, since geometry, appearance, and motion are all reconstructed by our technique.
We present a convolutional neural network (CNN) based solution for modeling physically plausible spatially varying surface reflectance functions (SVBRDF) from a single photograph of a planar material sample under unknown natural illumination. Gathering a sufficiently large set of labeled training pairs consisting of photographs of SVBRDF samples and corresponding reflectance parameters, is a difficult and arduous process. To reduce the amount of required labeled training data, we propose to leverage the appearance information embedded in unlabeled images of spatially varying materials to self-augment the training process. Starting from an initial approximative network obtained from a small set of labeled training pairs, we estimate provisional model parameters for each unlabeled training exemplar. Given this provisional reflectance estimate, we then synthesize a novel temporary labeled training pair by rendering the exact corresponding image under a new lighting condition. After refining the network using these additional training samples, we re-estimate the provisional model parameters for the unlabeled data and repeat the self-augmentation process until convergence. We demonstrate the efficacy of the proposed network structure on spatially varying wood, metals, and plastics, as well as thoroughly validate the effectiveness of the self-augmentation training process.
The ultimate goal of many image-based modeling systems is to render photo-realistic novel views of a scene without visible artifacts. Existing evaluation metrics and benchmarks focus mainly on the geometric accuracy of the reconstructed model, which is, however, a poor predictor of visual accuracy. Furthermore, using only geometric accuracy by itself does not allow evaluating systems that either lack a geometric scene representation or utilize coarse proxy geometry. Examples include a light field and most image-based rendering systems. We propose a unified evaluation approach based on novel view prediction error that is able to analyze the visual quality of any method that can render novel views from input images. One key advantage of this approach is that it does not require ground truth geometry. This dramatically simplifies the creation of test datasets and benchmarks. It also allows us to evaluate the quality of an unknown scene during the acquisition and reconstruction process, which is useful for acquisition planning. We evaluate our approach on a range of methods, including standard geometry-plus-texture pipelines as well as image-based rendering techniques, compare it to existing geometry-based benchmarks, demonstrate its utility for a range of use cases, and present a new virtual rephotography-based benchmark for image-based modeling and rendering systems.
Capturing a picture that "tells a story" requires the ability to create the right composition. The two most important parameters controlling composition are the camera position and the focal length of the lens. The traditional paradigm is for a photographer to mentally visualize the desired picture, select the capture parameters to produce it, and finally take the photograph, thus committing to a particular composition. We propose to change this paradigm. To do this, we introduce computational zoom, a framework that allows a photographer to manipulate several aspects of composition in post-processing from a stack of pictures captured at different distances from the scene. We further define a multi-perspective camera model that can generate compositions that are not physically attainable, thus extending the photographer's control over factors such as the relative size of objects at different depths and the sense of depth of the picture. We show several applications and results of the proposed computational zoom framework.
Traditional cinematography has relied for over a century on a well-established set of editing rules, called continuity editing, to create a sense of situational continuity. Despite massive changes in visual content across cuts, viewers in general experience no trouble perceiving the discontinuous flow of information as a coherent set of events. However, Virtual Reality (VR) movies are intrinsically different from traditional movies in that the viewer controls the camera orientation at all times. As a consequence, common editing techniques that rely on camera orientations, zooms, etc., cannot be used. In this paper we investigate key relevant questions to understand how well traditional movie editing carries over to VR, such as: Does the perception of continuity hold across edit boundaries? Under which conditions? Does viewers' observational behavior change after the cuts? To do so, we rely on recent cognition studies and the event segmentation theory, which states that our brains segment continuous actions into a series of discrete, meaningful events. We first replicate one of these studies to assess whether the predictions of such theory can be applied to VR. We next gather gaze data from viewers watching VR videos containing different edits with varying parameters, and provide the first systematic analysis of viewers' behavior and the perception of continuity in VR. From this analysis we make a series of relevant findings; for instance, our data suggests that predictions from the cognitive event segmentation theory are useful guides for VR editing; that different types of edits are equally well understood in terms of continuity; and that spatial misalignments between regions of interest at the edit boundaries favor a more exploratory behavior even after viewers have fixated on a new region of interest. In addition, we propose a number of metrics to describe viewers' attentional behavior in VR. We believe the insights derived from our work can be useful as guidelines for VR content creation.
Parameter tweaking is a common task in various design scenarios. For example, in color enhancement of photographs, designers tweak multiple parameters such as "brightness" and "contrast" to obtain the best visual impression. Adjusting one parameter is easy; however, if there are multiple correlated parameters, the task becomes much more complex, requiring many trials and a large cognitive load. To address this problem, we present a novel extension of Bayesian optimization techniques, where the system decomposes the entire parameter tweaking task into a sequence of one-dimensional line search queries that are easy for human to perform by manipulating a single slider. In addition, we present a novel concept called crowd-powered visual design optimizer, which queries crowd workers, and provide a working implementation of this concept. Our single-slider manipulation microtask design for crowdsourcing accelerates the convergence of the optimization relative to existing comparison-based microtask designs. We applied our framework to two different design domains: photo color enhancement and material BRDF design, and thereby showed its applicability to various design domains.
A major goal of research on virtual humans is the animation of expressive characters that display distinct psychological attributes. Body motion is an effective way of portraying different personalities and differentiating characters. The purpose and contribution of this work is to describe a formal, broadly applicable, procedural, and empirically grounded association between personality and body motion and apply this association to modify a given virtual human body animation that can be represented by these formal concepts. Because the body movement of virtual characters may involve different choices of parameter sets depending on the context, situation, or application, formulating a link from personality to body motion requires an intermediate step to assist generalization. For this intermediate step, we refer to Laban Movement Analysis, which is a movement analysis technique for systematically describing and evaluating human motion. We have developed an expressive human motion generation system with the help of movement experts and conducted a user study to explore how the psychologically validated OCEAN personality factors were perceived in motions with various Laban parameters. We have then applied our findings to procedurally animate expressive characters with personality, and validated the generalizability of our approach across different models and animations via another perception study.
Applications such as virtual tutors, games, and natural interfaces increasingly require animated characters to take on social roles while interacting with humans. The effectiveness of these applications depends on our ability to control the social presence of characters, including their personality. Understanding how movement impacts the perception of personality allows us to generate characters more capable of fulfilling this social role. The two studies described herein focus on gesture as a key component of social communication and examine how a set of gesture edits, similar to the types of changes that occur during motion warping, impact the perceived personality of the character. Surprisingly, when based on thin-slice gesture data, people's judgments of character personality mainly fall in a 2D subspace rather than independently impacting the full set of traits in the standard Big Five model of personality. These two dimensions are plasticity, which includes extraversion and openness, and stability, which includes emotional stability, agreeableness, and conscientiousness. A set of motion properties is experimentally determined that impacts each of these two traits. We show that when these properties are systematically edited in new gesture sequences, we can independently influence the character's perceived stability and plasticity (and the corresponding Big Five traits), to generate distinctive personalities. We identify motion adjustments salient to each judgment and, in a series of perceptual studies, repeatedly generate four distinctly perceived personalities. The effects extend to novel gesture sequences and character meshes, and even largely persist in the presence of accompanying speech. This paper furthers our understanding of how gesture can be used to control the perception of personality and suggests both the potential and possible limits of motion editing approaches.
Gaze-contingent rendering shows promise in improving perceived quality by providing a better match between image quality and the human visual system requirements. For example, information about fixation allows rendering quality to be reduced in peripheral vision, and the additional resources can be used to improve the quality in the foveal region. Gaze-contingent rendering can also be used to compensate for certain limitations of display devices, such as reduced dynamic range or lack of accommodation cues. Despite this potential and the recent drop in the prices of eye trackers, the adoption of such solutions is hampered by system latency which leads to a mismatch between image quality and the actual gaze location. This is especially apparent during fast saccadic movements when the information about gaze location is significantly delayed, and the quality mismatch can be noticed. To address this problem, we suggest a new way of updating images in gaze-contingent rendering during saccades. Instead of rendering according to the current gaze position, our technique predicts where the saccade is likely to end and provides an image for the new fixation location as soon as the prediction is available. While the quality mismatch during the saccade remains unnoticed due to saccadic suppression, a correct image for the new fixation is provided before the fixation is established. This paper describes the derivation of a model for predicting saccade landing positions and demonstrates how it can be used in the context of gaze-contingent rendering to reduce the influence of system latency on the perceived quality. The technique is validated in a series of experiments for various combinations of display frame rate and eye-tracker sampling rate.
We introduce a method for co-locating style-defining elements over a set of 3D shapes. Our goal is to translate high-level style descriptions, such as “Ming” or “European” for furniture models, into explicit and localized regions over the geometric models that characterize each style. For each style, the set of style-defining elements is defined as the union of all the elements that are able to discriminate the style. Another property of the style-defining elements is that they are frequently occurring, reflecting shape characteristics that appear across multiple shapes of the same style. Given an input set of 3D shapes spanning multiple categories and styles, where the shapes are grouped according to their style labels, we perform a cross-category co-analysis of the shape set to learn and spatially locate a set of defining elements for each style. This is accomplished by first sampling a large number of candidate geometric elements and then iteratively applying feature selection to the candidates, to extract style-discriminating elements until no additional elements can be found. Thus, for each style label, we obtain sets of discriminative elements that together form the superset of defining elements for the style. We demonstrate that the co-location of style-defining elements allows us to solve problems such as style classification, and enables a variety of applications such as style-revealing view selection, style-aware sampling, and style-driven modeling for 3D shapes.
Many approaches to shape comparison and recognition start by establishing a shape correspondence. We "turn the table" and show that quality shape correspondences can be obtained by performing many shape recognition tasks. What is more, the method we develop computes a fine-grained, topology-varying part correspondence between two 3D shapes where the core evaluation mechanism only recognizes shapes globally. This is made possible by casting the part correspondence problem in a deformation-driven framework and relying on a data-driven "deformation energy" which rates visual similarity between deformed shapes and models from a shape repository. Our basic premise is that if a correspondence between two chairs (or airplanes, bicycles, etc.) is correct, then a reasonable deformation between the two chairs anchored on the correspondence ought to produce plausible, "chair-like" in-between shapes.
Given two 3D shapes belonging to the same category, we perform a top-down, hierarchical search for part correspondences. For a candidate correspondence at each level of the search hierarchy, we deform one input shape into the other, while respecting the correspondence, and rate the correspondence based on how well the resulting deformed shapes resemble other shapes from ShapeNet belonging to the same category as the inputs. The resemblance, i.e., plausibility, is measured by comparing multi-view depth images over category-specific features learned for the various shape categories. We demonstrate clear improvements over state-of-the-art approaches through tests covering extensive sets of man-made models with rich geometric and topological variations.
We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures hierarchical structures of man-made 3D objects of varying structural complexities despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. Finally, our structure synthesis framework is augmented by a second trained module that produces fine-grained part geometry, conditioned on global and local structural context, leading to a full generative pipeline for 3D shapes. We demonstrate that without supervision, our network learns meaningful structural hierarchies adhering to perceptual grouping principles, produces compact codes which enable applications such as shape classification and partial matching, and supports shape synthesis and interpolation with significant variations in topology and geometry.
While collections of parametric shapes are growing in size and use, little progress has been made on the fundamental problem of shape-based matching and retrieval for parametric shapes in a collection. The search space for such collections is both discrete (number of shapes) and continuous (parameter values). In this work, we propose representing this space using descriptors that have shown to be effective for single shape retrieval. While single shapes can be represented as points in a descriptor space, parametric shapes are mapped into larger continuous regions. For smooth descriptors, we can assume that these regions are bounded low-dimensional manifolds where the dimensionality is given by the number of shape parameters. We propose representing these manifolds with a set of primitives, namely, points and bounded tangent spaces. Our algorithm describes how to define these primitives and how to use them to construct a manifold approximation that allows accurate and fast retrieval. We perform an analysis based on curvature, boundary evaluation, and the allowed approximation error to select between primitive types. We show how to compute decision variables with no need for empirical parameter adjustments and discuss theoretical guarantees on retrieval accuracy. We validate our approach with experiments that use different types of descriptors on a collection of shapes from multiple categories.
Interactions play a key role in understanding objects and scenes for both virtual and real-world agents. We introduce a new general representation for proximal interactions among physical objects that is agnostic to the type of objects or interaction involved. The representation is based on tracking particles on one of the participating objects and then observing them with sensors appropriately placed in the interaction volume or on the interaction surfaces. We show how to factorize these interaction descriptors and project them into a particular participating object so as to obtain a new functional descriptor for that object, its interaction landscape, capturing its observed use in a spatiotemporal framework. Interaction landscapes are independent of the particular interaction and capture subtle dynamic effects in how objects move and behave when in functional use. Our method relates objects based on their function, establishes correspondences between shapes based on functional key points and regions, and retrieves peer and partner objects with respect to an interaction.
To date, material modeling in physically based computer animation has largely focused on mass and stiffness material properties. However, deformation dynamics is largely affected also by the damping properties. In this paper, we propose an interactive design method for nonlinear isotropic and anisotropic damping of complex three-dimensional solids simulated using the Finite Element Method (FEM). We first give a damping design method and interface whereby the user can set the damping properties so that motion aligned with each of a few chosen example deformations is damped by an independently prescribed amount, whereas the rest of the deformation space follows standard Rayleigh damping, or any viscous damping. Next, we demonstrate how to design nonlinear damping that depends on the magnitude of the deformation along each example deformation, by editing a single spline curve for each example deformation. Our user interface enables an art-directed and intuitive approach to controlling damping in solid simulations. We mathematically prove that our nonlinear anisotropic damping generalizes the frequency-dependent Caughey damping model, when starting from the Rayleigh damping. Finally, we give an inverse design method whereby the damping curve parameters can be inferred automatically from high-level user input, such as the amount of amplitude loss in one oscillation cycle along each of the chosen example deformations. To minimize numerical damping for implicit integration, we introduce an accurate and stable implicit integrator, which removes spurious high-frequency oscillations while only introducing a minimal amount of numerical damping. Our damping can generate effects not possible with previous methods, such as controllable nonlinear decaying envelopes whereby large deformations are damped faster or slower than small deformations, and damping anisotropic effects. We also fit our damping to videos of real-world objects undergoing large deformations, capturing their nonlinear and anisotropic damping dynamics.
Data driven models of human poses and soft-tissue deformations can produce very realistic results, but they only model the visible surface of the human body and cannot create skin deformation due to interactions with the environment. Physical simulations can generalize to external forces, but their parameters are difficult to control. In this paper, we present a layered volumetric human body model learned from data. Our model is composed of a data-driven inner layer and a physics-based external layer. The inner layer is driven with a volumetric statistical body model (VSMPL). The soft tissue layer consists of a tetrahedral mesh that is driven using the finite element method (FEM). Model parameters, namely the segmentation of the body into layers and the soft tissue elasticity, are learned directly from 4D registrations of humans exhibiting soft tissue deformations. The learned two layer model is a realistic full-body avatar that generalizes to novel motions and external forces. Experiments show that the resulting avatars produce realistic results on held out sequences and react to external forces. Moreover, the model supports the retargeting of physical properties from one avatar when they share the same topology.
In this paper we present a robust remeshing-free cutting algorithm on the basis of the eXtended Finite Element Method (XFEM) and fully implicit time integration. One of the most crucial points of the XFEM is that integrals over discontinuous polynomials have to be computed on subdomains of the polyhedral elements. Most existing approaches construct a cut-aligned auxiliary mesh for integration. In contrast, we propose a cutting algorithm that includes the construction of specialized quadrature rules for each dissected element without the requirement to explicitly represent the arising subdomains. Moreover, we solve the problem of ill-conditioned or even numerically singular solver matrices during time integration using a novel algorithm that constrains non-contributing degrees of freedom (DOFs) and introduce a preconditioner that efficiently reuses the constructed quadrature weights.
Our method is particularly suitable for fine structural cutting as it decouples the added number of DOFs from the cut's geometry and correctly preserves geometry and physical properties by accurate integration. Due to the implicit time integration these fine features can still be simulated robustly using large time steps. As opposed to this, the vast majority of existing approaches either use remeshing or element duplication. Remeshing based methods are able to correctly preserve physical quantities but strongly couple cut geometry and mesh resolution leading to an unnecessary large number of additional DOFs. Element duplication based approaches keep the number of additional DOFs small but fail at correct conservation of mass and stiffness properties. We verify consistency and robustness of our approach on simple and reproducible academic examples while stability and applicability are demonstrated in large scenarios with complex and fine structural cutting.
The diverse interactions between hair and liquid are complex and span multiple length scales, yet are central to the appearance of humans and animals in many situations. We therefore propose a novel multi-component simulation framework that treats many of the key physical mechanisms governing the dynamics of wet hair. The foundations of our approach are a discrete rod model for hair and a particle-in-cell model for fluids. To treat the thin layer of liquid that clings to the hair, we augment each hair strand with a height field representation. Our contribution is to develop the necessary physical and numerical models to evolve this new system and the interactions among its components. We develop a new reduced-dimensional liquid model to solve the motion of the liquid along the length of each hair, while accounting for its moving reference frame and influence on the hair dynamics. We derive a faithful model for surface tension-induced cohesion effects between adjacent hairs, based on the geometry of the liquid bridges that connect them. We adopt an empirically-validated drag model to treat the effects of coarse-scale interactions between hair and surrounding fluid, and propose new volume-conserving dripping and absorption strategies to transfer liquid between the reduced and particle-in-cell liquid representations. The synthesis of these techniques yields an effective wet hair simulator, which we use to animate hair flipping, an animal shaking itself dry, a spinning car wash roller brush dunked in liquid, and intricate hair coalescence effects, among several additional scenarios.
Many computer graphics applications use simpler yet faithful approximations of complex shapes to conduct reliably part of their computations. Some tasks, such as physical simulation, collision detection, occlusion queries or free-form deformation, require the simpler proxy to strictly enclose the input shape. While there are algorithms that can output such bounding proxies on simple input shapes, most of them fail at generating a proper coarse approximant on real-world complex shapes, which may contain multiple components and have a high genus. We advocate that, before reducing the number of primitives to describe a shape, one needs to regularize it while maintaining the strict enclosing property, to avoid any geometric aliasing that makes the decimation unreliable. Depending on the scale of the desired approximation, the topology of the shape itself may indeed have to be first simplified, to let the subsequent geometric optimization be free from topological locks.
We propose a new bounding shape approximation algorithm which takes as input an arbitrary surface mesh, with potentially complex multi-component structures, and generates automatically a bounding proxy which is tightened on the input and can match even the coarsest levels of approximation. To sustain the nonlinear approximation process that may eventually abstract both geometry and topology, we propose to use an intermediate regularized representation in the form of a shape closing, computed in real time using a new fast morphological framework designed for efficient parallel execution. Once the desired level of approximation is reached in the shape closing, a coarse, tight and bounding polygonization of the proxy geometry is extracted using an adaptive meshing scheme. Our underlying representation is both geometry- and topology-adaptive and can be optionally controlled accurately by a user, through sizing and orientation fields, yielding an intuitive brush metaphor within an interactive proxy design environment. We provide extensive experiments on various kinds of input meshes and illustrate the potential applications of our method in scenarios that benefit greatly from coarse, tight bounding substitutes to the actual high resolution geometry of the original 3D model, including freeform deformation, physical simulation and level of detail generation for rendering.
We convert a sequence of unstructured textured meshes into a mesh with incrementally changing connectivity and atlas parameterization. Like prior work on surface tracking, we seek temporally coherent mesh connectivity to enable efficient representation of surface geometry and texture. Like recent work on evolving meshes, we pursue local remeshing to permit tracking over long sequences containing significant deformations or topological changes. Our main contribution is to show that both goals are realizable within a common framework that simultaneously evolves both the set of mesh triangles and the parametric map. Sparsifying the remeshing operations allows the formation of large spatiotemporal texture charts. These charts are packed as prisms into a 3D atlas for a texture video. Reducing tracking drift using mesh-based optical flow helps improve compression of the resulting video stream.
We present FlowRep, an algorithm for extracting descriptive compact 3D curve networks from meshes of free-form man-made shapes. We infer the desired compact curve network from complex 3D geometries by using a series of insights derived from perception, computer graphics, and design literature. These sources suggest that visually descriptive networks are cycle-descriptive, i.e their cycles unambiguously describe the geometry of the surface patches they surround. They also indicate that such networks are designed to be projectable, or easy to envision when observed from a static general viewpoint; in other words, 2D projections of the network should be strongly indicative of its 3D geometry. Research suggests that both properties are best achieved by using networks dominated by flowlines, surface curves aligned with principal curvature directions across anisotropic regions and strategically extended across sharp-features and isotropic areas. Our algorithm leverages these observation in the construction of a compact descriptive curve network. Starting with a curvature aligned quad dominant mesh we first extract sequences of mesh edges that form long, well-shaped and reliable flowlines by leveraging directional similarity between nearby meaningful flowline directions We then use a compact subset of the extracted flowlines and the model's sharp-feature, or trim, curves to form a sparse, projectable network which describes the underlying surface. We validate our method by demonstrating a range of networks computed from diverse inputs, using them for surface reconstruction, and showing extensive comparisons with prior work and artist generated networks.
We propose a novel way to capture and characterize distortion between pairs of shapes by extending the recently proposed framework of shape differences built on functional maps. We modify the original definition of shape differences slightly and prove that after this change, the discrete metric is fully encoded in two shape difference operators and can be recovered by solving two linear systems of equations. Then we introduce an extension of the shape difference operators using offset surfaces to capture extrinsic or embedding-dependent distortion, complementing the purely intrinsic nature of the original shape differences. Finally, we demonstrate that a set of four operators is complete, capturing intrinsic and extrinsic structure and fully encoding a shape up to rigid motion in both discrete and continuous settings. We highlight the usefulness of our constructions by showing the complementary nature of our extrinsic shape differences in capturing distortion ignored by previous approaches. We additionally provide examples where we recover local shape structure from the shape difference operators, suggesting shape editing and analysis tools based on manipulating shape differences.
Color palettes are widely used by artists to define colors of artworks and explore color designs. In general, artists select the colors of a palette by following a set of rules, e.g. contrast or relative luminance. Existing interactive palette exploration tools explore palette spaces following limited constraints defined as geometric configurations in color space e.g. harmony rules on the color wheel. Palette search algorithms sample palettes from color relations learned from an input dataset, however they cannot provide interactive user edits and palette refinement.
We introduce in this work a new versatile formulation enabling the creation of constraint-based interactive palette exploration systems. Our technical contribution is a graph-based palette representation, from which we define palette exploration as a minimization problem that can be solved efficiently and provide real-time feedback. Based on our formulation, we introduce two interactive palette exploration strategies: constrained palette exploration, and for the first time, constrained palette interpolation. We demonstrate the performances of our approach on various application cases and evaluate how it helps users finding trade-offs between concurrent constraints.
We present Playful Palette, a color picker interface for digital paint programs that derives intuition from oil paint and watercolor palettes, but extends them with digital features. A Playful Palette is a set of blobs of color that blend together to create gradients and gamuts. They can be directly manipulated to explore arrangements and harmonies. All edits are non-destructive, and an infinite history allows previous palettes to be revisited and modified, recoloring the painting. The Playful Palette design is motivated by a pilot study of how artists use paint palettes, and we evaluate the final design with a set of traditional and digital media painters to demonstrate that Playful Palette is effective both at enabling artists' color tasks, and at amplifying their creativity.
In digital image editing software, layers organize images. However, layers are often not explicitly represented in the final image, and may never have existed for a scanned physical painting or a photograph. We propose a technique to decompose an image into layers. In our decomposition, each layer represents a single-color coat of paint applied with varying opacity. Our decomposition is based on the image’s RGB-space geometry. In RGB-space, the linear nature of the standard Porter-Duff [1984] “over” pixel compositing operation implies a geometric structure. The vertices of the convex hull of image pixels in RGB-space correspond to a palette of paint colors. These colors may be “hidden” and inaccessible to algorithms based on clustering visible colors. For our layer decomposition, users choose the palette size (degree of simplification to perform on the convex hull), as well as a layer order for the paint colors (vertices). We then solve a constrained optimization problem to find translucent, spatially coherent opacity for each layer, such that the composition of the layers reproduces the original image. We demonstrate the utility of the resulting decompositions for recoloring (global and local) and object insertion. Our layers can be interpreted as generalized barycentric coordinates; we compare to these and other recoloring approaches.
Due to the widespread use of compositing in contemporary feature films, green-screen keying has become an essential part of postproduction workflows. To comply with the ever-increasing quality requirements of the industry, specialized compositing artists spend countless hours using multiple commercial software tools, while eventually having to resort to manual painting because of the many shortcomings of these tools. Due to the sheer amount of manual labor involved in the process, new green-screen keying approaches that produce better keying results with less user interaction are welcome additions to the compositing artist’s arsenal. We found that—contrary to the common belief in the research community—production-quality green-screen keying is still an unresolved problem with its unique challenges. In this article, we propose a novel green-screen keying method utilizing a new energy minimization-based color unmixing algorithm. We present comprehensive comparisons with commercial software packages and relevant methods in literature, which show that the quality of our results is superior to any other currently available green-screen keying solution. It is important to note that, using the proposed method, these high-quality results can be generated using only one-tenth of the manual editing time that a professional compositing artist requires to process the same content having all previous state-of-the-art tools at one’s disposal.
We present a new method for decomposing an image into a set of soft color segments that are analogous to color layers with alpha channels that have been commonly utilized in modern image manipulation software. We show that the resulting decomposition serves as an effective intermediate image representation, which can be utilized for performing various, seemingly unrelated, image manipulation tasks. We identify a set of requirements that soft color segmentation methods have to fulfill, and present an in-depth theoretical analysis of prior work. We propose an energy formulation for producing compact layers of homogeneous colors and a color refinement procedure, as well as a method for automatically estimating a statistical color model from an image. This results in a novel framework for automatic and high-quality soft color segmentation that is efficient, parallelizable, and scalable. We show that our technique is superior in quality compared to previous methods through quantitative analysis as well as visually through an extensive set of examples. We demonstrate that our soft color segments can easily be exported to familiar image manipulation software packages and used to produce compelling results for numerous image manipulation applications without forcing the user to learn new tools and workflows.
We propose a computational tool for designing Kirchhoff-Plateau Surfaces---planar rod networks embedded in pre-stretched fabric that deploy into complex, three-dimensional shapes. While Kirchhoff-Plateau Surfaces offer an intriguing and expressive design space, navigating this space is made difficult by the highly nonlinear nature of the underlying mechanical problem. In order to tackle this challenge, we propose a user-guided but computer-assisted approach that combines an efficient forward simulation model with a dedicated optimization algorithm in order to implement a powerful set of design tools. We demonstrate our method by designing a diverse set of complex-shaped Kirchhoff-Plateau Surfaces, each validated through physically-fabricated prototypes.
Objects created by connecting and bending wires are common in furniture design, metal sculpting, wire jewelry, etc. Reconstructing such objects with traditional depth and image based methods is extremely difficult due to their unique characteristics such as lack of features, thin elements, and severe self-occlusions. We present a novel image-based method that reconstructs a set of continuous 3D wires used to create such an object, where each wire is composed of an ordered set of 3D curve segments. Our method exploits two main observations: simplicity - wire objects are often created using only a small number of wires, and smoothness - each wire is primarily smoothly bent with sharp features appearing only at joints or isolated points. In light of these observations, we tackle the challenging image correspondence problem across featureless wires by first generating multiple candidate 3D curve segments and then solving a global selection problem that balances between image and smoothness cues to identify the correct 3D curves. Next, we recover a decomposition of such curves into a set of distinct and continuous wires by formulating a multiple traveling salesman problem, which finds smooth paths, i.e., wires, connecting the curves. We demonstrate our method on a wide set of real examples with varying complexity and present high-fidelity results using only 3 images for each object. We provide the source code and data for our work in the project website.
We present a computational approach for designing CurveUps, curvy shells that form from an initially flat state. They consist of small rigid tiles that are tightly held together by two pre-stretched elastic sheets attached to them. Our method allows the realization of smooth, doubly curved surfaces that can be fabricated as a flat piece. Once released, the restoring forces of the pre-stretched sheets support the object to take shape in 3D. CurveUps are structurally stable in their target configuration. The design process starts with a target surface. Our method generates a tile layout in 2D and optimizes the distribution, shape, and attachment areas of the tiles to obtain a configuration that is fabricable and in which the curved up state closely matches the target. Our approach is based on an efficient approximate model and a local optimization strategy for an otherwise intractable nonlinear optimization problem. We demonstrate the effectiveness of our approach for a wide range of shapes, all realized as physical prototypes.
Curved folded surfaces, given their ability to produce elegant freeform shapes by folding flat sheets etched with curved creases, hold a special place in computational Origami. Artists and designers have proposed a wide variety of different fold patterns to create a range of interesting surfaces. The creative process, design, as well as fabrication is usually only concerned with the static surface that emerges once folding has completed. Folding such patterns, however, is difficult as multiple creases have to be folded simultaneously to obtain a properly folded target shape. We introduce string actuated curved folded surfaces that can be shaped by pulling a network of strings, thus, vastly simplifying the process of creating such surfaces and making the folding motion an integral part of the design. Technically, we solve the problem of which surface points to string together and how to actuate them by locally expressing a desired folding path in the space of isometric shape deformations in terms of novel string actuation modes. We demonstrate the validity of our approach by computing string actuation networks for a range of well-known crease patterns and testing their effectiveness on physical prototypes. All the examples in this article can be downloaded for personal use from http://geometry.cs.ucl.ac.uk/projects/2017/string-actuated/.
Slicing is the procedure necessary to prepare a shape for layered manufacturing. There are degrees of freedom in this process, such as the starting point of the slicing sequence and the thickness of each slice. The choice of these parameters influences the manufacturing process and its result: The number of slices significantly affects the time needed for manufacturing, while their thickness affects the error. Assuming a discrete setting, we measure the error as the number of voxels that are incorrectly assigned due to slicing. We provide an algorithm that generates, for a given set of available slice heights and a shape, a slicing that is provably optimal. By optimal, we mean that the algorithm generates sequences with minimal error for any possible number of slices. The algorithm is fast and flexible, that is, it can accommodate a user driven importance modulation of the error function and allows the interactive exploration of the desired quality/time tradeoff. We demonstrate the practical importance of our optimization on several three-dimensional-printed results.
We propose two novel contributions for measurement-based rendering of diffraction effects in surface reflectance of planar homogeneous diffractive materials. As a general solution for commonly manufactured materials, we propose a practical data-driven rendering technique and a measurement approach to efficiently render complex diffraction effects in real time. Our measurement step simply involves photographing a planar diffractive sample illuminated with an LED flash. Here, we directly record the resultant diffraction pattern on the sample surface due to a narrow-band point source illumination. Furthermore, we propose an efficient rendering method that exploits the measurement in conjunction with the Huygens-Fresnel principle to fit relevant diffraction parameters based on a first-order approximation. Our proposed data-driven rendering method requires the precomputation of a single diffraction look-up table for accurate spectral rendering of complex diffraction effects. Second, for sharp specular samples, we propose a novel method for practical measurement of the underlying diffraction grating using out-of-focus “bokeh” photography of the specular highlight. We demonstrate how the measured bokeh can be employed as a height field to drive a diffraction shader based on a first-order approximation for efficient real-time rendering. Finally, we also drive analytic solutions for a few special cases of diffraction from our measurements and demonstrate realistic rendering results under complex light sources and environments.
In this work, we introduce an extension to microfacet theory for the rendering of iridescent effects caused by thin-films of varying thickness (such as oil, grease, alcohols, etc) on top of an arbitrarily rough base layer. Our material model is the first to produce a consistent appearance between tristimulus (e.g., RGB) and spectral rendering engines by analytically pre-integrating its spectral response. The proposed extension works with any microfacet-based model: not only on reflection over dielectrics or conductors, but also on transmission through dielectrics. We adapt its evaluation to work in multi-scale rendering contexts, and we expose parameters enabling artistic control over iridescent appearance. The overhead compared to using the classic Fresnel reflectance or transmittance terms remains reasonable enough for practical uses in production.
Adequate reflectance models are essential for the production of photorealistic images. Microfacet reflectance models predict the appearance of a material at the macroscopic level based on microscopic surface details. They provide a good match with measured reflectance in some cases, but not always. This discrepancy between the behavior predicted by microfacet models and the observed behavior has puzzled researchers for a long time. In this paper, we show that diffraction effects in the micro-geometry provide a plausible explanation. We describe a two-scale reflectance model, separating between geometry details much larger than wavelength and those of size comparable to wavelength. The former model results in the standard Cook-Torrance model. The latter model is responsible for diffraction effects. Diffraction effects at the smaller scale are convolved by the micro-geometry normal distribution. The resulting two-scale model provides a very good approximation to measured reflectances.
Physically-based fur rendering is difficult. Recently, structural differences between hair and fur fibers have been revealed by Yan et al. (2015), who showed that fur fibers have an inner scattering medulla, and developed a double cylinder model. However, fur rendering is still complicated due to the complex scattering paths through the medulla. We develop a number of optimizations that improve efficiency and generality without compromising accuracy, leading to a practical fur reflectance model. We also propose a key contribution to support both near and far-field rendering, and allow smooth transitions between them.
Specifically, we derive a compact BCSDF model for fur reflectance with only 5 lobes. Our model unifies hair and fur rendering, making it easy to implement within standard hair rendering software, since we keep the traditional R, TT, and TRT lobes in hair, and only add two extensions to scattered lobes, TTs and TRTs. Moreover, we introduce a compression scheme using tensor decomposition to dramatically reduce the precomputed data storage for scattered lobes to only 150 KB, with minimal loss of accuracy. By exploiting piecewise analytic integration, our method further enables a multi-scale rendering scheme that transitions between near and far field rendering smoothly and efficiently for the first time, leading to 6 -- 8× speed up over previous work.
We present a novel approach to guiding physically based particle simulations using boundary conditions. Unlike commonly used ad hoc particle techniques for adding and removing the material from a simulation, our approach is principled by utilizing the concept of volumetric flux. Artists are provided with a simple yet powerful primitive called a fluxed animated boundary (FAB), allowing them to specify a control shape and a material flow field. The system takes care of enforcing the corresponding boundary conditions and necessary particle reseeding. We show how FABs can be used artistically or physically. Finally, we demonstrate production examples that show the efficacy of our method.
We present a novel algorithm to control the physically-based animation of smoke. Given a set of keyframe smoke shapes, we compute a dense sequence of control force fields that can drive the smoke shape to match several keyframes at certain time instances. Our approach formulates this control problem as a spacetime optimization constrained by partial differential equations. In order to compute the locally optimal control forces, we alternatively optimize the velocity fields and density fields using an alternating direction method of multiplier (ADMM) optimizer. In order to reduce the high complexity of multiple passes of fluid resimulation during velocity field optimization, we utilize the coherence between consecutive fluid simulation passes. We demonstrate the benefits of our approach by computing accurate solutions on 2D and 3D benchmarks. In practice, we observe up to an order of magnitude improvement over prior optimal control methods.
We present a novel data-driven algorithm to synthesize high resolution flow simulations with reusable repositories of space-time flow data. In our work, we employ a descriptor learning approach to encode the similarity between fluid regions with differences in resolution and numerical viscosity. We use convolutional neural networks to generate the descriptors from fluid data such as smoke density and flow velocity. At the same time, we present a deformation limiting patch advection method which allows us to robustly track deformable fluid regions. With the help of this patch advection, we generate stable space-time data sets from detailed fluids for our repositories. We can then use our learned descriptors to quickly localize a suitable data set when running a new simulation. This makes our approach very efficient, and resolution independent. We will demonstrate with several examples that our method yields volumes with very high effective resolutions, and non-dissipative small scale details that naturally integrate into the motions of the underlying flow.
We propose a method for converting geometric shapes into hierarchically segmented parts with part labels. Our key idea is to train category-specific models from the scene graphs and part names that accompany 3D shapes in public repositories. These freely-available annotations represent an enormous, untapped source of information on geometry. However, because the models and corresponding scene graphs are created by a wide range of modelers with different levels of expertise, modeling tools, and objectives, these models have very inconsistent segmentations and hierarchies with sparse and noisy textual tags. Our method involves two analysis steps. First, we perform a joint optimization to simultaneously cluster and label parts in the database while also inferring a canonical tag dictionary and part hierarchy. We then use this labeled data to train a method for hierarchical segmentation and labeling of new 3D shapes. We demonstrate that our method can mine complex information, detecting hierarchies in man-made objects and their constituent parts, obtaining finer scale details than existing alternatives. We also show that, by performing domain transfer using a few supervised examples, our technique outperforms fully-supervised techniques that require hundreds of manually-labeled models.
The recent success of convolutional neural networks (CNNs) for image processing tasks is inspiring research efforts attempting to achieve similar success for geometric tasks. One of the main challenges in applying CNNs to surfaces is defining a natural convolution operator on surfaces.
In this paper we present a method for applying deep learning to sphere-type shapes using a global seamless parameterization to a planar flat-torus, for which the convolution operator is well defined. As a result, the standard deep learning framework can be readily applied for learning semantic, high-level properties of the shape. An indication of our success in bridging the gap between images and surfaces is the fact that our algorithm succeeds in learning semantic information from an input of raw low-dimensional feature vectors.
We demonstrate the usefulness of our approach by presenting two applications: human body segmentation, and automatic landmark detection on anatomical surfaces. We show that our algorithm compares favorably with competing geometric deep-learning algorithms for segmentation tasks, and is able to produce meaningful correspondences on anatomical surfaces where hand-crafted features are bound to fail.
We present O-CNN, an Octree-based Convolutional Neural Network (CNN) for 3D shape analysis. Built upon the octree representation of 3D shapes, our method takes the average normal vectors of a 3D model sampled in the finest leaf octants as input and performs 3D CNN operations on the octants occupied by the 3D shape surface. We design a novel octree data structure to efficiently store the octant information and CNN features into the graphics memory and execute the entire O-CNN training and evaluation on the GPU. O-CNN supports various CNN structures and works for 3D shapes in different representations. By restraining the computations on the octants occupied by 3D surfaces, the memory and computational costs of the O-CNN grow quadratically as the depth of the octree increases, which makes the 3D CNN feasible for high-resolution 3D models. We compare the performance of the O-CNN with other existing 3D CNN solutions and demonstrate the efficiency and efficacy of O-CNN in three shape analysis tasks, including object classification, shape retrieval, and shape segmentation.
Designing and simulating realistic clothing is challenging. Previous methods addressing the capture of clothing from 3D scans have been limited to single garments and simple motions, lack detail, or require specialized texture patterns. Here we address the problem of capturing regular clothing on fully dressed people in motion. People typically wear multiple pieces of clothing at a time. To estimate the shape of such clothing, track it over time, and render it believably, each garment must be segmented from the others and the body. Our ClothCap approach uses a new multi-part 3D model of clothed bodies, automatically segments each piece of clothing, estimates the minimally clothed body shape and pose under the clothing, and tracks the 3D deformations of the clothing over time. We estimate the garments and their motion from 4D scans; that is, high-resolution 3D scans of the subject in motion at 60 fps. ClothCap is able to capture a clothed person in motion, extract their clothing, and retarget the clothing to new body shapes; this provides a step towards virtual try-on.
Rendering algorithms using Markov chain Monte Carlo (MCMC) currently build upon two different state spaces. One of them is the path space, where the algorithms operate on the vertices of actual transport paths. The other state space is the primary sample space, where the algorithms operate on sequences of numbers used for generating transport paths. While the two state spaces are related by the sampling procedure of transport paths, all existing MCMC rendering algorithms are designed to work within only one of the state spaces. We propose a first framework which provides a comprehensive connection between the path space and the primary sample space. Using this framework, we can use mutation strategies designed for one space with mutation strategies in the respective other space. As a practical example, we take a combination of manifold exploration and multiplexed Metropolis light transport using our framework. Our results show that the simultaneous use of the two state spaces improves the robustness of MCMC rendering. By combining efficient local exploration in the path space with global jumps in primary sample space, our method achieves more uniform convergence as compared to using only one space.
In this manuscript, inspired by a simpler reformulation of primary sample space Metropolis light transport, we derive a novel family of general Markov chain Monte Carlo algorithms called charted Metropolis-Hastings, that introduces the notion of sampling charts to extend a given sampling domain and make it easier to sample the desired target distribution and escape from local maxima through coordinate changes. We further apply the novel algorithms to light transport simulation, obtaining a new type of algorithm called charted Metropolis light transport, that can be seen as a bridge between primary sample space and path space Metropolis light transport. The new algorithms require to provide only right inverses of the sampling functions, a property that we believe crucial to make them practical in the context of light transport simulation.
The human visual system is sensitive to relative differences in luminance, but light transport simulation algorithms based on Metropolis sampling often result in a highly nonuniform relative error distribution over the rendered image. Although this issue has previously been addressed in the context of the Metropolis light transport algorithm, our work focuses on Metropolis photon tracing. We present a new target function (TF) for Metropolis photon tracing that ensures good stratification of photons leading to pixel estimates with equalized relative error. We develop a hierarchical scheme for progressive construction of the TF from paths sampled during rendering. In addition to the approach taken in previous work, where the TF is defined in the image plane, ours can be associated with compact spatial regions. This allows us to take advantage of illumination coherence to more robustly estimate the TF while adapting to geometry discontinuities. To sample from this TF, we design a new replica exchange Metropolis scheme. We apply our algorithm in progressive photon mapping and show that it often outperforms alternative approaches in terms of image quality by a large margin.
We present the first method to efficiently predict antialiasing footprints to pre-filter color-, normal-, and displacement-mapped appearance in the context of multi-bounce global illumination. We derive Fourier spectra for radiance and importance functions that allow us to compute spatial-angular filtering footprints at path vertices for both uni- and bi-directional path construction. We then use these footprints to antialias reflectance modulated by high-resolution maps (such as color and normal maps) encountered along a path. In doing so, we also unify the traditional path-space formulation of light transport with our frequency-space interpretation of global illumination pre-filtering. Our method is fully compatible with all existing single bounce pre-filtering appearance models, not restricted by path length, and easy to implement atop existing path-space renderers. We illustrate its effectiveness on several radiometrically complex scenarios where previous approaches either completely fail or require orders of magnitude more time to arrive at similarly high-quality results.
In this work we present the first algorithm for reconstructing multi-labeled material interfaces the allows for explicit topology control. Our algorithm takes in a set of 2D cross-sectional slices (not necessarily parallel), each partitioned by a curve network into labeled regions representing different material types. For each label, the user has the option to constrain the number of connected components and genus. Our algorithm is able to not only produce a material interface that interpolates the curve networks but also simultaneously satisfy the topological requirements. Our key innovation is defining a space of topology-varying material interfaces, which extends the family of level sets in a scalar function, and developing discrete methods for sampling distinct topologies in this space. Besides specifying topological constraints, the user can steer the algorithm interactively, such as by scribbling. We demonstrate, on synthetic and biological shapes, how our algorithm opens up new opportunities for topology-aware modeling in the multi-labeled context.
Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges of drift in pose estimation, introducing significant errors in the accumulated model. Approaches often require hours of offline processing to globally correct model errors. Recent online methods demonstrate compelling results but suffer from (1) needing minutes to perform online correction, preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model) pose estimation, resulting in many tracking failures; or (3) supporting only unstructured point-based representations, which limit scan quality and applicability. We systematically address these issues with a novel, real-time, end-to-end reconstruction framework. At its core is a robust pose estimation strategy, optimizing per frame for a global set of camera poses by considering the complete history of RGB-D input with an efficient hierarchical approach. We remove the heavy reliance on temporal tracking and continually localize to the globally optimized frames instead. We contribute a parallelizable optimization framework, which employs correspondences based on sparse features and dense geometric and photometric matching. Our approach estimates globally optimized (i.e., bundle adjusted) poses in real time, supports robust tracking with recovery from gross tracking failures (i.e., relocalization), and re-estimates the 3D model in real time to ensure global consistency, all within a single framework. Our approach outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness. Our framework leads to a comprehensive online scanning solution for large indoor environments, enabling ease of use and high-quality results.1
Today's 3D scanning pipelines can be classified into two overarching categories: offline, high accuracy methods that rely on global optimization to reconstruct complex scenes with hundreds of millions of samples, and online methods that produce real-time but low-quality output, usually from structure-from-motion or depth sensors. The method proposed in this paper is the first to combine the benefits of both approaches, supporting online reconstruction of scenes with hundreds of millions of samples from high-resolution sensing modalities such as structured light or laser scanners. The key property of our algorithm is that it sidesteps the signed-distance computation of classical reconstruction techniques in favor of direct filtering, parametrization, and mesh and texture extraction. All of these steps can be realized using only weak notions of spatial neighborhoods, which allows for an implementation that scales approximately linearly with the size of each dataset that is integrated into a partial reconstruction. Combined, these algorithmic differences enable a drastically more efficient output-driven interactive scanning and reconstruction workflow, where the user is able to see the final quality field-aligned textured mesh during the entirety of the scanning procedure. Holes or parts with registration problems are displayed in real-time to the user and can be easily resolved by adding further localized scans, or by adjusting the input point cloud using our interactive editing tools with immediate visual feedback on the output mesh. We demonstrate the effectiveness of our algorithm in conjunction with a state-of-the-art structured light scanner and optical tracking system and test it on a large variety of challenging models.
We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.
The paper presents a novel three-dimensional shape acquisition and reconstruction method based on the well-known Archimedes equality between fluid displacement and the submerged volume. By repeatedly dipping a shape in liquid in different orientations and measuring its volume displacement, we generate the dip transform: a novel volumetric shape representation that characterizes the object's surface. The key feature of our method is that it employs fluid displacements as the shape sensor. Unlike optical sensors, the liquid has no line-of-sight requirements, it penetrates cavities and hidden parts of the object, as well as transparent and glossy materials, thus bypassing all visibility and optical limitations of conventional scanning devices. Our new scanning approach is implemented using a dipping robot arm and a bath of water, via which it measures the water elevation. We show results of reconstructing complex 3D shapes and evaluate the quality of the reconstruction with respect to the number of dips.
We present a computational approach to creating animated plushies, soft robotic plush toys specifically-designed to reenact user-authored motions. Our design process is inspired by muscular hydrostat structures, which drive highly versatile motions in many biological systems. We begin by instrumenting simulated plush toys with a large number of small, independently-actuated, virtual muscle-fibers. Through an intuitive posing interface, users then begin animating their plushie. A novel numerical solver, reminiscent of inverse-kinematics, computes optimal contractions for each muscle-fiber such that the soft body of the plushie deforms to best match user input. By analyzing the co-activation patterns of the fibers that contribute most to the plushie's motions, our design system generates physically-realizable winch-tendon networks. Winch-tendon networks model the motorized cable-driven actuation mechanisms that drive the motions of our real-life plush toy prototypes. We demonstrate the effectiveness of our computational approach by co-designing motions and actuation systems for a variety of physically-simulated and fabricated plushies.
We present an interactive design system to create functional mechanical objects. Our computational approach allows novice users to retarget an existing mechanical template to a user-specified input shape. Our proposed representation for a mechanical template encodes a parameterized mechanism, mechanical constraints that ensure a physically valid configuration, spatial relationships of mechanical parts to the user-provided shape, and functional constraints that specify an intended functionality. We provide an intuitive interface and optimization-in-the-loop approach for finding a valid configuration of the mechanism and the shape to ensure that higher-level functional goals are met. Our algorithm interactively optimizes the mechanism while the user manipulates the placement of mechanical components and the shape. Our system allows users to efficiently explore various design choices and to synthesize customized mechanical objects that can be fabricated with rapid prototyping technologies. We demonstrate the efficacy of our approach by retargeting various mechanical templates to different shapes and fabricating the resulting functional mechanical objects.
We present a computational tool for designing compliant mechanisms. Our method takes as input a conventional, rigidly-articulated mechanism defining the topology of the compliant design. This input can be both planar or spatial, and we support a number of common joint types which, whenever possible, are automatically replaced with parameterized flexures. As the technical core of our approach, we describe a number of objectives that shape the design space in a meaningful way, including trajectory matching, collision avoidance, lateral stability, resilience to failure, and minimizing motor torque. Optimal designs in this space are obtained as solutions to an equilibrium-constrained minimization problem that we solve using a variant of sensitivity analysis. We demonstrate our method on a set of examples that range from simple four-bar linkages to full-fledged animatronics, and verify the feasibility of our designs by manufacturing physical prototypes.
Telescoping structures are valuable for a variety of applications where mechanisms must be compact in size and yet easily deployed. So far, however, there has been no systematic study of the types of shapes that can be modeled by telescoping structures, nor practical tools for telescopic design. We present a novel geometric characterization of telescoping curves, and explore how free-form surfaces can be approximated by networks of such curves. In particular we consider piecewise helical space curves with torsional impulses, which significantly generalize the linear telescopes found in typical engineering designs. Based on this principle we develop a system for computational design and fabrication which allows users to explore the space of telescoping structures; inputs to our system include user sketches or arbitrary meshes, which are then converted to a curve skeleton. We prototype applications in animation, fabrication, and robotics, using our system to design a variety of both simulated and fabricated examples.
The realistic simulation of highly-dynamic elastic objects is important for a broad range of applications in computer graphics, engineering and computational fabrication. However, whether simulating flipping toys, jumping robots, prosthetics or quickly moving creatures, performing such simulations in the presence of contact, impact and friction is both time consuming and inaccurate. In this paper we present Dynamics-Aware Coarsening (DAC) and the Boundary Balanced Impact (BBI) model which allow for the accurate simulation of dynamic, elastic objects undergoing both large scale deformation and frictional contact, at rates up to 79 times faster than state-of-the-art methods. DAC and BBI produce simulations that are accurate and fast enough to be used (for the first time) for the computational design of 3D-printable compliant dynamic mechanisms. Thus we demonstrate the efficacy of DAC and BBI by designing and fabricating mechanisms which flip, throw and jump over and onto obstacles as requested.
We present novel designs for virtual and augmented reality near-eye displays based on phase-only holographic projection. Our approach is built on the principles of Fresnel holography and double phase amplitude encoding with additional hardware, phase correction factors, and spatial light modulator encodings to achieve full color, high contrast and low noise holograms with high resolution and true per-pixel focal control. We provide a GPU-accelerated implementation of all holographic computation that integrates with the standard graphics pipeline and enables real-time (≥90 Hz) calculation directly or through eye tracked approximations. A unified focus, aberration correction, and vision correction model, along with a user calibration process, accounts for any optical defects between the light source and retina. We use this optical correction ability not only to fix minor aberrations but to enable truly compact, eyeglasses-like displays with wide fields of view (80°) that would be inaccessible through conventional means. All functionality is evaluated across a series of hardware prototypes; we discuss remaining challenges to incorporate all features into a single device.
Conventional binocular head-mounted displays (HMDs) vary the stimulus to vergence with the information in the picture, while the stimulus to accommodation remains fixed at the apparent distance of the display, as created by the viewing optics. Sustained vergence-accommodation conflict (VAC) has been associated with visual discomfort, motivating numerous proposals for delivering near-correct accommodation cues. We introduce focal surface displays to meet this challenge, augmenting conventional HMDs with a phase-only spatial light modulator (SLM) placed between the display screen and viewing optics. This SLM acts as a dynamic freeform lens, shaping synthesized focal surfaces to conform to the virtual scene geometry. We introduce a framework to decompose target focal stacks and depth maps into one or more pairs of piecewise smooth focal surfaces and underlying display images. We build on recent developments in "optimized blending" to implement a multifocal display that allows the accurate depiction of occluding, semi-transparent, and reflective objects. Practical benefits over prior accommodation-supporting HMDs are demonstrated using a binocular focal surface display employing a liquid crystal on silicon (LCOS) phase SLM and an organic light-emitting diode (OLED) display.
Head-mounted displays (HMDs) often cause discomfort and even nausea. Improving comfort is therefore one of the most significant challenges for the design of such systems. In this paper, we evaluate the effect of different HMD display configurations on discomfort. We do this by designing a device to measure human visual behavior and evaluate viewer comfort. In particular, we focus on one known source of discomfort: the vergence-accommodation (VA) conflict. The VA conflict is the difference between accommodative and vergence response. In HMDs the eyes accommodate to a fixed screen distance while they converge to the simulated distance of the object of interest, requiring the viewer to undo the neural coupling between the two responses. Several methods have been proposed to alleviate the VA conflict, including Depth-of-Field (DoF) rendering, focus-adjustable lenses, and monovision. However, no previous work has investigated whether these solutions actually drive accommodation to the distance of the simulated object. If they did, the VA conflict would disappear, and we expect comfort to improve. We design the first device that allows us to measure accommodation in HMDs, and we use it to obtain accommodation measurements and to conduct a discomfort study. The results of the first experiment demonstrate that only the focus-adjustable-lens design drives accommodation effectively, while other solutions do not drive accommodation to the simulated distance and thus do not resolve the VA conflict. The second experiment measures discomfort. The results validate that the focus-adjustable-lens design improves comfort significantly more than the other solutions.
Although emerging virtual and augmented reality (VR/AR) systems can produce highly immersive experiences, they can also cause visual discomfort, eyestrain, and nausea. One of the sources of these symptoms is a mismatch between vergence and focus cues. In current VR/AR near-eye displays, a stereoscopic image pair drives the vergence state of the human visual system to arbitrary distances, but the accommodation, or focus, state of the eyes is optically driven towards a fixed distance. In this work, we introduce a new display technology, dubbed accommodation-invariant (AI) near-eye displays, to improve the consistency of depth cues in near-eye displays. Rather than producing correct focus cues, AI displays are optically engineered to produce visual stimuli that are invariant to the accommodation state of the eye. The accommodation system can then be driven by stereoscopic cues, and the mismatch between vergence and accommodation state of the eyes is significantly reduced. We validate the principle of operation of AI displays using a prototype display that allows for the accommodation state of users to be measured while they view visual stimuli using multiple different display modes.
We present a method for locally injective seamless parametrization of triangular mesh surfaces of arbitrary genus, with or without boundaries, given desired cone points and rational holonomy angles (multiples of 2π/q for some positive integer q). The basis of the method is an elegant generalization of Tutte's "spring embedding theorem" to this setting. The surface is cut to a disk and a harmonic system with appropriate rotation constraints is solved, resulting in a harmonic global parametrization (HGP) method. We show a remarkable result: that if the triangles adjacent to the cones and boundary are positively oriented, and the correct cone and turning angles are induced, then the resulting map is guaranteed to be locally injective. Guided by this result, we solve the linear system by convex optimization, imposing convexification frames on only the boundary and cone triangles, and minimizing a Laplacian energy to achieve harmonicity. We compare HGP to state-of-the-art methods and see that it is the most robust, and is significantly faster than methods with comparable robustness.
This work presents an algorithm for injectively parameterizing surfaces into spherical target domains called spherical orbifolds. Spherical orbifolds are cone surfaces that are generated from symmetry groups of the sphere. The surface is mapped the spherical orbifold via an extension of Tutte's embedding. This embedding is proven to be bijective under mild additional assumptions, which hold in all experiments performed.
This work also completes the adaptation of Tutte's embedding to orbifolds of the three classic geometries - Euclidean, hyperbolic and spherical - where the first two were recently addressed.
The spherical orbifold embeddings approximate conformal maps and require relatively low computational times. The constant positive curvature of the spherical orbifolds, along with the flexibility of their cone angles, enables producing embeddings with lower isometric distortion compared to their Euclidean counterparts, a fact that makes spherical orbifolds a natural candidate for surface parameterization.
A variety of techniques were proposed to model smooth surfaces based on tensor product splines (e.g. subdivision surfaces, free-form splines, T-splines). Conversion of an input surface into such a representation is commonly achieved by constructing a global seamless parametrization, possibly aligned to a guiding cross-field (e.g. of principal curvature directions), and using this parametrization as domain to construct the spline-based surface.
One major fundamental difficulty in designing robust algorithms for this task is the fact that for common types, e.g. subdivision surfaces (requiring a conforming domain mesh) or T-spline surfaces (requiring a globally consistent knot interval assignment) reliably obtaining a suitable parametrization that has the same topological structure as the guiding field poses a major challenge. Even worse, not all fields do admit suitable parametrizations, and no concise conditions are known as to which fields do.
We present a class of surface constructions (T-splines with halfedge knots) and a class of parametrizations (seamless similarity maps) that are, in a sense, a perfect match for the task: for any given guiding field structure, a compatible parametrization of this kind exists and a smooth piecewise rational surface with exactly the same structure as the input field can be constructed from it. As a byproduct, this enables full control over extraordinary points. The construction is backward compatible with classical NURBS. We present efficient algorithms for building discrete conformal similarity maps and associated T-meshes and T-spline surfaces.
We propose a novel technique for computing consistent cross fields on a pair of triangle meshes given an input correspondence, which we use as guiding fields for approximately consistent quadrangulations. Unlike the majority of existing methods our approach does not assume that the meshes share the same connectivity or even have the same number of vertices, and furthermore does not place any restrictions on the topology (genus) of the shapes. Importantly, our method is robust with respect to small perturbations of the given correspondence, as it only relies on the transportation of real-valued functions and thus avoids the costly and error-prone estimation of the map differential. Key to this robustness is a novel formulation, which relies on the previously-proposed notion of power vectors, and we show how consistency can be enforced without pre-alignment of local basis frames, in which these power vectors are computed. We demonstrate that using the same formulation we can both compute a quadrangulation that would respect a given symmetry on the same shape or a map across a pair of shapes. We provide quantitative and qualitative comparison of our method with several baselines and show that it both provides more accurate results and allows to handle more general cases than existing techniques.
We introduce a simple and effective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech. Our approach uses a sliding window predictor that learns arbitrary nonlinear mappings from phoneme label input sequences to mouth movements in a way that accurately captures natural motion and visual coarticulation effects. Our deep learning approach enjoys several attractive properties: it runs in real-time, requires minimal parameter tuning, generalizes well to novel input speech sequences, is easily edited to create stylized and emotional speech, and is compatible with existing animation retargeting approaches. One important focus of our work is to develop an effective approach for speech animation that can be easily integrated into existing production pipelines. We provide a detailed description of our end-to-end approach, including machine learning design decisions. Generalized speech animation results are demonstrated over a wide range of animation clips on a variety of characters and voices, including singing and foreign language input. Our approach can also generate on-demand speech animation in real-time from user speech input.
We present a machine learning technique for driving 3D facial animation by audio input in real time and with low latency. Our deep neural network learns a mapping from input waveforms to the 3D vertex coordinates of a face model, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone. During inference, the latent code can be used as an intuitive control for the emotional state of the face puppet.
We train our network with 3--5 minutes of high-quality animation data obtained using traditional, vision-based performance capture methods. Even though our primary goal is to model the speaking style of a single actor, our model yields reasonable results even when driven with audio from other speakers with different gender, accent, or language, as we demonstrate with a user study. The results are applicable to in-game dialogue, low-cost localization, virtual reality avatars, and telepresence.
Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.
Editing audio narration using conventional software typically involves many painstaking low-level manipulations. Some state of the art systems allow the editor to work in a text transcript of the narration, and perform select, cut, copy and paste operations directly in the transcript; these operations are then automatically applied to the waveform in a straightforward manner. However, an obvious gap in the text-based interface is the ability to type new words not appearing in the transcript, for example inserting a new word for emphasis or replacing a misspoken word. While high-quality voice synthesizers exist today, the challenge is to synthesize the new word in a voice that matches the rest of the narration. This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration. Our approach is to use a text to speech synthesizer to say the word in a generic voice, and then use voice conversion to convert it into a voice that matches the narration. Offering a range of degrees of control to the editor, our interface supports fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and even guidance by the editors own voice. The paper presents studies showing that the output of our method is preferred over baseline methods and often indistinguishable from the original voice.
Regression-based algorithms have shown to be good at denoising Monte Carlo (MC) renderings by leveraging its inexpensive by-products (e.g., feature buffers). However, when using higher-order models to handle complex cases, these techniques often overfit to noise in the input. For this reason, supervised learning methods have been proposed that train on a large collection of reference examples, but they use explicit filters that limit their denoising ability. To address these problems, we propose a novel, supervised learning approach that allows the filtering kernel to be more complex and general by leveraging a deep convolutional neural network (CNN) architecture. In one embodiment of our framework, the CNN directly predicts the final denoised pixel value as a highly non-linear combination of the input features. In a second approach, we introduce a novel, kernel-prediction network which uses the CNN to estimate the local weighting kernels used to compute each denoised pixel from its neighbors. We train and evaluate our networks on production data and observe improvements over state-of-the-art MC denoisers, showing that our methods generalize well to a variety of scenes. We conclude by analyzing various components of our architecture and identify areas of further research in deep learning for MC denoising.
We describe a machine learning technique for reconstructing image sequences rendered using Monte Carlo methods. Our primary focus is on reconstruction of global illumination with extremely low sampling budgets at interactive rates. Motivated by recent advances in image restoration with deep convolutional networks, we propose a variant of these networks better suited to the class of noise present in Monte Carlo rendering. We allow for much larger pixel neighborhoods to be taken into account, while also improving execution speed by an order of magnitude. Our primary contribution is the addition of recurrent connections to the network in order to drastically improve temporal stability for sequences of sparsely sampled input images. Our method also has the desirable property of automatically modeling relationships based on auxiliary per-pixel input channels, such as depth and normals. We show significantly higher quality results compared to existing methods that run at comparable speeds, and furthermore argue a clear path for making our method run at realtime rates in the near future.
Implementing Monte Carlo integration requires significant domain expertise. While simple samplers, such as unidirectional path tracing, are relatively forgiving, more complex algorithms, such as bidirectional path tracing or Metropolis methods, are notoriously difficult to implement correctly. We propose Aether, an embedded domain specific language for Monte Carlo integration, which offers primitives for writing concise and correct-by-construction sampling and probability code. The user is tasked with writing sampling code, while our compiler automatically generates the code necessary for evaluating PDFs as well as the book keeping and combination of multiple sampling strategies. Our language focuses on ease of implementation for rapid exploration, at the cost of run time performance. We demonstrate the effectiveness of the language by implementing several challenging rendering algorithms as well as a new algorithm, which would otherwise be prohibitively difficult.
Modern game engines seek to balance the conflicting goals of high rendering performance and productive software development. To improve CPU performance, the most recent generation of real-time graphics APIs provide new primitives for performing efficient batch updates to shader parameters. However, modern game engines featuring large shader codebases have struggled to take advantage of these benefits. The problem is that even though shader parameters can be organized into efficient modules bound to the pipeline at various frequencies, modern shading languages lack corresponding primitives to organize shader logic (requiring these parameters) into modules as well. The result is that complex shaders are typically compiled to use a monolithic block of parameters, defeating the design, and performance benefits, of the new parameter binding API. In this paper we propose to resolve this mismatch by introducing shader components, a first-class unit of modularity in a shader program that encapsulates a unit of shader logic and the parameters that must be bound when that logic is in use. We show that by building sophisticated shaders out of components, we can retain essential aspects of performance (static specialization of the shader logic in use and efficient update of parameters at component granularity) while maintaining the modular shader code structure that is desirable in today's high-end game engines.
Parametric surfaces are an essential modeling tool in computer aided design and movie production. Even though their use is well established in industry, generating ray-traced images adds significant cost in time and memory consumption. Ray tracing such surfaces is usually accomplished by subdividing the surfaces on the fly, or by conversion to a polygonal representation. However, on-the-fly subdivision is computationally very expensive, whereas polygonal meshes require large amounts of memory. This is a particular problem for parametric surfaces with displacement, where very fine tessellation is required to faithfully represent the shape. Hence, memory restrictions are the major challenge in production rendering. In this article, we present a novel solution to this problem. We propose a compression scheme for a priori Bounding Volume Hierarchies (BVHs) on parametric patches, that reduces the data required for the hierarchy by a factor of up to 48. We further propose an approximate evaluation method that does not require leaf geometry, yielding an overall reduction of memory consumption by a factor of 60 over regular BVHs on indexed face sets and by a factor of 16 over established state-of-the-art compression schemes. Alternatively, our compression can simply be applied to a standard BVH while keeping the leaf geometry, resulting in a compression rate of up to 2:1 over current methods. Although decompression generates additional costs during traversal, we can manage very complex scenes even on the memory restrictive GPU at competitive render times.
We propose a novel unsteady Stokes solver for coupled viscous and pressure forces in grid-based liquid animation which yields greater accuracy and visual realism than previously achieved. Modern fluid simulators treat viscosity and pressure in separate solver stages, which reduces accuracy and yields incorrect free surface behavior. Our proposed implicit variational formulation of the Stokes problem leads to a symmetric positive definite linear system that gives properly coupled forces, provides unconditional stability, and treats difficult boundary conditions naturally through simple volume weights. Surface tension and moving solid boundaries are also easily incorporated. Qualitatively, we show that our method recovers the characteristic rope coiling instability of viscous liquids and preserves fine surface details, while previous grid-based schemes do not. Quantitatively, we demonstrate that our method is convergent through grid refinement studies on analytical problems in two dimensions. We conclude by offering practical guidelines for choosing an appropriate viscous solver, based on the scenario to be animated and the computational costs of different methods.
In this paper we introduce a novel method to adaptive incompressible SPH simulations. Instead of using a scheme with a number of fixed particle sizes or levels, our approach allows continuous particle sizes. This enables us to define optimal particle masses with respect to, e.g., the distance to the fluid's surface. A required change in mass due to the dynamics of the fluid is properly and stably handled by our scheme of mass redistribution. This includes temporally smooth changes in particle masses as well as sudden mass variations in regions of high flow dynamics. Our approach guarantees low spatial variations in particle size, which is a core property in order to achieve large adaptivity ratios for incompressible fluid simulations. Conceptually, our approach allows for infinite continuous adaptivity, practically we achieved adaptivity ratios up to 5 orders of magnitude, while still being mass preserving and numerically stable, yielding unprecedented vivid surface detail at comparably low computational cost and moderate particle counts.
This paper presents a method for simulating water surface waves as a displacement field on a 2D domain. Our method relies on Lagrangian particles that carry packets of water wave energy; each packet carries information about an entire group of wave trains, as opposed to only a single wave crest. Our approach is unconditionally stable and can simulate high resolution geometric details. This approach also presents a straightforward interface for artistic control, because it is essentially a particle system with intuitive parameters like wavelength and amplitude. Our implementation parallelizes well and runs in real time for moderately challenging scenarios.
We present a multi-scale method for simulating incompressible gases in 3-dimensions with resolution variation suitable for perspective cameras and regions of importance. The dynamics is derived from the vorticity equation. Lagrangian particles are created, modified and deleted in a manner that handles advection with buoyancy and viscosity. Boundaries and deformable object collisions are modeled with the source and doublet panel method. Our acceleration structure is based on the FMM (Fast Multipole Method), but with a varying size to account for non-uniform sampling. Because the dynamics of our method is voxel free, we can freely specify the voxel resolution of the output density and velocity while keeping the main shapes and timing.
We present a multi-species model for the simulation of gravity driven landslides and debris flows with porous sand and water interactions. We use continuum mixture theory to describe individual phases where each species individually obeys conservation of mass and momentum and they are coupled through a momentum exchange term. Water is modeled as a weakly compressible fluid and sand is modeled with an elastoplastic law whose cohesion varies with water saturation. We use a two-grid Material Point Method to discretize the governing equations. The momentum exchange term in the mixture theory is relatively stiff and we use semi-implicit time stepping to avoid associated small time steps. Our semi-implicit treatment is explicit in plasticity and preserves symmetry of force linearizations. We develop a novel regularization of the elastic part of the sand constitutive model that better mimics plasticity during the implicit solve to prevent numerical cohesion artifacts that would otherwise have occurred. Lastly, we develop an improved return mapping for sand plasticity that prevents volume gain artifacts in the traditional Drucker-Prager model.
This article introduces a programmable method for designing stationary 2D arrangements for element textures, namely textures made of small geometric elements. These textures are ubiquitous in numerous applications of computer-aided illustration. Previous methods, whether they be example-based or layout-based, lack control and can produce a limited range of possible arrangements. Our approach targets technical artists who will design an arrangement by writing a script. These scripts are using three types of operators: partitioning operators for defining the broad-scale organization of the arrangement, mapping operators for controlling the local organization of elements, and merging operators for mixing different arrangements. These operators are designed so as to guarantee a stationary result, meaning that the produced arrangements will always be repetitive. We show that this simple set of operators is sufficient to reach a much broader variety of arrangements than previous methods. Editing the script leads to predictable changes in the synthesized arrangement, which allows an easy iterative design of complex structures. Finally, our operator set is extensible and can be adapted to application-dependent needs.
Example-based texture synthesis has been an active research problem for over two decades. Still, synthesizing textures with nonlocal structures remains a challenge. In this article, we present a texture synthesis technique that builds upon convolutional neural networks and extracted statistics of pretrained deep features. We introduce a structural energy, based on correlations among deep features, which capture the self-similarities and regularities characterizing the texture. Specifically, we show that our technique can synthesize textures that have structures of various scales, local and nonlocal, and the combination of the two.
Image-based texture mapping is a common way of producing texture maps for geometric models of real-world objects. Although a high-quality texture map can be easily computed for accurate geometry and calibrated cameras, the quality of texture map degrades significantly in the presence of inaccuracies. In this paper, we address this problem by proposing a novel global patch-based optimization system to synthesize the aligned images. Specifically, we use patch-based synthesis to reconstruct a set of photometrically-consistent aligned images by drawing information from the source images. Our optimization system is simple, flexible, and more suitable for correcting large misalignments than other techniques such as local warping. To solve the optimization, we propose a two-step approach which involves patch search and vote, and reconstruction. Experimental results show that our approach can produce high-quality texture maps better than existing techniques for objects scanned by consumer depth cameras such as Intel RealSense. Moreover, we demonstrate that our system can be used for texture editing tasks such as hole-filling and reshuffling as well as multiview camouflage.
We present a novel approach for image completion that results in images that are both locally and globally consistent. With a fully-convolutional neural network, we can complete images of arbitrary resolutions by filling-in missing regions of any shape. To train this image completion network to be consistent, we use global and local context discriminators that are trained to distinguish real images from completed ones. The global discriminator looks at the entire image to assess if it is coherent as a whole, while the local discriminator looks only at a small area centered at the completed region to ensure the local consistency of the generated patches. The image completion network is then trained to fool the both context discriminator networks, which requires it to generate images that are indistinguishable from real ones with regard to overall consistency as well as in details. We show that our approach can be used to complete a wide variety of scenes. Furthermore, in contrast with the patch-based approaches such as PatchMatch, our approach can generate fragments that do not appear elsewhere in the image, which allows us to naturally complete the images of objects with familiar and highly specific structures, such as faces.
Natural images often exhibit symmetries that should be taken into account when editing them. In this paper we present Nautilus --- a method for automatically identifying symmetric regions in an image along with their corresponding symmetry transformations. We compute dense local similarity symmetry transformations using a novel variant of the Generalised PatchMatch algorithm that uses Metropolis-Hastings sampling. We combine and refine these local symmetries using an extended Lucas-Kanade algorithm to compute regional transformations and their spatial extents. Our approach produces dense estimates of complex symmetries that are combinations of translation, rotation, scale, and reflection under perspective distortion. This enables a number of automatic symmetry-aware image editing applications including inpainting, rectification, beautification, and segmentation, and we demonstrate state-of-the-art applications for each of them.
Rendering translucent materials with physically based Monte Carlo methods tends to be computationally expensive due to the long chains of volumetric scattering interactions. In the case of strongly forward scattering materials, the problem gets compounded since each scattering interaction becomes highly anisotropic and near-specular. Various well-known approaches try to avoid the resulting sampling problem through analytical approximations based on diffusion theory. Although these methods are computationally efficient, their assumption of diffusive, isotropic scattering can lead to considerable errors when rendering forward scattering materials, even in the optically dense limit. In this paper, we present an analytical subsurface scattering model, derived with the explicit assumption of strong forward scattering. Our model is not based on diffusion theory, but follows from a connection that we identified between the functional integral formulation of radiative transport and the partition function of a worm-like chain in polymer physics. Our resulting model does not need a separate Monte Carlo solution for unscattered or single-scattered contributions, nor does it require ad-hoc regularization procedures. It has a single singularity by design, corresponding to the initial unscattered propagation, which can be accounted for by the extensive analytical importance sampling scheme that we provide. Our model captures the full behaviour of forward scattering media, ranging from unscattered straight-line propagation to the fully diffusive limit. Moreover, we derive a novel forward scattering BRDF as limiting case of our subsurface scattering model, which can be used in a level of detail hierarchy. We show how our model can be integrated in existing Monte Carlo rendering algorithms, and make comparisons to previous approaches.
Rendering explosions with self-illumination is a challenging problem. Explosions contain animated volumetric light sources immersed in animated smoke that cast volumetric shadows, which play an essential role and are expensive to compute. We propose an efficient solution that redefines this problem as rendering with many animated lights by converting the volumetric lighting data into a large number of point lights. Focusing on temporal coherency to avoid flickering in animations, we introduce lighting grid hierarchy for approximating the volumetric illumination at different resolutions. Using this structure we can efficiently approximate the lighting at any point inside or outside of the explosion volume as a mixture of lighting contributions from all levels of the hierarchy. As a result, we are able to capture high-frequency details of local illumination, as well as the potentially strong impact of distant illumination. Most importantly, this hierarchical structure allows us to efficiently precompute volumetric shadows, which substantially accelerates the lighting computation. Finally, we provide a scalable approach for computing the multiple scattering of light within the smoke volume using our lighting grid hierarchy. Temporal coherency is achieved by relying on continuous formulations at all stages of the lighting approximation. We show that our method is efficient and effective approximating the self-illumination of explosions with visually indistinguishable results, as compared to path tracing. We also show that our method can be applied to other problems involving a large number of (animated) point lights.
We present two novel unbiased techniques for sampling free paths in heterogeneous participating media. Our decomposition tracking accelerates free-path construction by splitting the medium into a control component and a residual component and sampling each of them separately. To minimize expensive evaluations of spatially varying collision coefficients, we define the control component to allow constructing free paths in closed form. The residual heterogeneous component is then homogenized by adding a fictitious medium and handled using weighted delta tracking, which removes the need for computing strict bounds of the extinction function. Our second contribution, spectral tracking, enables efficient light transport simulation in chromatic media. We modify free-path distributions to minimize the fluctuation of path throughputs and thereby reduce the estimation variance. To demonstrate the correctness of our algorithms, we derive them directly from the radiative transfer equation by extending the integral formulation of null-collision algorithms recently developed in reactor physics. This mathematical framework, which we thoroughly review, encompasses existing trackers and postulates an entire family of new estimators for solving transport problems; our algorithms are examples of such. We analyze the proposed methods in canonical settings and on production scenes, and compare to the current state of the art in simulating light transport in heterogeneous participating media.
We develop a theory of volumetric density estimation which generalizes prior photon point (0D) and beam (1D) approaches to a broader class of estimators using "nD" samples along photon and/or camera subpaths. Volumetric photon mapping performs density estimation by point sampling propagation distances within the medium and performing density estimation over the generated points (0D). Beam-based (1D) approaches consider the expected value of this distance sampling process along the last camera and/or light subpath segments. Our theory shows how to replace propagation distance sampling steps across multiple bounces to form higher-dimensional samples such as photon planes (2D), photon volumes (3D), their camera path equivalents, and beyond. We perform a theoretical error analysis which reveals that in scenarios where beams already outperform points, each additional dimension of nD samples compounds these benefits further. Moreover, each additional sample dimension reduces the required dimensionality of the blurring needed for density estimation, allowing us to formulate, for the first time, fully unbiased forms of volumetric photon mapping. We demonstrate practical implementations of several of the new estimators our theory predicts, including both biased and unbiased variants, and show that they outperform state-of-the-art beam-based volumetric photon mapping by a factor of 2.4--40×.
We present a framework for designing shapes from diverse combinatorial patterns, where the vertex 1-rings and the faces are as rotationally symmetric as possible, and define such meshes as regular. Our algorithm computes the geometry that brings out the symmetries encoded in the combinatorics. We then allow designers and artists to envision and realize original meshes with great aesthetic qualities. Our method is general and applicable to meshes of arbitrary topology and connectivity, from triangle meshes to general polygonal meshes. The designer controls the result by manipulating and constraining vertex positions. We offer a novel characterization of regularity, using quaternionic ratios of mesh edges, and optimize meshes to be as regular as possible according to this characterization. Finally, we provide a mathematical analysis of these regular meshes, and show how they relate to concepts like the discrete Willmore energy and connectivity shapes.
We propose a robust and efficient field-aligned volumetric meshing algorithm that produces hex-dominant meshes, i.e. meshes that are predominantly composed of hexahedral elements while containing a small number of irregular polyhedra. The latter are placed according to the singularities of two optimized guiding fields, which allow our method to generate meshes with an exceptionally high amount of isotropy.
The field design phase of our method relies on a compact quaternionic representation of volumetric octa-fields and a corresponding optimization that explicitly models the discrete matchings between neighboring elements. This optimization naturally supports alignment constraints and scales to very large datasets. We also propose a novel extraction technique that uses field-guided mesh simplification to convert the optimized fields into a hexdominant output mesh. Each simplification operation maintains topological validity as an invariant, ensuring manifold output. These steps easily generalize to other dimensions or representations, and we show how they can be an asset in existing 2D surface meshing techniques.
Our method can automatically and robustly convert any tetrahedral mesh into an isotropic hex-dominant mesh and (with minor modifications) can also convert any triangle mesh into a corresponding isotropic quad-dominant mesh, preserving its genus, number of holes, and manifoldness. We demonstrate the benefits of our algorithm on a large collection of shapes provided in the supplemental material along with all generated results.
This article introduces a method that generates a hexahedral-dominant mesh from an input tetrahedral mesh. It follows a three-step pipeline similar to the one proposed by Carrier Baudoin et al.: (1) generate a frame field, (2) generate a pointset P that is mostly organized on a regular grid locally aligned with the frame field, and (3) generate the hexahedral-dominant mesh by recombining the tetrahedra obtained from the constrained Delaunay triangulation of P.
For step (1), we use a state-of-the-art algorithm to generate a smooth frame field. For step (2), we introduce an extension of Periodic Global Parameterization to the volumetric case. As compared with other global parameterization methods (such as CubeCover), our method relaxes some global constraints to avoid creating degenerate elements, at the expense of introducing some singularities that are meshed using non-hexahedral elements. For step (3), we build on the formalism introduced by Meshkat and Talmor, fill in a gap in their proof, and provide a complete enumeration of all the possible recombinations, as well as an algorithm that efficiently detects all the matches in a tetrahedral mesh.
The method is evaluated and compared with the state of the art on a database of examples with various mesh complexities, varying from academic examples to real industrial cases. Compared with the method of Carrier-Baudoin et al., the method results in better scores for classical quality criteria of hexahedral-dominant meshes (hexahedral proportion, scaled Jacobian, etc.). The method also shows better robustness than CubeCover and its derivatives when applied to complicated industrial models.
The computation of smooth fields of orthogonal directions within a volume is a critical step in hexahedral mesh generation, used to guide placement of edges and singularities. While this problem shares high-level structure with surface-based frame field problems, critical aspects are lost when extending to volumes, while new structure from the flat Euclidean metric emerges. Taking these considerations into account, this article presents an algorithm for computing such “octahedral” fields. Unlike existing approaches, our formulation achieves infinite resolution in the interior of the volume via the boundary element method (BEM), continuously assigning frames to points in the interior from only a triangle mesh discretization of the boundary. The end result is an orthogonal direction field that can be sampled anywhere inside the mesh, with smooth variation and singular structure in the interior, even with a coarse boundary. We illustrate our computed frames on a number of challenging test geometries. Since the octahedral frame field problem is relatively new, we also contribute a thorough discussion of theoretical and practical challenges unique to this problem.
We present an approach to generate plausible acoustic effects at interactive rates in large dynamic environments containing many sound sources. Our formulation combines listener-based backward ray tracing with sound source clustering and hybrid audio rendering to handle complex scenes. We present a new algorithm for dynamic late reverberation that performs high-order ray tracing from the listener against spherical sound sources. We achieve sublinear scaling with the number of sources by clustering distant sound sources and taking relative visibility into account. We also describe a hybrid convolution-based audio rendering technique that can process hundreds of thousands of sound paths at interactive rates. We demonstrate the performance on many indoor and outdoor scenes with up to 200 sound sources. In practice, our algorithm can compute more than 50 reflection orders at interactive rates on a multicore PC, and we observe a 5x speedup over prior geometric sound propagation algorithms.
Sound generation methods, such as linear modal synthesis, can sonify a wide range of physics-based animation of solid objects, resolving vibrations and sound radiation from various structures. However, elastic rods are an important computer animation primitive for which prior sound synthesis methods, such as modal synthesis, are ill-suited for several reasons: large displacements, nonlinear vibrations, dispersion effects, and the geometrically singular nature of rods.
In this paper, we present physically based methods for simultaneous generation of animation and sound for deformable rods. We draw on Kirchhoff theory to simplify the representation of rod dynamics and introduce a generalized dipole model to calculate the spatially varying acoustic radiation. In doing so, we drastically decrease the amount of precomputation required (in some cases eliminating it completely), while being able to resolve sound radiation for arbitrary body deformations encountered in computer animation. We present several examples, including challenging scenes involving thousands of highly coupled frictional contacts.
We present a new integration algorithm for the accurate and efficient solution of stiff elastodynamic problems governed by the second-order ordinary differential equations of structural mechanics. Current methods have the shortcoming that their performance is highly dependent on the numerical stiffness of the underlying system that often leads to unrealistic behavior or a significant loss of efficiency. To overcome these limitations, we present a new integration method which is based on a mathematical reformulation of the underlying differential equations, an exponential treatment of the full nonlinear forcing operator as opposed to more standard partially implicit or exponential approaches, and the utilization of the concept of stiff accuracy which ensures that the efficiency of the simulations is significantly less sensitive to increased stiffness. As a consequence, we are able to tremendously accelerate the simulation of stiff systems compared to established integrators and significantly increase the overall accuracy. The advantageous behavior of this approach is demonstrated on a broad spectrum of complex examples like deformable bodies, textiles, bristles, and human hair. Our easily parallelizable integrator enables more complex and realistic models to be explored in visual computing without compromising efficiency.
We present a new method for real-time physics-based simulation supporting many different types of hyperelastic materials. Previous methods such as Position-Based or Projective Dynamics are fast but support only a limited selection of materials; even classical materials such as the Neo-Hookean elasticity are not supported. Recently, Xu et al. [2015] introduced new “spline-based materials” that can be easily controlled by artists to achieve desired animation effects. Simulation of these types of materials currently relies on Newton’s method, which is slow, even with only one iteration per timestep. In this article, we show that Projective Dynamics can be interpreted as a quasi-Newton method. This insight enables very efficient simulation of a large class of hyperelastic materials, including the Neo-Hookean, spline-based materials, and others. The quasi-Newton interpretation also allows us to leverage ideas from numerical optimization. In particular, we show that our solver can be further accelerated using L-BFGS updates (Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm). Our final method is typically more than 10 times faster than one iteration of Newton’s method without compromising quality. In fact, our result is often more accurate than the result obtained with one iteration of Newton’s method. Our method is also easier to implement, implying reduced software development costs.
Extraction of structural lines from pattern-rich manga is a crucial step for migrating legacy manga to digital domain. Unfortunately, it is very challenging to distinguish structural lines from arbitrary, highly-structured, and black-and-white screen patterns. In this paper, we present a novel data-driven approach to identify structural lines out of pattern-rich manga, with no assumption on the patterns. The method is based on convolutional neural networks. To suit our purpose, we propose a deep network model to handle the large variety of screen patterns and raise output accuracy. We also develop an efficient and effective way to generate a rich set of training data pairs. Our method suppresses arbitrary screen patterns no matter whether these patterns are regular, irregular, tone-varying, or even pictorial, and regardless of their scales. It outputs clear and smooth structural lines even if these lines are contaminated by and immersed in complex patterns. We have evaluated our method on a large number of mangas of various drawing styles. Our method substantially outperforms state-of-the-art methods in terms of visual quality. We also demonstrate its potential in various manga applications, including manga colorization, manga retargeting, and 2.5D manga generation.
Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.
We propose a deep learning approach for user-guided image colorization. The system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a Convolutional Neural Network (CNN). Rather than using hand-defined rules, the network propagates user edits by fusing low-level cues along with high-level semantic information, learned from large-scale data. We train on a million images, with simulated user inputs. To guide the user towards efficient input selection, the system recommends likely colors based on the input image and current user inputs. The colorization is performed in a single feed-forward pass, enabling real-time use. Even with randomly simulated user inputs, we show that the proposed system helps novice users quickly create realistic colorizations, and offers large improvements in colorization quality with just a minute of use. In addition, we demonstrate that the framework can incorporate other user "hints" to the desired colorization, showing an application to color histogram transfer.
We propose a new technique for visual attribute transfer across images that may have very different appearance but have perceptually similar semantic structure. By visual attribute transfer, we mean transfer of visual information (such as color, tone, texture, and style) from one image to another. For example, one image could be that of a painting or a sketch while the other is a photo of a real scene, and both depict the same type of scene.
Our technique finds semantically-meaningful dense correspondences between two input images. To accomplish this, it adapts the notion of "image analogy" [Hertzmann et al. 2001] with features extracted from a Deep Convolutional Neutral Network for matching; we call our technique deep image analogy. A coarse-to-fine strategy is used to compute the nearest-neighbor field for generating the results. We validate the effectiveness of our proposed method in a variety of cases, including style/texture transfer, color/style swap, sketch/painting to photo, and time lapse.
Deep compositing is an important practical tool in creating digital imagery, but there has been little theoretical analysis of the underlying mathematical operators. Motivated by finding a simple formulation of the merging operation on OpenEXR-style deep images, we show that the Porter-Duff over function is the operator of a Lie group. In its corresponding Lie algebra, the splitting and mixing functions that OpenEXR deep merging requires have a particularly simple form. Working in the Lie algebra, we present a novel, simple proof of the uniqueness of the mixing function.
The Lie group structure has many more applications, including new, correct resampling algorithms for volumetric images with alpha channels, and a deep image compression technique that outperforms that of OpenEXR.
In this article, we present a novel two-scale framework to optimize the structure and the material distribution of an object given its functional specifications. Our approach utilizes multi-material microstructures as low-level building blocks of the object. We start by precomputing the material property gamut—the set of bulk material properties that can be achieved with all material microstructures of a given size. We represent the boundary of this material property gamut using a level set field. Next, we propose an efficient and general topology optimization algorithm that simultaneously computes an optimal object topology and spatially varying material properties constrained by the precomputed gamut. Finally, we map the optimal spatially varying material properties onto the microstructures with the corresponding properties to generate a high-resolution printable structure. We demonstrate the efficacy of our framework by designing, optimizing, and fabricating objects in different material property spaces on the level of a trillion voxels, that is, several orders of magnitude higher than what can be achieved with current systems.
Additive manufacturing enables the fabrication of objects embedding meta-materials. By creating fine-scale structures, the object's physical properties can be graded (e.g. elasticity, porosity), even though a single base material is used for fabrication. Designing the fine and detailed geometry of a metamaterial while attempting to achieve specific properties is difficult. In addition, the structures are intended to fill comparatively large volumes, which quickly leads to large data structures and intractable simulation costs. Thus, most metamaterials are defined as periodic structures repeated in regular lattices. The periodicity simplifies modeling, simulation, and reduces memory costs - however it limits the possibility to smoothly grade properties along free directions.
In this work, we propose a novel metamaterial with controllable, freely orientable, orthotropic elastic behavior - orthotropy means that elasticity is controlled independently along three orthogonal axes, which leads to materials that better adapt to uneven, directional load scenarios, and offer a more versatile material design primitive. The fine-scale structures are generated procedurally by a stochastic process, and resemble a foam. The absence of global organization and periodicity allows the free gradation of density, orientation, and stretch, leading to the controllable orthotropic behavior. The procedural nature of the synthesis process allows it to scale to arbitrarily large volumes at low memory costs.
We detail the foam structure synthesis, analyze and discuss its properties through numerical and experimental verifications, and finally demonstrate the use of orthotropic materials for the design of 3D printed objects.
Additive fabrication technologies are limited by the types of material they can print: while the technologies are continuously improving, still only a relatively small discrete set of materials can be used in each printed object. At the same time, the low cost of introducing geometric complexity suggests the alternative of controlling the elastic material properties by producing microstructures, which can achieve behaviors significantly differing from the solid printing material. While promising results have been obtained in this direction, fragility is a significant problem blocking practical applications, especially for achieving soft material properties: due to stress concentrations at thin joints, deformations and repeated loadings are likely to cause fracture.
We present a set of methods to minimize stress concentrations in microstructures by evolving their shapes. First, we demonstrate that the worst-case stress analysis problem (maximizing a stress measure over all possible unit loads) has an exact solution for periodic microstructures. We develop a new, accurate discretization of the shape derivative for stress objectives and introduce a low-dimensional parametric shape model for microstructures. This model supports robust minimization of maximal stress (approximated by an Lp norm with high p) and an efficient implementation of printability constraints. In addition to significantly reducing stresses (by a typical factor of 5X), the new method substantially expands the range of effective material properties covered by the collection of structures.
The ability to fabricate surfaces with fine control over bidirectional reflectance (BRDF) is a long-standing goal in appearance research, with applications in product design and manufacturing. We propose a technique that embeds magnetic flakes in a photo-cured resin, allowing the orientation distribution of those flakes to be controlled at printing time using a magnetic field. We show that time-varying magnetic fields allow us to control off-specular lobe direction, anisotropy, and lobe width, while using multiple spatial masks displayed by a UV projector allows for spatial variation. We demonstrate optical effects including bump maps: fat surfaces with spatially-varying specular lobe direction.
Appearance reproduction is an important aspect of 3D printing. Current color reproduction systems use halftoning methods that create colors through a spatial combination of different inks at the object's surface. This introduces a variety of artifacts to the object, especially when viewed from a closer distance. In this work, we propose an alternative color reproduction method for 3D printing. Inspired by the inherent ability of 3D printers to layer different materials on top of each other, 3D color contoning creates colors by combining inks with various thicknesses inside the object's volume. Since inks are inside the volume, our technique results in a uniform color surface with virtually invisible spatial patterns on the surface. For color prediction, we introduce a simple and highly accurate spectral model that relies on a weighted regression of spectral absorptions. We fully characterize the proposed framework by addressing a number of problems, such as material arrangement, calculation of ink concentration, and 3D dot gain. We use a custom 3D printer to fabricate and validate our results.
Sketch-based modeling provides a powerful paradigm for geometric modeling. Recent research had shown, sketch based modeling methods are most effective when targeting a specific family of surfaces. A large and growing arsenal of sketching tools is available for different types of geometries and different target user populations. Our work augments this arsenal with a new and powerful tool for modeling complex freeform shapes by sketching sparse 2D strokes; our method complements existing approaches in enabling the generation of surfaces with complex curvature patterns that are challenging to produce with existing methods.
To model a desired surface patch with our technique, the user sketches the patch boundary as well as a small number of strokes representing the major bending directions of the shape. Our method uses this input to generate a curvature field that conforms to the user strokes and then uses this field to derive a freeform surface with the desired curvature pattern. To infer the surface from the strokes we first disambiguate the convex versus concave bending directions indicated by the strokes and estimate the surface bending magnitude along the strokes. We subsequently construct a curvature field based on these estimates, using a non-orthogonal 4-direction field coupled with a scalar magnitude field, and finally construct a surface whose curvature pattern reflects this field through an iterative sequence of simple linear optimizations.
Our framework is well suited for single-view modeling, but also supports multi-view interaction, necessary to model complex shapes portions of which can be occluded in many views. It effectively combines multi-view inputs to obtain a coherent 3D shape. It runs at interactive speed allowing for immediate user feedback. We demonstrate the effectiveness of the proposed method through a large collection of complex examples created by both artists and amateurs. Our framework provides a useful complement to the existing sketch-based modeling methods.
Face modeling has been paid much attention in the field of visual computing. There exist many scenarios, including cartoon characters, avatars for social media, 3D face caricatures as well as face-related art and design, where low-cost interactive face modeling is a popular approach especially among amateur users. In this paper, we propose a deep learning based sketching system for 3D face and caricature modeling. This system has a labor-efficient sketching interface, that allows the user to draw freehand imprecise yet expressive 2D lines representing the contours of facial features. A novel CNN based deep regression network is designed for inferring 3D face models from 2D sketches. Our network fuses both CNN and shape based features of the input sketch, and has two independent branches of fully connected layers generating independent subsets of coefficients for a bilinear face representation. Our system also supports gesture based interactions for users to further manipulate initial face models. Both user studies and numerical results indicate that our sketching system can help users create face models quickly and effectively. A significantly expanded face database with diverse identities, expressions and levels of exaggeration is constructed to promote further research and evaluation of face modeling techniques.
We present a novel approach to facilitate the creation of stylized 2D rigid body animations. Our approach can handle multiple rigid objects following complex physically-simulated trajectories with collisions, while retaining a unique artistic style directly specified by the user. Starting with an existing target animation (e.g., produced by a physical simulation engine) an artist interactively draws over a sparse set of frames, and the desired appearance and motion stylization is automatically propagated to the rest of the sequence. The stylization process may also be performed in an off-line batch process from a small set of drawn sequences. To achieve these goals, we combine parametric deformation synthesis that generalizes and reuses hand-drawn exemplars, with non-parametric techniques that enhance the hand-drawn appearance of the synthesized sequence. We demonstrate the potential of our method on various complex rigid body animations which are created with an expressive hand-drawn look using notably less manual interventions as compared to traditional techniques.
We introduce Skippy, a novel algorithm for 3D interactive curve modeling from a single view. While positing curves in space can be a tedious task, our rapid sketching algorithm allows users to draw curves in and around existing geometry in a controllable manner. The key insight behind our system is to automatically infer the 3D curve coordinates by enumerating a large set of potential curve trajectories. More specifically, we partition 2D strokes into continuous segments that land both on and off the geometry, duplicating segments that could be placed in front or behind, to form a directed graph. We use distance fields to estimate 3D coordinates for our curve segments and solve for an optimally smooth path that follows the curvature of the scene geometry while avoiding intersections. Using our curve design framework we present a collection of novel editing operations allowing artists to rapidly explore and refine the combinatorial space of solutions. Furthermore, we include the quick placement of transient geometry to aid in guiding the 3D curve. Finally we demonstrate our interactive design curve system on a variety of applications including geometric modeling, and camera motion path planning.
We present a method for constructing almost-everywhere curvature-continuous, piecewise-quadratic curves that interpolate a list of control points and have local maxima of curvature only at the control points. Our premise is that salient features of the curve should occur only at control points to avoid the creation of features unintended by the artist. While many artists prefer to use interpolated control points, the creation of artifacts, such as loops and cusps, away from control points has limited the use of these types of curves. By enforcing the maximum curvature property, loops and cusps cannot be created unless the artist intends for them to be.
To create such curves, we focus on piecewise quadratic curves, which can have only one maximum curvature point. We provide a simple, iterative optimization that creates quadratic curves, one per interior control point, that meet with G2 continuity everywhere except at inflection points of the curve where the curves are G1. Despite the nonlinear nature of curvature, our curves only obtain local maxima of the absolute value of curvature only at interpolated control points.
We present a system for efficiently editing video of dialogue-driven scenes. The input to our system is a standard film script and multiple video takes, each capturing a different camera framing or performance of the complete scene. Our system then automatically selects the most appropriate clip from one of the input takes, for each line of dialogue, based on a user-specified set of film-editing idioms. Our system starts by segmenting the input script into lines of dialogue and then splitting each input take into a sequence of clips time-aligned with each line. Next, it labels the script and the clips with high-level structural information (e.g., emotional sentiment of dialogue, camera framing of clip, etc.). After this pre-process, our interface offers a set of basic idioms that users can combine in a variety of ways to build custom editing styles. Our system encodes each basic idiom as a Hidden Markov Model that relates editing decisions to the labels extracted in the pre-process. For short scenes (< 2 minutes, 8--16 takes, 6--27 lines of dialogue) applying the user-specified combination of idioms to the pre-processed inputs generates an edited sequence in 2--3 seconds. We show that this is significantly faster than the hours of user time skilled editors typically require to produce such edits and that the quick feedback lets users iteratively explore the space of edit designs.
Time slice photography is a popular effect that visualizes the passing of time by aligning and stitching multiple images capturing the same scene at different times together into a single image. Extending this effect to video is a difficult problem, and one where existing solutions have only had limited success. In this paper, we propose an easy-to-use and robust system for creating time slice videos from a wide variety of consumer videos. The main technical challenge we address is how to align videos taken at different times with substantially different appearances, in the presence of moving objects and moving cameras with slightly different trajectories. To achieve a temporally stable alignment, we perform a mixed 2D-3D alignment, where a rough 3D reconstruction is used to generate sparse constraints that are integrated into a pixelwise 2D registration. We apply our method to a number of challenging scenarios, and show that we can achieve a higher quality registration than prior work. We propose a 3D user interface that allows the user to easily specify how multiple videos should be composited in space and time. Finally, we show that our alignment method can be applied in more general video editing and compositing tasks, such as object removal.
We propose a method for automated aerial videography in dynamic and cluttered environments. An online receding horizon optimization formulation facilitates the planning process for novices and experts alike. The algorithm takes high-level plans as input, which we dub virtual rails, alongside interactively defined aesthetic framing objectives and jointly solves for 3D quadcopter motion plans and associated velocities. The method generates control inputs subject to constraints of a non-linear quadrotor model and dynamic constraints imposed by actors moving in an a priori unknown way. The output plans are physically feasible, for the horizon length, and we apply the resulting control inputs directly at each time-step, without requiring a separate trajectory tracking algorithm. The online nature of the method enables incorporation of feedback into the planning and control loop, makes the algorithm robust to disturbances. Furthermore, we extend the method to include coordination between multiple drones to enable dynamic multi-view shots, typical for action sequences and live TV coverage. The algorithm runs in real-time on standard hardware and computes motion plans for several drones in the order of milliseconds. Finally, we evaluate the approach qualitatively with a number of challenging shots, involving multiple drones and actors and qualitatively characterize the computational performance experimentally.
Light field cameras have many advantages over traditional cameras, as they allow the user to change various camera settings after capture. However, capturing light fields requires a huge bandwidth to record the data: a modern light field camera can only take three images per second. This prevents current consumer light field cameras from capturing light field videos. Temporal interpolation at such extreme scale (10x, from 3 fps to 30 fps) is infeasible as too much information will be entirely missing between adjacent frames. Instead, we develop a hybrid imaging system, adding another standard video camera to capture the temporal information. Given a 3 fps light field sequence and a standard 30 fps 2D video, our system can then generate a full light field video at 30 fps. We adopt a learning-based approach, which can be decomposed into two steps: spatio-temporal flow estimation and appearance estimation. The flow estimation propagates the angular information from the light field sequence to the 2D video, so we can warp input images to the target view. The appearance estimation then combines these warped images to output the final pixels. The whole process is trained end-to-end using convolutional neural networks. Experimental results demonstrate that our algorithm outperforms current video interpolation methods, enabling consumer light field videography, and making applications such as refocusing and parallax view generation achievable on videos for the first time.
We present the first realistic, physically based, fully coupled, real-time weather design tool for use in urban procedural modeling. We merge designing of a 3D urban model with a controlled long-lasting spatiotemporal interactive simulation of weather. Starting from the fundamental dynamical equations similar to those used in state-of-the-art weather models, we present a novel simplified urban weather model for interactive graphics. Control of physically based weather phenomena is accomplished via an inverse modeling methodology. In our results, we present several scenarios of forward design, inverse design with high-level and detailed-level weather control and optimization, and comparisons of our method against well-known weather simulation results and systems.
We introduce a novel framework for interactive landscape authoring that supports bi-directional feedback between erosion and vegetation simulation. Vegetation and terrain erosion have strong mutual impact and their interplay influences the overall realism of virtual scenes. Despite their importance, these complex interactions have been neglected in computer graphics. Our framework overcomes this by simulating the effect of a variety of geomorphological agents and the mutual interaction between different material and vegetation layers, including rock, sand, humus, grass, shrubs, and trees. Users are able to exploit these interactions with an authoring interface that consistently shapes the terrain and populates it with details. Our method, validated through side-by-side comparison with real terrains, can be used not only to generate realistic static landscapes, but also to follow the temporal evolution of a landscape over a few centuries.
Botanical simulation plays an important role in many fields including visual effects, games and virtual reality. Previous plant simulation research has focused on computing physically based motion, under the assumption that the material properties are known. It is too tedious and impractical to manually set the spatially-varying material properties of complex trees. In this paper, we give a method to set the mass density, stiffness and damping properties of individual tree components (branches and leaves) using a small number of intuitive parameters. Our method is rooted in plant biomechanics literature and builds upon power laws observed in real botanical systems. We demonstrate our materials by simulating them using offline and model-reduced FEM simulators. Our parameters can be tuned directly by artists; but we also give a technique to infer the parameters from ground truth videos of real trees. Our materials produce tree animations that look much more similar to real trees than previous methods, as evidenced by our user study and experiments.
Large multi-agent systems such as crowds involve inter-agent interactions that are typically anticipatory in nature, depending strongly on both the positions and the velocities of agents. We show how the nonlinear, anticipatory forces seen in multi-agent systems can be made compatible with recent work on energy-based formulations in physics-based animation, and propose a simple and effective optimization-based integration scheme for implicit integration of such systems. We apply this approach to crowd simulation by using a state-of-the-art model derived from a recent analysis of human crowd data, and adapting it to our framework. Our approach provides, for the first time, guaranteed collision-free motion while simultaneously maintaining high-quality collective behavior in a way that is insensitive to simulation parameters such as time step size and crowd density. These benefits are demonstrated through simulation results on various challenging scenarios and validation against real-world crowd data.
Traditional Monte Carlo (MC) integration methods use point samples to numerically approximate the underlying integral. This approximation introduces variance in the integrated result, and this error can depend critically on the sampling patterns used during integration. Most of the well-known samplers used for MC integration in graphics---e.g. jittered, Latin-hypercube (N-rooks), multijittered---are anisotropic in nature. However, there are currently no tools available to analyze the impact of such anisotropic samplers on the variance convergence behavior of Monte Carlo integration. In this work, we develop a Fourier-domain mathematical tool to analyze the variance, and subsequently the convergence rate, of Monte Carlo integration using any arbitrary (anisotropic) sampling power spectrum. We also validate and leverage our theoretical analysis, demonstrating that judicious alignment of anisotropic sampling and integrand spectra can improve variance and convergence rates in MC rendering, and that similar improvements can apply to (anisotropic) deterministic samplers.
In this article, we present a multi-class blue noise sampling algorithm by throwing samples as the constrained Wasserstein barycenter of multiple density distributions. Using an entropic regularization term, a constrained transport plan in the optimal transport problem is provided to break the partition required by the previous Capacity-Constrained Voronoi Tessellation method. The entropic regularization term cannot only control spatial regularity of blue noise sampling, but it also reduces conflicts between the desired centroids of Vornoi cells for multi-class sampling. Moreover, the adaptive blue noise property is guaranteed for each individual class, as well as their combined class. Our method can be easily extended to multi-class sampling on a point set surface. We also demonstrate applications in object distribution and color stippling.
We present a framework to distribute point samples with controlled spectral properties using a regular lattice of tiles with a single sample per tile. We employ a word-based identification scheme to identify individual tiles in the lattice. Our scheme is recursive, permitting tiles to be subdivided into smaller tiles that use the same set of IDs. The corresponding framework offers a very simple setup for optimization towards different spectral properties. Small lookup tables are sufficient to store all the information needed to produce different point sets. For blue noise with varying densities, we employ the bit-reversal principle to recursively traverse sub-tiles. Our framework is also capable of delivering multi-class blue noise samples. It is well-suited for different sampling scenarios in rendering, including area-light sampling (uniform and adaptive), and importance sampling. Other applications include stippling and distributing objects.
We introduce a novel parameterization for spherical distributions that is based on a point located inside the sphere, which we call a pivot. The pivot serves as the center of a straight-line projection that maps solid angles onto the opposite side of the sphere. By transforming spherical distributions in this way, we derive novel parametric spherical distributions that can be evaluated and importance-sampled from the original distributions using simple, closed-form expressions. Moreover, we prove that if the original distribution can be sampled and/or integrated over a spherical cap, then so can the transformed distribution. We exploit the properties of our parameterization to derive efficient spherical lighting techniques for both real-time and offline rendering. Our techniques are robust, fast, easy to implement, and achieve quality that is superior to previous work.
We present an algorithmically efficient and parallelized domain decomposition based approach to solving Poisson’s equation on irregular domains. Our technique employs the Schur complement method, which permits a high degree of parallel efficiency on multicore systems. We create a novel Schur complement preconditioner which achieves faster convergence, and requires less computation time and memory. This domain decomposition method allows us to apply different linear solvers for different regions of the flow. Subdomains with regular boundaries can be solved with an FFT-based Fast Poisson Solver. We can solve systems with 1,0243 degrees of freedom, and demonstrate its use for the pressure projection step of incompressible liquid and gas simulations. The results demonstrate considerable speedup over preconditioned conjugate gradient methods commonly employed to solve such problems, including a multigrid preconditioned conjugate gradient method.
We present an efficient and scalable octree-inspired fluid simulation framework with the flexibility to leverage adaptivity in any part of the computational domain, even when resolution transitions reach the free surface. Our methodology ensures symmetry, definiteness and second order accuracy of the discrete Poisson operator, and eliminates numerical and visual artifacts of prior octree schemes. This is achieved by adapting the operators acting on the octree's simulation variables to reflect the structure and connectivity of a power diagram, which recovers primal-dual mesh orthogonality and eliminates problematic T-junction configurations. We show how such operators can be efficiently implemented using a pyramid of sparsely populated uniform grids, enhancing the regularity of operations and facilitating parallelization. A novel scheme is proposed for encoding the topology of the power diagram in the neighborhood of each octree cell, allowing us to locally reconstruct it on the fly via a lookup table, rather than resorting to costly explicit meshing. The pressure Poisson equation is solved via a highly efficient, matrix-free multigrid preconditioner for Conjugate Gradient, adapted to the power diagram discretization. We use another sparsely populated uniform grid for high resolution interface tracking with a narrow band level set representation. Using the recently introduced SPGrid data structure, sparse uniform grids in both the power diagram discretization and our narrow band level set can be compactly stored and efficiently updated via streaming operations. Additionally, we present enhancements to adaptive level set advection, velocity extrapolation, and the fast marching method for redistancing. Our overall framework gracefully accommodates the task of dynamically adapting the octree topology during simulation. We demonstrate end-to-end simulations of complex adaptive flows in irregularly shaped domains, with tens of millions of degrees of freedom.
In flow visualization, vortex extraction is a long-standing and unsolved problem. For decades, scientists developed numerous definitions that characterize vortex regions and their corelines in different ways, but none emerged as ultimate solution. One reason is that almost all techniques have a fundamental weakness: they are not invariant under changes of the reference frame, i.e., they are not objective. This has two severe implications: First, the result depends on the movement of the observer, and second, they cannot track vortices that are moving on arbitrary paths, which limits their reliability and usefulness in practice. Objective measures are rare, but recently gained more attention in the literature. Instead of only introducing a new objective measure, we show in this paper how all existing measures that are based on velocity and its derivatives can be made objective. We achieve this by observing the vector field in optimal local reference frames, in which the temporal derivative of the flow vanishes, i.e., reference frames in which the flow appears steady. The central contribution of our paper is to show that these optimal local reference frames can be found by a simple and elegant linear optimization. We prove that in the optimal frame, all local vortex extraction methods that are based on velocity and its derivatives become objective. We demonstrate our approach with objective counterparts to λ2, vorticity and Sujudi-Haimes.
Clebsch maps encode velocity fields through functions. These functions contain valuable information about the velocity field. For example, closed integral curves of the associated vorticity field are level lines of the vorticity Clebsch map. This makes Clebsch maps useful for visualization and fluid dynamics analysis. Additionally they can be used in the context of simulations to enhance flows through the introduction of subgrid vorticity. In this paper we study spherical Clebsch maps, which are particularly attractive. Elucidating their geometric structure, we show that such maps can be found as minimizers of a non-linear Dirichlet energy. To illustrate our approach we use a number of benchmark problems and apply it to numerically given flow fields. Code and a video can be found in the ACM Digital Library.
This paper proposes a novel framework to evaluate fluid simulation methods based on crowd-sourced user studies in order to robustly gather large numbers of opinions. The key idea for a robust and reliable evaluation is to use a reference video from a carefully selected real-world setup in the user study. By conducting a series of controlled user studies and comparing their evaluation results, we observe various factors that affect the perceptual evaluation. Our data show that the availability of a reference video makes the evaluation consistent. We introduce this approach for computing scores of simulation methods as visual accuracy metric. As an application of the proposed framework, a variety of popular simulation methods are evaluated.
This article addresses the relighting of outdoor and large indoor scenes illuminated by nondistant lights, which has seldom been discussed in previous works. We propose a method for users to interactively edit the illumination of a scene by moving existing lights and inserting synthetic lights into the scene that requires only a small amount of user annotation and a single low-dynamic range (LDR) image. We achieve this by adopting a top-down approach that estimates the scene reflectance by fitting a diffuse illumination model to a photograph. This approach gains stability and robustness by estimating the camera, scene geometry, and light sources in sequence and by using a confidence map, which is a per-pixel weight map. The results of our evaluation demonstrates that the proposed method can estimate a scene accurately enough for realistic relighting of images. Moreover, the experimental results of our user studies show that the synthesized images are so realistic as to be almost indistinguishable from real photographs.
Light-field cameras offer new imaging possibilities compared to conventional digital cameras. However, the additional angular domain of light fields prohibits direct application of frequently used image processing algorithms, such as warping, retargeting, or stitching. We present a general and efficient framework for nonuniform light-field warping, which forms the basis for extending many of these image processing techniques to light fields. It propagates arbitrary spatial deformations defined in one light-field perspective consistently to all other perspectives by means of 4D patch matching instead of relying on explicit depth reconstruction. This allows processing light-field recordings of complex scenes with non-Lambertian properties such as transparency and refraction. We show application examples of our framework in panorama light-field imaging, light-field retargeting, and artistic manipulation of light fields.
Producing a high dynamic range (HDR) image from a set of images with different exposures is a challenging process for dynamic scenes. A category of existing techniques first register the input images to a reference image and then merge the aligned images into an HDR image. However, the artifacts of the registration usually appear as ghosting and tearing in the final HDR images. In this paper, we propose a learning-based approach to address this problem for dynamic scenes. We use a convolutional neural network (CNN) as our learning model and present and compare three different system architectures to model the HDR merge process. Furthermore, we create a large dataset of input LDR images and their corresponding ground truth HDR images to train our system. We demonstrate the performance of our system by producing high-quality HDR images from a set of three LDR images. Experimental results show that our method consistently produces better results than several state-of-the-art approaches on challenging scenes.
We present an image downscaling technique capable of appropriately representing high-frequency structured patterns. Our method breaks conventional wisdom in sampling theory---instead of discarding high-frequency information to avoid aliasing, it controls aliasing by remapping such information to the representable range of the downsampled spectrum. The resulting images provide more faithful representations of their original counterparts, retaining visually-important details that would otherwise be lost. Our technique can be used with any resampling method and works for both natural and synthetic images. We demonstrate its effectiveness on a large number of images downscaled in combination with various resampling strategies. By providing an alternative solution for a long-standing problem, our method opens up new possibilities for image processing.
Lighting is a critical element of portrait photography. However, good lighting design typically requires complex equipment and significant time and expertise. Our work simplifies this task using a relighting technique that transfers the desired illumination of one portrait onto another. The novelty in our approach to this challenging problem is our formulation of relighting as a mass transport problem. We start from standard color histogram matching that only captures the overall tone of the illumination, and we show how to use the mass-transport formulation to make it dependent on facial geometry. We fit a three-dimensional (3D) morphable face model to the portrait, and for each pixel, we combine the color value with the corresponding 3D position and normal. We then solve a mass-transport problem in this augmented space to generate a color remapping that achieves localized, geometry-aware relighting. Our technique is robust to variations in facial appearance and small errors in face reconstruction. As we demonstrate, this allows our technique to handle a variety of portraits and illumination conditions, including scenarios that are challenging for previous methods.
Multi-contact locomotion that uses both the hands and feet in a complex environment remains a challenging problem in computer animation. To address this problem, we present a contact graph, which is a motion graph augmented by learned feasibility predictors, namely contact spaces and an occupancy estimator, for a motion clip in each graph node. By estimating the feasibilities of candidate contact points that can be reached by modifying a motion clip, the predictors allow us to find contact points that are likely to be valid and natural before attempting to generate the actual motion for the contact points. The contact graph thus enables the efficient generation of multi-contact motion in two steps: planning contact points to the goal and then generating the whole-body motion. We demonstrate the effectiveness of our method by creating several climbing motions in complex and cluttered environments by using only a small number of motion samples.
Determining effective control strategies and solutions for high-degree-of-freedom humanoid characters has been a difficult, ongoing problem. A controller is only valid for a subset of the states of the character, known as the domain of attraction (DOA). This article shows how many states that are initially outside the DOA can be brought inside it. Our first contribution is to show how DOA expansion can be performed for a high-dimensional simulated character. Our second contribution is to present an algorithm that efficiently increases the DOA using random trees that provide denser coverage than the trees produced by typical sampling-based motion-planning algorithms. The trees are constructed offline but can be queried fast enough for near-real-time control. We show the effect of DOA expansion on getting up, crouch-to-stand, jumping, and standing-twist controllers. We also show how DOA expansion can be used to connect controllers together.
Designing a unified framework for simulating a broad variety of human behaviors has proven to be challenging. In this article, we present an approach for control system design that can generate animations of a diverse set of behaviors including walking, running, and a variety of gymnastic behaviors. We achieve this generalization with a balancing strategy that relies on a new form of inverted pendulum model (IPM), which we call the momentum-mapped IPM (MMIPM). We analyze reference motion capture data in a pre-processing step to extract the motion of the MMIPM. To compute a new motion, the controller plans a desired motion, frame by frame, based on the current pendulum state and a predicted pendulum trajectory. By tracking this time-varying trajectory, the controller creates a character that dynamically balances, changes speed, makes turns, jumps, and performs gymnastic maneuvers.
Stereoscopic 3D (S3D) movies have become widely popular in the movie theaters, but the adoption of S3D at home is low even though most TV sets support S3D. It is widely believed that S3D with glasses is not the right approach for the home. A much more appealing approach is to use automulti-scopic displays that provide a glasses-free 3D experience to multiple viewers. A technical challenge is the lack of native multiview content that is required to deliver a proper view of the scene for every viewpoint. Our approach takes advantage of the abundance of stereoscopic 3D movies. We propose a real-time system that can convert stereoscopic video to a high-quality multiview video that can be directly fed to automultiscopic displays. Our algorithm uses a wavelet-based decomposition of stereoscopic images with per-wavelet disparity estimation. A key to our solution lies in combining Lagrangian and Eulerian approaches for both the disparity estimation and novel view synthesis, which leverages the complementary advantages of both techniques. The solution preserves all the features of Eulerian methods, e.g., subpixel accuracy, high performance, robustness to ambiguous depth cases, and easy integration of inter-view aliasing while maintaining the advantages of Lagrangian approaches, e.g., robustness to large disparities and possibility of performing non-trivial disparity manipulations through both view extrapolation and interpolation. The method achieves real-time performance on current GPUs. Its design also enables an easy hardware implementation that is demonstrated using a field-programmable gate array. We analyze the visual quality and robustness of our technique on a number of synthetic and real-world examples. We also perform a user experiment which demonstrates benefits of the technique when compared to existing solutions.
When a conventional stereoscopic display is viewed without stereo glasses, image blurs, or 'ghosts', are visible due to the fusion of stereo image pairs. This artifact severely degrades 2D image quality, making it difficult to simultaneously present clear 2D and 3D contents. To overcome this limitation (backward incompatibility), here we propose a novel method to synthesize ghost-free stereoscopic images. Our method gives binocular disparity to a 2D image, and drives human binocular disparity detectors, by the addition of a quadrature-phase pattern that induces spatial subband phase shifts. The disparity-inducer patterns added to the left and right images are identical except for the contrast polarity. Physical fusion of the two images cancels out the disparity-inducer components and makes only the original 2D pattern visible to viewers without glasses. Unlike previous solutions, our method perfectly excludes stereo ghosts without using special hardware. A simple algorithm can transform 3D contents from the conventional stereo format into ours. Furthermore, our method can alter the depth impression of a real object without its being noticed by naked-eye viewers by means of light projection of the disparity-inducer components onto the object's surface. Psychophysical evaluations have confirmed the practical utility of our method.
A number of consumer-grade spherical cameras have recently appeared, enabling affordable monoscopic VR content creation in the form of full 360° X 180° spherical panoramic photos and videos. While monoscopic content is certainly engaging, it fails to leverage a main aspect of VR HMDs, namely stereoscopic display. Recent stereoscopic capture rigs involve placing many cameras in a ring and synthesizing an omni-directional stereo panorama enabling a user to look around to explore the scene in stereo. In this work, we describe a method that takes images from two 360° spherical cameras and synthesizes an omni-directional stereo panorama with stereo in all directions. Our proposed method has a lower equipment cost than camera-ring alternatives, can be assembled with currently available off-the-shelf equipment, and is relatively small and light-weight compared to the alternatives. We validate our method by generating both stills and videos. We have conducted a user study to better understand what kinds of geometric processing are necessary for a pleasant viewing experience. We also discuss several algorithmic variations, each with their own time and quality trade-offs.
Increasing resolution and dynamic range of digital color displays is challenging with designs confined by cost and power specifications. This necessitates modern displays to trade-off spatial and temporal resolution for color reproduction capability. In this work we explore the idea of joint hardware and algorithm design to balance such trade-offs. We introduce a system that uses content-adaptive and compressive factorizations to reproduce colors. Each target frame is factorized into two products of high-resolution monochromatic and low-resolution color images, which then get integrated through temporal or spatial multiplexing. As our framework minimizes the error in colorimetric space, the perceived color rendition is high, and thanks to GPU acceleration, the results are generated in real-time. We evaluate our system with a LCD prototype that uses LED backlight array and temporal multiplexing to reproduce color images. Our approach enables high effective resolution and dynamic range without increasing power consumption. We also demonstrate low-cost extensions to hyperspectral and light-field imaging, which are possible due to compressive nature of our system.
We present a novel method to enrich standard rigid-body impact models with a spatially varying coefficient of restitution map, or Bounce Map. Even state-of-the art methods in computer graphics assume that for a single rigid body, post- and pre-impact dynamics are related with a single global, constant, namely the coefficient of restitution. We first demonstrate that this assumption is highly inaccurate, even for simple objects. We then present a technique to efficiently and automatically generate a function which maps locations on the object's surface along with impact normals, to a scalar coefficient of restitution value. Furthermore, we propose a method for two-body restitution analysis, and, based on numerical experiments, estimate a practical model for combining one-body Bounce Map values to approximate the two-body coefficient of restitution. We show that our method not only improves accuracy, but also enables visually richer rigid-body simulations.
Iterative algorithms are frequently used to resolve simultaneous impacts between rigid bodies in physical simulations. However, these algorithms lack formal guarantees of termination, which is sometimes viewed as potentially dangerous, so failsafes are used in practical codes to prevent infinite loops. We show such steps are unnecessary. In particular, we study the broad class of such algorithms that are conservative and satisfy a minimal set of physical correctness properties, and which encompasses recent methods like Generalized Reflections as well as pairwise schemes. We fully characterize finite termination of these algorithms. The only possible failure cases can be detected, and we describe a procedure for modifying the algorithms to provably ensure termination. We also describe modifications necessary to guarantee termination in the presence of numerical error due to the use of floating-point arithmetic. Finally, we discuss the challenges dissipation introduce for finite termination, and describe how dissipation models can be incorporated while retaining the termination guarantee.
This article presents a new version of the Gilbert-Johnson-Keerthi (GJK) algorithm that circumvents the shortcomings introduced by degenerate geometries. The original Johnson algorithm and Backup procedure are replaced by a distance subalgorithm that is faster and accurate to machine precision, thus guiding the GJK algorithm toward a shorter search path in less computing time. Numerical tests demonstrate that this effectively is a more robust procedure. In particular, when the objects are found in contact, the newly proposed subalgorithm runs from 15% to 30% times faster than the original one. The improved performance has a significant impact on various applications, such as real-time simulations and collision avoidance systems. Altogether, the main contributions made to the GJK algorithm are faster convergence rate and reduced computational time. These improvements may be easily added into existing implementations; furthermore, engineering applications that require solutions of distance queries to machine precision can now be tackled using the GJK algorithm.
The typical elastic surface or curve simulation method takes a Lagrangian approach and consists of three components: time integration, collision detection and collision response. The Lagrangian view is beneficial because it naturally allows for tracking of the codimensional manifold, however collision must then be detected and resolved separately. Eulerian methods are promising alternatives because collision processing is automatic and while this is effective for volumetric objects, advection of a codimensional manifold is too inaccurate in practice. We propose a novel hybrid Lagrangian/Eulerian approach that preserves the best aspects of both views. Similar to the Drucker-Prager and Mohr-Coulomb models for granular materials, we define our collision response with a novel elastoplastic constitutive model. To achieve this, we design an anisotropic hyperelastic constitutive model that separately characterizes the response to manifold strain as well as shearing and compression in the directions orthogonal to the manifold. We discretize the model with the Material Point Method and a novel codimensional Lagrangian/Eulerian update of the deformation gradient. Collision intensive scenarios with millions of degrees of freedom require only a few minutes per frame and examples with up to one million degrees of freedom run in less than thirty seconds per frame.
We present a novel physics-based approach to facial animation. Contrary to commonly used generative methods, our solution computes facial expressions by minimizing a set of non-linear potential energies that model the physical interaction of passive flesh, active muscles, and rigid bone structures. By integrating collision and contact handling into the simulation, our algorithm avoids inconsistent poses commonly observed in generative methods such as blendshape rigs. A novel muscle activation model leads to a robust optimization that faithfully reproduces complex facial articulations. We show how person-specific simulation models can be built from a few expression scans with a minimal data acquisition process and an almost entirely automated processing pipeline. Our method supports temporal dynamics due to inertia or external forces, incorporates skin sliding to avoid unnatural stretching, and offers full control of the simulation parameters, which enables a variety of advanced animation effects. For example, slimming or fattening the face is achieved by simply scaling the volume of the soft tissue elements. We show a series of application demos, including artistic editing of the animation model, simulation of corrective facial surgery, or dynamic interaction with external forces and objects.
While facial capturing focuses on accurate reconstruction of an actor's performance, facial animation retargeting has the goal to transfer the animation to another character, such that the semantic meaning of the animation remains. Because of the popularity of blendshape animation, this effectively means to compute suitable blendshape weights for the given target character. Current methods either require manually created examples of matching expressions of actor and target character, or are limited to characters with similar facial proportions (i.e., realistic models). In contrast, our approach can automatically retarget facial animations from a real actor to stylized characters. We formulate the problem of transferring the blendshapes of a facial rig to an actor as a special case of manifold alignment, by exploring the similarities of the motion spaces defined by the blendshapes and by an expressive training sequence of the actor. In addition, we incorporate a simple, yet elegant facial prior based on discrete differential properties to guarantee smooth mesh deformation. Our method requires only sparse correspondences between characters and is thus suitable for retargeting marker-less and marker-based motion capture as well as animation transfer between virtual characters.
We introduce a novel approach to example-based stylization of portrait videos that preserves both the subject's identity and the visual richness of the input style exemplar. Unlike the current state-of-the-art based on neural style transfer [Selim et al. 2016], our method performs non-parametric texture synthesis that retains more of the local textural details of the artistic exemplar and does not suffer from image warping artifacts caused by aligning the style exemplar with the target face. Our method allows the creation of videos with less than full temporal coherence [Ruder et al. 2016]. By introducing a controllable amount of temporal dynamics, it more closely approximates the appearance of real hand-painted animation in which every frame was created independently. We demonstrate the practical utility of the proposed solution on a variety of style exemplars and target videos.
We introduce a novel four-view image-based hair modeling method. Given four hair images taken from the front, back, left and right views as input, we first estimate the rough 3D shape of the hair observed in the input using a predefined database of 3D hair models, then synthesize a hair texture on the surface of the shape, from which the hair growing direction information is calculated and used to construct a 3D direction field in the hair volume. Finally, we grow hair strands from the scalp, following the direction field, to produce the 3D hair model, which closely resembles the hair in all input images. Our method does not require that all input images are from the same hair, enabling an effective way to create compelling hair models from images of considerably different hairstyles at different views. We demonstrate the efficacy of our method using a wide range of examples.
Computer Aided Design (CAD) is a multi-billion dollar industry used by almost every mechanical engineer in the world to create practically every existing manufactured shape. CAD models are not only widely available but also extremely useful in the growing field of fabrication-oriented design because they are parametric by construction and capture the engineer's design intent, including manufacturability. Harnessing this data, however, is challenging, because generating the geometry for a given parameter value requires time-consuming computations. Furthermore, the resulting meshes have different combinatorics, making the mesh data inherently discontinuous with respect to parameter adjustments. In our work, we address these challenges and develop tools that allow interactive exploration and optimization of parametric CAD data. To achieve interactive rates, we use precomputation on an adaptively sampled grid and propose a novel scheme for interpolating in this domain where each sample is a mesh with different combinatorics. Specifically, we extract partial correspondences from CAD representations for local mesh morphing and propose a novel interpolation method for adaptive grids that is both continuous/smooth and local (i.e., the influence of each sample is constrained to the local regions where mesh morphing can be computed). We show examples of how our method can be used to interactively visualize and optimize objects with a variety of physical properties.
High-quality hand-made furniture often employs intrinsic joints that geometrically interlock along mating surfaces. Such joints increase the structural integrity of the furniture and add to its visual appeal. We present an interactive tool for designing such intrinsic joints. Users draw the visual appearance of the joints on the surface of an input furniture model as groups of two-dimensional (2D) regions that must belong to the same part. Our tool automatically partitions the furniture model into a set of solid 3D parts that conform to the user-specified 2D regions and assemble into the furniture. If the input does not merit assemblable solid 3D parts, then our tool reports the failure and suggests options for redesigning the 2D surface regions so that they are assemblable. Similarly, if any parts in the resulting assembly are unstable, then our tool suggests where additional 2D regions should be drawn to better interlock the parts and improve stability. To perform this stability analysis, we introduce a novel variational static analysis method that addresses shortcomings of the equilibrium method for our task. Specifically, our method correctly detects sliding instabilities and reports the locations and directions of sliding and hinging failures. We show that our tool can be used to generate over 100 joints inspired by traditional woodworking and Japanese joinery. We also design and fabricate nine complete furniture assemblies that are stable and connected using only the intrinsic joints produced by our tool.
We introduce a lightweight structure optimization approach for problems in which there is uncertainty in the force locations. Such uncertainty may arise due to force contact locations that change during use or are simply unknown a priori. Given an input 3D model, regions on its boundary where arbitrary normal forces may make contact, and a total force-magnitude budget, our algorithm generates a minimum weight 3D structure that withstands any force configuration capped by the budget. Our approach works by repeatedly finding the most critical force configuration and altering the internal structure accordingly. A key issue, however, is that the critical force configuration changes as the structure evolves, resulting in a significant computational challenge. To address this, we propose an efficient critical instant analysis approach. Combined with a reduced order formulation, our method provides a practical solution to the structural optimization problem. We demonstrate our method on a variety of models and validate it with mechanical tests.
We study the design and optimization of statically sound and materially efficient space structures constructed by connected beams. We propose a systematic computational framework for the design of space structures that incorporates static soundness, approximation of reference surfaces, boundary alignment, and geometric regularity. To tackle this challenging problem, we first jointly optimize node positions and connectivity through a nonlinear continuous optimization algorithm. Next, with fixed nodes and connectivity, we formulate the assignment of beam cross sections as a mixed-integer programming problem with a bilinear objective function and quadratic constraints. We solve this problem with a novel and practical alternating direction method based on linear programming relaxation. The capability and efficiency of the algorithms and the computational framework are validated by a variety of examples and comparisons.