Path guiding is a promising technique to reduce the variance of path tracing. Although existing online path guiding algorithms can eventually learn good sampling distributions given a large amount of time and samples, the speed of learning becomes a major bottleneck. In this paper, we accelerate the learning of sampling distributions by training a light-weight neural network offline to reconstruct from sparse samples. Uniquely, we design our neural network to directly operate convolutions on a sparse quadtree, which regresses a high-quality hierarchical sampling distribution. Our approach can reconstruct reasonably accurate sampling distributions faster, allowing for efficient path guiding and rendering. In contrast to the recent offline neural path guiding techniques that reconstruct low-resolution 2D images for sampling, our novel hierarchical framework enables more fine-grained directional sampling with less memory usage, effectively advancing the practicality and efficiency of neural path guiding. In addition, we take advantage of hybrid bidirectional samples including both path samples and photons, as we have found this more robust to different light transport scenarios compared to using only one type of sample as in previous work. Experiments on diverse testing scenes demonstrate that our approach often improves rendering results with better visual quality and lower errors. Our framework can also provide the proper balance of speed, memory cost, and robustness.
We present a real-time neural radiance caching method for path-traced global illumination. Our system is designed to handle fully dynamic scenes, and makes no assumptions about the lighting, geometry, and materials. The data-driven nature of our approach sidesteps many difficulties of caching algorithms, such as locating, interpolating, and updating cache points. Since pretraining neural networks to handle novel, dynamic scenes is a formidable generalization challenge, we do away with pretraining and instead achieve generalization via adaptation, i.e. we opt for training the radiance cache while rendering. We employ self-training to provide low-noise training targets and simulate infinite-bounce transport by merely iterating few-bounce training updates. The updates and cache queries incur a mild overhead---about 2.6ms on full HD resolution---thanks to a streaming implementation of the neural network that fully exploits modern hardware. We demonstrate significant noise reduction at the cost of little induced bias, and report state-of-the-art, real-time performance on a number of challenging scenarios.
High-quality denoising of Monte Carlo low-sample renderings remains a critical challenge for practical interactive ray tracing. We present a new learning-based denoiser that achieves state-of-the-art quality and runs at interactive rates. Our model processes individual path-traced samples with a lightweight neural network to extract per-pixel feature vectors. The rest of our pipeline operates in pixel space. We define a novel pairwise affinity over the features in a pixel neighborhood, from which we assemble dilated spatial kernels to filter the noisy radiance. Our denoiser is temporally stable thanks to two mechanisms. First, we keep a running average of the noisy radiance and intermediate features, using a per-pixel recursive filter with learned weights. Second, we use a small temporal kernel based on the pairwise affinity between features of consecutive frames. Our experiments show our new affinities lead to higher quality outputs than techniques with comparable computational costs, and better high-frequency details than kernel-predicting approaches. Our model matches or outperfoms state-of-the-art offline denoisers in the low-sample count regime (2--8 samples per pixel), and runs at interactive frame rates at 1080p resolution.
Image-space auxiliary features such as surface normal have significantly contributed to the recent success of Monte Carlo (MC) reconstruction networks. However, path-space features, another essential piece of light propagation, have not yet been sufficiently explored. Due to the curse of dimensionality, information flow between a regression loss and high-dimensional path-space features is sparse, leading to difficult training and inefficient usage of path-space features in a typical reconstruction framework. This paper introduces a contrastive manifold learning framework to utilize path-space features effectively. The proposed framework employs weakly-supervised learning that converts reference pixel colors to dense pseudo labels for light paths. A convolutional path-embedding network then induces a low-dimensional manifold of paths by iteratively clustering intra-class embeddings, while discriminating inter-class embeddings using gradient descent. The proposed framework facilitates path-space exploration of reconstruction networks by extracting low-dimensional yet meaningful embeddings within the features. We apply our framework to the recent image- and sample-space models and demonstrate considerable improvements, especially on the sample space. The source code is available at https://github.com/Mephisto405/WCMC.
We present Bistable Auxetic Surface Structures, a novel deployable material system based on optimized bistable auxetic cells. Such a structure can be flat-fabricated from elastic sheet material, then deployed towards a desired double-curved target shape by activating the bistable mechanism of its component cells. A unique feature is that the deployed model is by design in a stable state. This facilitates deployment without the need of complex external supports or boundary constraints.
We introduce a computational solution for the inverse design of our Bistable Auxetic Surface Structures. Our algorithm first precomputes a library of bistable auxetic cells to cover a range of in-plane expansion / contraction ratios, while maximizing the bistability and stiffness of the cell to ensure robust deployment. We then use metric distortion analysis of the target surface to compute the planar fabrication state as a composition of cells that best matches the desired deployment deformation. As each cell expands or contracts during deployment, metric frustration forces the surface towards its target equilibrium state. We validate our method with several physical prototypes.
We present a computational inverse design method for a new class of surface-based inflatable structure. Our deployable structures are fabricated by fusing together two layers of inextensible sheet material along carefully selected curves. The fusing curves form a network of tubular channels that can be inflated with air or other fluids. When fully inflated, the initially flat surface assumes a programmed double-curved shape and becomes stiff and load-bearing. We present a method that solves for the layout of air channels that, when inflated, best approximate a given input design. For this purpose, we integrate a forward simulation method for inflation with a gradient-based optimization algorithm that continuously adapts the geometry of the air channels to improve the design objectives. To initialize this non-linear optimization, we propose a novel surface flattening algorithm. When a channel is inflated, it approximately maintains its length, but contracts transversally to its main direction. Our algorithm approximates this deformation behavior by computing a mapping from the 3D design surface to the plane that allows for anisotropic metric scaling within the bounds realizable by the physical system. We show a wide variety of inflatable designs and fabricate several prototypes to validate our approach and highlight potential applications.
We propose a novel method to model and fabricate shapes using a small set of specified discrete equivalence classes of triangles. The core of our modeling technique is a fabrication-error-driven remeshing algorithm. Given a triangle and a template triangle, which are coplanar and have one-to-one corresponding vertices, we define their similarity error from a manufacturing point of view as follows: the minimizer of the maximum of the three distances between the corresponding pair of vertices concerning a rigid transformation. To compute the similarity error, we convert it into an easy-to-compute form. Then, a greedy remeshing method is developed to optimize the topology and geometry of the input mesh to minimize the fabrication error defined as the maximum similarity error of all triangles. Besides, constraints are enforced to ensure the similarity between input and output shapes and the smoothness of the resulting shapes. Since the fabrication error has been considered during the modeling process, the fabrication process is easy to proceed. To assist users in performing fabrication using common materials and tools manually, we present a straightforward manufacturing solution. The feasibility and practicability of our method are demonstrated over various examples, including seven physical manufacturing models with only nine template triangles.
We solve the task of representing free forms by an arrangement of panels that are manufacturable by precise isometric bending of surfaces made from a small number of molds. In fact we manage to solve the paneling task with surfaces of constant Gaussian curvature alone. This includes the case of developable surfaces which exhibit zero curvature. Our computations are based on an existing discrete model of isometric mappings between surfaces which for this occasion has been refined to obtain higher numerical accuracy. Further topics are interesting connections of the paneling problem with the geometry of Killing vector fields, designing and actuating isometries, curved folding in the double-curved case, and quad meshes with rigid faces that are nevertheless flexible.
We propose a novel system for portrait relighting and background replacement, which maintains high-frequency boundary details and accurately synthesizes the subject's appearance as lit by novel illumination, thereby producing realistic composite images for any desired scene. Our technique includes foreground estimation via alpha matting, relighting, and compositing. We demonstrate that each of these stages can be tackled in a sequential pipeline without the use of priors (e.g. known background or known illumination) and with no specialized acquisition techniques, using only a single RGB portrait image and a novel, target HDR lighting environment as inputs. We train our model using relit portraits of subjects captured in a light stage computational illumination system, which records multiple lighting conditions, high quality geometry, and accurate alpha mattes. To perform realistic relighting for compositing, we introduce a novel per-pixel lighting representation in a deep learning framework, which explicitly models the diffuse and the specular components of appearance, producing relit portraits with convincingly rendered non-Lambertian effects like specular highlights. Multiple experiments and comparisons show the effectiveness of the proposed approach when applied to in-the-wild images.
Photorealistic editing of head portraits is a challenging task as humans are very sensitive to inconsistencies in faces. We present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination (parameterised with an environment map) in a portrait image. This requires our method to capture and control the full reflectance field of the person in the image. Most editing approaches rely on supervised learning using training data captured with setups such as light and camera stages. Such datasets are expensive to acquire, not readily available and do not capture all the rich variations of in-the-wild portrait images. In addition, most supervised approaches only focus on relighting, and do not allow camera viewpoint editing. Thus, they only capture and control a subset of the reflectance field. Recently, portrait editing has been demonstrated by operating in the generative model space of StyleGAN. While such approaches do not require direct supervision, there is a significant loss of quality when compared to the supervised approaches. In this paper, we present a method which learns from limited supervised training data. The training images only include people in a fixed neutral expression with eyes closed, without much hair or background variations. Each person is captured under 150 one-light-at-a-time conditions and under 8 camera poses. Instead of training directly in the image space, we design a supervised problem which learns transformations in the latent space of StyleGAN. This combines the best of supervised learning and generative adversarial modeling. We show that the StyleGAN prior allows for generalisation to different expressions, hairstyles and backgrounds. This produces high-quality photorealistic results for in-the-wild images and significantly outperforms existing methods. Our approach can edit the illumination and pose simultaneously, and runs at interactive rates.
The task of age transformation illustrates the change of an individual's appearance over time. Accurately modeling this complex transformation over an input facial image is extremely challenging as it requires making convincing, possibly large changes to facial features and head shape, while still preserving the input identity. In this work, we present an image-to-image translation method that learns to directly encode real facial images into the latent space of a pre-trained unconditional GAN (e.g., StyleGAN) subject to a given aging shift. We employ a pre-trained age regression network to explicitly guide the encoder in generating the latent codes corresponding to the desired age. In this formulation, our method approaches the continuous aging process as a regression task between the input age and desired target age, providing fine-grained control over the generated image. Moreover, unlike approaches that operate solely in the latent space using a prior on the path controlling age, our method learns a more disentangled, non-linear path. Finally, we demonstrate that the end-to-end nature of our approach, coupled with the rich semantic latent space of StyleGAN, allows for further editing of the generated images. Qualitative and quantitative evaluations show the advantages of our method compared to state-of-the-art approaches. Code is available at our project page: https://yuval-alaluf.github.io/SAM.
Facial structure editing of portrait images is challenging given the facial variety, the lack of ground-truth, the necessity of jointly adjusting color and shape, and the requirement of no visual artifacts. In this paper, we investigate how to perform chin editing as a case study of editing facial structures. We present a novel method that can automatically remove the double chin effect in portrait images. Our core idea is to train a fine classification boundary in the latent space of the portrait images. This can be used to edit the chin appearance by manipulating the latent code of the input portrait image while preserving the original portrait features. To achieve such a fine separation boundary, we employ a carefully designed training stage based on latent codes of paired synthetic images with and without a double chin. In the testing stage, our method can automatically handle portrait images with only a refinement to subtle misalignment before and after double chin editing. Our model enables alteration to the neck region of the input portrait image while keeping other regions unchanged, and guarantees the rationality of neck structure and the consistency of facial characteristics. To the best of our knowledge, this presents the first effort towards an effective application for editing double chins. We validate the efficacy and efficiency of our approach through extensive experiments and user studies.
Virtual and augmented reality (VR/AR) displays strive to provide a resolution, framerate and field of view that matches the perceptual capabilities of the human visual system, all while constrained by limited compute budgets and transmission bandwidths of wearable computing systems. Foveated graphics techniques have emerged that could achieve these goals by exploiting the falloff of spatial acuity in the periphery of the visual field. However, considerably less attention has been given to temporal aspects of human vision, which also vary across the retina. This is in part due to limitations of current eccentricity-dependent models of the visual system. We introduce a new model, experimentally measuring and computationally fitting eccentricity-dependent critical flicker fusion thresholds jointly for both space and time. In this way, our model is unique in enabling the prediction of temporal information that is imperceptible for a certain spatial frequency, eccentricity, and range of luminance levels. We validate our model with an image quality user study, and use it to predict potential bandwidth savings 7X higher than those afforded by current spatial-only foveated models. As such, this work forms the enabling foundation for new temporally foveated graphics techniques.
To peripheral vision, a pair of physically different images can look the same. Such pairs are metamers relative to each other, just as physically-different spectra of light are perceived as the same color. We propose a real-time method to compute such ventral metamers for foveated rendering where, in particular for near-eye displays, the largest part of the framebuffer maps to the periphery. This improves in quality over state-of-the-art foveation methods which blur the periphery. Work in Vision Science has established how peripheral stimuli are ventral metamers if their statistics are similar. Existing methods, however, require a costly optimization process to find such metamers. To this end, we propose a novel type of statistics particularly well-suited for practical real-time rendering: smooth moments of steerable filter responses. These can be extracted from images in time constant in the number of pixels and in parallel over all pixels using a GPU. Further, we show that they can be compressed effectively and transmitted at low bandwidth. Finally, computing realizations of those statistics can again be performed in constant time and in parallel. This enables a new level of quality for foveated applications such as such as remote rendering, level-of-detail and Monte-Carlo denoising. In a user study, we finally show how human task performance increases and foveation artifacts are less suspicious, when using our method compared to common blurring.
FovVideoVDP is a video difference metric that models the spatial, temporal, and peripheral aspects of perception. While many other metrics are available, our work provides the first practical treatment of these three central aspects of vision simultaneously. The complex interplay between spatial and temporal sensitivity across retinal locations is especially important for displays that cover a large field-of-view, such as Virtual and Augmented Reality displays, and associated methods, such as foveated rendering. Our metric is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking. It accounts for physical specification of the display (luminance, size, resolution) and viewing distance. To validate the metric, we collected a novel foveated rendering dataset which captures quality degradation due to sampling and reconstruction. To demonstrate our algorithm's generality, we test it on 3 independent foveated video datasets, and on a large image quality dataset, achieving the best performance across all datasets when compared to the state-of-the-art.
When creating freeform drawings, artists routinely employ clusters of overdrawn strokes to convey intended, aggregate curves. The ability to algorithmically fit these intended curves to their corresponding clusters is central to many applications that use artist drawings as inputs. However, while human observers effortlessly envision the intended curves given stroke clusters as input, existing fitting algorithms lack robustness and frequently fail when presented with input stroke clusters with non-trivial geometry or topology. We present StrokeStrip, a new and robust method for fitting intended curves to vector-format stroke clusters. Our method generates fitting outputs consistent with viewer expectations across a vast range of input stroke cluster configurations. We observe that viewers perceive stroke clusters as continuous, varying-width strips whose paths are described by the intended curves. An arc length parameterization of these strips defines a natural mapping from a strip to its path. We recast the curve fitting problem as one of parameterizing the cluster strokes using a joint 1D parameterization that is the restriction of the natural arc length parameterization of this strip to the strokes in the cluster. We simultaneously compute the joint cluster parameterization and implicitly reconstruct the a priori unknown strip geometry by solving a variational problem using a discrete-continuous optimization framework. We use this parameterization to compute parametric aggregate curves whose shape reflects the geometric properties of the cluster strokes at the corresponding isovalues. We demonstrate StrokeStrip outputs to be significantly better aligned with observer preferences compared to those of prior art; in a perceptual study, viewers preferred our fitting outputs by a factor of 12:1 compared to alternatives. We further validate our algorithmic choices via a range of ablation studies; extend our framework to raster data; and illustrate applications that benefit from the parameterizations produced.
Vector line art plays an important role in graphic design, however, it is tedious to manually create. We introduce a general framework to produce line drawings from a wide variety of images, by learning a mapping from raster image space to vector image space. Our approach is based on a recurrent neural network that draws the lines one by one. A differentiable rasterization module allows for training with only supervised raster data. We use a dynamic window around a virtual pen while drawing lines, implemented with a proposed aligned cropping and differentiable pasting modules. Furthermore, we develop a stroke regularization loss that encourages the model to use fewer and longer strokes to simplify the resulting vector image. Ablation studies and comparisons with existing methods corroborate the efficiency of our approach which is able to generate visually better results in less computation time, while generalizing better to a diversity of images and applications.
Non-photorealistic rendering (NPR) and image processing algorithms are widely assumed as a proxy for drawing. However, this assumption is not well assessed due to the difficulty in collecting and registering freehand drawings. Alternatively, tracings are easier to collect and register, but there is no quantitative evaluation of tracing as a proxy for freehand drawing. In this paper, we compare tracing, freehand drawing, and computer-generated drawing approximation (CGDA) to understand their similarities and differences. We collected a dataset of 1,498 tracings and freehand drawings by 110 participants for 100 image prompts. Our drawings are registered to the prompts and include vector-based timestamped strokes collected via stylus input. Comparing tracing and freehand drawing, we found a high degree of similarity in stroke placement and types of strokes used over time. We show that tracing can serve as a viable proxy for freehand drawing because of similar correlations between spatio-temporal stroke features and labeled stroke types. Comparing hand-drawn content and current CGDA output, we found that 60% of drawn pixels corresponded to computer-generated pixels on average. The overlap tended to be commonly drawn content, but people's artistic choices and temporal tendencies remained largely uncaptured. We present an initial analysis to inform new CGDA algorithms and drawing applications, and provide the dataset for use by the community.
We present a novel representation of solid models for shape design. Like Constructive Solid Geometry (CSG), the solid shape is constructed from a set of halfspaces without the need for an explicit boundary structure. Instead of using Boolean expressions as in CSG, the shape is defined by sparsely placed samples on the boundary of each halfspace. This representation, called Boundary-Sampled Halfspaces (BSH), affords greater agility and expressiveness than CSG while simplifying the reverse engineering process. We discuss theoretical properties of the representation and present practical algorithms for boundary extraction and conversion from other representations. Our algorithms are demonstrated on both 2D and 3D examples.
Parametric computer-aided design (CAD) is a standard paradigm used to design manufactured objects, where a 3D shape is represented as a program supported by the CAD software. Despite the pervasiveness of parametric CAD and a growing interest from the research community, currently there does not exist a dataset of realistic CAD models in a concise programmatic form. In this paper we present the Fusion 360 Gallery, consisting of a simple language with just the sketch and extrude modeling operations, and a dataset of 8,625 human design sequences expressed in this language. We also present an interactive environment called the Fusion 360 Gym, which exposes the sequential construction of a CAD program as a Markov decision process, making it amendable to machine learning approaches. As a use case for our dataset and environment, we define the CAD reconstruction task of recovering a CAD program from a target geometry. We report results of applying state-of-the-art methods of program synthesis with neurally guided search on this task.
Given a solid 3D shape and a trajectory of it over time, we compute its swept volume - the union of all points contained within the shape at some moment in time. We consider the representation of the input and output as implicit functions, and lift the problem to 4D spacetime, where we show the problem gains a continuous structure which avoids expensive global searches. We exploit this structure via a continuation method which marches and reconstructs the zero level set of the swept volume, using the temporal dimension to avoid erroneous solutions. We show that, compared to other methods, our approach is not restricted to a limited class of shapes or trajectories, is extremely robust, and its asymptotic complexity is an order lower than standards used in the industry, enabling its use in applications such as modeling, constructive solid geometry, and path planning.
Online reconstruction based on RGB-D sequences has thus far been restrained to relatively slow camera motions (<1m/s). Under very fast camera motion (e.g., 3m/s), the reconstruction can easily crumble even for the state-of-the-art methods. Fast motion brings two challenges to depth fusion: 1) the high nonlinearity of camera pose optimization due to large inter-frame rotations and 2) the lack of reliably trackable features due to motion blur. We propose to tackle the difficulties of fast-motion camera tracking in the absence of inertial measurements using random optimization, in particular, the Particle Filter Optimization (PFO). To surmount the computation-intensive particle sampling and update in standard PFO, we propose to accelerate the randomized search via updating a particle swarm template (PST). PST is a set of particles pre-sampled uniformly within the unit sphere in the 6D space of camera pose. Through moving and rescaling the pre-sampled PST guided by swarm intelligence, our method is able to drive tens of thousands of particles to locate and cover a good local optimum extremely fast and robustly. The particles, representing candidate poses, are evaluated with a fitness function defined based on depth-model conformance. Therefore, our method, being depth-only and correspondence-free, mitigates the motion blur impediment as (ToF-based) depths are often resilient to motion blur. Thanks to the efficient template-based particle set evolution and the effective fitness function, our method attains good quality pose tracking under fast camera motion (up to 4m/s) in a realtime framerate without including loop closure or global pose optimization. Through extensive evaluations on public datasets of RGB-D sequences, especially on a newly proposed benchmark of fast camera motion, we demonstrate the significant advantage of our method over the state of the arts.
Complex luminaires, such as grand chandeliers, can be extremely costly to render because the light-emitting sources are typically encased in complex refractive geometry, creating difficult light paths that require many samples to evaluate with Monte Carlo approaches. Previous work has attempted to speed up this process, but the methods are either inaccurate, require the storage of very large lightfields, and/or do not fit well into modern path-tracing frameworks. Inspired by the success of deep networks, which can model complex relationships robustly and be evaluated efficiently, we propose to use a machine learning framework to compress a complex luminaire's lightfield into an implicit neural representation. Our approach can easily plug into conventional renderers, as it works with the standard techniques of path tracing and multiple importance sampling (MIS). Our solution is to train three networks to perform the essential operations for evaluating the complex luminaire at a specific point and view direction, importance sampling a point on the luminaire given a shading location, and blending to determine the transparency of luminaire queries to properly composite them with other scene elements. We perform favorably relative to state-of-the-art approaches and render final images that are close to the high-sample-count reference with only a fraction of the computation and storage costs, with no need to store the original luminaire geometry and materials.
Neural representations have emerged as a new paradigm for applications in rendering, imaging, geometric modeling, and simulation. Compared to traditional representations such as meshes, point clouds, or volumes they can be flexibly incorporated into differentiable learning-based pipelines. While recent improvements to neural representations now make it possible to represent signals with fine details at moderate resolutions (e.g., for images and 3D shapes), adequately representing large-scale or complex scenes has proven a challenge. Current neural representations fail to accurately represent images at resolutions greater than a megapixel or 3D scenes with more than a few hundred thousand polygons. Here, we introduce a new hybrid implicit-explicit network architecture and training strategy that adaptively allocates resources during training and inference based on the local complexity of a signal of interest. Our approach uses a multiscale block-coordinate decomposition, similar to a quadtree or octree, that is optimized during training. The network architecture operates in two stages: using the bulk of the network parameters, a coordinate encoder generates a feature grid in a single forward pass. Then, hundreds or thousands of samples within each block can be efficiently evaluated using a lightweight feature decoder. With this hybrid implicit-explicit network architecture, we demonstrate the first experiments that fit gigapixel images to nearly 40 dB peak signal-to-noise ratio. Notably this represents an increase in scale of over 1000X compared to the resolution of previously demonstrated image-fitting experiments. Moreover, our approach is able to represent 3D shapes significantly faster and better than previous techniques; it reduces training times from days to hours or minutes and memory requirements by over an order of magnitude.
Real-time rendering and animation of humans is a core function in games, movies, and telepresence applications. Existing methods have a number of drawbacks we aim to address with our work. Triangle meshes have difficulty modeling thin structures like hair, volumetric representations like Neural Volumes are too low-resolution given a reasonable memory budget, and high-resolution implicit representations like Neural Radiance Fields are too slow for use in real-time applications. We present Mixture of Volumetric Primitives (MVP), a representation for rendering dynamic 3D content that combines the completeness of volumetric representations with the efficiency of primitive-based rendering, e.g., point-based or mesh-based methods. Our approach achieves this by leveraging spatially shared computation with a convolutional architecture and by minimizing computation in empty regions of space with volumetric primitives that can move to cover only occupied regions. Our parameterization supports the integration of correspondence and tracking constraints, while being robust to areas where classical tracking fails, such as around thin or translucent structures and areas with large topological variability. MVP is a hybrid that generalizes both volumetric and primitive-based representations. Through a series of extensive experiments we demonstrate that it inherits the strengths of each, while avoiding many of their limitations. We also compare our approach to several state-of-the-art methods and demonstrate that MVP produces superior results in terms of quality and runtime performance.
This paper proposes a novel scalable image-based rendering (IBR) pipeline for indoor scenes with reflections. We make substantial progress towards three sub-problems in IBR, namely, depth and reflection reconstruction, view selection for temporally coherent view-warping, and smooth rendering refinements. First, we introduce a global-mesh-guided alternating optimization algorithm that robustly extracts a two-layer geometric representation. The front and back layers encode the RGB-D reconstruction and the reflection reconstruction, respectively. This representation minimizes the image composition error under novel views, enabling accurate renderings of reflections. Second, we introduce a novel approach to select adjacent views and compute blending weights for smooth and temporal coherent renderings. The third contribution is a supersampling network with a motion vector rectification module that refines the rendering results to improve the final output's temporal coherence. These three contributions together lead to a novel system that produces highly realistic rendering results with various reflections. The rendering quality outperforms state-of-the-art IBR or neural rendering algorithms considerably.
Time-correlated imaging is an emerging sensing modality that has been shown to enable promising application scenarios, including lidar ranging, fluorescence lifetime imaging, and even non-line-of-sight sensing. A leading technology for obtaining time-correlated light measurements are single-photon avalanche diodes (SPADs), which are extremely sensitive and capable of temporal resolution on the order of tens of picoseconds. However, the rare and expensive optical setups used by researchers have so far prohibited these novel sensing techniques from entering the mass market. Fortunately, SPADs also exist in a radically cheaper and more power-efficient version that has been widely deployed as proximity sensors in mobile devices for almost a decade. These commodity SPAD sensors can be obtained at a mere few cents per detector pixel. However, their inferior data quality and severe technical drawbacks compared to their high-end counterparts necessitate the use of additional optics and suitable processing algorithms. In this paper, we adopt an existing evaluation platform for commodity SPAD sensors, and modify it to unlock time-of-flight (ToF) histogramming and hence computational imaging. Based on this platform, we develop and demonstrate a family of hardware/software systems that, for the first time, implement applications that had so far been limited to significantly more advanced, higher-priced setups: direct ToF depth imaging, non-line-of-sight object tracking, and material classification.
In this paper, we present a new computational pipeline for designing and fabricating 4D garments as knitwear that considers comfort during body movement. This is achieved by careful control of elasticity distribution to reduce uncomfortable pressure and unwanted sliding caused by body motion. We exploit the ability to knit patterns in different elastic levels by single-jersey jacquard (SJJ) with two yarns. We design the distribution of elasticity for a garment by physics-based computation, the optimized elasticity on the garment is then converted into instructions for a digital knitting machine by two algorithms proposed in this paper. Specifically, a graph-based algorithm is proposed to generate knittable stitch meshes that can accurately capture the 3D shape of a garment, and a tiling algorithm is employed to assign SJJ patterns on the stitch mesh to realize the designed distribution of elasticity. The effectiveness of our approach is verified on simulation results and on specimens physically fabricated by knitting machines.
We present a novel workflow to design and program knitted garments for industrial whole-garment knitting machines. Inspired by traditional garment making based on cutting and sewing, we propose a sketch representation with additional annotations necessary to model the knitting process. Our system bypasses complex editing operations in 3D space, which allows us to achieve interactive editing of both the garment shape and its underlying time process. We provide control of the local knitting direction, the location of important course interfaces, as well as the placement of stitch irregularities that form seams in the final garment. After solving for the constrained knitting time process, the garment sketches are automatically segmented into a minimal set of simple regions that can be knitted using simple knitting procedures. Finally, our system optimizes a stitch graph hierarchically while providing control over the tradeoff between accuracy and simplicity. We showcase different garments created with our web interface.
In this work, we introduce KnitKit, a flexible and customizable system for the computational design and production of functional, multi-material, and three-dimensional knitted textiles. Our system greatly simplifies the knitting of 3D objects with complex, varying patterns that use multiple yarns and stitch patterns by separating the high-level design specification in terms of geometry, stitch patterns, materials or colors from the low-level, machine-specific knitting instruction generation. Starting from a triangular 3D mesh and a 2D texture that specifies knitting patterns on top of the geometry, our system generates the required machine instructions in three major steps. First, the input is processed and the KnitNet data structure is generated. This graph structure serves as an abstract interface between the high-level geometric and knitting configuration and the low-level, machine-specific knitting instructions. Second, a graph rewriting procedure is applied on the KnitNet that produces a sequence of abstract machine actions. Finally, the low-level machine instructions are generated by adapting those abstract actions to a specific machine context. We showcase the potential of this computational approach by designing and fabricating a variety of objects with complex geometries, multiple yarns, and multiple stitch patterns.
Foundation paper piecing is a popular technique for constructing fabric patchwork quilts using printed paper patterns. But, the construction process imposes constraints on the geometry of the pattern and the order in which the fabric pieces are attached to the quilt. Manually designing foundation paper pieceable patterns that meet all of these constraints is challenging. In this work we mathematically formalize the foundation paper piecing process and use this formalization to develop an algorithm that can automatically check if an input pattern geometry is foundation paper pieceable. Our key insight is that we can represent the geometric pattern design using a certain type of dual hypergraph where nodes represent faces and hyperedges represent seams connecting two or more nodes. We show that determining whether the pattern is paper pieceable is equivalent to checking whether this hypergraph is acyclic, and if it is acyclic, we can apply a leaf-plucking algorithm to the hypergraph to generate viable sewing orders for the pattern geometry. We implement this algorithm in a design tool that allows quilt designers to focus on producing the geometric design of their pattern and let the tool handle the tedious task of determining whether the pattern is foundation paper pieceable.
We introduce a selected set of protocols inspired from the Soft Matter Physics community in order to validate Computer Graphics simulators of slender elastic structures possibly subject to dry frictional contact. Although these simulators were primarily intended for feature film animation and visual effects, they are more and more used as virtual design tools for predicting the shape and deformation of real objects; hence the need for a careful, quantitative validation. Our tests, experimentally verified, are designed to evaluate carefully the predictability of these simulators on various aspects, such as bending elasticity, bend-twist coupling, and frictional contact. We have passed a number of popular codes of Computer Graphics through our benchmarks by defining a rigorous, consistent, and as fair as possible methodology. Our results show that while some popular simulators for plates/shells and frictional contact fail even on the simplest scenarios, more recent ones, as well as well-known codes for rods, generally perform well and sometimes even better than some reference commercial tools of Mechanical Engineering. To make our validation protocols easily applicable to any simulator, we provide an extensive description of our methodology, and we shall distribute all the necessary model data to be compared against.
Fiducial markers have been broadly used to identify objects or embed messages that can be detected by a camera. Primarily, existing detection methods assume that markers are printed on ideally planar surfaces. The size of a message or identification code is limited by the spatial resolution of binary patterns in a marker. Markers often fail to be recognized due to various imaging artifacts of optical/perspective distortion and motion blur. To overcome these limitations, we propose a novel deformable fiducial marker system that consists of three main parts: First, a fiducial marker generator creates a set of free-form color patterns to encode significantly large-scale information in unique visual codes. Second, a differentiable image simulator creates a training dataset of photorealistic scene images with the deformed markers, being rendered during optimization in a differentiable manner. The rendered images include realistic shading with specular reflection, optical distortion, defocus and motion blur, color alteration, imaging noise, and shape deformation of markers. Lastly, a trained marker detector seeks the regions of interest and recognizes multiple marker patterns simultaneously via inverse deformation transformation. The deformable marker creator and detector networks are jointly optimized via the differentiable photorealistic renderer in an end-to-end manner, allowing us to robustly recognize a wide range of deformable markers with high accuracy. Our deformable marker system is capable of decoding 36-bit messages successfully at ~29 fps with severe shape deformation. Results validate that our system significantly outperforms the traditional and data-driven marker methods. Our learning-based marker system opens up new interesting applications of fiducial markers, including cost-effective motion capture of the human body, active 3D scanning using our fiducial markers' array as structured light patterns, and robust augmented reality rendering of virtual objects on dynamic surfaces.
This paper provides a new avenue for exploiting deep neural networks to improve physics-based simulation. Specifically, we integrate the classic Lagrangian mechanics with a deep autoencoder to accelerate elastic simulation of deformable solids. Due to the inertia effect, the dynamic equilibrium cannot be established without evaluating the second-order derivatives of the deep autoencoder network. This is beyond the capability of off-the-shelf automatic differentiation packages and algorithms, which mainly focus on the gradient evaluation. Solving the nonlinear force equilibrium is even more challenging if the standard Newton's method is to be used. This is because we need to compute a third-order derivative of the network to obtain the variational Hessian. We attack those difficulties by exploiting complex-step finite difference, coupled with reverse automatic differentiation. This strategy allows us to enjoy the convenience and accuracy of complex-step finite difference and in the meantime, to deploy complex-value perturbations as collectively as possible to save excessive network passes. With a GPU-based implementation, we are able to wield deep autoencoders (e.g., 10+ layers) with a relatively high-dimension latent space in real-time. Along this pipeline, we also design a sampling network and a weighting network to enable weight-varying Cubature integration in order to incorporate nonlinearity in the model reduction. We believe this work will inspire and benefit future research efforts in nonlinearly reduced physical simulation problems.
We introduce a new method for direct physics-based animation of volumetric curved models, represented using NURBS surfaces. Our technical contribution is the Shape Matching Element Method (SEM). SEM is a completely meshless algorithm, the first to simultaneously be robust to gaps and overlaps in geometry, be compatible with standard constitutive models and time integration schemes, support contact and frictional interactions and to preserve feature correspondence during simulation which enables editable simulated output. We demonstrate the efficacy of our algorithm by producing compelling physics-based animations from a variety of curved input models.
Median filters are a widely-used tool in graphics, imaging, machine learning, visual effects, and even audio processing. Currently, very-small-support median filters are performed using sorting networks, and large-support median filters are handled by O(1) histogram-based methods. However, the constant factor on these O(1) algorithms is large, and they scale poorly to data types above 8-bit integers. On the other hand, good sorting networks have not been described above the 7 X 7 case, leaving us with no fast way to compute integer median filters of modest size, and no fast way to compute floating point median filters for any size above 7 X 7.
This paper describes new sorting networks that efficiently compute median filters of arbitrary size. The key idea is that these networks can be factored to exploit the separability of the sorting problem - they share common work across scanlines, and within small tiles of output. We also describe new ways to run sorting networks efficiently, using a sorting-specific instruction set, compiler, and interpreter.
The speed-up over prior work is more than an order of magnitude for a wide range of data types and filter sizes. For 8-bit integers, we describe the fastest median filters for all sizes up to 25 X 25 on CPU, and up to 33 X 33 on GPU. For higher-precision types, we describe the fastest median filters at all sizes tested on both CPU and GPU.
Imaging systems have long been designed in separated steps: experience-driven optical design followed by sophisticated image processing. Although recent advances in computational imaging aim to bridge the gap in an end-to-end fashion, the image formation models used in these approaches have been quite simplistic, built either on simple wave optics models such as Fourier transform, or on similar paraxial models. Such models only support the optimization of a single lens surface, which limits the achievable image quality.
To overcome these challenges, we propose a general end-to-end complex lens design framework enabled by a differentiable ray tracing image formation model. Specifically, our model relies on the differentiable ray tracing rendering engine to render optical images in the full field by taking into account all on/off-axis aberrations governed by the theory of geometric optics. Our design pipeline can jointly optimize the lens module and the image reconstruction network for a specific imaging task. We demonstrate the effectiveness of the proposed method on two typical applications, including large field-of-view imaging and extended depth-of-field imaging. Both simulation and experimental results show superior image quality compared with conventional lens designs. Our framework offers a competitive alternative for the design of modern imaging systems.
Direct Delta Mush (DDM) is a high-quality, direct skinning method with a low setup cost. However, its storage and run-time computing cost are relatively high for two reasons: its skinning weights are 4 X 4 matrices instead of scalars like other direct skinning methods, and its computation requires one 3 X 3 Singular Value Decomposition per vertex.
In this paper, we introduce a compression method that takes a DDM model and splits it into two layers: the first layer is a smaller DDM model that computes a set of virtual bone transformations and the second layer is a Linear Blend Skinning model that computes per-vertex transformations from the output of the first layer. The two-layer model can approximate the deformation of the original DDM model with significantly lower costs.
Our main contribution is a novel problem formulation for the DDM compression based on a continuous example-based technique, in which we minimize the compression error on an uncountable set of example poses. This formulation provides an elegant metric for the compression error and simplifies the problem to the common linear matrix factorization. Our formulation also takes into account the skeleton hierarchy of the model, the bind pose, and the range of motions. In addition, we propose a new update rule to optimize DDM weights of the first layer and a modification to resolve the floating-point cancellation issue of DDM.
We propose quasi-harmonic weights for interpolating geometric data, which are orders of magnitude faster to compute than state-of-the-art. Currently, interpolation (or, skinning) weights are obtained by solving large-scale constrained optimization problems with explicit constraints to suppress oscillative patterns, yielding smooth weights only after a substantial amount of computation time. As an alternative, our weights are obtained as minima of an unconstrained problem that can be optimized quickly using straightforward numerical techniques. We consider weights that can be obtained as solutions to a parameterized family of second-order elliptic partial differential equations. By leveraging the maximum principle and careful parameterization, we pose weight computation as an inverse problem of recovering optimal anisotropic diffusivity tensors. In addition, we provide a customized ADAM solver that significantly reduces the number of gradient steps; our solver only requires inverting tens of linear systems that share the same sparsity pattern. Overall, our approach achieves orders of magnitude acceleration compared to previous methods, allowing weight computation in near real-time.
We present a highly efficient method for interactive volumetric meshless shape deformation. Our method operates within a low dimensional sub-space of shape-aware C∞ harmonic maps, and is the first method that is guaranteed to produce a smooth locally injective deformation in 3D. Unlike mesh-based methods in which local injectivity is enforced on tetrahedral elements, our method enforces injectivity on a sparse set of domain samples. The main difficulty is then to certify the map as locally injective throughout the entire domain. This is done by utilizing the Lipschitz continuity property of the harmonic basis functions. We show a surprising relation between the Lipschitz constant of the smallest singular value of the map Jacobian and the norm of the Hessian. We further carefully derive a Lipschitz constant for the Hessian, and develop a sufficient condition for the injectivity certification. This is done by utilizing the special structure of the harmonic basis functions combined with a novel regularization term that pushes the Lipschitz constants further down. As a result, the injectivity analysis can be performed on a relatively sparse set of samples. Combined with a parallel GPU-based implementation, our method can produce superior deformations with unique quality guarantees at real-time rates which were possible only in 2D so far.
We extend recent advances in the numerical time-integration of contacting elastodynamics [Li et al. 2020] to build a new framework, called Injective Deformation Processing (IDP), for the robust solution of a wide range of mesh deformation problems requiring injectivity. IDP solves challenging 3D (and 2D) geometry processing and animation tasks on meshes, via artificial time integration, with guarantees of both non-inversion and non-overlap. To our knowledge IDP is the first framework for 3D deformation processing that can efficiently guarantee globally injective deformation without geometric locking. We demonstrate its application on a diverse set of problems and show its significant improvement over state-of-the-art for globally injective 3D deformation.
Physics-based differentiable rendering---which focuses on estimating derivatives of radiometric detector responses with respect to arbitrary scene parameters---has a diverse array of applications from solving analysis-by-synthesis problems to training machine-learning pipelines incorporating forward-rendering processes. Unfortunately, existing general-purpose differentiable rendering techniques lack either the generality to handle volumetric light transport or the flexibility to devise Monte Carlo estimators capable of handling complex geometries and light transport effects.
In this paper, we bridge this gap by showing how generalized path integrals can be differentiated with respect to arbitrary scene parameters. Specifically, we establish the mathematical formulation of generalized differential path integrals that capture both interfacial and volumetric light transport. Our formulation allows the development of advanced differentiable rendering algorithms capable of efficiently handling challenging geometric discontinuities and light transport phenomena such as volumetric caustics.
We validate our method by comparing our derivative estimates to those generated using the finite differences. Further, to demonstrate the effectiveness of our technique, we compare both differentiable rendering and inverse rendering performance with state-of-the-art methods.
Stochastic sampling of light transport paths is key to Monte Carlo forward rendering, and previous studies have led to mature techniques capable of drawing high-contribution light paths in complex scenes. These sampling techniques have also been applied to differentiable rendering.
In this paper, we demonstrate that path sampling techniques developed for forward rendering can become inefficient for differentiable rendering of glossy materials---especially when estimating derivatives with respect to global scene geometries. To address this problem, we introduce antithetic sampling of BSDFs and light-transport paths, allowing significantly faster convergence and can be easily integrated into existing differentiable rendering pipelines. We validate our method by comparing our derivative estimates to those generated with existing unbiased techniques. Further, we demonstrate the effectiveness of our technique by providing equal-quality and equal-time comparisons with existing sampling methods.
Physically based differentiable rendering algorithms propagate derivatives through realistic light transport simulations and have applications in diverse areas including inverse reconstruction and machine learning. Recent progress has led to unbiased methods that can simultaneously compute derivatives with respect to millions of parameters. At the same time, elementary properties of these methods remain poorly understood.
Current algorithms for differentiable rendering are constructed by mechanically differentiating a given primal algorithm. While convenient, such an approach is simplistic because it leaves no room for improvement. Differentiation produces major changes in the integrals that occur throughout the rendering process, which indicates that the primal and differential algorithms should be decoupled so that the latter can suitably adapt.
This leads to a large space of possibilities: consider that even the most basic Monte Carlo path tracer already involves several design choices concerning the techniques for sampling materials and emitters, and their combination, e.g. via multiple importance sampling (MIS). Differentiation causes a veritable explosion of this decision tree: should we differentiate only the estimator, or also the sampling technique? Should MIS be applied before or after differentiation? Are specialized derivative sampling strategies of any use? How should visibility-related discontinuities be handled when millions of parameters are differentiated simultaneously? In this paper, we provide a taxonomy and analysis of different estimators for differential light transport to provide intuition about these and related questions.
Solving nonlinear systems is an important problem. Numerical continuation methods efficiently solve certain nonlinear systems. The Asymptotic Numerical Method (ANM) is a powerful continuation method that usually converges faster than Newtonian methods. ANM explores the landscape of the function by following a parameterized solution curve approximated with a high-order power series. Although ANM has successfully solved a few graphics and engineering problems, prior to our work, applying ANM to new problems required significant effort because the standard ANM assumes quadratic functions, while manually deriving the power series expansion for nonquadratic systems is a tedious and challenging task.
This paper presents a novel solver, SANM, that applies ANM to solve symbolically represented nonlinear systems. SANM solves such systems in a fully automated manner. SANM also extends ANM to support many nonquadratic operators, including intricate ones such as singular value decomposition. Furthermore, SANM generalizes ANM to support the implicit homotopy form. Moreover, SANM achieves high computing performance via optimized system design and implementation.
We deploy SANM to solve forward and inverse elastic force equilibrium problems and controlled mesh deformation problems with a few constitutive models. Our results show that SANM converges faster than Newtonian solvers, requires little programming effort for new problems, and delivers comparable or better performance than a hand-coded, specialized ANM solver. While we demonstrate on mesh deformation problems, SANM is generic and potentially applicable to many tasks.
This paper introduces a novel geometric multigrid solver for unstructured curved surfaces. Multigrid methods are highly efficient iterative methods for solving systems of linear equations. Despite the success in solving problems defined on structured domains, generalizing multigrid to unstructured curved domains remains a challenging problem. The critical missing ingredient is a prolongation operator to transfer functions across different multigrid levels. We propose a novel method for computing the prolongation for triangulated surfaces based on intrinsic geometry, enabling an efficient geometric multigrid solver for curved surfaces. Our surface multigrid solver achieves better convergence than existing multigrid methods. Compared to direct solvers, our solver is orders of magnitude faster. We evaluate our method on many geometry processing applications and a wide variety of complex shapes with and without boundaries. By simply replacing the direct solver, we upgrade existing algorithms to interactive frame rates, and shift the computational bottleneck away from solving linear systems.
Many computer graphics applications boil down to solving sparse systems of linear equations. While the current arsenal of numerical solvers available in various specialized libraries and for different computer architectures often allow efficient and scalable solutions to image processing, modeling and simulation applications, an increasing number of graphics problems face large-scale and ill-conditioned sparse linear systems --- a numerical challenge which typically chokes both direct factorizations (due to high memory requirements) and iterative solvers (because of slow convergence). We propose a novel approach to the efficient preconditioning of such problems which often emerge from the discretization over unstructured meshes of partial differential equations with heterogeneous and anisotropic coefficients. Our numerical approach consists in simply performing a fine-to-coarse ordering and a multiscale sparsity pattern of the degrees of freedom, using which we apply an incomplete Cholesky factorization. By further leveraging supernodes for cache coherence, graph coloring to improve parallelism and partial diagonal shifting to remedy negative pivots, we obtain a preconditioner which, combined with a conjugate gradient solver, far exceeds the performance of existing carefully-engineered libraries for graphics problems involving bad mesh elements and/or high contrast of coefficients. We also back the core concepts behind our simple solver with theoretical foundations linking the recent method of operator-adapted wavelets used in numerical homogenization to the traditional Cholesky factorization of a matrix, providing us with a clear bridge between incomplete Cholesky factorization and multiscale analysis that we leverage numerically.
Local-global solvers such as ADMM for elastic simulation and geometry optimization struggle to resolve large rotations such as bending and twisting modes, and large distortions in the presence of barrier energies. We propose two improvements to address these challenges. First, we introduce a novel local-global splitting based on the polar decomposition that separates the geometric nonlinearity of rotations from the material nonlinearity of the deformation energy. The resulting ADMM-based algorithm is a combination of an L-BFGS solve in the global step and proximal updates of element stretches in the local step. We also introduce a novel method for dynamic reweighting that is used to adjust element weights at runtime for improved convergence. With both improved rotation handling and element weighting, our algorithm is considerably faster than state-of-the-art approaches for quasi-static simulations. It is also much faster at making early progress in parameterization problems, making it valuable as an initializer to jump-start second-order algorithms.
We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub "physionical", is aware of physical and environmental constraints. It combines in a fully-differentiable way several key innovations, i.e., 1) a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2) an explicit rigid body dynamics model and 3) a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters---both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically-principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are provided in the supplementary video.
In a conventional optical motion capture (MoCap) workflow, two processes are needed to turn captured raw marker sequences into correct skeletal animation sequences. Firstly, various tracking errors present in the markers must be fixed (cleaning or refining). Secondly, an agent skeletal mesh must be prepared for the actor/actress, and used to determine skeleton information from the markers (re-targeting or solving). The whole process, normally referred to as solving MoCap data, is extremely time-consuming, labor-intensive, and usually the most costly part of animation production. Hence, there is a great demand for automated tools in industry. In this work, we present MoCap-Solver, a production-ready neural solver for optical MoCap data. It can directly produce skeleton sequences and clean marker sequences from raw MoCap markers, without any tedious manual operations. To achieve this goal, our key idea is to make use of neural encoders concerning three key intrinsic components: the template skeleton, marker configuration and motion, and to learn to predict these latent vectors from imperfect marker sequences containing noise and errors. By decoding these components from latent vectors, sequences of clean markers and skeletons can be directly recovered. Moreover, we also provide a novel normalization strategy based on learning a pose-dependent marker reliability function, which greatly improves system robustness. Experimental results demonstrate that our algorithm consistently outperforms the state-of-the-art on both synthetic and real-world datasets.
We present a new method to capture detailed human motion, sampling more than 1000 unique points on the body. Our method outputs highly accurate 4D (spatio-temporal) point coordinates and, crucially, automatically assigns a unique label to each of the points. The locations and unique labels of the points are inferred from individual 2D input images only, without relying on temporal tracking or any human body shape or skeletal kinematics models. Therefore, our captured point trajectories contain all of the details from the input images, including motion due to breathing, muscle contractions and flesh deformation, and are well suited to be used as training data to fit advanced models of the human body and its motion. The key idea behind our system is a new type of motion capture suit which contains a special pattern with checkerboard-like corners and two-letter codes. The images from our multi-camera system are processed by a sequence of neural networks which are trained to localize the corners and recognize the codes, while being robust to suit stretching and self-occlusions of the body. Our system relies only on standard RGB or monochrome sensors and fully passive lighting and the passive suit, making our method easy to replicate, deploy and use. Our experiments demonstrate highly accurate captures of a wide variety of human poses, including challenging motions such as yoga, gymnastics, or rolling on the ground.
Motion capture is facing some new possibilities brought by the inertial sensing technologies which do not suffer from occlusion or wide-range recordings as vision-based solutions do. However, as the recorded signals are sparse and quite noisy, online performance and global translation estimation turn out to be two key difficulties. In this paper, we present TransPose, a DNN-based approach to perform full motion capture (with both global translations and body poses) from only 6 Inertial Measurement Units (IMUs) at over 90 fps. For body pose estimation, we propose a multi-stage network that estimates leaf-to-full joint positions as intermediate results. This design makes the pose estimation much easier, and thus achieves both better accuracy and lower computation cost. For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique. Quantitative and qualitative comparisons show that our method outperforms the state-of-the-art learning- and optimization-based methods with a large margin in both accuracy and efficiency. As a purely inertial sensor-based approach, our method is not limited by environmental settings (e.g., fixed cameras), making the capture free from common difficulties such as wide-range motion space and strong occlusion.
We present a method for performing real-time facial animation of a 3D avatar from binocular video. Existing facial animation methods fail to automatically capture precise and subtle facial motions for driving a photo-realistic 3D avatar "in-the-wild" (i.e., variability in illumination, camera noise). The novelty of our approach lies in a light-weight process for specializing a personalized face model to new environments that enables extremely accurate real-time face tracking anywhere. Our method uses a pre-trained high-fidelity personalized model of the face that we complement with a novel illumination model to account for variations due to lighting and other factors often encountered in-the-wild (e.g., facial hair growth, makeup, skin blemishes). Our approach comprises two steps. First, we solve for our illumination model's parameters by applying analysis-by-synthesis on a short video recording. Using the pairs of model parameters (rigid, non-rigid) and the original images, we learn a regression for real-time inference from the image space to the 3D shape and texture of the avatar. Second, given a new video, we fine-tune the real-time regression model with a few-shot learning strategy to adapt the regression model to the new environment. We demonstrate our system's ability to precisely capture subtle facial motions in unconstrained scenarios, in comparison to competing methods, on a diverse collection of identities, expressions, and real-world environments.
While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. To enable this, we introduce a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA is learned from in-the-wild images with no paired 3D supervision and achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity- and expression-dependent details enabling animation of reconstructed faces. The model and code are publicly available at https://deca.is.tue.mpg.de.
We present a method for building high-fidelity animatable 3D face models that can be posed and rendered with novel lighting environments in real-time. Our main insight is that relightable models trained to produce an image lit from a single light direction can generalize to natural illumination conditions but are computationally expensive to render. On the other hand, efficient, high-fidelity face models trained with point-light data do not generalize to novel lighting conditions. We leverage the strengths of each of these two approaches. We first train an expensive but generalizable model on point-light illuminations, and use it to generate a training set of high-quality synthetic face images under natural illumination conditions. We then train an efficient model on this augmented dataset, reducing the generalization ability requirements. As the efficacy of this approach hinges on the quality of the synthetic data we can generate, we present a study of lighting pattern combinations for dynamic captures and evaluate their suitability for learning generalizable relightable models. Towards achieving the best possible quality, we present a novel approach for generating dynamic relightable faces that exceeds state-of-the-art performance. Our method is capable of capturing subtle lighting effects and can even generate compelling near-field relighting despite being trained exclusively with far-field lighting data. Finally, we motivate the utility of our model by animating it with images captured from VR-headset mounted cameras, demonstrating the first system for face-driven interactions in VR that uses a photorealistic relightable face model.
Recent facial image synthesis methods have been mainly based on conditional generative models. Sketch-based conditions can effectively describe the geometry of faces, including the contours of facial components, hair structures, as well as salient edges (e.g., wrinkles) on face surfaces but lack effective control of appearance, which is influenced by color, material, lighting condition, etc. To have more control of generated results, one possible approach is to apply existing disentangling works to disentangle face images into geometry and appearance representations. However, existing disentangling methods are not optimized for human face editing, and cannot achieve fine control of facial details such as wrinkles. To address this issue, we propose DeepFaceEditing, a structured disentanglement framework specifically designed for face images to support face generation and editing with disentangled control of geometry and appearance. We adopt a local-to-global approach to incorporate the face domain knowledge: local component images are decomposed into geometry and appearance representations, which are fused consistently using a global fusion module to improve generation quality. We exploit sketches to assist in extracting a better geometry representation, which also supports intuitive geometry editing via sketching. The resulting method can either extract the geometry and appearance representations from face images, or directly extract the geometry representation from face sketches. Such representations allow users to easily edit and synthesize face images, with decoupled control of their geometry and appearance. Both qualitative and quantitative evaluations show the superior detail and appearance control abilities of our method compared to state-of-the-art methods.
We present a framework that enables the discovery of diverse and natural-looking motion strategies for athletic skills such as the high jump. The strategies are realized as control policies for physics-based characters. Given a task objective and an initial character configuration, the combination of physics simulation and deep reinforcement learning (DRL) provides a suitable starting point for automatic control policy training. To facilitate the learning of realistic human motions, we propose a Pose Variational Autoencoder (P-VAE) to constrain the actions to a subspace of natural poses. In contrast to motion imitation methods, a rich variety of novel strategies can naturally emerge by exploring initial character states through a sample-efficient Bayesian diversity search (BDS) algorithm. A second stage of optimization that encourages novel policies can further enrich the unique strategies discovered. Our method allows for the discovery of diverse and novel strategies for athletic jumping motions such as high jumps and obstacle jumps with no motion examples and less reward engineering than prior work.
Interactively synthesizing novel combinations and variations of character movements from different motion skills is a key problem in computer animation. In this paper, we propose a deep learning framework to produce a large variety of martial arts movements in a controllable manner from raw motion capture data. Our method imitates animation layering using neural networks with the aim to overcome typical challenges when mixing, blending and editing movements from unaligned motion sources. The framework can synthesize novel movements from given reference motions and simple user controls, and generate unseen sequences of locomotion, punching, kicking, avoiding and combinations thereof, but also reconstruct signature motions of different fighters, as well as close-character interactions such as clinching and carrying by learning the spatial joint relationships. To achieve this goal, we adopt a modular framework which is composed of the motion generator and a set of different control modules. The motion generator functions as a motion manifold that projects novel mixed/edited trajectories to natural full-body motions, and synthesizes realistic transitions between different motions. The control modules are task dependent and can be developed and trained separately by engineers to include novel motion tasks, which greatly reduces network iteration time when working with large-scale datasets. Our modular framework provides a transparent control interface for animators that allows modifying or combining movements after network training, and enables iterative adding of control modules for different motion tasks and behaviors. Our system can be used for offline and online motion generation alike, and is relevant for real-time applications such as computer games.
We present a new algorithm that learns a parameterized family of motor skills from a single motion clip. The motor skills are represented by a deep policy network, which produces a stream of motions in physics simulation in response to user input and environment interaction by navigating continuous action space. Three novel technical components play an important role in the success of our algorithm. First, it explicitly constructs motion parameterization that maps action parameters to their corresponding motions. Simultaneous learning of motion parameterization and motor skills significantly improves the performance and visual quality of learned motor skills. Second, continuous-time reinforcement learning is adopted to explore temporal variations as well as spatial variations in motion parameterization. Lastly, we present a new automatic curriculum generation method that explores continuous action space more efficiently. We demonstrate the flexibility and versatility of our algorithm with highly dynamic motor skills that can be parameterized by task goals, body proportions, physical measurements, and environmental conditions.
We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance learned in a new weakly supervised way from multi-view imagery. In contrast to previous work, our controllable 3D character displays dynamics, e.g., the swing of the skirt, dependent on skeletal body motion in an efficient data-driven way, without requiring complex physics simulation. Our character model also features a learned dynamic texture model that accounts for photo-realistic motion-dependent appearance details, as well as view-dependent lighting effects. During training, we do not need to resort to difficult dynamic 3D capture of the human; instead we can train our model entirely from multi-view video in a weakly supervised manner. To this end, we propose a parametric and differentiable character representation which allows us to model coarse and fine dynamic deformations, e.g., garment wrinkles, as explicit spacetime coherent mesh geometry that is augmented with high-quality dynamic textures dependent on motion and view point. As input to the model, only an arbitrary 3D skeleton motion is required, making it directly compatible with the established 3D animation pipeline. We use a novel graph convolutional network architecture to enable motion-dependent deformation learning of body and clothing, including dynamics, and a neural generative dynamic texture model creates corresponding dynamic texture maps. We show that by merely providing new skeletal motions, our model creates motion-dependent surface deformations, physically plausible dynamic clothing deformations, as well as video-realistic surface textures at a much higher level of detail than previous state of the art approaches, and even in real-time.
A color palette is one of the simplest and most intuitive descriptors that can be extracted from images or videos. This paper proposes a method to assess the similarity between color palettes by sorting colors. While previous palette similarity measures compare only colors without considering the overall palette combination, we sort palettes to minimize the geometric distance between colors and align them to share a common color tendency. We propose dynamic closest color warping (DCCW) to calculate the minimum distance sum between colors and the graph connecting the colors in the other palette. We evaluate the proposed palette sorting and DCCW with several datasets and demonstrate that DCCW outperforms previous methods in terms of accuracy and computing time. We validate the effectiveness of the proposed sorting technique by conducting a perceptual study, which indicates a clear preference for the results of our approach. We also demonstrate useful applications enabled by DCCW, including palette interpolation, palette navigation, and image recoloring.
Manga inpainting fills up the disoccluded pixels due to the removal of dialogue balloons or "sound effect" text. This process is long needed by the industry for the language localization and the conversion to animated manga. It is mostly done manually, as existing methods (mostly for natural image inpainting) cannot produce satisfying results. Manga inpainting is more tricky than natural image inpainting because its highly abstract illustration using structural lines and screentone patterns, which confuses the semantic interpretation and visual content synthesis. In this paper, we present the first manga inpainting method, a deep learning model, that generates high-quality results. Instead of direct inpainting, we propose to separate the complicated inpainting into two major phases, semantic inpainting and appearance synthesis. This separation eases both the feature understanding and hence the training of the learning model. A key idea is to disentangle the structural line and screentone, that helps the network to better distinguish the structural line and the screentone features for semantic interpretation. Both the visual comparison and the quantitative experiments evidence the effectiveness of our method and justify its superiority over existing state-of-the-art methods in the application of manga inpainting.
Solving partial differential equations (PDEs) on infinite domains has been a challenging task in physical simulations and geometry processing. We introduce a general technique to transform a PDE problem on an unbounded domain to a PDE problem on a bounded domain. Our method uses the Kelvin Transform, which essentially inverts the distance from the origin. However, naive application of this coordinate mapping can still result in a singularity at the origin in the transformed domain. We show that by factoring the desired solution into the product of an analytically known (asymptotic) component and another function to solve for, the problem can be made continuous and compact, with solutions significantly more efficient and well-conditioned than traditional finite element and Monte Carlo numerical PDE methods on stretched coordinates. Specifically, we show that every Poisson or Laplace equation on an infinite domain is transformed to another Poisson (Laplace) equation on a compact region. In other words, any existing Poisson solver on a bounded domain is readily an infinite domain Poisson solver after being wrapped by our transformation. We demonstrate the integration of our method with finite difference and Monte Carlo PDE solvers, with applications in the fluid pressure solve and simulating electromagnetism, including visualizations of the solar magnetic field. Our transformation technique also applies to the Helmholtz equation whose solutions oscillate out to infinity. After the transformation, the Helmholtz equation becomes a tractable equation on a bounded domain without infinite oscillation. To our knowledge, this is the first time that the Helmholtz equation on an infinite domain is solved on a bounded grid without requiring an artificial absorbing boundary condition.
We propose a novel Lagrangian geometric representation using segment clouds to simulate incompressible fluid exhibiting strong anisotropic vortical features. The central component of our approach is a cloud of discrete segments enhanced by a set of local segment reseeding operations to facilitate both the geometrical evolution and the topological updates of vortical flow. We build a vortex dynamics solver with the support for dynamic solid boundaries based on discrete segment primitives. We demonstrate the efficacy of our approach by simulating a broad range of challenging flow phenomena, such as reconnection of non-closed vortex tubes and vortex shedding behind a rotating object.
We propose a novel gauge fluid solver based on Clebsch wave functions to solve incompressible fluid equations. Our method combines the expressive power of Clebsch wave functions to represent coherent vortical structures and the generality of gauge methods to accommodate a broad array of fluid phenomena. By evolving a transformed wave function as the system's gauge variable enhanced by an additional projection step to enforce pressure jumps on the free boundaries, our method can significantly improve the vorticity generation and preservation ability for a broad range of gaseous and liquid phenomena. Our approach can be easily implemented by modifying a standard grid-based fluid simulator. It can be used to solve various fluid dynamics, including complex vortex filament dynamics, fluids with different obstacles, and surface-tension flow.
While modern fluid simulation methods achieve high-quality simulation results, it is still a big challenge to interpret and control motion from visual quantities, such as the advected marker density. These visual quantities play an important role in user interactions: Being familiar and meaningful to humans, these quantities have a strong correlation with the underlying motion. We propose a novel data-driven conditional adversarial model that solves the challenging and theoretically ill-posed problem of deriving plausible velocity fields from a single frame of a density field. Besides density modifications, our generative model is the first to enable the control of the results using all of the following control modalities: obstacles, physical parameters, kinetic energy, and vorticity. Our method is based on a new conditional generative adversarial neural network that explicitly embeds physical quantities into the learned latent space, and a new cyclic adversarial network design for control disentanglement. We show the high quality and versatile controllability of our results for density-based inference, realistic obstacle interaction, and sensitive responses to modifications of physical parameters, kinetic energy, and vorticity. Code, models, and results can be found at https://github.com/RachelCmy/den2vel.
We present a unified method to meshing surfaces with unconventional patterns, both periodic and aperiodic. These patterns, which have so far been studied on the plane, are patterns comprising a small number of tiles, that do not necessarily exhibit translational periodicity. Our method generalizes the de Bruijn multigrid method to the discrete setting, and thus reduces the problem to the computation of N-Directional fields on triangle meshes. We work with all cases of directional symmetries that have been little studied, including odd and high N. We address the properties of such patterns on surfaces and the challenges in their construction, including order-preservation, seamlessness, duality, and singularities. We show how our method allows for the design of original and unconventional meshes that can be applied to architectural, industrial, and recreational design.
Mapping a triangulated surface to 2D space (or a tetrahedral mesh to 3D space) is an important problem in geometry processing. In computational physics, untangling plays an important role in mesh generation: it takes a mesh as an input, and moves the vertices to get rid of foldovers. In fact, mesh untangling can be considered as a special case of mapping where the geometry of the object is to be defined in the map space and the geometric domain is not explicit, supposing that each element is regular. In this paper, we propose a mapping method inspired by the untangling problem and compare its performance to the state of the art. The main advantage of our method is that the untangling aims at producing locally injective maps, which is the major challenge of mapping. In practice, our method produces locally injective maps in very difficult settings, both in 2D and 3D. We demonstrate it on a large reference database as well as on more difficult stress tests. For a better reproducibility, we publish the code in Python for a basic evaluation, and in C++ for more advanced applications.
This paper describes a numerical method for surface parameterization, yielding maps that are locally injective and discretely conformal in an exact sense. Unlike previous methods for discrete conformal parameterization, the method is guaranteed to work for any manifold triangle mesh, with no restrictions on triangulatiothat each task can be formulated as a convex problem where the triangulation is allowed to change---we complete the picture by introducing the machinery needed to actually construct a discrete conformal map. In particular, we introduce a new scheme for tracking correspondence between triangulations based on normal coordinates, and a new interpolation procedure based on layout in the light cone. Stress tests involving difficult cone configurations and near-degenerate triangulations indicate that the method is extremely robust in practice, and provides high-quality interpolation even on meshes with poor elements.
We propose a new static high-performance mesh data structure for triangle surface meshes on the GPU. Our data structure is carefully designed for parallel execution while capturing mesh locality and confining data access, as much as possible, within the GPU's fast "shared memory." We achieve this by subdividing the mesh into patches and representing these patches compactly using a matrix-based representation. Our patching technique is decorated with ribbons, thin mesh strips around patches that eliminate the need to communicate between different computation thread blocks, resulting in consistent high throughput. We call our data structure RXMesh: Ribbon-matriX Mesh. We hide the complexity of our data structure behind a flexible but powerful programming model that helps deliver high performance by inducing load balance even in highly irregular input meshes. We show the efficacy of our programming model on common geometry processing applications---mesh smoothing and filtering, geodesic distance, and vertex normal computation. For evaluation, we benchmark our data structure against well-optimized GPU and (single and multi-core) CPU data structures and show significant speedups.
We present a novel method for authoring landscapes with flora and fauna while considering their mutual interactions. Our algorithm outputs a steady-state ecosystem in the form of density maps for each species, their daily circuits, and a modified terrain with eroded trails from a terrain, climatic conditions, and species with related biological information. We introduce the Resource Access Graph, a new data structure that encodes both interactions between food chain levels and animals traveling between resources over the terrain. A novel competition algorithm operating on this data progressively computes a steady-state solution up the food chain, from plants to carnivores. The user can explore the resulting landscape, where plants and animals are instantiated on the fly, and interactively edit it by over-painting the maps. Our results show that our system enables the authoring of consistent landscapes where the impact of wildlife is visible through animated animals, clearings in the vegetation, and eroded trails. We provide quantitative validation with existing ecosystems and a user-study with expert paleontologist end-users, showing that our system enables them to author and compare different ecosystems illustrating climate changes over the same terrain while enabling relevant visual immersion into consistent landscapes.
It is increasingly common to model, simulate, and process complex materials based on loopy structures, such as in yarn-level cloth garments, which possess topological constraints between inter-looping curves. While the input model may satisfy specific topological linkages between pairs of closed loops, subsequent processing may violate those topological conditions. In this paper, we explore a family of methods for efficiently computing and verifying linking numbers between closed curves, and apply these to applications in geometry processing, animation, and simulation, so as to verify that topological invariants are preserved during and after processing of the input models. Our method has three stages: (1) we identify potentially interacting loop-loop pairs, then (2) carefully discretize each loop's spline curves into line segments so as to enable (3) efficient linking number evaluation using accelerated kernels based on either counting projected segment-segment crossings, or by evaluating the Gauss linking integral using direct or fast summation methods (Barnes-Hut or fast multipole methods). We evaluate CPU and GPU implementations of these methods on a suite of test problems, including yarn-level cloth and chainmail, that involve significant processing: physics-based relaxation and animation, user-modeled deformations, curve compression and reparameterization. We show that topology errors can be efficiently identified to enable more robust processing of loopy structures.
Emerging research in computer graphics, inverse problems, and machine learning requires us to differentiate and optimize parametric discontinuities. These discontinuities appear in object boundaries, occlusion, contact, and sudden change over time. In many domains, such as rendering and physics simulation, we differentiate the parameters of models that are expressed as integrals over discontinuous functions. Ignoring the discontinuities during differentiation often has a significant impact on the optimization process. Previous approaches either apply specialized hand-derived solutions, smooth out the discontinuities, or rely on incorrect automatic differentiation.
We propose a systematic approach to differentiating integrals with discontinuous integrands, by developing a new differentiable programming language. We introduce integration as a language primitive and account for the Dirac delta contribution from differentiating parametric discontinuities in the integrand. We formally define the language semantics and prove the correctness and closure under the differentiation, allowing the generation of gradients and higher-order derivatives. We also build a system, Teg, implementing these semantics. Our approach is widely applicable to a variety of tasks, including image stylization, fitting shader parameters, trajectory optimization, and optimizing physical designs.
Differentiable physically-based rendering has become an indispensable tool for solving inverse problems involving light. Most applications in this area jointly optimize a large set of scene parameters to minimize an objective function, in which case reverse-mode differentiation is the method of choice for obtaining parameter gradients.
However, existing techniques that perform the necessary differentiation step suffer from either statistical bias or a prohibitive cost in terms of memory and computation time. For example, standard techniques for automatic differentiation based on program transformation or Wengert tapes lead to impracticably large memory usage when applied to physically-based rendering algorithms. A recently proposed adjoint method by Nimier-David et al. [2020] reduces this to a constant memory footprint, but the computation time for unbiased gradient estimates then becomes quadratic in the number of scattering events along a light path. This is problematic when the scene contains highly scattering materials like participating media.
In this paper, we propose a new unbiased backpropagation algorithm for rendering that only requires constant memory, and whose computation time is linear in the number of scattering events (i.e., just like path tracing). Our approach builds on the invertibility of the local Jacobian at scattering interactions to recover the various quantities needed for reverse-mode differentiation. Our method also extends to specular materials such as smooth dielectrics and conductors that cannot be handled by prior work.
The material point method (MPM) recently demonstrated its efficacy at simulating many materials and the coupling between them on a massive scale. However, in scenarios containing debris, MPM manifests more dissipation and numerical viscosity than traditional Lagrangian methods. We have two observations from carefully revisiting existing integration methods used in MPM. First, nearby particles would end up with smoothed velocities without recovering momentum for each particle during the particle-grid-particle transfers. Second, most existing integrators assume continuity in the entire domain and advect particles by directly interpolating the positions from deformed nodal positions, which would trap the particles and make them harder to separate. We propose an integration scheme that corrects particle positions at each time step. We demonstrate our method's effectiveness with several large-scale simulations involving brittle materials. Our approach effectively reduces diffusion and unphysical viscosity compared to traditional integrators.
We propose a particle-based method to simulate thin-film fluid that jointly facilitates aggressive surface deformation and vigorous tangential flows. We build our dynamics model from the surface tension driven Navier-Stokes equation with the dimensionality reduced using the asymptotic lubrication theory and customize a set of differential operators based on the weakly compressible Smoothed Particle Hydrodynamics (SPH) for evolving pointset surfaces. The key insight is that the compressible nature of SPH, which is unfavorable in its typical usage, is helpful in our application to co-evolve the thickness, calculate the surface tension, and enforce the fluid incompressibility on a thin film. In this way, we are able to two-way couple the surface deformation with the in-plane flows in a physically based manner. We can simulate complex vortical swirls, fingering effects due to Rayleigh-Taylor instability, capillary waves, Newton's interference fringes, and the Marangoni effect on liberally deforming surfaces by presenting both realistic visual results and numerical validations. The particle-based nature of our system also enables it to conveniently handle topology changes and codimension transitions, allowing us to marry the thin-film simulation with a wide gamut of 3D phenomena, such as pinch-off of unstable catenoids, dripping under gravity, merging of droplets, as well as bubble rupture.
We present a novel Material Point Method (MPM) discretization of surface tension forces that arise from spatially varying surface energies. These variations typically arise from surface energy dependence on temperature and/or concentration. Furthermore, since the surface energy is an interfacial property depending on the types of materials on either side of an interface, spatial variation is required for modeling the contact angle at the triple junction between a liquid, solid and surrounding air. Our discretization is based on the surface energy itself, rather than on the associated traction condition most commonly used for discretization with particle methods. Our energy based approach automatically captures surface gradients without the explicit need to resolve them as in traction condition based approaches. We include an implicit discretization of thermomechanical material coupling with a novel particle-based enforcement of Robin boundary conditions associated with convective heating. Lastly, we design a particle resampling approach needed to achieve perfect conservation of linear and angular momentum with Affine-Particle-In-Cell (APIC) [Jiang et al. 2015]. We show that our approach enables implicit time stepping for complex behaviors like the Marangoni effect and hydrophobicity/hydrophilicity. We demonstrate the robustness and utility of our method by simulating materials that exhibit highly diverse degrees of surface tension and thermomechanical effects, such as water, wine and wax.
Smooth curves and surfaces can be characterized as minimizers of squared curvature bending energies subject to constraints. In the univariate case with an isometry (length) constraint this leads to classic non-linear splines. For surfaces, isometry is too rigid a constraint and instead one asks for minimizers of the Willmore (squared mean curvature) energy subject to a conformality constraint. We present an efficient algorithm for (conformally) constrained Willmore surfaces using triangle meshes of arbitrary topology with or without boundary. Our conformal class constraint is based on the discrete notion of conformal equivalence of triangle meshes. The resulting non-linear constrained optimization problem can be solved efficiently using the competitive gradient descent method together with appropriate Sobolev metrics. The surfaces can be represented either through point positions or differential coordinates. The latter enable the realization of abstract metric surfaces without an initial immersion. A versatile toolkit for extrinsic conformal geometry processing, suitable for the construction and manipulation of smooth surfaces, results through the inclusion of additional point, area, and volume constraints.
We describe a new algorithm that solves a classical geometric problem: Find a surface of minimal area bordered by an arbitrarily prescribed boundary curve. Existing numerical methods face challenges due to the non-convexity of the problem. Using a representation of curves and surfaces via differential forms on the ambient space, we reformulate this problem as a convex optimization. This change of variables overcomes many difficulties in previous numerical attempts and allows us to find the global minimum across all possible surface topologies. The new algorithm is based on differential forms on the ambient space and does not require handling meshes. We adopt the Alternating Direction Method of Multiplier (ADMM) to find global minimal surfaces. The resulting algorithm is simple and efficient: it boils down to an alternation between a Fast Fourier Transform (FFT) and a pointwise shrinkage operation. We also show other applications of our solver in geometry processing such as surface reconstruction.
In this paper we study Weingarten surfaces and explore their potential for fabrication-aware design in freeform architecture. Weingarten surfaces are characterized by a functional relation between their principal curvatures that implicitly defines approximate local congruences on the surface. These symmetries can be exploited to simplify surface paneling of double-curved architectural skins through mold re-use.
We present an optimization approach to find a Weingarten surface that is close to a given input design. Leveraging insights from differential geometry, our method aligns curvature isolines of the surface in order to contract the curvature diagram from a 2D region into a 1D curve. The unknown functional curvature relation then emerges as the result of the optimization. We show how a robust and efficient numerical shape approximation method can be implemented using a guided projection approach on a high-order B-spline representation. This algorithm is applied in several design studies to illustrate how Weingarten surfaces define a versatile shape space for fabrication-aware exploration in freeform architecture. Our optimization algorithm provides the first practical tool to compute general Weingarten surfaces with arbitrary curvature relation, thus enabling new investigations into a rich, but as of yet largely unexplored class of surfaces.
Given a pair of images---target person and garment on another person---we automatically generate the target person in the given garment. Previous methods mostly focused on texture transfer via paired data training, while overlooking body shape deformations, skin color, and seamless blending of garment with the person. This work focuses on those three components, while also not requiring paired data training. We designed a pose conditioned StyleGAN2 architecture with a clothing segmentation branch that is trained on images of people wearing garments. Once trained, we propose a new layered latent space interpolation method that allows us to preserve and synthesize skin color and target body shape while transferring the garment from a different person. We demonstrate results on high resolution 512 × 512 images, and extensively compare to state of the art in try-on on both latent space generated and real images.
We present a caricature generation framework based on shape and style manipulation using StyleGAN. Our framework, dubbed StyleCariGAN, automatically creates a realistic and detailed caricature from an input photo with optional controls on shape exaggeration degree and color stylization type. The key component of our method is shape exaggeration blocks that are used for modulating coarse layer feature maps of StyleGAN to produce desirable caricature shape exaggerations. We first build a layer-mixed StyleGAN for photo-to-caricature style conversion by swapping fine layers of the StyleGAN for photos to the corresponding layers of the StyleGAN trained to generate caricatures. Given an input photo, the layer-mixed model produces detailed color stylization for a caricature but without shape exaggerations. We then append shape exaggeration blocks to the coarse layers of the layer-mixed model and train the blocks to create shape exaggerations while preserving the characteristic appearances of the input. Experimental results show that our StyleCariGAN generates realistic and detailed caricatures compared to the current state-of-the-art methods. We demonstrate StyleCariGAN also supports other StyleGAN-based image manipulations, such as facial expression control.
Portraiture as an art form has evolved from realistic depiction into a plethora of creative styles. While substantial progress has been made in automated stylization, generating high quality stylistic portraits is still a challenge, and even the recent popular Toonify suffers from several artifacts when used on real input images. Such StyleGAN-based methods have focused on finding the best latent inversion mapping for reconstructing input images; however, our key insight is that this does not lead to good generalization to different portrait styles. Hence we propose AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We introduce a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution, while augmenting the original space to a multi-resolution latent space so as to better encode different levels of detail. To better capture attribute-dependent stylization of facial features, we also present an attribute-aware generator and adopt an early stopping strategy to avoid overfitting small training datasets. Our approach provides greater agility in creating high quality and high resolution (1024×1024) portrait stylization models, requiring only a limited number of style exemplars (~100) and short training time (~1 hour). We collected several style datasets for evaluation including 3D cartoons, comics, oil paintings and celebrities. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study. We also demonstrate two applications of our method, image editing and motion retargeting.
Porous materials are common in daily life. They include granular material (e.g. sand) that behaves like liquid flow when mixed with fluid and foam material (e.g. sponge) that deforms like solid when interacting with liquid. The underlying physics is further complicated when multiple fluids interact with porous materials involving coupling between rigid and fluid bodies, which may follow different physics models such as the Darcy's law and the multiple-fluid Navier-Stokes equations. We propose a unified particle framework for the simulation of multiple-fluid flows and porous materials. A novel virtual phase concept is introduced to avoid explicit particle state tracking and runtime particle deletion/insertion. Our unified model is flexible and stable to cope with multiple fluid interacting with porous materials, and it can ensure consistent mass and momentum transport over the whole simulation space.
We assume that the viscous forces in any liquid are simultaneously local and non-local, and introduce the extended POM-POM model [McLeish and Larson 1998; Oishi et al. 2012; Verbeeten et al. 2001] to computer graphics to design a unified constitutive model for viscosity that generalizes prior models, such as Oldroyd-B, the Upper-convected Maxwell (UCM) model [Sadeghy et al. 2005], and classical Newtonian viscosity under one umbrella, recovering each of them with different parameter values. Implicit discretization of our model via backward Euler recovers the variational Stokes solver of [Larionov et al. 2017] for Newtonian viscosity. For greater accuracy, however, we introduce the second-order accurate Generalized Single Step Single Solve (GS4) scheme [Tamma et al. 2000; Zhou and Tamma 2004] to computer graphics, which recovers all prior second-order accurate time integration schemes to date. Using GS4 and our generalized constitutive model, we present a Material Point Method (MPM) for simulating various viscoelastic liquid behaviors, such as classical liquid rope coiling, buckling, folding, and shear thinning/thickening. In addition, we show how to couple our viscoelastic liquid simulator with the recently introduced non-Fourier heat diffusion solver [Xue et al. 2020] for simulating problems with phase change, such as melting chocolate and digital fabrication with 3D printing. While the discretization of heat diffusion is slightly different within GS4, we show that it can still be efficiently solved using an assembly-free Multigrid-preconditioned Conjugate Gradients solver. We present end-to-end 3D simulations to demonstrate the versatility of our framework.
We propose a novel three-way coupling method to model the contact interaction between solid and fluid driven by strong surface tension. At the heart of our physical model is a thin liquid membrane that simultaneously couples to both the liquid volume and the rigid objects, facilitating accurate momentum transfer, collision processing, and surface tension calculation. This model is implemented numerically under a hybrid Eulerian-Lagrangian framework where the membrane is modelled as a simplicial mesh and the liquid volume is simulated on a background Cartesian grid. We devise a monolithic solver to solve the interactions among the three systems of liquid, solid, and membrane. We demonstrate the efficacy of our method through an array of rigid-fluid contact simulations dominated by strong surface tension, which enables the faithful modeling of a host of new surface-tension-dominant phenomena including: objects with higher density than water that remains afloat; 'Cheerios effect' where floating objects attract one another; and surface tension weakening effect caused by surface-active constituents.
Natural hand manipulations exhibit complex finger maneuvers adaptive to object shapes and the tasks at hand. Learning dexterous manipulation from data in a brute force way would require a prohibitive amount of examples to effectively cover the combinatorial space of 3D shapes and activities. In this paper, we propose a hand-object spatial representation that can achieve generalization from limited data. Our representation combines the global object shape as voxel occupancies with local geometric details as samples of closest distances. This representation is used by a neural network to regress finger motions from input trajectories of wrists and objects. Specifically, we provide the network with the current finger pose, past and future trajectories, and the spatial representations extracted from these trajectories. The network then predicts a new finger pose for the next frame as an autoregressive model. With a carefully chosen hand-centric coordinate system, we can handle single-handed and two-handed motions in a unified framework. Learning from a small number of primitive shapes and kitchenware objects, the network is able to synthesize a variety of finger gaits for grasping, in-hand manipulation, and bimanual object handling on a rich set of novel shapes and functional tasks. We also demonstrate a live demo of manipulating virtual objects in real-time using a simple physical prop. Our system is useful for offline animation or real-time applications forgiving to a small delay.
We present a new approach for modelling musculoskeletal anatomy. Unlike previous methods, we do not model individual muscle shapes as geometric primitives (polygonal meshes, NURBS etc.). Instead, we adopt a volumetric segmentation approach where every point in our volume is assigned to a muscle, fat, or bone tissue. We provide an interactive modelling tool where the user controls the segmentation via muscle curves and we visualize the muscle shapes using volumetric rendering. Muscle curves enable intuitive yet powerful control over the muscle shapes. This representation allows us to automatically handle intersections between different tissues (muscle-muscle, muscle-bone, and muscle-skin) during the modelling and automates computation of muscle fiber fields. We further introduce a novel algorithm for converting the volumetric muscle representation into tetrahedral or surface geometry for use in downstream tasks. Additionally, we introduce an interactive skeleton authoring tool that allows the users to create skeletal anatomy starting from only a skin mesh using a library of bone parts.
This paper addresses the task of estimating spatially-varying reflectance (i.e., SVBRDF) from a single, casually captured image. Central to our method is a highlight-aware (HA) convolution operation and a two-stream neural network equipped with proper training losses. Our HA convolution, as a novel variant of standard (ST) convolution, directly modulates convolution kernels under the guidance of automatically learned masks representing potentially overexposed highlight regions. It helps to reduce the impact of strong specular highlights on diffuse components and at the same time, hallucinates plausible contents in saturated regions. Considering that variation of saturated pixels also contains important cues for inferring surface bumpiness and specular components, we design a two-stream network to extract features from two different branches stacked by HA convolutions and ST convolutions, respectively. These two groups of features are further fused in an attention-based manner to facilitate feature selection of each SVBRDF map. The whole network is trained end to end with a new perceptual adversarial loss which is particularly useful for enhancing the texture details. Such a design also allows the recovered material maps to be disentangled. We demonstrate through quantitative analysis and qualitative visualization that the proposed method is effective to recover clear SVBRDFs from a single casually captured image, and performs favorably against state-of-the-arts. Since we impose very few constraints on the capture process, even a non-expert user can create high-quality SVBRDFs that cater to many graphical applications.
We propose neural trace photography, a novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance. Our key insight is that free-form appearance scanning can be cast as a geometry learning problem on unstructured point clouds, each of which represents an image measurement and the corresponding acquisition condition. Based on this connection, we carefully design a neural network, to jointly optimize the lighting conditions to be used in acquisition, as well as the spatially independent reconstruction of reflectance from corresponding measurements. Our framework is not tied to a specific setup, and can adapt to various factors in a data-driven manner. We demonstrate the effectiveness of our framework on a number of physical objects with a wide variation in appearance. The objects are captured with a light-weight mobile device, consisting of a single camera and an RGB LED array. We also generalize the framework to other common types of light sources, including a point, a linear and an area light.
Material appearance hinges on material reflectance properties but also surface geometry and illumination. The unlimited number of potential combinations between these factors makes understanding and predicting material appearance a very challenging task. In this work, we collect a large-scale dataset of perceptual ratings of appearance attributes with more than 215,680 responses for 42,120 distinct combinations of material, shape, and illumination. The goal of this dataset is twofold. First, we analyze for the first time the effects of illumination and geometry in material perception across such a large collection of varied appearances. We connect our findings to those of the literature, discussing how previous knowledge generalizes across very diverse materials, shapes, and illuminations. Second, we use the collected dataset to train a deep learning architecture for predicting perceptual attributes that correlate with human judgments. We demonstrate the consistent and robust behavior of our predictor in various challenging scenarios, which, for the first time, enables estimating perceived material attributes from general 2D images. Since our predictor relies on the final appearance in an image, it can compare appearance properties across different geometries and illumination conditions. Finally, we demonstrate several applications that use our predictor, including appearance reproduction using 3D printing, BRDF editing by integrating our predictor in a differentiable renderer, illumination design, or material recommendations for scene design.
Elastic bending of initially flat slender elements allows the realization and economic fabrication of intriguing curved shapes. In this work, we derive an intuitive but rigorous geometric characterization of the design space of plane elastic rods with variable stiffness. It enables designers to determine which shapes are physically viable with active bending by visual inspection alone. Building on these insights, we propose a method for efficiently designing the geometry of a flat elastic rod that realizes a target equilibrium curve, which only requires solving a linear program.
We implement this method in an interactive computational design tool that gives feedback about the feasibility of a design, and computes the geometry of the structural elements necessary to realize it within an instant. The tool also offers an iterative optimization routine that improves the fabricability of a model while modifying it as little as possible. In addition, we use our geometric characterization to derive an algorithm for analyzing and recovering the stability of elastic curves that would otherwise snap out of their unstable equilibrium shapes by buckling. We show the efficacy of our approach by designing and manufacturing several physical models that are assembled from flat elements.
Basket weaving is a traditional craft for creating curved surfaces as an interwoven array of thin, flexible, and initially straight ribbons. The three-dimensional shape of a woven structure emerges through a complex interplay of the elastic bending behavior of the ribbons and the contact forces at their crossings. Curvature can be injected by carefully placing topological singularities in the otherwise regular weaving pattern. However, shape control through topology is highly non-trivial and inherently discrete, which severely limits the range of attainable woven geometries. Here, we demonstrate how to construct arbitrary smooth free-form surface geometries by weaving carefully optimized curved ribbons. We present an optimization-based approach to solving the inverse design problem for such woven structures. Our algorithm computes the ribbons' planar geometry such that their interwoven assembly closely approximates a given target design surface in equilibrium. We systematically validate our approach through a series of physical prototypes to show a broad range of new woven geometries that is not achievable by existing methods. We anticipate our computational approach to significantly enhance the capabilities for the design of new woven structures. Facilitated by modern digital fabrication technology, we see potential applications in material science, bio- and mechanical engineering, art, design, and architecture.
We present WireRoom, a computational framework for the intelligent design of abstract 3D wire art to depict a given 3D model. Our algorithm generates a set of 3D wire shapes from the 3D model with informative, visually pleasing, and concise structures. It is achieved by solving a dynamic travelling salesman problem on the surface of the 3D model with a multi-path expansion approach. We introduce a novel explorative computational design procedure by taking the generated wire shapes as candidates, avoiding manual design of the wire shape structure. We compare our algorithm with a baseline method and conduct a user study to investigate the usability of the framework and the quality of the produced wire shapes. The results of the comparison and user study confirm that our framework is effective for producing informative, visually pleasing, and concise wire shapes.
Humans and animals can control their bodies to generate a wide range of motions via low-dimensional action signals representing high-level goals. As such, human bodies and faces are prime examples of active objects, which can affect their shape via an internal actuation mechanism. This paper explores the following proposition: given a training set of example poses of an active deformable object, can we learn a low-dimensional control space that could reproduce the training set and generalize to new poses? In contrast to popular machine learning methods for dimensionality reduction such as auto-encoders, we model our active objects in a physics-based way. We utilize a differentiable, quasistatic, physics-based simulation layer and combine it with a decoder-type neural network. Our differentiable physics layer naturally fits into deep learning frameworks and allows the decoder network to learn actuations that reach the desired poses after physics-based simulation. In contrast to modeling approaches where users build anatomical models from first principles, medical literature or medical imaging, we do not presume knowledge of the underlying musculature, but learn the structure and control of the actuation mechanism directly from the input data. We present a training paradigm and several scalability-oriented enhancements that allow us to train effectively while accommodating high-resolution volumetric models, with as many as a quarter million simulation elements. The prime demonstration of the efficacy of our example-driven modeling framework targets facial animation, where we train on a collection of input expressions while generalizing to unseen poses, drive detailed facial animation from sparse motion capture input, and facilitate expression sculpting via direct manipulation.
Animating a newly designed character using motion capture (mocap) data is a long standing problem in computer animation. A key consideration is the skeletal structure that should correspond to the available mocap data, and the shape deformation in the joint regions, which often requires a tailored, pose-specific refinement. In this work, we develop a neural technique for articulating 3D characters using enveloping with a pre-defined skeletal structure which produces high quality pose dependent deformations. Our framework learns to rig and skin characters with the same articulation structure (e.g., bipeds or quadrupeds), and builds the desired skeleton hierarchy into the network architecture. Furthermore, we propose neural blend shapes - a set of corrective pose-dependent shapes which improve the deformation quality in the joint regions in order to address the notorious artifacts resulting from standard rigging and skinning. Our system estimates neural blend shapes for input meshes with arbitrary connectivity, as well as weighting coefficients which are conditioned on the input joint rotations. Unlike recent deep learning techniques which supervise the network with ground-truth rigging and skinning parameters, our approach does not assume that the training data has a specific underlying deformation model. Instead, during training, the network observes deformed shapes and learns to infer the corresponding rig, skin and blend shapes using indirect supervision. During inference, we demonstrate that our network generalizes to unseen characters with arbitrary mesh connectivity, including unrigged characters built by 3D artists. Conforming to standard skeletal animation models enables direct plug-and-play in standard animation software, as well as game engines.
This paper introduces a novel subspace method for the simulation of dynamic deformations. The method augments existing linear handle-based subspace formulations with nonlinear learning-based corrections parameterized by the same subspace. Together, they produce a compact nonlinear model that combines the fast dynamics and overall contact-based interaction of subspace methods, with the highly detailed deformations of learning-based methods. We propose a formulation of the model with nonlinear corrections applied on the local undeformed setting, and decoupling internal and external contact-driven corrections. We define a simple mapping of these corrections to the global setting, an efficient implementation for dynamic simulation, and a training pipeline to generate examples that efficiently cover the interaction space. Altogether, the method achieves unprecedented combination of speed and contact-driven deformation detail.
The computational design of soft underwater swimmers is challenging because of the high degrees of freedom in soft-body modeling. In this paper, we present a differentiable pipeline for co-designing a soft swimmer's geometry and controller. Our pipeline unlocks gradient-based algorithms for discovering novel swimmer designs more efficiently than traditional gradient-free solutions. We propose Wasserstein barycenters as a basis for the geometric design of soft underwater swimmers since it is differentiable and can naturally interpolate between bio-inspired base shapes via optimal transport. By combining this design space with differentiable simulation and control, we can efficiently optimize a soft underwater swimmer's performance with fewer simulations than baseline methods. We demonstrate the efficacy of our method on various design problems such as fast, stable, and energy-efficient swimming and demonstrate applicability to multi-objective design.
Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.
In recent years, considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). Even so, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions. To address this issue, we present a novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain. SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach, designed to directly tackle the spectral bias of neural networks, yields an improvement in the ability to generate medium and high frequency content, including structures which other networks fail to learn. We demonstrate the advantage of our method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to more realistic high-frequency content, even when trained for fewer iterations. Furthermore, we verify that our model's latent space retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks, and show that our frequency-aware approach also induces improved high-frequency performance in downstream tasks.
We present a fitted model of sky dome radiance and attenuation for realistic terrestrial atmospheres. Using scatterer distribution data from atmospheric measurement data, our model considerably improves on the visual realism of existing analytical clear sky models, as well as of interactive methods that are based on approximating atmospheric light transport. We also provide features not found in fitted models so far: radiance patterns for post-sunset conditions, in-scattered radiance and attenuation values for finite viewing distances, an observer altitude resolved model that includes downward-looking viewing directions, as well as polarisation information. We introduce a fully spherical model for in-scattered radiance that replaces the family of hemispherical functions originally introduced by Perez et al., and which was extended for several subsequent analytical models: our model relies on reference image compression via tensor decomposition instead.
We introduce a novel transmittance model to improve the volumetric representation of 3D scenes. The model can represent opaque surfaces in the volumetric light transport framework. Volumetric representations are useful for complex scenes, and become increasingly popular for level of detail and scene reconstruction. The traditional exponential transmittance model found in volumetric light transport cannot capture correlations in visibility across volume elements. When representing opaque surfaces as volumetric density, this leads to both bloating of silhouettes and light leaking artifacts. By introducing a parametric non-exponential transmittance model, we are able to approximate these correlation effects and significantly improve the accuracy of volumetric appearance representation of opaque scenes. Our parametric transmittance model can represent a continuum between the linear transmittance that opaque surfaces exhibit and the traditional exponential transmittance encountered in participating media and unstructured geometries. This covers a large part of the spectrum of geometric structures encountered in complex scenes. In order to handle the spatially varying transmittance correlation effects, we further extend the theory of non-exponential participating media to a heterogeneous transmittance model. Our model is compact in storage and computationally efficient both for evaluation and for reverse-mode gradient computation. Applying our model to optimization algorithms yields significant improvements in volumetric scene appearance quality. We further show improvements for relevant applications, such as scene appearance prefiltering, image-based scene reconstruction using differentiable rendering, neural representations, and compare it to a conventional exponential model.
We present an in-depth analysis of the sources of variance in state-of-the-art unbiased volumetric transmittance estimators, and propose several new methods for improving their efficiency. These combine to produce a single estimator that is universally optimal relative to prior work, with up to several orders of magnitude lower variance at the same cost, and has zero variance for any ray with non-varying extinction. We first reduce the variance of truncated power-series estimators using a novel efficient application of U-statistics. We then greatly reduce the average expansion order of the power series and redistribute density evaluations to filter the optical depth estimates with an equidistant sampling comb. Combined with the use of an online control variate built from a sampled mean density estimate, the resulting estimator effectively performs ray marching most of the time while using rarely-sampled higher-order terms to correct the bias.
In the context of geometric acoustic simulation, one of the more perceptually important yet difficult to simulate acoustic effects is diffraction, a phenomenon that allows sound to propagate around obstructions and corners. A significant bottleneck in real-time simulation of diffraction is the enumeration of high-order diffraction propagation paths in scenes with complex geometry (e.g. highly tessellated surfaces). To this end, we present a dynamic geometric diffraction approach that consists of an extensive mesh preprocessing pipeline and complementary runtime algorithm. The preprocessing module identifies a small subset of edges that are important for diffraction using a novel silhouette edge detection heuristic. It also extends these edges with planar diffraction geometry and precomputes a graph data structure encoding the visibility between the edges. The runtime module uses bidirectional path tracing against the diffraction geometry to probabilistically explore potential paths between sources and listeners, then evaluates the intensities for these paths using the Uniform Theory of Diffraction. It uses the edge visibility graph and the A* pathfinding algorithm to robustly and efficiently find additional high-order diffraction paths. We demonstrate how this technique can simulate 10th-order diffraction up to 568 times faster than the previous state of the art, and can efficiently handle large scenes with both high geometric complexity and high numbers of sources.
Physically accurate rendering often calls for taking the wave nature of light into consideration. In computer graphics, this is done almost exclusively locally, i.e. on a micrometre scale where the diffractive phenomena arise. However, the statistical properties of light, that dictate its coherence characteristics and its capacity to give rise to wave interference effects, evolve globally: these properties change on, e.g., interaction with a surface, diffusion by participating media and simply by propagation. In this paper, we derive the first global light transport framework that is able to account for these properties of light and, therefore, is fully consistent with Maxwell's electromagnetic theory. We show that our framework is a generalization of the classical, radiometry-based light transport---prominent in computer graphics---and retains some of its attractive properties. Finally, as a proof of concept, we apply the presented framework to a few practical problems in rendering and validate against well-studied methods in optics.
With the advent of real-time ray tracing, there is an increasing interest in GPU-friendly importance sampling techniques. We present such methods to sample convex polygonal lights approximately proportional to diffuse and specular BRDFs times the cosine term. For diffuse surfaces, we sample the polygons proportional to projected solid angle. Our algorithm partitions the polygon suitably and employs inverse function sampling for each part. Inversion of the distribution function is challenging. Using algebraic geometry, we develop a special iterative procedure and an initialization scheme. Together, they achieve high accuracy in all possible situations with only two iterations. Our implementation is numerically stable and fast. For specular BRDFs, this method enables us to sample the polygon proportional to a linearly transformed cosine. We combine these diffuse and specular sampling strategies through novel variants of optimal multiple importance sampling. Our techniques render direct lighting from Lambertian polygonal lights with almost no variance outside of penumbrae and support shadows and textured emission. Additionally, we propose an algorithm for solid angle sampling of polygons. It is faster and more stable than existing methods.
We explore the space of (0, m, 2)-nets in base 2 commonly used for sampling. We present a novel constructive algorithm that can exhaustively generate all nets --- up to m-bit resolution --- and thereby compute the exact number of distinct nets. We observe that the construction algorithm holds the key to defining a transformation operation that lets us transform one valid net into another one. This enables the optimization of digital nets using arbitrary objective functions. For example, we define an analytic energy function for blue noise, and use it to generate nets with high-quality blue-noise frequency power spectra. We also show that the space of (0, 2)-sequences is significantly smaller than nets with the same number of points, which drastically limits the optimizability of sequences.
We present an algorithm for producing a seamless animated loop from a single image. The algorithm detects periodic structures, such as the windows of a building or the steps of a staircase, and generates a non-trivial displacement vector field that maps each segment of the structure onto a neighboring segment along a user- or auto-selected main direction of motion. This displacement field is used, together with suitable temporal and spatial smoothing, to warp the image and produce the frames of a continuous animation loop. Our cinemagraphs are created in under a second on a mobile device. Over 140,000 users downloaded our app and exported over 350,000 cinemagraphs. Moreover, we conducted two user studies that show that users prefer our method for creating surreal and structured cinemagraphs compared to more manual approaches and compared to previous methods.
We present a learning-based method for building driving-signal aware full-body avatars. Our model is a conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, and produces a high-quality representation of human geometry and view-dependent appearance. The core intuition behind our method is that better drivability and generalization can be achieved by disentangling the driving signals and remaining generative factors, which are not available during animation. To this end, we explicitly account for information deficiency in the driving signal by introducing a latent space that exclusively captures the remaining information, thus enabling the imputation of the missing factors required during full-body animation, while remaining faithful to the driving signal. We also propose a learnable localized compression for the driving signal which promotes better generalization, and helps minimize the influence of global chance-correlations often found in real datasets. For a given driving signal, the resulting variational model produces a compact space of uncertainty for missing factors that allows for an imputation strategy best suited to a particular application. We demonstrate the efficacy of our approach on the challenging problem of full-body animation for virtual telepresence with driving signals acquired from minimal sensors placed in the environment and mounted on a VR-headset.
Synthesizing graceful and life-like behaviors for physically simulated characters has been a fundamental challenge in computer animation. Data-driven methods that leverage motion tracking are a prominent class of techniques for producing high fidelity motions for a wide range of behaviors. However, the effectiveness of these tracking-based methods often hinges on carefully designed objective functions, and when applied to large and diverse motion datasets, these methods require significant additional machinery to select the appropriate motion for the character to track in a given scenario. In this work, we propose to obviate the need to manually design imitation objectives and mechanisms for motion selection by utilizing a fully automated approach based on adversarial imitation learning. High-level task objectives that the character should perform can be specified by relatively simple reward functions, while the low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips, without any explicit clip selection or sequencing. For example, a character traversing an obstacle course might utilize a task-reward that only considers forward progress, while the dataset contains clips of relevant behaviors such as running, jumping, and rolling. These motion clips are used to train an adversarial motion prior, which specifies style-rewards for training the character through reinforcement learning (RL). The adversarial RL procedure automatically selects which motion to perform, dynamically interpolating and generalizing from the dataset. Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips. Composition of disparate skills emerges automatically from the motion prior, without requiring a high-level motion planner or other task-specific annotations of the motion clips. We demonstrate the effectiveness of our framework on a diverse cast of complex simulated characters and a challenging suite of motor control tasks.
Despite strong demand in the game and film industry, automatically synthesizing high-quality dance motions remains a challenging task. In this paper, we present ChoreoMaster, a production-ready music-driven dance motion synthesis system. Given a piece of music, ChoreoMaster can automatically generate a high-quality dance motion sequence to accompany the input music in terms of style, rhythm and structure. To achieve this goal, we introduce a novel choreography-oriented choreomusical embedding framework, which successfully constructs a unified choreomusical embedding space for both style and rhythm relationships between music and dance phrases. The learned choreomusical embedding is then incorporated into a novel choreography-oriented graph-based motion synthesis framework, which can robustly and efficiently generate high-quality dance motions following various choreographic rules. Moreover, as a production-ready system, ChoreoMaster is sufficiently controllable and comprehensive for users to produce desired results. Experimental results demonstrate that dance motions generated by ChoreoMaster are accepted by professional artists.
In two-player competitive sports, such as boxing and fencing, athletes often demonstrate efficient and tactical movements during a competition. In this paper, we develop a learning framework that generates control policies for physically simulated athletes who have many degrees-of-freedom. Our framework uses a two step-approach, learning basic skills and learning bout-level strategies, with deep reinforcement learning, which is inspired by the way that people how to learn competitive sports. We develop a policy model based on an encoder-decoder structure that incorporates an autoregressive latent variable, and a mixture-of-experts decoder. To show the effectiveness of our framework, we implemented two competitive sports, boxing and fencing, and demonstrate control policies learned by our framework that can generate both tactical and natural-looking behaviors. We also evaluate the control policies with comparisons to other learning configurations and with ablation studies.
Creating agile and responsive characters from a collection of unorganized human motion has been an important problem of constructing interactive virtual environments. Recently, learning-based approaches have successfully been exploited to learn deep network policies for the control of interactive characters. The agility and responsiveness of deep network policies are influenced by many factors, such as the composition of training datasets, the architecture of network models, and learning algorithms that involve many threshold values, weights, and hyper-parameters. In this paper, we present a novel teacher-student framework to learn time-critically responsive policies, which guarantee the time-to-completion between user inputs and their associated responses regardless of the size and composition of the motion databases. We demonstrate the effectiveness of our approach with interactive characters that can respond to the user's control quickly while performing agile, highly dynamic movements.
We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this under-constrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction MLP over the entire input video. By recursively unrolling the scene-flow prediction MLP over varying time steps, we compute both short-range scene flow to impose local smooth motion priors directly in 3D, and long-range scene flow to impose multi-view consistency constraints with wide baselines. We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars), as well as camera motion. Our depth maps give rise to a number of depth-and-motion aware video editing effects such as object and lighting insertion.
Generating free-viewpoint videos is critical for immersive VR/AR experience, but recent neural advances still lack the editing ability to manipulate the visual perception for large dynamic scenes. To fill this gap, in this paper, we propose the first approach for editable free-viewpoint video generation for large-scale view-dependent dynamic scenes using only 16 cameras. The core of our approach is a new layered neural representation, where each dynamic entity, including the environment itself, is formulated into a spatio-temporal coherent neural layered radiance representation called ST-NeRF. Such a layered representation supports manipulations of the dynamic scene while still supporting a wide free viewing experience. In our ST-NeRF, we represent the dynamic entity/layer as a continuous function, which achieves the disentanglement of location, deformation as well as the appearance of the dynamic entity in a continuous and self-supervised manner. We propose a scene parsing 4D label map tracking to disentangle the spatial information explicitly and a continuous deform module to disentangle the temporal motion implicitly. An object-aware volume rendering scheme is further introduced for the re-assembling of all the neural layers. We adopt a novel layered loss and motion-aware ray sampling strategy to enable efficient training for a large dynamic scene with multiple performers, Our framework further enables a variety of editing functions, i.e., manipulating the scale and location, duplicating or retiming individual neural layers to create numerous visual effects while preserving high realism. Extensive experiments demonstrate the effectiveness of our approach to achieve high-quality, photo-realistic, and editable free-viewpoint video generation for dynamic scenes.
Color correction and color grading are important steps in film production. Recent palette-based approaches to image recoloring have shown that a small set of representative colors provide an intuitive set of handles for color adjustment. However, a single, static palette cannot represent the time-varying colors in a video. We introduce a spatial-temporal geometry-based approach to video recoloring. Specifically, its core is a 4D skew polytope with a few vertices that approximately encloses the video pixels in color and time, which implicitly defines time-varying palettes through slicing of the 4D skew polytope at specific time values. Our geometric palette is compact, descriptive, and provides a correspondence between colors throughout the video, including topological changes when colors merge or split. Experiments show that our method produces natural, artifact-free recoloring.
We present SP-GAN, a new unsupervised sphere-guided generative model for direct synthesis of 3D shapes in the form of point clouds. Compared with existing models, SP-GAN is able to synthesize diverse and high-quality shapes with fine details and promote controllability for part-aware shape generation and manipulation, yet trainable without any parts annotations. In SP-GAN, we incorporate a global prior (uniform points on a sphere) to spatially guide the generative process and attach a local prior (a random latent code) to each sphere point to provide local details. The key insight in our design is to disentangle the complex 3D shape generation task into a global shape modeling and a local structure adjustment, to ease the learning process and enhance the shape generation quality. Also, our model forms an implicit dense correspondence between the sphere points and points in every generated shape, enabling various forms of structure-aware shape manipulations such as part editing, part-wise shape interpolation, and multi-shape part composition, etc., beyond the existing generative models. Experimental results, which include both visual and quantitative evaluations, demonstrate that our model is able to synthesize diverse point clouds with fine details and less noise, as compared with the state-of-the-art models.
Representing complex 3D objects as simple geometric primitives, known as shape abstraction, is important for geometric modeling, structural analysis, and shape synthesis. In this paper, we propose an unsupervised shape abstraction method to map a point cloud into a compact cuboid representation. We jointly predict cuboid allocation as part segmentation and cuboid shapes and enforce the consistency between the segmentation and shape abstraction for self-learning. For the cuboid abstraction task, we transform the input point cloud into a set of parametric cuboids using a variational auto-encoder network. The segmentation network allocates each point into a cuboid considering the point-cuboid affinity. Without manual annotations of parts in point clouds, we design four novel losses to jointly supervise the two branches in terms of geometric similarity and cuboid compactness. We evaluate our method on multiple shape collections and demonstrate its superiority over existing shape abstraction methods. Moreover, based on our network architecture and learned representations, our approach supports various applications including structured shape generation, shape interpolation, and structural shape clustering.
A popular way to create detailed yet easily controllable 3D shapes is via procedural modeling, i.e. generating geometry using programs. Such programs consist of a series of instructions along with their associated parameter values. To fully realize the benefits of this representation, a shape program should be compact and only expose degrees of freedom that allow for meaningful manipulation of output geometry. One way to achieve this goal is to design higher-level macro operators that, when executed, expand into a series of commands from the base shape modeling language. However, manually authoring such macros, much like shape programs themselves, is difficult and largely restricted to domain experts. In this paper, we present ShapeMOD, an algorithm for automatically discovering macros that are useful across large datasets of 3D shape programs. ShapeMOD operates on shape programs expressed in an imperative, statement-based language. It is designed to discover macros that make programs more compact by minimizing the number of function calls and free parameters required to represent an input shape collection. We run ShapeMOD on multiple collections of programs expressed in a domain-specific language for 3D shape structures. We show that it automatically discovers a concise set of macros that abstract out common structural and parametric patterns that generalize over large shape collections. We also demonstrate that the macros found by ShapeMOD improve performance on downstream tasks including shape generative modeling and inferring programs from point clouds. Finally, we conduct a user study that indicates that ShapeMOD's discovered macros make interactive shape editing more efficient.
We present a guaranteed quality mesh generation algorithm for the curvilinear triangulation of planar domains with piecewise polynomial boundary. The resulting mesh consists of higher-order triangular elements which are not only regular (i.e., with injective geometric map) but respect strict bounds on quality measures like scaled Jacobian and MIPS distortion. This also implies that the curved triangles' inner angles are bounded from above and below. These are key quality criteria, for instance, in the field of finite element analysis. The domain boundary is reproduced exactly, without geometric approximation error. The central idea is to transform the curvilinear meshing problem into a linear meshing problem via a carefully constructed transformation of bounded distortion, enabling us to leverage key results on guaranteed-quality straight-edge triangulation. The transformation is based on a simple yet general construction and observations about convergence properties of curves under subdivision. Our algorithm can handle arbitrary polynomial order, arbitrarily sharp corners, feature and interface curves, and can be executed using rational arithmetic for strict reliability.
We present a new algorithm for the semi-regular quadrangulation of an input surface, driven by its line features, such as sharp creases. We define a perfectly feature-aligned cross-field and a coarse layout of polygonal-shaped patches where we strictly ensure that all the feature-lines are represented as patch boundaries. To be able to consistently do so, we allow non-quadrilateral patches and T-junctions in the layout; the key is the ability to constrain the layout so that it still admits a globally consistent, T-junction-free, and pure-quad internal tessellation of its patches. This requires the insertion of additional irregular-vertices inside patches, but the regularity of the final-mesh is safeguarded by optimizing for both their number and for their reciprocal alignment. In total, our method guarantees the reproduction of feature-lines by construction, while still producing good quality, isometric, pure-quad, conforming meshes, making it an ideal candidate for CAD models. Moreover, the method is fully automatic, requiring no user intervention, and remarkably reliable, requiring little assumptions on the input mesh, as we demonstrate by batch processing the entire Thingi10K repository, with less than 0.5% of the attempted cases failing to produce a usable mesh.
We present a new approach for computing planar hexagonal meshes that approximate a given surface, represented as a triangle mesh. Our method is based on two novel technical contributions. First, we introduce Coordinate Power Fields, which are a pair of tangent vector fields on the surface that fulfill a certain continuity constraint. We prove that the fulfillment of this constraint guarantees the existence of a seamless parameterization with quantized rotational jumps, which we then use to regularly remesh the surface. We additionally propose an optimization framework for finding Coordinate Power Fields, which also fulfill additional constraints, such as alignment, sizing and bijectivity. Second, we build upon this framework to address a challenging meshing problem: planar hexagonal meshing. To this end, we suggest a combination of conjugacy, scaling and alignment constraints, which together lead to planarizable hexagons. We demonstrate our approach on a variety of surfaces, automatically generating planar hexagonal meshes on complicated meshes, which were not achievable with existing methods.
We introduce a robust and automatic algorithm to convert linear triangle meshes with feature annotated into coarse tetrahedral meshes with curved elements. Our construction guarantees that the high-order meshes are free of element inversion or self-intersection. A user-specified maximal geometrical error from the input mesh controls the faithfulness of the curved approximation. The boundary of the output mesh is in bijective correspondence to the input, enabling attribute transfer between them, such as boundary conditions for simulations, making our curved mesh an ideal replacement or complement for the original input geometry.
The availability of a bijective shell around the input surface is employed to ensure robust curving, prevent self-intersections, and compute a bijective map between the linear input and curved output surface. As necessary building blocks of our algorithm, we extend the bijective shell formulation to support features and propose a robust approach for boundary-preserving linear tetrahedral meshing.
We demonstrate the robustness and effectiveness of our algorithm by generating high-order meshes for a large collection of complex 3D models.
We propose a framework of efficient nonlinear deformable simulation with both fast continuous collision detection and robust collision resolution. We name this new framework Medial IPC as it integrates the merits from medial elastics, for an efficient and versatile reduced simulation, as well as incremental potential contact, for a robust collision and contact resolution. We leverage medial axis transform to construct a kinematic subspace. Instead of resorting to projective dynamics, we use classic hyperelastics to embrace real-world nonlinear materials. A novel reduced continuous collision detection algorithm is presented based on the medial mesh. Thanks to unique geometric properties of medial axis and medial primitives, we derive closed-form formulations for identifying between-primitive collision within the reduced medial space. In the meantime, the implicit barrier energy that generates necessary repulsion forces for collision resolution is also formulated with the medial coordinate. In other words, Medial IPC exploits a universal reduced coordinate for simulation, continuous self-/collision detection, and IPC-based collision resolution. Continuous collision detection also allows more aggressive time stepping. In addition, we carefully implement our system with a heterogeneous CPU-GPU deployment such that massively parallelizable computations are carried out on the GPU while few sequential computations are on the CPU. Such implementation also frees us from generating training poses for selecting Cubature points and pre-computing their weights. We have tested our method on complicated deformable models and collision-rich simulation scenarios. Due to the reduced nature of our system, the computation is faster than fullspace IPC or other fullspace methods using continuous collision detection by at least one order. The simulation remains high-quality as the medial subspace captures intriguing and local deformations with sufficient realism.
We present a purely geometric, time-independent deformer resolving local contacts between elastic objects, including self-collisions between adjacent parts of the same object that often occur in character skinning animation. Starting from multiple meshes in intersection, our deformer first computes the parts of the surfaces remaining in contact, and then applies a procedural displacement with volume preservation. Although our deformer processes each frame independently, it achieves temporally continuous deformations with artistic control of the bulge through few pseudo-stiffness parameters. The plausibility of the deformation is further enhanced by anisotropically spreading the volume-preserving bulge. The result is a robust, real-time deformer that can handle complex geometric configurations such as a ball squashed by a hand, colliding lips, bending fingers, etc.
This paper proposes a novel energy-momentum conserving integration method. Adopting Projective Dynamics, the proposed method extends its unconstrained minimization for time integration into the constrained form with the position-based energy-momentum constraints. This resolves the well-known problem of unwanted dissipation of energy and momenta without compromising the real-time performance and simulation stability. The proposed method also enables users to directly control the energy and momenta so as to easily create the vivid deformable and global motions they want, which is a fascinating feature for many real-time applications such as virtual/augmented reality and games.
High-resolution fluid simulations are computationally expensive, so many post-processing methods have been proposed to add turbulent details to low-resolution flows. Guiding methods are one promising approach for adding naturalistic, detailed motions as a post-process, but can be inefficient. Thus, we propose a novel, efficient method that formulates fluid guidance as a minimization problem in stream function space. Input flows are first converted into stream functions, and a high resolution flow is then computed via optimization. The resulting problem sizes are much smaller than previous approaches, resulting in faster computation times. Additionally, our method does not require an expensive pressure projection, but still preserves mass. The method is both easy to implement and easy to control, as the user can control the degree of guiding with a single, intuitive parameter. We demonstrate the effectiveness of our method across various examples.
This paper aims to efficiently construct the volume of heterogeneous single-scattering albedo for a given medium that would lead to desired color appearance. We achieve this goal by formulating it as a volumetric style transfer problem in which an input 3D density volume is stylized using color features extracted from a reference 2D image. Unlike existing algorithms that require cumbersome iterative optimizations, our method leverages a feed-forward deep neural network with multiple well-designed modules. At the core of our network is a stylizing kernel predictor (SKP) that extracts multi-scale feature maps from a 2D style image and predicts a handful of stylizing kernels as a highly non-linear combination of the feature maps. Each group of stylizing kernels represents a specific style. A volume autoencoder (VolAE) is designed and jointly learned with the SKP to transform a density volume to an albedo volume based on these stylizing kernels. Since the autoencoder does not encode any style information, it can generate different albedo volumes with a wide range of appearance once training is completed. Additionally, a hybrid multi-scale loss function is used to learn plausible color features and guarantee temporal coherence for time-evolving volumes. Through comprehensive experiments, we validate the effectiveness of our method and show its superiority by comparing against state-of-the-arts. We show that with our method a novice user can easily create a diverse set of realistic translucent effects for 3D models (either static or dynamic), neglecting any cumbersome process of parameter tuning.
Resulting from changing climatic conditions, wildfires have become an existential threat across various countries around the world. The complex dynamics paired with their often rapid progression renders wildfires an often disastrous natural phenomenon that is difficult to predict and to counteract. In this paper we present a novel method for simulating wildfires with the goal to realistically capture the combustion process of individual trees and the resulting propagation of fires at the scale of forests. We rely on a state-of-the-art modeling approach for large-scale ecosystems that enables us to represent each plant as a detailed 3D geometric model. We introduce a novel mathematical formulation for the combustion process of plants - also considering effects such as heat transfer, char insulation, and mass loss - as well as for the propagation of fire through the entire ecosystem. Compared to other wildfire simulations which employ geometric representations of plants such as cones or cylinders, our detailed 3D tree models enable us to simulate the interplay of geometric variations of branching structures and the dynamics of fire and wood combustion. Our simulation runs at interactive rates and thereby provides a convenient way to explore different conditions that affect wildfires, ranging from terrain elevation profiles and ecosystem compositions to various measures against wildfires, such as cutting down trees as firebreaks, the application of fire retardant, or the simulation of rain.
We present a neural scene graph---a modular and controllable representation of scenes with elements that are learned from data. We focus on the forward rendering problem, where the scene graph is provided by the user and references learned elements. The elements correspond to geometry and material definitions of scene objects and constitute the leaves of the graph; we store them as high-dimensional vectors. The position and appearance of scene objects can be adjusted in an artist-friendly manner via familiar transformations, e.g. translation, bending, or color hue shift, which are stored in the inner nodes of the graph. In order to apply a (non-linear) transformation to a learned vector, we adopt the concept of linearizing a problem by lifting it into higher dimensions: we first encode the transformation into a high-dimensional matrix and then apply it by standard matrix-vector multiplication. The transformations are encoded using neural networks. We render the scene graph using a streaming neural renderer, which can handle graphs with a varying number of objects, and thereby facilitates scalability. Our results demonstrate a precise control over the learned object representations in a number of animated 2D and 3D scenes. Despite the limited visual complexity, our work presents a step towards marrying traditional editing mechanisms with learned representations, and towards high-quality, controllable neural rendering.
Establishing a consistent normal orientation for point clouds is a notoriously difficult problem in geometry processing, requiring attention to both local and global shape characteristics. The normal direction of a point is a function of the local surface neighborhood; yet, point clouds do not disclose the full underlying surface structure. Even assuming known geodesic proximity, calculating a consistent normal orientation requires the global context. In this work, we introduce a novel approach for establishing a globally consistent normal orientation for point clouds. Our solution separates the local and global components into two different sub-problems. In the local phase, we train a neural network to learn a coherent normal direction per patch (i.e., consistently oriented normals within a single patch). In the global phase, we propagate the orientation across all coherent patches using a dipole propagation. Our dipole propagation decides to orient each patch using the electric field defined by all previously orientated patches. This gives rise to a global propagation that is stable, as well as being robust to nearby surfaces, holes, sharp features and noise.
Constrained by the limitations of learning toolkits engineered for other applications, such as those in image processing, many mesh-based learning algorithms employ data flows that would be atypical from the perspective of conventional geometry processing. As an alternative, we present a technique for learning from meshes built from standard geometry processing modules and operations. We show that low-order eigenvalue/eigenvector computation from operators parameterized using discrete exterior calculus is amenable to efficient approximate backpropagation, yielding spectral per-element or per-mesh features with similar formulas to classical descriptors like the heat/wave kernel signatures. Our model uses few parameters, generalizes to high-resolution meshes, and exhibits performance and time complexity on par with past work.
Many problems in computer graphics and computer vision applications involves inferring a rotation from a variety of different forms of inputs. With the increasing use of deep learning, neural networks have been employed to solve such problems. However, the traditional representations for 3D rotations, the quaternions and Euler angles, are found to be problematic for neural networks in practice, producing seemingly unavoidable large estimation errors. Previous researches has identified the discontinuity of the mapping from SO(3) to the quaternions or Euler angles as the source of such errors, and to solve it, embeddings of SO(3) have been proposed as the output representation of rotation estimation networks instead. In this paper, we argue that the argument against quaternions and Euler angles from local discontinuities of the mappings from SO(3) is flawed, and instead provide a different argument from the global topological properties of SO(3) that also establishes the lower bound of maximum error when using quaternions and Euler angles for rotation estimation networks. Extending from this view, we discover that rotation symmetries in the input object causes additional topological problems that even using embeddings of SO(3) as the output representation would not correctly handle. We propose the self-selecting ensemble, a topologically motivated approach, where the network makes multiple predictions and assigns weights to them. We show theoretically and with experiments that our methods can be combined with a wide range of different rotation representations and can handle all kinds of finite symmetries in 3D rotation estimation problems.
Triangle mesh-based simulations are able to produce satisfying animations of knitted and woven cloth; however, they lack the rich geometric detail of yarn-level simulations. Naive texturing approaches do not consider yarn-level physics, while full yarn-level simulations may become prohibitively expensive for large garments. We propose a method to animate yarn-level cloth geometry on top of an underlying deforming mesh in a mechanics-aware fashion. Using triangle strains to interpolate precomputed yarn geometry, we are able to reproduce effects such as knit loops tightening under stretching. In combination with precomputed mesh animation or real-time mesh simulation, our method is able to animate yarn-level cloth in real-time at large scales.
In this paper, we study physics-based cloth simulation in a very high resolution setting, presumably at submillimeter levels with millions of vertices, to meet perceptual precision of our human eyes. State-of-the-art simulation techniques, mostly developed for unstructured triangular meshes, can hardly meet this demand due to their large computational costs and memory footprints. We argue that in a very high resolution, it is more plausible to use regular meshes with an underlying grid structure, which can be highly compatible with GPU acceleration like high-resolution images. Based on this idea, we formulate and solve the nonlinear optimization problem for simulating high-resolution wrinkles, by a fast block-based descent method with reduced memory accesses. We also investigate the development of the collision handling component in our system, whose performance benefits greatly from the grid structure. Finally, we explore various issues related to the applications of our system, including initialization for fast convergence and temporal coherence, gathering effects, inflation and stuffing models, and mesh simplification. We can treat our system as a quasistatic wrinkle synthesis tool, run it as a standalone dynamic simulator, or integrate it into a multi-resolution solver as an additional component. The experiment demonstrates the capability, efficiency and flexibility of our system in producing a variety of high-resolution wrinkles effects.
We extend the incremental potential contact (IPC) model [Li et al. 2020a] for contacting elastodynamics to resolve systems composed of codimensional degrees-of-freedoms in arbitrary combination. This enables a unified, interpenetration-free, robust, and stable simulation framework that couples codimension-0,1,2, and 3 geometries seamlessly with frictional contact. Extending the IPC model to thin structures poses new challenges in computing strain, modeling thickness and determining collisions. To address these challenges we propose three corresponding contributions. First, we introduce a C2 constitutive barrier model that directly enforces strain limiting as an energy potential while preserving rest state. This provides energetically-consistent strain limiting models (both isotropic and anisotropic) for cloth that enable strict satisfaction of strain-limit inequalities with direct coupling to both elastodynamics and contact via minimization of the incremental potential. Second, to capture the geometric thickness of codimensional domains we extend the IPC model to directly enforce distance offsets. Our treatment imposes a strict guarantee that mid-surfaces (respectively mid-lines) of shells (respectively rods) will not move closer than applied thickness values, even as these thicknesses become characteristically small. This enables us to account for thickness in the contact behavior of codimensional structures and so robustly capture challenging contacting geometries; a number of which, to our knowledge, have not been simulated before. Third, codimensional models, especially with modeled thickness, mandate strict accuracy requirements that pose a severe challenge to all existing continuous collision detection (CCD) methods. To address these limitations we develop a new, efficient, simple-to-implement additive CCD (ACCD) method that applies conservative advancement [Mirtich 1996; Zhang et al. 2006] to iteratively refine a lower bound for deforming primitives, converging to time of impact. In combination these contributions enable codimensional IPC (C-IPC). We perform extensive benchmark experiments to validate the efficacy of our method in capturing intricate behaviors of thin-structure contact and resulting bulk effects. In our experiments C-IPC obtains feasible, convergent, and so artifact-free solutions for all time steps, across all tested examples - producing robust simulations. We test C-IPC across extreme deformations, large time steps, and exceedingly close contact over all possible pairings of codimensional domains. Finally, with our strain-limit model, we confirm C-IPC guarantees non-intersection and strain-limit satisfaction for all reasonable (and well below - verified down to 0.1%) strain limits throughout all time steps.
Manufactured parts are meticulously engineered to perform well with respect to several conflicting metrics, like weight, stress, and cost. The best achievable trade-offs reside on the Pareto front, which can be discovered via performance-driven optimization. The objectives that define this Pareto front often incorporate assumptions about the context in which a part will be used, including loading conditions, environmental influences, material properties, or regions that must be preserved to interface with a surrounding assembly. Existing multi-objective optimization tools are only equipped to study one context at a time, so engineers must run independent optimizations for each context of interest. However, engineered parts frequently appear in many contexts: wind turbines must perform well in many wind speeds, and a bracket might be optimized several times with its bolt-holes fixed in different locations on each run. In this paper, we formulate a framework for variable-context multi-objective optimization. We introduce the Pareto gamut, which captures Pareto fronts over a range of contexts. We develop a global/local optimization algorithm to discover the Pareto gamut directly, rather than discovering a single fixed-context "slice" at a time. To validate our method, we adapt existing multi-objective optimization benchmarks to contextual scenarios. We also demonstrate the practical utility of Pareto gamut exploration for several engineering design problems.
Complex deformable face-rigs have many independent parameters that control the shape of the object. A human face has upwards of 50 parameters (FACS Action Units), making conventional UI controls hard to find and operate. Animators address this problem by tediously hand-crafting in-situ layouts of UI controls that serve as visual deformation proxies, and facilitate rapid shape exploration. We propose the automatic creation of such in-situ UI control layouts. We distill the design choices made by animators into mathematical objectives that we optimize as the solution to an integer quadratic programming problem. Our evaluation is three-fold: we show the impact of our design principles on the resulting layouts; we show automated UI layouts for complex and diverse face rigs, comparable to animator handcrafted layouts; and we conduct a user study showing our UI layout to be an effective approach to face-rig manipulation, preferable to a baseline slider interface.
Parametric shapes model objects as programs producing a geometry based on a few semantic degrees of freedom, called hyper-parameters. These shapes are the typical output of non-destructive modeling, CAD modeling or rigging. However they suffer from the core issue of being manipulated only indirectly, through a series of values rather than the geometry itself. In this paper, we introduce an amendment process of the underlying direct acyclic graph (DAG) of a parametric shape. This amendment enables a local differentiation of the shape w.r.t. its hyper-parameters that we leverage to provide interactive direct manipulation of the output. By acting on the shape synthesis process itself, our method is agnostic to the variations of the connectivity and topology that may occur in its output while changing the input hyper-parameters. Furthermore, our method is oblivious to the internal logic of the DAG nodes. We illustrate our approach on a collection of examples combining the typical nodes found in modern parametric modeling packages - such as deformation, booleans and surfacing operators - for which our method provides the user with inverse control over the hyper-parameters through a brush stroke metaphor.
We present an algorithmic approach to designing animatronic figures - expressive robotic characters whose movements are driven by a large number of actuators. The input to our design system provides a high-level specification of the space of motions the character should be able to perform. The output consists of a fully functional mechatronic blueprint. We cast the design task as a search problem in a vast combinatorial space of possible solutions. To find an optimal design in this space, we propose an efficient best-first search algorithm that is guided by an admissible heuristic. The objectives guiding the search process demand that the design remains free of singularities and self-collisions at any point in the high-dimensional space of motions the character is expected to be able to execute. To identify worst-case self-collision scenarios for multi degree-of-freedom closed-loop mechanisms, we additionally develop an elegant technique inspired by the concept of adversarial attacks. We demonstrate the efficacy of our approach by creating designs for several animatronic figures of varying complexity.
We propose NeuMIP, a neural method for representing and rendering a variety of material appearances at different scales. Classical prefiltering (mipmapping) methods work well on simple material properties such as diffuse color, but fail to generalize to normals, self-shadowing, fibers or more complex microstructures and reflectances. In this work, we generalize traditional mipmap pyramids to pyramids of neural textures, combined with a fully connected network. We also introduce neural offsets, a novel method which enables rendering materials with intricate parallax effects without any tessellation. This generalizes classical parallax mapping, but is trained without supervision by any explicit heightfield. Neural materials within our system support a 7-dimensional query, including position, incoming and outgoing direction, and the desired filter kernel size. The materials have small storage (on the order of standard mipmapping except with more texture channels), and can be integrated within common Monte-Carlo path tracing systems. We demonstrate our method on a variety of materials, resulting in complex appearance across levels of detail, with accurate parallax, self-shadowing, and other effects.
Layered materials exhibit a wide range of appearance, due to the combined effects of absorption and scattering at and between interfaces. Yet most existing approaches let users set the physical parameters of all layers by hand, a process of trial and error. We introduce an inverse method that provides control over BRDF lobe properties of layered materials, while automatically retrieving compatible physical parameters. Our method permits to explore the space of layered material appearance: it lets users find configurations with nearly indistinguishable appearance, isolate grazing angle effects, and give control over properties such as the color, blur or haze of reflections.
A statistical multi-lobe approach was recently introduced in order to efficiently handle layered materials rendering as an alternative to expensive general-purpose approaches. However, this approach poorly supports scattering volumes as the method does not account for back-scattering and resorts to single scattering approximations. In this paper, we address these limitations with an efficient solution based upon a transfer matrix approach which leverages the properties of the Henyey-Greenstein phase function. Under this formalism, each scattering component of the stack is described through a lightweight matrix, layering operations are reduced to simple matrix products and the statistics of each BSDF lobe accounting for multiple scattering effects are obtained through matrix operators. Based on this representation, we leverage the versatility of the transfer matrix approach to efficiently handle forward and backward scattering which occurs in arbitrary layered materials. The resulting model enables the reproduction of a wide range of layered structures embedding scattering volumes of arbitrary depth, in constant computation time and with low variance.
In full-color inkjet 3D printing, a key problem is determining the material configuration for the millions of voxels that a printed object is made of. The goal is a configuration that minimises the difference between desired target appearance and the result of the printing process. So far, the techniques used to find such a configuration have relied on domain-specific methods or heuristic optimization, which allowed only a limited level of control over the resulting appearance.
We propose to use differentiable volume rendering in a continuous material-mixture space, which leads to a framework that can be used as a general tool for optimising inkjet 3D printouts. We demonstrate the technical feasibility of this approach, and use it to attain fine control over the fabricated appearance, and high levels of faithfulness to the specified target.
We propose displaced signed distance fields, an implicit shape representation to accurately, efficiently and robustly 3D-print finely detailed and smoothly curved surfaces at native device resolution. As the resolution and accuracy of 3D printers increase, accurate reproduction of such surfaces becomes increasingly realizable from a hardware perspective. However, representing such surfaces with polygonal meshes requires high polygon counts, resulting in excessive storage, transmission and processing costs. These costs increase with print size, and can become exorbitant for large prints. Our implicit formulation simultaneously allows the augmentation of low-polygon meshes with compact meso-scale topographic information, such as displacement maps, and the realization of curved polygons, while leveraging efficient, streaming-compatible, discrete voxel-wise algorithms. Critical for this is careful treatment of the input primitives, their voxel approximation and the displacement to the true surface. We further propose a robust sign estimation to allow for incomplete, non-manifold input, whether human-made for onscreen rendering or directly out of a scanning pipeline. Our framework is efficient both in terms of time and space. The running time is independent of the number of input polygons, the amount of displacement, and is constant per voxel. The storage costs grow sub-linearly with the number of voxels, making our approach suitable for large prints. We evaluate our approach for efficiency and robustness, and show its advantages over standard techniques.
CNC machining is the leading subtractive manufacturing technology. Although it is in use since decades, it is far from fully solved and still a rich source for challenging problems in geometric computing. We demonstrate this at hand of 5-axis machining of freeform surfaces, where the degrees of freedom in selecting and moving the cutting tool allow one to adapt the tool motion optimally to the surface to be produced. We aim at a high-quality surface finish, thereby reducing the need for hard-to-control post-machining processes such as grinding and polishing. Our work is based on a careful geometric analysis of curvature-adapted machining via so-called second order line contact between tool and target surface. On the geometric side, this leads to a new continuous transition between "dual" classical results in surface theory concerning osculating circles of surface curves and osculating cones of tangentially circumscribed developable surfaces. Practically, it serves as an effective basis for tool motion planning. Unlike previous approaches to curvature-adapted machining, we solve locally optimal tool positioning and motion planning within a single optimization framework and achieve curvature adaptation even for convex surfaces. This is possible with a toroidal cutter that contains a negatively curved cutting area. The effectiveness of our approach is verified at hand of digital models, simulations and machined parts, including a comparison to results generated with commercial software.
We present a computational framework for modeling and optimizing complex assemblies using cone joints. Cone joints are integral joints that generalize traditional single-direction joints such as mortise and tenon joints to support a general cone of directions for assembly. This additional motion flexibility not just reduces the risk of deadlocking for complex joint arrangements, but also simplifies the assembly process, in particular for automatic assembly by robots. On the other hand, compared to planar contacts, cone joints restrict relative part movement for improved structural stability. Cone joints can be realized in the form of curved contacts between associated parts, which have demonstrated good mechanical properties such as reduced stress concentration. To find the best trade-off between assemblability and stability, we propose an optimization approach that first determines the optimal motion cone for each part contact and subsequently derives a geometric realization of each joint to match this motion cone. We demonstrate that our approach can optimize cone joints for assemblies with a variety of geometric forms, and highlight several application examples.
High-resolution simulations can deliver great visual quality, but they are often limited by available memory, especially on GPUs. We present a compiler for physical simulation that can achieve both high performance and significantly reduced memory costs, by enabling flexible and aggressive quantization. Low-precision ("quantized") numerical data types are used and packed to represent simulation states, leading to reduced memory space and bandwidth consumption. Quantized simulation allows higher resolution simulation with less memory, which is especially attractive on GPUs. Implementing a quantized simulator that has high performance and packs the data tightly for aggressive storage reduction would be extremely labor-intensive and error-prone using a traditional programming language. To make the creation of quantized simulation practical, we have developed a new set of language abstractions and a compilation system. A suite of tailored domain-specific optimizations ensure quantized simulators often run as fast as the full-precision simulators, despite the overhead of encoding-decoding the packed quantized data types. Our programming language and compiler, based on Taichi, allow developers to effortlessly switch between different full-precision and quantized simulators, to explore the full design space of quantization schemes, and ultimately to achieve a good balance between space and precision. The creation of quantized simulation with our system has large benefits in terms of memory consumption and performance, on a variety of hardware, from mobile devices to workstations with high-end GPUs. We can simulate with levels of resolution that were previously only achievable on systems with much more memory, such as multiple GPUs. For example, on a single GPU, we can simulate a Game of Life with 20 billion cells (8× compression per pixel), an Eulerian fluid system with 421 million active voxels (1.6× compression per voxel), and a hybrid Eulerian-Lagrangian elastic object simulation with 235 million particles (1.7× compression per particle). At the same time, quantized simulations create physically plausible results. Our quantization techniques are complementary to existing acceleration approaches of physical simulation: they can be used in combination with these existing approaches, such as sparse data structures, for even higher scalability and performance.
We introduce the first implicit time-stepping algorithm for rigid body dynamics, with contact and friction, that guarantees intersection-free configurations at every time step.
Our algorithm explicitly models the curved trajectories traced by rigid bodies in both collision detection and response. For collision detection, we propose a conservative narrow phase collision detection algorithm for curved trajectories, which reduces the problem to a sequence of linear CCD queries with minimal separation. For time integration and contact response, we extend the recently proposed incremental potential contact framework to reduced coordinates and rigid body dynamics.
We introduce a benchmark for rigid body simulation and show that our approach, while less efficient than alternatives, can robustly handle a wide array of complex scenes, which cannot be simulated with competing methods, without requiring per-scene parameter tuning.