Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We call our model SMPL-IK, and show that when integrated into real-time 3D software, this extended system opens up opportunities for defining novel AI-assisted animation workflows. For example, when chained with existing pose estimation algorithms, SMPL-IK accelerates posing by allowing users to bootstrap 3D scenes from 2D images while allowing for further editing. Additionally, we propose a novel SMPL Shape Inversion mechanism (SMPL-SI) to map arbitrary humanoid characters to the SMPL space, allowing artists to leverage SMPL-IK on custom characters. In addition to qualitative demos showing proposed tools, we present quantitative SMPL-IK baselines on the H36M and AMASS datasets. Our code is publicly available https://github.com/boreshkinai/smpl-ik.
We present a novel self-supervised framework for learning the discretization-agnostic surface parameterization of arbitrary 3D objects with both bounded and unbounded surfaces. Our framework leverages diffusion-enabled global-to-local shape context for each vertex first to partition the unbounded surface into multiple patches using the proposed self-supervised PatchNet and subsequently perform independent UV parameterization of these patches by learning forward and backward UV mapping for individual patches. Thus, our framework enables learning a discretization-agnostic parameterization at a lower resolution and then directly inferring the parameterization for a higher-resolution mesh without retraining. We evaluate our framework on multiple 3D objects from the publicly available SHREC [Lian et al. 2011] dataset and report superior/faster UV parameterization over conventional methods.
In this paper, we propose a parameter estimation method for Bidirectional Curve Scattering Distribution Function (BCSDF) for hair and fur using differentiable rendering. Given fiber geometries, illumination, and a target rendered image, our method jointly estimates optical parameters of BCSDF so as to match the target image taking into account indirect illumination. We develop a differentiable path-tracing renderer tailored to estimate optical parameters of BCSDF. To avoid getting stuck in a local minimum, we propose a two-stage estimation method where the first stage estimates two optical parameters affecting hair and fur’s entire appearance. Then, all the optical parameters are jointly estimated in the second stage. Experimental results substantiate that our method can automatically estimate optical parameters of hair and fur.
Stencil art is a drawing technique for reproducing designs by repeatedly painting pigments through stencil sheets. Each stencil sheet is a connected surface maintained via thin connections called bridges. Existing computer-aided stencil art creation studies focus on choosing a set of bridges that causes the least amount of distortion while fully connecting all islands to the sheet. Unlike previous works, we propose a novel method called OVERPAINT that sidesteps the use of bridges by overpainting pigments. Our method automatically generates an ordered set of bridge-free stencil sheets from an input image. By painting sequentially, one can obtain a result resembling the appearance of the input image. We demonstrate our method on several artworks and a user study. The results show that our method enables the user to create visually precise and delicate stencil arts.
Recently, differentiable volume rendering in neural radiance fields (NeRF) has gained a lot of popularity, and its variants have attained many impressive results. However, existing methods usually assume the scene is a homogeneous volume so that a ray is cast along the straight path. In this work, the scene is instead a heterogeneous volume with a piecewise-constant refractive index, where the path will be curved if it intersects the different refractive indices. For novel view synthesis of refractive objects, our NeRF-based framework aims to optimize the radiance fields of bounded volume and boundary from multi-view posed images with refractive object silhouettes. To tackle this challenging problem, the refractive index of a scene is reconstructed from silhouettes. Given the refractive index, we extend the stratified and hierarchical sampling techniques in NeRF to allow drawing samples along a curved path tracked by the Eikonal equation. The results indicate that our framework outperforms the state-of-the-art method both quantitatively and qualitatively, demonstrating better performance on the perceptual similarity metric and an apparent improvement in the rendering quality on several synthetic and real scenes.
Animations have become increasingly realistic with the evolution of Computer Graphics (CG). In particular, human models and behaviors have been represented through animated virtual humans. Gender is a characteristic related to human identification, so virtual humans assigned to a specific gender have, in general, stereotyped representations through movements, clothes, hair, and colors in order to be understood by users as desired by designers. An important area of study is determining whether participants’ perceptions change depending on how a virtual human is visually presented. Findings in this area can help the industry guide the modeling and animation of virtual humans to deliver the expected impact to the public. In this paper, we reproduce using an animated CG baby, a previous perceptual study conducted in real life aimed to assess gender bias about a baby. Our research indicates that simply textually reporting a virtual human’s gender may be sufficient to create a perception of gender that affects the participant’s emotional response so that stereotyped behaviors can be avoided.
Full-body portrait stylization has drawn attention recently. However, most methods have focused only on converting face regions. A recently proposed two-stage method expands to full bodies, but the outputs are less plausible and fail to achieve quality robustness of non-face regions. Furthermore, they cannot reflect diverse skin tones. In this study, we propose a data-centric solution to build a production-level full-body portrait stylization system. We construct a novel and advanced dataset preparation paradigm that can effectively resolve the aforementioned problems.
Creating cartoon-style avatars has drawn growing attention recently, however previous methods only learn face cartoonization in the 2D image level. In this paper, we propose a novel 3D generative model to translate a real-world face image into its corresponding 3D avatar with only a single style example provided. To bridge the gap between 2D real faces and 3D cartoon avatars, we leverage the state-of-the-art StyleGAN and its style-mixing property to produce a 2D paired cartoonized face dataset. We then finetune a pretrained 3D GAN with the pair data in a dual-learning mechanism to get the final synthesized 3D avatar. Furthermore, we analyze the latent space of our model, enabling manual control in what degree a style is applied. Our model is 3D-aware in the sense and also able to do attribute editing, such as smile, age, etc directly in the 3D domain. Experimental results demonstrate that our method can produce high-fidelity cartoonized avatars with true-to-life 3D geometry.
Virtual reality is a promising tool to study the influence of ophthalmic lenses on human perception. Independent of the actual refractive errors of the users’ eye, vision through different types of lenses can be simulated while allowing free eye and head movements and easy measurement of behavioural parameters. Especially for presbyopia, existing solutions often lead to visual discomfort. Multifocal spectacle lenses have a spatially varying optical power distribution leading to changing strength and orientation of blurred vision. Understanding the contribution of blur to visual discomfort and studying behavioral changes under the influence of lens blur is a promising application of a spectacle lens simulator. Here we present our framework to simulate spherical and astigmatic blur of corrective spectacle lenses. Size and orientation of a location dependent blur ellipse is calculated live in virtual reality using the depth information of the scene and precalculated power distribution of the lens.
When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.
Light field displays represent yet another step in continually increasing pixel counts. Rendering realistic real-time 3D content for them with ray tracing-based methods is a major challenge even accounting for recent hardware acceleration features, as renderers have to scale to tens to hundreds of distinct viewpoints.
To this end, we contribute an open-source, cross-platform real-time 3D renderer called Tauray. The primary focus of Tauray is in using photorealistic path tracing techniques to generate real-time content for multi-view displays, such as VR headsets and light field displays; this aspect is generally overlooked in existing renderers. Ray tracing hardware acceleration as well as multi-GPU rendering is supported. We compare Tauray to other open source real-time path tracers, like Lighthouse 2, and show that it can meet or significantly exceed their performance.
Alpha matting is widely used in video conferencing as well as in movies, television, and social media sites. Deep learning approaches to the matte extraction problem are well suited to video conferencing due to the consistent subject matter (front-facing humans), however training-based approaches are somewhat pointless for entertainment videos where varied subjects (spaceships, monsters, etc.) may appear only a few times in a single movie – if a method of creating ground truth for training exists, just use that method to produce the desired mattes. We introduce a training-free high quality neural matte extraction approach that specifically targets the assumptions of visual effects production. Our approach is based on the deep image prior, which optimizes a deep neural network to fit a single image, thereby providing a deep encoding of the particular image. We make use of the representations in the penultimate layer to interpolate coarse and incomplete "trimap" constraints. Videos processed with this approach are temporally consistent. The algorithm is both very simple and surprisingly effective.
In recent years, ray tracing has begun to be used for real-time rendering. However, the number of rays that can be emitted in real time is limited due to the high time complexity of ray tracing. In this paper, we propose affine-transformed ray alignment as a novel method of reducing time complexity. Using affine transformations, the proposed method aligns multiple rays to form a single ray while maintaining the intersection points obtained from the scene, thereby speeding up ray tracing without introducing errors. We also propose application of our method to the problems of primary ray traversal, stereo rendering, and multi-frame rendering and empirically prove that time complexity can be reduced by these simple implementations.
In this work, we introduce an approach to model topologically interlocked corrugated bricks that can be assembled in a water-tight manner (space-filling) to design a variety of spatial structures. Our approach takes inspiration from recently developed methods that utilize Voronoi tessellation of spatial domains by using symmetrically arranged Voronoi sites. However, in contrast to these existing methods, we focus our attention on Voronoi sites modeled using helical trajectories, which can provide corrugation and better interlocking. For symmetries, we only use affine transformations based on the Bravais lattice to avoid self-intersections. This methodology naturally results in structures that are both space-filling (owing to Voronoi tessellation) as well as interlocking by corrugation (owing to helical trajectories). The resulting shapes of the bricks appear to be similar to a variety of pasta noodles, thereby inspiring the names, Voronoi Spaghetti and VoroNoodles.
Rig inversion is the problem of creating a method that can find the rig parameter vector that best approximates a given input mesh. In this paper we propose to solve this problem by first obtaining a differentiable rig function by training a multi layer perceptron to approximate the rig function. This differentiable rig function can then be used to train a deep learning model of rig inversion.
In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation.
A recent trend in real-time rendering is the utilization of the new hardware ray tracing capabilities. Often, usage of a distance field representation is proposed as an alternative when hardware ray tracing is deemed too costly, and the two are seen as competing approaches. In this work, we show that both approaches can work together effectively for a single ray query on modern hardware. We choose to use hardware ray tracing where precision is most important, while avoiding its heavy cost by using a distance field when possible. While a simple approach, in our experiments the resulting tracing algorithm overcomes the associated overhead and allows a user-defined middle ground between the performance of distance field traversal and the improved visual quality of hardware ray tracing.
The evaluation of material networks is a relatively resource-intensive process in the rendering pipeline. Modern production scenes can contain hundreds or thousands of complex materials with massive networks, so there is a great demand for an efficient way of handling material networks. In this paper, we introduce an efficient method for progressively caching the material nodes without an overhead on the rendering performance. We evaluate the material networks as usual in the rendering process. Then, the output value of part of the network is stored in a cache and can be used in the evaluation of the next materials. Using our method, we can render the scene with performance equal to or better than that of the method without caching, with a slight difference in the images rendered with caching and without it.
We propose a computational alternative photographic process that integrates computer processing with the conventional printing method, particularly cyanotype. Cyanotype is a non-silver-halide process that has a lower environmental impact and is known for its availability to produce tri-color prints; however, the tri-color process is complex and time-consuming. To simplify this heavy printing process, our framework provides a user interface for image editing and optimization based on color simulation within a printable color gamut. We demonstrate image editing and tri-color cyanotype printing using our framework, indicating that it can reduce the number of user trials and errors.
Neural radiance field (NeRF) has recently achieved impressive results in novel view synthesis. However, previous works on NeRF mainly focus on object-centric scenarios. They would suffer observable performance degradation in outward-facing and large-scale scenes due to limiting positional encoding capacity. To narrow the gap, we explore radiance fields in a geometry-aware fashion. We estimate explicit geometry from the omnidirectional neural radiance field that was learned from multiple 360° images. Relying on the recovered geometry, we use an adaptive divide-and-conquer strategy to slim and fine-tune the radiance fields and further improve render speed and quality. Quantitative and qualitative comparisons among baselines illustrated our predominant performance in large-scale indoor scenes and our system supports real-time VR roaming.