SA '24: SIGGRAPH Asia 2024 Technical Communications

Full Citation in the ACM Digital Library

SESSION: Art Design

Unsteady Streamline Synthesis

Streamlines tracing out a vector field are important for graphics and visualization, yet the way of controllably arranging the streamlines in unsteady flows, even in 2D, has not been well studied to this day. We thus propose an unsteady streamline synthesis framework that can create a full set of coherent streamlines following certain inputs under 2D and 3D time-varying scenarios while satisfying coverage, continuity, and controllability purposes. Our key concept is to treat the time-varying streamline arrangement like a dynamic packing of anisotropic elements with deformability such that both quality and efficiency can be fulfilled. Specifically, we devise a synthesis process based on inter- and intra-streamline relationships to dynamically synthesize all the coherent streamlines in parallel and incorporate a regulation mechanism with primitive operations to timely adjust each streamline’s continuity and the overall streamline coverage. With specification, our framework can further controllably produce desired streamline arrangements for various application demands.

Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing

Assistive drawing aims to facilitate the creative process by providing intelligent guidance to artists. Existing solutions often fail to effectively model intricate stroke details or adequately address the temporal aspects of drawing. We introduce hyperstroke, a novel stroke representation designed to capture precise fine stroke details, including RGB appearance and alpha-channel opacity. Using a Vector Quantization approach, hyperstroke learns compact tokenized representations of strokes from real-life drawing videos of artistic drawing. With hyperstroke, we propose to model assistive drawing via a transformer-based architecture, to enable intuitive and user-friendly drawing applications, which are experimented in our exploratory evaluation.

Ink-Edit: Interactive color-constrained textures editing

Procedural textures are able to generate large and detailed textures on-demand, at rendering time, with minimal memory usage. They are defined by parallel algorithms, which are easily integrated into modern graphics hardware. As a drawback, they may be difficult to control, namely in an interactive and user-friendly manner. In this paper, we define an interactive tool to control the parameters of the procedural model introduced by Grenier et al. [Grenier et al. 2022], which is based on a noise vector field (Figure 1 (a), bottom and middle), and a color map (Figure 1 (a), top). In our interactive tool, the noise is controlled through a small set of parameters in the spectral domain. To control the color map, the user first prescribes the colors (inks) and their proportions (ink volumes in the final result), and then adjusts interactively the adjacency between colors. This is done through a new constrained optimal transport point of view: the colors in the map are regarded as cells of a weighted Voronoi diagram, and the noise distribution is regarded as a probability measure. The ink volumes are automatically enforced as hard constraints of an optimization process during the interactive session, while the user focuses on high-level control over the adjacency of the colors.

Fast Leak-Resistant Segmentation for Anime Line Art

We propose a fast leak-resistant automatic segmentation method for line art with gaps. Using the existing trapped-ball idea, we develop a fair, prioritized flood-fill that avoids artifacts, augment it with a heuristic region merging strategy and obtain robust results on anime line art minimizing over-segmentation, with consistent performance \(\scriptstyle \sim\) 5–10 times faster than the baseline method.

Can GPTs Evaluate Graphic Design Based on Design Principles?

Recent advancements in foundation models show promising capability in graphic design generation. Several studies have started employing Large Multimodal Models (LMMs) to evaluate graphic designs, assuming that LMMs can properly assess their quality, but it is unclear if the evaluation is reliable. One way to evaluate the quality of graphic design is to assess whether the design adheres to fundamental graphic design principles, which are the designer’s common practice. In this paper, we compare the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects. Our experiments reveal that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles, suggesting that they are indeed capable of assessing the quality of graphic design. Our dataset is available at: https://cyberagentailab.github.io/Graphic-design-evaluation/.

SESSION: Human Reconstruction & Modeling

Real-time 3D Human Reconstruction and Rendering System from a Single RGB Camera

Transforming 2D human images into 3D appearance is essential for immersive communication. In this paper, we introduce a low-cost real-time 3D human reconstruction and rendering system with a single RGB camera at 28+ FPS, which guarantees both real-time computing speed and realistic rendering results. We use WebRTC to transmit captured images over the Internet for low-latency remote communication. In addition, we design a lighting-robust rendering approach to enhance the lighting perception and generalization ability of the model. Experimental results show that our system achieves high-quality real-time 3D human reconstruction and rendering for different persons wearing various clothes, even with challenging poses. Our system makes use of only a common USB webcam and a consumer-level GPU with real-time performance and high-fidelity results, which provides a consumer-accessible immersive telepresence solution.

Intrinsic Morphological Relationship Guided 3D Craniofacial Reconstruction Using Siamese Cycle Attention GAN

Craniofacial reconstruction is essential in forensic science and has widespread applications. It is challenging due to the detailed facial geometry, complex skull topology, and nonlinear skull-face relationship. We propose a novel approach for 3D craniofacial reconstruction using a Siamese cycle attention mechanism within Generative Adversarial Networks (GAN). Benefiting from the cycle attention mechanism, our method focuses on high-frequency features and morphological connections between the skull and face. Additionally, a Siamese network preserves its identity consistently. Extensive experiments demonstrate superior accuracy and high-quality details of our approach.

Eyelid Fold Consistency in Facial Modeling

Eyelid shape is integral to identity and likeness in human facial modeling. Human eyelids are diverse in appearance with varied skin fold and epicanthal fold morphology between individuals. Existing parametric face models express eyelid shape variation to an extent, but do not preserve sufficient likeness across a diverse range of individuals. We propose a new definition of eyelid fold consistency and implement geometric processing techniques to model diverse eyelid shapes in a unified topology. Using this method we reprocess data used to train a parametric face model and demonstrate significant improvements in face-related machine learning tasks.

A Theory of Stabilization by Skull Carving

Accurate stabilization of facial motion is essential for applications in photoreal avatar construction for 3D games, virtual reality, movies, and training data collection. For the latter, stabilization must work automatically for the general population with people of varying morphology. Distinguishing rigid skull motion from facial expressions is critical since misalignment between skull motion and facial expressions can lead to animation models that are hard to control and can not fit natural motion. Existing methods struggle to work with sparse sets of very different expressions, such as when combining multiple units from the Facial Action Coding System (FACS). Certain approaches are not robust enough, some depend on motion data to find stable points, while others make one-for-all invalid physiological assumptions. In this paper, we leverage recent advances in neural signed distance fields and differentiable isosurface meshing to compute skull stabilization rigid transforms directly on unstructured triangle meshes or point clouds, significantly enhancing accuracy and robustness. We introduce the concept of a stable hull as the surface of the boolean intersection of stabilized scans, analogous to the visual hull in shape-from-silhouette and the photo hull from space carving. This hull resembles a skull overlaid with minimal soft tissue thickness, upper teeth are automatically included. Our skull carving algorithm simultaneously optimizes the stable hull shape and rigid transforms to get accurate stabilization of complex expressions for large diverse sets of people, outperforming existing methods.

FacialX: A Robust Facial Expression Tracking System based on Multifaceted Expression Embedding

Recent advancements in deep learning have significantly transformed the visual effects (VFX) industry by enabling highly realistic animation of human characters through the precise capture of actors’ facial expressions. Despite these innovations, many existing methodologies remain limited by specific environmental constraints, such as camera viewpoints, necessitating extensive optimization efforts to achieve optimal performance. In response to these challenges, we propose a novel approach that not only incorporates diverse facial expressions but also accounts for variations in identity and camera viewpoint. We introduce FacialX, a versatile facial tracking tool, and demonstrate its ability to generate production-quality facial animations from monocular video while consistently maintaining robust performance across a wide range of conditions. Comparative evaluations reveal that FacialX surpasses current state-of-the-art solutions, including FACEGOOD and Apple ARKit, delivering production-quality outcomes, thereby establishing itself as a superior tool within the VFX industry.

SESSION: Human Motion & Filming

Designing a Usable Framework for Diverse Users in Synthetic Human Action Data Generation

This paper introduces SynthDa2, a synthetic data generation framework aimed at addressing data scarcity for training video-based machine learning models. Grounded in an initial user study (n=84), SynthDa2 includes both an API and a UI to accommodate technical and non-technical users. The framework leverages generative AI techniques and domain randomization to create diverse, user-customized synthetic video datasets. Case study experiments on dataset permutations demonstrate the feasibility of SynthDa2 in assessing composition impacts. While the initial user study confirmed camera angles as a popular key variation factor, experiments reveal they are insufficient alone for desired outcomes, highlighting the need for further research. A subsequent hands-on user study (n=8) further validates SynthDa2’s usability across varied users.

Mapping and Recognition of Body Movements on Another Person's Look-Alike Avatar

Research is needed to understand how users perceive the body movements of the look-alike avatar of a remote collaborator. To investigate, we captured three actors performing seven actions and mapped their body movements onto the look-alike avatar of one of the actors. Subsequently, we asked participants to identify the body movements of the actors rendered on the look-alike avatar. We found that participants were able to identify biological motions irrespective of the resemblance of the avatar to the controlling actor. The results of this study provide valuable insights into the accuracy of identifying body movements when mapped onto an avatar that bears no resemblance to the user. We discuss whether there are certain body movements that enables the user to accurately determine the person behind the avatar. The results have significant implications for trust and realism in collaborative virtual environments.

CAR: Collision-Avoidance Retargeting for Varied Skeletal Architectures

Motion retargeting is essential for digital human and animation applications, typically demands strict skeletal structure adherence, often simplifying Mixamo-based skeletons to focus on action semantics rather than geometric differences. This paper introduces Collision-Avoidance Retargeting(CAR) method that does not rely on data-driven approaches and can be applicable to different skeleton architectures in cold start situations, ensuring the integrity of the drivable skeleton of the target character and reducing the occurrence of mesh interpenetrations due to shape geometric differences. The method has two stages. Stage one aligns bone chains and transforms motions with matrix operations. Stage two uses geometric data to refine joint rotations, preventing collisions and preserving action accuracy. CAR outperforms existing methods, as shown in extensive experiments, in both quantitative metrics and qualitative assessments.

Inside Out 2 characters: revisiting the old and making space for the new

Inside Out 2 (2024) presented a unique set of technical and artistic challenges, requiring careful decisions on retaining elements from the original film while embracing new technologies. As we transitioned from RenderMan’s REYES to the newer RIS framework, we navigated the complicated task of preserving the iconic look, particularly the emotions and human elements. We enhanced the visual quality and accessibility for artists through advancements in our character pipeline, including reusing shading libraries and optimizing procedural workflows in Houdini, implementing show emotion specific hair illumination. Innovative rigging and grooming techniques were created to address technical and design challenges in character creation. Lighting techniques were refined to reflect the characters’ story arcs, we also enhanced the Hexport [Coleman et al. 2020] system to be more user-friendly and accessible, streamlining shading and lighting workflows while reducing computation time and storage costs.

FilmAgent: Automating Virtual Film Production Through a Multi-Agent Collaborative Framework

Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions. Remarkable progress in automated decision-making have utilized agent societies powered by large language models (LLMs). This paper introduces FilmAgent, a novel LLM-based multi-agent collaborative framework designed to automate and streamline the film production process. FilmAgent simulates key crew roles—directors, screenwriters, actors, and cinematographers—within a sandbox environment, integrating efficient human workflows. The process is divided into three stages: planning, scriptwriting, and cinematography. Each stage engages a team of film crews providing iterative feedback, thus verifying intermediate results and reducing errors. Our evaluation of generated videos reveals that collaborative FilmAgent significantly outperforms individual efforts in line consistency, script coherence, character actions, and camera settings. Further analysis highlights the importance of feedback and verification in reducing hallucinations, enhancing script quality, and improving camera choices. We hope that this project lays the groundwork and shows the potential of integrating LLMs into creative multimedia tasks1.

SESSION: Image Generation and Editing

DiffOBI: Diffusion-based Image Generation of Oracle Bone Inscription Style Characters

Oracle bone inscriptions are the most valuable asset of humankind with irreplaceable archaeological and visual values. However, it is still challenging to analyze and generate oracle bone inscription-style images for visual identification and art design. In this paper, we propose DiffOBI, a novel approach for generating images in the oracle bone inscription style using diffusion models. We first construct a dataset that aligns with oracle bone inscription, text prompts, and object images. This dataset serves as the foundation for fine-tuning ControlNet. By inputting images of various objects along with corresponding text prompts, the model generates the corresponding images in the style of oracle bone inscription. To further enhance the quality of the generated images, we integrate a refinement module to refine the initial results, ensuring the refined results conform more closely to the original structure and norms of oracle bone inscription. This approach ensures that the generated images conform to the given input images and also preserves the unique pictographic features of oracle bone inscription.

OrienText: Surface Oriented Textual Image Generation

Textual content in images is crucial in e-commerce sectors, particularly in marketing campaigns, product imaging, advertising, and the entertainment industry. Current text-to-image (T2I) generation diffusion models, though proficient at producing high-quality images, often struggle to incorporate text accurately onto complex surfaces with varied perspectives, such as angled views of architectural elements like buildings, banners, or walls. In this paper, we introduce the Surface Oriented Textual Image Generation (OrienText) method, which leverages region-specific surface normals as conditional input to T2I generation diffusion model. Our approach ensures accurate rendering and correct orientation of the text within the image context. We demonstrate the effectiveness of the OrienText method on a self-curated dataset of images and compare it against the existing textual image generation methods.

The Secrets of Penalty Functions on Edge-preserving Image Smoothing

Recently, penalty functions are applied to the optimization-based edge-preserving image smoothing problem. But, to select appropriate parameter(s) for a penalty function to achieve desired smoothing effect is nontrivial. In this paper, we study the influence of penalty function on the performance of the edge-preserving image smoothing from the shrinkage rule perspective. We find some interesting secrets about penalty function on the edge-preserving image smoothing. Firstly, the shrinkage rule of a penalty function can help to determine its appropriate parameter(s). Secondly, it is more important is to select appropriate parameters for a penalty function than to select a specific penalty function, because different penalties may have very similar shrinkage rules by changing their parameter(s). Thirdly, a simple optimization framework with an appropriate parameter(s) setting of a penalty function can achieve comparable smoothing effects to those produced by much complex algorithms. To cope with the difficulties brought by nonconvex penalties, we propose an iterative algorithm by virtue of local quadratic approximation.

Improved Morphological Anti-Aliasing for Japanese Animation

Digitalization of 2D animation deeply modified artists’ workflows and brought many challenges in that industry. While vector-based technologies have been adopted in many fields, Japanese animation still heavily relies on raster aliased drawings in their workflows. Thus post-process anti-aliasing solutions tailored towards traditional Japanese animation were developed based on Morphological Anti-Aliasing (MLAA). After a decade, MLAA has been refined and extended, especially in the context of 3D real-time rendering. In this paper, we present a revised solution which incorporates features from existing anti-aliasing techniques, tailored torwards Japanese 2D animation.

Capturing Light with Robots: A Novel Workflow for Reproducing Realistic Lens Flares

Lens flares are a common optical artifact in photography and filmmaking caused by reflections and scattering of light within a camera system. The task of creating lens flares is usually solved with 2D approaches or simulation. This research compares two alternative methods of reproducing lens flares for a production-ready compositing workflow: traditional image processing and machine learning techniques. To create the dataset, a novel approach to capturing lens flares in a grid-like manner is explored. By systematically varying the position of a light source with a motion control system, flare patterns for a diverse set of lenses are captured.

SESSION: Stylization & Evaluation

StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting

We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image’s style to a 3D radiance field at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. Project page: https://kunhao-liu.github.io/StyleGaussian/

A Practical Style Transfer Pipeline for 3D Animation: Insights from Production R&D

Our animation studio has developed a practical style transfer pipeline for creating stylized 3D animation, which is suitable for complex real-world production. This paper presents the insights from our development process, where we explored various options to balance quality, artist control, and workload, leading to several key decisions. For example, we chose patch-based texture synthesis over machine learning for better control and to avoid training data issues. We also addressed specifying style exemplars, managing multiple colors within a scene, controlling outlines and shadows, and reducing temporal noise. These insights were used to further refine our pipeline, ultimately enabling us to produce an experimental short film showcasing various styles.

AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data

This paper introduces an effective method for computation-efficient personalized style video generation without requiring access to any personalized video data. It reduces the necessary generation time of similarly sized video diffusion models from 25 seconds to around 1 second while maintaining the same level of performance. The method’s effectiveness lies in its dual-level decoupling learning approach: 1) separating the learning of video style from video generation acceleration, which allows for personalized style video generation without any personalized style video data, and 2) separating the acceleration of image generation from the acceleration of video motion generation, enhancing training efficiency and mitigating the negative effects of low-quality video data.

An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation

Fashion aesthetic evaluation is the task of estimating how well the outfits worn by individuals in images suit them. In this work, we examine the zero-shot performance of GPT-4V on this task for the first time. We show that its predictions align fairly well with human judgments on our datasets, and also find that it struggles with ranking outfits in similar colors. The code is available at https://github.com/st-tech/gpt4v-fashion-aesthetic-evaluation.

SESSION: Mixed Reality & Holography

Automatic Generation of Multimodal 4D Effects for Immersive Video Watching Experiences

The recent trend of watching content using over-the-top (OTT) services pushes the 4D movie industry to seek a way of transformation. To address the issue, this paper suggests an AI-driven automatic 4D effects generation algorithm applied to a low-cost comfort chair. The system extracts multiple features using psychoacoustic analysis, saliency detection, optical flow, and an LLM-based thermal effect synthesis, and maps them into various sensory displays such as vibration, heat, wind, and poking automatically. To evaluate the system, a user study with 21 participants across seven film genres was conducted. The results showed that 1) there was a general improvement with 4D effects in terms of immersion, concentration, and expressiveness, and 2) multisensory effects were particularly useful in action and fantasy movie scenes. The suggested system could be directly used in current general video-on-demand services.

Shrunken Reality: Augmenting Real-World Contexts in Real-Time on Realistic Miniature Dioramas

This study presents a new form of augmented reality (AR), referred to as shrunken reality (SR), that captures various real-world contexts in real-time using cameras and augments them onto realistic miniature dioramas, denoted as shrunken worlds (SWs), providing a distinctive experience different from existing AR technologies. We explored the mathematical and technical methods to realize SR, focusing on the processes involved in mapping and augmenting real-world contexts onto miniature, scaled-down models. By exploring the integration of physical miniaturization with digital augmentation, we investigated the potential applications of SR across various sectors, highlighting its relevance to both public services and industries. Finally, we discuss future research directions, such as increasing the number of cameras to capture comprehensive real-world contexts, utilizing deep learning models to enhance user’s immersive experience, improving the naturalness of the augmented overlays on SWs, and broadening the scope of augmentations to cover a wider range of real-world, real-time contexts.

See-Through Face Display: Enabling Gaze Communication for Any Face—Human or AI

We present See-Through Face Display, an eye-contact display system designed to enhance gaze awareness in both human-to-human and human-to-avatar communication. The system addresses the limitations of existing gaze correction methods by combining a transparent display with a strategically positioned camera. The display alternates rapidly between a visible and transparent state, thereby enabling the camera to capture clear images of the user’s face from behind the display. This configuration allows for mutual gaze awareness among remote participants without the necessity of a large form factor or computationally resource-intensive image processing. In comparison to conventional methodologies, See-Through Face Display offers a number of practical advantages. The system requires minimal physical space, operates with low computational overhead, and avoids the visual artifacts typically associated with software-based gaze redirection. These features render the system suitable for a variety of applications, including multi-party teleconferencing and remote customer service. Furthermore, the alignment of the camera’s field of view with the displayed face position facilitates more natural gaze-based interactions with AI avatars. This paper presents the implementation of See-Through Face Display and examines its potential applications, demonstrating how this compact eye-contact system can enhance gaze communication in both human-to-human and human-AI interactions.

Focal Surface Holographic Light Transport using Learned Spatially Adaptive Convolutions

Computer-Generated Holography (CGH) is a set of algorithmic methods for identifying holograms that reconstruct Three-Dimensio-nal (3D) scenes in holographic displays. CGH algorithms decompose 3D scenes into multiplanes at different depth levels and rely on simulations of light that propagated from a source plane to a targeted plane. Thus, for n planes, CGH typically optimizes holograms using n plane-to-plane light transport simulations, leading to major time and computational demands. Our work replaces multiple planes with a focal surface and introduces a learned light transport model that could propagate a light field from a source plane to the focal surface in a single inference. Our model leverages spatially adaptive convolution to achieve depth-varying propagation demanded by targeted focal surfaces. The proposed model reduces the hologram optimization process up to 1.5x, which contributes to hologram dataset generation and the training of future learned CGH models.

SESSION: Image-based Reconstruction

ODA-GS: Occlusion- and Distortion-aware Gaussian Splatting for Indoor Scene Reconstruction

In this work, we aim to address the quality degradation issues of indoor scene reconstruction using 3D Gaussian Splatting (3DGS). Existing methods enhance reconstruction quality by exploiting learned geometric priors like Signed Distance Functions (SDF), but these come with significant computational costs. We analyze the traditional 3DGS training process and identify key factors contributing to quality degradation: over-reconstruction and gradient dilution during the densification stage, and the occurrence of distorted/redundant Gaussians during the post-optimization stage. To tackle these issues, we introduce ODA-GS, a novel framework that modifies 3DGS with tailored modules. During densification, we employ occlusion-aware gradient accumulation to prevent gradient dilution and use homo-directional gradients to mitigate over-reconstruction. In the post-optimization stage, we introduce post-pruning to eliminate distorted and redundant Gaussians, thereby enhancing visual quality and reducing computational overhead. Tested on the ScanNet++ and Replica datasets, ODA-GS outperforms several baselines both qualitatively and quantitatively.

Splanting: 3D plant capture with gaussian splatting

3D capture and characterization of plant shoot architecture is a grand challenge in plant phenotyping research, made difficult by plants’ intricate 3D shape, composed of thin and flat sub-structures (stems, leaves, flowers, pods, etc.). In this paper, we show that 3D gaussian splatting is well-suited for capturing 3D plant representations, which we call splants. We report a simple and fast capture procedure and 3DGS processing software that is tailored to foreground object capture. Splant generation worked well across plant species and growth stages. Our preliminary results point to a promising future for splant phenotyping, which we expect will lead to a dramatic increase in the use of multi-view imaging and 3D analysis in plant pathology and plant breeding research.

An Evaluation Metric for Single Image-to-3D Models Based on Object Detection Perspective

In this paper, we propose a novel evaluation metric for "single image-to-3D" models based on object detection perspective. Unlike conventional metrics that require the original 3D model or a lot of multi-view images of the model as reference, the proposed metric allows evaluations of the "single image-to-3D" models with just one input image. The proposed evaluation metric can be calculated by comparing the object detection results of the input image and novel-view images rendered from the generated object. Therefore, this metric enables evaluations that consider semantic information and focus especially on objects in the rendered images. The proposed evaluation metric is validated by the rank correlation coefficient between the order of its values and the order of the "single image-to-3D" models’ publication dates. The experimental results show that our proposed metric enables semantically accurate evaluations using only one input image. The code of this work is available at https://github.com/EvalSingleImg23D/EvalSingleImg23D.

A Multi-flash Stereo Camera for Photo-realistic Capture of Small Scenes

Synthesizing accurate geometry and photo-realistic appearance of small scenes is an active area of research with compelling use cases in gaming, content creation, virtual reality, and robotics. When applying scene geometry and appearance estimation techniques to robotics, we found that the narrow cone of possible viewpoints due to the limited range of robot motion and scene clutter caused current estimation techniques to produce poor quality estimates or even fail. On the other hand, in robotic applications, dense metric depth can often be measured directly using robot mounted stereo cameras and light and camera positions can be accurately controlled. Depth can provide a good initial estimate of the object geometry to improve reconstruction, while multi-illumination images can help recover appearance to facilitate relighting. In this work we demonstrate a robot-mountable multi-flash stereo camera system developed in-house to capture the necessary data for synthesizing relightable assets from small scenes with a few training views.

SESSION: Geometric Modeling and Editing

Real-Time Enveloped Mesh Editing Using Vector Graph Representation

In mesh processing, enveloping establishes a relationship between mesh and control structure deformation. Once enveloped, the mesh is edited via control structure deformation. This editing process aims for realistic and computationally efficient mesh deformation. In this work, we propose a data-driven enveloping approach designed for efficient mesh editing and producing realistic deformations. The proposed method computes geometric model deformations by leveraging the Vector Graph (VG) representation. By establishing a map between skeleton (control structure) and mesh deformations, the proposed approach achieves realistic results. Comparative analysis demonstrates qualitative parity with leading techniques, while computational efficiency remains akin to Linear Blend Skinning. Extensive experiments across diverse mesh datasets validate the proposed method, substantiated by qualitative and quantitative comparisons against state-of-the-art linear and non-linear approaches.

Designing Reconfigurable Joints

We propose a method to create reconfigurable joints, where reconfigurability entails that a set of components can be connected in multiple ways. The advantage of this type of joint is the possibility of efficiently reusing limited components for multiple purposes. In established carpentry practices, a popular reconfigurable joint geometry called Kawai Tsugite has two components that can be reconfigured in different orthogonal angles. However, the general geometric requirements for the design of reconfigurable joints have not been defined previously. We clarify the conditions for reconfigurable joints from the perspective of symmetry-based geometry repetition and provide guidelines for constructing multi-component joints and for ensuring stability. Moreover, we present a system that assists in the design of voxel-based reconfigurable joints and use it to fabricate several stable reconfigurable joints.

Towards Wave Function Collapse using Optimization with Quantum Algorithms

Quantum computing has emerged in recent years as a rapidly evolving field, presenting the potential to solve problems that are difficult for classical computers. Our work presents a new implementation of procedural generation using Wave Function Collapse (WFC), solved using Quantum Approximate Optimization Algorithm (QAOA). The problem is formatted as a Constraint Satisfaction Problem (CSP), where the edge rules in classical WFC are encoded as linear constraints. We successfully demonstrate the generation of valid tile configuration in both 2D and 3D, showcasing the potential of quantum computing in hybrid quantum-classical systems. However, our algorithm is constrained by limitations set by current quantum computers, which prevents real-world deployment at this stage.

Enhancing Mesh Deformation Realism for Synthesizing Wrinkles

We propose a solution for generating dynamic heightmap data to simulate deformations for soft objects, with a focus on the human skin. The solution utilizes mesostructure-level wrinkles and procedural textures to add static microstructure details. It offers flexibility beyond human skin during animations to mimic other material deformations, such as leather and rubber. Various methods suffer from self-intersections and increased storage requirements during synthesizing wrinkles. Although manual intervention using wrinkles and tension maps offers control, it lacks information on principal deformation directions. Physics-based simulations can generate detailed wrinkle maps, but may limit artistic control. Our research presents a procedural method to enhance the generation of dynamic deformation patterns, including wrinkles, with better control and without reliance on captured data. Incorporating static procedural patterns improves realism, and the proposed approach can be used in other application areas.

SESSION: Rendering

An Efficient and Smooth Volume Shell Mapping by Nonlinear Ray Traversal of Hierarchical Structures

Shell mapping is a method of representing microstructures on surfaces by mapping a three-dimensional texture space bijectively into a shell space. This paper introduces a volumetric texture to the texture space for the purpose of representing implicit surfaces and translucent materials. However, conventional approaches often generate artifacts at the boundaries of neighboring tetrahedra due to piecewise-linear approximate mapping. To achieve smooth mapping, we present an approach of mapping rays nonlinearly into the texture space, and we also address efficient sampling by using hierarchical volumetric textures.

k-DOP Clipping: Robust Ghosting Mitigation in Temporal Antialiasing

Temporal antialiasing is one of the most common methods for removing aliasing artifacts in contemporary real-time rendering, based on utilizing reprojected color data from previous frames. One typical issue with the method is ghosting: moving objects leaving a wake of visual trails. To mitigate this, a common technique is to validate and rectify reused history colors by comparing and clipping them to the color neighborhoods of the current frame’s pixels. Previous, bounding box based methods are only situationally effective and cause significant ghosting in less favorable circumstances. We propose using k-Discrete Oriented Polytopes (“k-DOPs”) for more robust neighborhood clipping. For a 0.2 ms performance overhead, our method more reliably mitigates ghosting across scenes where previous methods have inconsistent results.

Fringe Lights: Colored Penumbra in Glimpse

The recent popularity of stylized rendering calls for new approaches to help artists produce the desired look. Although some effects can be added to the rendered frames in compositing, this requires manual work at the end of the pipeline, leaving preceding stages unclear as to the impact on the final image. In this paper, we present our implementation of fringe lights in our production path tracer, Glimpse, for Netflix’s Leo. Fringe lights allow lighting artists to easily modify the appearance of the penumbra, leaving both the unoccluded and fully shadowed parts of the scene unaffected. The soft transition between these two regions can be estimated by tracing at most one additional shadow ray towards the light source, allowing separate illumination at a minimal computational overhead. We extended our light sampling and evaluation for fringe lights to account for three different shading regions and modify the light contribution accordingly. Our method offers artists the freedom to stylize the penumbra and match the desired look of the show with direct visual feedback at every stage of the pipeline.

SESSION: Fluid Simulation

Boundary Conditions of the Third Kind: Generalized Sources and Sinks in Fluid Simulations

Fluid simulations use Dirichlet or Neumann boundary conditions to create free surfaces, colliders, source, and sinks. There is, however, a third condition: the Robin boundary condition. Considered relevant only for heat or electromagnetic problems, we found it provides a consistent framework to replace the binary choice of Dirichlet or Neumann with a continuum. Blending these conditions enables the simulation of real-world fluid behaviours otherwise not achievable.

Editing Fluid Flows with Divergence-Free Biharmonic Vector Field Interpolation

Achieving a satisfying fluid animation through numerical simulation can be time-consuming and there are few practical post-processing tools for editing completed simulations. To address this challenge, we present a divergence-free biharmonic vector field interpolation method that can be used to perform smooth spatial blending between input incompressible flows. Given flow data on the boundary of a desired interpolation domain at each time step, we fill in the given domain by constructing an optimally smooth, divergence-free, boundary-satisfying vector field. We ensure smoothness using the Laplacian energy and enforce divergence constraints through Lagrange multipliers. Prior methods for this problem suffer from visible artifacts due to non-zero divergence and discontinuous velocity gradients. By then replacing the Laplacian energy with the Hessian energy we further extend our method to extrapolation in the presence of open boundaries. We demonstrate that our approach produces smooth and incompressible flows, which enables a range of natural simulation editing capabilities: copy-pasting, hole-filling, and domain extension.

Galerkin Method of Regularized Stokeslets for Procedural Fluid Flow with Control Curves

We present a new procedural incompressible velocity field authoring tool, which lets users design a volumetric flow by directly specifying velocity along control curves. Our method combines the Method of Regularized Stokeslets with Galerkin discretization. Based on the highly viscous Stokes flow assumption, we find the force along a given set of curves that satisfies the velocity constraints along them. We can then evaluate the velocity anywhere inside the surrounding infinite 2D or 3D domain. We also show the extension of our method to control the angular velocity along control curves. Compared to a collocation discretization, our method is not very sensitive to the vertex sampling rate along control curves and only requires a small linear system solve.