Interactive articles are an effective medium of communication in education, journalism, and scientific publishing, yet are created using complex general-purpose programming tools. We present Idyll Studio, a structured editor for authoring and publishing interactive and data-driven articles. We extend the Idyll framework to support reflective documents, which can inspect and modify their underlying program at runtime, and show how this functionality can be used to reify the constituent parts of a reactive document model—components, text, state, and styles—in an expressive, interoperable, and easy-to-learn graphical interface. In a study with 18 diverse participants, all could perform basic editing and composition, use datasets and variables, and specify relationships between components. Most could choreograph interactive visualizations and dynamic text, although some struggled with advanced uses requiring unstructured code editing. Our findings suggest Idyll Studio lowers the threshold for non-experts to create interactive articles and allows experts to rapidly specify a wide range of article designs.
We propose LipNotif, a non-contact tactile notification system that uses airborne ultrasound tactile presentation to lips. Lips are suitable for non-contact tactile notifications because they have high tactile sensitivity comparable to the palms, are less occupied in daily life, and are constantly exposed outward. LipNotif uses tactile patterns to intuitively convey information to users, allowing them to receive notifications using only their lips, without sight, hearing, or hands. We developed a prototype system that automatically recognizes the position of the lips and presents non-contact tactile sensations. Two experiments were conducted to evaluate the feasibility of LipNotif. In the first experiment, we found that directional information can be notified to the lips with an average accuracy of ± 11.1° in the 120° horizontal range. In the second experiment, we could elicit significantly different affective responses by changing the stimulus intensity. The experimental results indicated that LipNotif is practical for conveying directions, emotions, and combinations of them. LipNotif can be applied for various purposes, such as notifications during work, calling in the waiting room, and tactile feedback in automotive user interfaces.
Laser cutter users face difficulties distinguishing between visually similar materials. This can lead to problems, such as using the wrong power/speed settings or accidentally cutting hazardous materials. To support users, we present SensiCut, an integrated material sensing platform for laser cutters. SensiCut enables material awareness beyond what users are able to see and reliably differentiates among similar-looking types. It achieves this by detecting materials’ surface structures using speckle sensing and deep learning.
SensiCut consists of a compact hardware add-on for laser cutters and a user interface that integrates material sensing into the laser cutting workflow. In addition to improving the traditional workflow and its safety1, SensiCut enables new applications, such as automatically partitioning designs when engraving on multi-material objects or adjusting their geometry based on the kerf of the identified material.
We evaluate SensiCut’s accuracy for different types of materials under different sheet orientations and illumination conditions.
Human-centered approaches to machine learning have established theoretical foundations, design principles and interaction techniques to facilitate end-user interaction with machine learning systems. Yet, general-purpose toolkits supporting the design of interactive machine learning systems are still missing, despite their potential to foster reuse, appropriation and collaboration between different stakeholders including developers, machine learning experts, designers and end users. In this paper, we present an architectural model for toolkits dedicated to the design of human interactions with machine learning. The architecture is built upon a modular collection of interactive components that can be composed to build interactive machine learning workflows, using reactive pipelines and composable user interfaces. We introduce Marcelle, a toolkit for the design of human interactions with machine learning that implements this model. We illustrate Marcelle with two implemented case studies: (1) a HCI researcher conducts user studies to understand novice interaction with machine learning, and (2) a machine learning expert and a clinician collaborate to develop a skin cancer diagnosis system. Finally, we discuss our experience with the toolkit, along with its limitation and perspectives.
Existing approaches to trading off false positive versus false negative errors in input recognition are based on imprecise ideas of how these errors affect user experience that are unlikely to hold for all situations. To inform dynamic approaches to setting such a tradeoff, two user studies were conducted on how relative preference for false positive versus false negative errors is influenced by differences in the temporal cost of error recovery, and high-level task factors (time pressure, multi-tasking). Participants completed a tile selection task in which false positive and false negative errors were injected at a fixed rate, and the temporal cost to recover from each of the two types of error was varied, and then indicated a preference for one error type or the other, and a frustration rating for the task. Responses indicate that the temporal costs of error recovery can drive both frustration and relative error type preference, and that participants exhibit a bias against false positive errors, equivalent to ∼1.5 seconds or more of added temporal recovery time. Several explanations for this bias were revealed, including that false positive errors impose a greater attentional demand on the user, and that recovering from false positive errors imposes a task switching cost.
Users face many challenges in keeping their personal file collections organized. While current file-management interfaces help users retrieve files in disorganized repositories, they do not aid in organization. Pertinent files can be difficult to find, and files that should have been deleted may remain. To help, we designed KondoCloud, a file-browser interface for personal cloud storage. KondoCloud makes machine learning-based recommendations of files users may want to retrieve, move, or delete. These recommendations leverage the intuition that similar files should be managed similarly.
We developed and evaluated KondoCloud through two complementary online user studies. In our Observation Study, we logged the actions of 69 participants who spent 30 minutes manually organizing their own Google Drive repositories. We identified high-level organizational strategies, including moving related files to newly created sub-folders and extensively deleting files. To train the classifiers that underpin KondoCloud’s recommendations, we had participants label whether pairs of files were similar and whether they should be managed similarly. In addition, we extracted ten metadata and content features from all files in participants’ repositories. Our logistic regression classifiers all achieved F1 scores of 0.72 or higher. In our Evaluation Study, 62 participants used KondoCloud either with or without recommendations. Roughly half of participants accepted a non-trivial fraction of recommendations, and some participants accepted nearly all of them. Participants who were shown the recommendations were more likely to delete related files located in different directories. They also generally felt the recommendations improved efficiency. Participants who were not shown recommendations nonetheless manually performed about a third of the actions that would have been recommended.
There has been widespread adoption of IDEs and powerful tools for program analysis. However, programmers still find it difficult to conveniently analyze their code for custom patterns. Such systems either provide inflexible interfaces or require knowledge of complex query languages and compiler internals. In this paper, we present Sporq, a tool that allows developers to mine their codebases for a range of patterns, including bugs, code smells, and violations of coding standards. Sporq offers an interactive environment in which the user highlights program elements, and the system responds by identifying other parts of the codebase with similar patterns. The programmer can then provide feedback which enables the system to rapidly infer the programmer’s intent. Internally, our system is driven by high-fidelity relational program representations and algorithms to synthesize database queries from examples. Our experiments and user studies with a VS Code extension indicate that Sporq reduces the effort needed by programmers to write custom analyses and discover bugs in large codebases.
We propose ModularHMD, a new mobile head-mounted display concept, which adopts a modular mechanism and allows a user to perform ad-hoc peripheral interaction with real-world devices or people during VR experiences. ModularHMD is comprised of a central HMD and three removable module devices installed in the periphery of the HMD cowl. Each module has four main states: occluding, extended VR view, video see-through (VST), and removed/reused. Among different combinations of module states, a user can quickly setup the necessary HMD forms, functions, and real-world visions for ad-hoc peripheral interactions without removing the headset. For instance, an HMD user can see her surroundings by switching a module into the VST mode. She can also physically remove a module to obtain direct peripheral visions of the real world. The removed module can be reused as an instant interaction device (e.g., touch keyboards) for subsequent peripheral interactions. Users can end the peripheral interaction and revert to a full VR experience by re-mounting the module. We design ModularHMD’s configuration and peripheral interactions with real-world objects and people. We also implement a proof-of-concept prototype of ModularHMD to validate its interactions capabilities through a user study. Results show that ModularHMD is an effective solution that enables both immersive VR and ad-hoc peripheral interactions.
Music is a central instrument in video gaming to attune a player’s attention to the current atmosphere and increase their immersion in the game. We transfer the idea of scene-adaptive music to car drives and propose SoundsRide, an in-car audio augmented reality system that mixes music in real-time synchronized with sound affordances along the ride. After exploring the design space of affordance-synchronized music, we design SoundsRide to temporally and spatially align high-contrast events on the route, e. g., highway entrances or tunnel exits, with high-contrast events in music, e. g., song transitions or beat drops, for any recorded and annotated GPS trajectory by a three-step procedure. In real-time, SoundsRide 1) estimates temporal distances to events on the route, 2) fuses these novel estimates with previous estimates in a cost-aware music-mixing plan, and 3) if necessary, re-computes an updated mix to be propagated to the audio output. To minimize user-noticeable updates to the mix, SoundsRide fuses new distance information with a filtering procedure that chooses the best updating strategy given the last music-mixing plan, the novel distance estimations, and the system parameterization. We technically evaluate SoundsRide and conduct a user evaluation with 8 participants to gain insights into how users perceive SoundsRide in terms of mixing, affordances, and synchronicity. We find that SoundsRide can create captivating music experiences and positively as well as negatively influence subjectively perceived driving safety, depending on the mix and user.
Assembly programming is challenging, even for experts. Program synthesis, as an alternative to manual implementation, has the potential to enable both expert and non-expert users to generate programs in an automated fashion. However, current tools and techniques are unable to synthesize assembly programs larger than a few instructions. We present Assuage : ASsembly Synthesis Using A Guided Exploration, which is a parallel interactive assembly synthesizer that engages the user as an active collaborator, enabling synthesis to scale beyond current limits. Using Assuage, users can provide two types of semantically meaningful hints that expedite synthesis and allow for exploration of multiple possibilities simultaneously. Assuage exposes information about the underlying synthesis process using multiple representations to help users guide synthesis. We conducted a within-subjects study with twenty-one participants working on assembly programming tasks. With Assuage, participants with a wide range of expertise were able to achieve significantly higher success rates, perceived less subjective workload, and preferred the usefulness and usability of Assuage over a state of the art synthesis tool.
Tablets are attractive for design work anywhere, but 3D manipulations are notoriously difficult. We explore how engaging the stylus and multi-touch in concert can render such tasks easier. We introduce Bi-3D, an interaction concept where touch gestures are combined with 2D pen commands for 3D manipulation. For example, for a fast and intuitive 3D drag & drop technique: the pen drags the object on-screen, and parallel pinch-to-zoom moves it in the third dimension. In this paper, we describe the Bi-3D design space, crossing two-handed input and the degrees-of-freedom (DOF) of 3D manipulation and navigation tasks. We demonstrate sketching and manipulation tools in a prototype 3D design application, where users can fluidly combine 3D operations through alternating and parallel use of the modalities. We evaluate the core technique, bi-manual 3DOF input, against widget and mid-air baselines in an object movement task. We find that Bi-3D is a fast and practical way for multi-dimensional manipulation of graphical objects, promising to facilitate 3D design on stylus and tablet devices.
Editing operations such as cut, copy, paste, and correcting errors in typed text are often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as ”bold” to change the format of the fragment, or the user can tap inside a text area and speak a command such as ”highlight this paragraph” to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user’s editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS’s Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS’s Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method. VT reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS’s Voice Control method.
Brain-computer interfaces (BCI) are an emerging technology with many potential applications. Functional near-infrared spectroscopy (fNIRS) can provide a convenient and unobtrusive real time input for BCI. fNIRS is especially promising as a signal that could be used to automatically classify a user’s current cognitive workload. However, the data needed to train such a classifier is currently not widely available, difficult to collect, and difficult to interpret due to noise and cross-subject variation. A further challenge is the need for significant user-specific calibration. To address these issues, we introduce a new dataset gathered from 15 subjects and a new multi-stage supervised machine learning pipeline. Our approach learns from both observed data and augmented data derived from multiple subjects in its early stages, and then fine-tunes predictions to an individual subject in its last stage. We show promising gains in accuracy in a standard “n-back” cognitive workload classification task compared to baselines that use only subject-specific data or only group-level data, even when our approach is given much less subject-specific data. Even though these experiments analyzed the data retrospectively, we carefully removed anything from our process that could not have been done in real time, because our process is targeted at future real-time operation. This paper contributes a new dataset, a new multi-stage training pipeline, results showing significant improvement compared to alternative pipelines, and discussion of the implications for user interface design. Our complete dataset and software are publicly available at https://tufts-hci-lab.github.io/code_and_datasets/. We hope these results make fNIRS-based interactive brain input easier for a wide range of future researchers and designers to explore.
Data scientists have adopted a popular design pattern in programming called the fluent interface for composing data wrangling code. The fluent interface works by combining multiple transformations on a data table—or dataframes—with a single chain of expressions, which produces an output. Although fluent code promotes legibility, the intermediate dataframes are lost, forcing data scientists to unravel the chain through tedious code edits and re-execution. Existing tools for data scientists do not allow easy exploration or support understanding of fluent code. To address this gap, we designed a tool called Unravel that enables structural edits via drag-and-drop and toggle switch interactions to help data scientists explore and understand fluent code. Data scientists can apply simple structural edits via drag-and-drop and toggle switch interactions to reorder and (un)comment lines. To help data scientists understand fluent code, Unravel provides function summaries and always-on visualizations highlighting important changes to a dataframe. We discuss the design motivations behind Unravel and how it helps understand and explore fluent code. In a first-use study with 14 data scientists, we found that Unravel facilitated diverse activities such as validating assumptions about the code or data, exploring alternatives, and revealing function behavior.
As computing paradigms shift toward mobile and ubiquitous interaction, there is an increasing demand for wearable interfaces supporting multifaceted input in smart living environments. In this regard, we introduce a system that identifies contact fingers using vibration as a modality of communication. We investigate the vibration characteristics of the communication channels involved and simulate the transmission of vibration sequences. In the simulation, we test and refine modulation and demodulation methods to design vibratory communication protocols that are robust to environmental noises and can detect multiple simultaneous contact fingers. As a result, we encode an on-off keying sequence with a unique carrier frequency to each finger and demodulate the sequences by applying cross-correlation. We verify the communication protocols in two environments, laboratory and cafe, where the resulting highest accuracy was 93 % and 90.5 %, respectively. Our system achieves over 91 % accuracy in identifying seven contact states from three fingers while wearing only two actuator rings with the aid of a touch screen. Our findings shed light on diversifying touch interactions on rigid surfaces by means of vibratory communication.
An increasingly popular way of experiencing remote places is by viewing 360° virtual tour videos, which show the surrounding view while traveling through an environment. However, finding particular locations in these videos can be difficult because current interfaces rely on distorted frame previews for navigation. To alleviate this usability issue, we propose Route Tapestries, continuous orthographic-perspective projection of scenes along camera routes. We first introduce an algorithm for automatically constructing Route Tapestries from a 360° video, inspired by the slit-scan photography technique. We then present a desktop video player interface using a Route Tapestry timeline for navigation. An online evaluation using a target-seeking task showed that Route Tapestries allowed users to locate targets 22% faster than with YouTube-style equirectangular previews and reduced the failure rate by 75% compared to a more conventional row-of-thumbnail strip preview. Our results highlight the value of reducing visual distortion and providing continuous visual contexts in previews for navigating 360°virtual tour videos.
We propose a new class of haptic devices that provide haptic sensations by delivering liquid-stimulants to the user's skin; we call this chemical haptics. Upon absorbing these stimulants, which contain safe and small doses of key active ingredients, receptors in the user's skin are chemically triggered, rendering distinct haptic sensations. We identified five chemicals that can render lasting haptic sensations: tingling (sanshool), numbing (lidocaine), stinging (cinnamaldehyde), warming (capsaicin), and cooling (menthol). To enable the application of our novel approach in a variety of settings (such as VR), we engineered a self-contained wearable that can be worn anywhere on the user's skin (e.g., face, arms, legs). Implemented as a soft silicone patch, our device uses micropumps to push the liquid stimulants through channels that are open to the user's skin, enabling topical stimulants to be absorbed by the skin as they pass through. Our approach presents two unique benefits. First, it enables sensations, such as numbing, not possible with existing haptic devices. Second, our approach offers a new pathway, via the skin's chemical receptors, for achieving multiple haptic sensations using a single actuator, which would otherwise require combining multiple actuators (e.g., Peltier, vibration motors, electro-tactile stimulation). We evaluated our approach by means of two studies. In our first study, we characterized the temporal profiles of sensations elicited by each chemical. Using these insights, we designed five interactive VR experiences utilizing chemical haptics, and in our second user study, participants rated these VR experiences with chemical haptics as more immersive than without. Finally, as the first work exploring the use of chemical haptics on the skin, we offer recommendations to designers for how they may employ our approach for their interactive experiences.
Software developers frequently confront a recurring challenge of making code transformations—similar but not entirely identical code changes in many places—in their integrated development environments. Through formative interviews (n = 7), we found that developers were aware of many tools intended to help with code transformations, but often made their changes manually because these tools required too much expertise or effort to be able to use effectively. To address these needs, we built an extension for Visual Studio Code, called reCode. reCode improves the familiar find-and-replace experience by allowing the developer to specify a straightforward search term to identify relevant locations, and then demonstrate their intended changes by simply typing a change directly in the editor. Using programming by example, reCode automatically learns a more general code transformation and displays these transformations as before-and-after differences inline, with clickable actions to interactively accept, reject, or refine the proposed changes. In our usability evaluation (n = 12), developers reported that this mixed-initiative, example-driven experience is intuitive, complements their existing workflow, and offers a unified approach to conveniently tackle a variety of common yet frustrating scenarios for code transformations.
Adroid1 enables users to borrow precision and accuracy from a robotic arm when using hand-held tools. When a tool is mounted to the robot, the user can hold and move the tool directly—Adroid measures the user’s applied forces and commands the robot to move in response. Depending on the tool and scenario, Adroid can selectively restrict certain motions. In the resulting interaction, the robot acts like a virtual “jig” which constrains the tool’s motion, augmenting the user’s accuracy, technique, and strength, while not diminishing their agency during open-ended fabrication tasks. We complement these hands-on interactions with projected augmented reality for visual feedback about the state of the system. We show how tools augmented by Adroid can support hands-on making and discuss how it can be configured to support other tasks within and beyond fabrication.
We present an optimization-based approach that automatically adapts Mixed Reality (MR) interfaces to different physical environments. Current MR layouts, including the position and scale of virtual interface elements, need to be manually adapted by users whenever they move between environments, and whenever they switch tasks. This process is tedious and time consuming, and arguably needs to be automated for MR systems to be beneficial for end users. We contribute an approach that formulates this challenge as a combinatorial optimization problem and automatically decides the placement of virtual interface elements in new environments. To achieve this, we exploit the semantic association between the virtual interface elements and physical objects in an environment. Our optimization furthermore considers the utility of elements for users’ current task, layout factors, and spatio-temporal consistency to previous layouts. All those factors are combined in a single linear program, which is used to adapt the layout of MR interfaces in real time. We demonstrate a set of application scenarios, showcasing the versatility and applicability of our approach. Finally, we show that compared to a naive adaptive baseline approach that does not take semantic associations into account, our approach decreased the number of manual interface adaptations by 33%.
We introduce FabHydro, a set of rapid and low-cost methods to prototype interactive hydraulic devices based on an off-the-shelf 3D printer and flexible photosensitive resin. We first present printer settings and custom support structures to warrant the successful print of flexible and deformable objects. We then demonstrate two printing methods to seal the transmission fluid inside these deformable structures: the Submerged Printing process that seals the liquid resin without manual assembly, and the Printing with Plugs method that allows the use of different transmission fluids without modification to the printer. Following the printing methods, we report a design space with a range of 3D printable primitives, including the hydraulic generator, transmitter, and actuator. To demonstrate the feasibility of our approaches and the breadth of new designs that they enable, we showcase a set of examples from a printed robotic gripper that can be operated at a distance to a mobile phone stand that serves as a status reminder by repositioning the user’s phone. We conclude with a discussion of our approach’s limitations and possible future improvements.
Knowledge bases, such as Google knowledge graph, contain millions of entities (people, places, etc.) and billions of facts about them. While much is known about entities, little is known about the actions these entities relate to. On the other hand, the Web has lots of information about human tasks. A website for restaurant reservations, for example, implicitly knows about various restaurant-related actions (making reservations, delivering food, etc.), the inputs these actions require and their expected output; it can also be automated to execute those actions. To harvest action knowledge from websites, we propose Etna. Users demonstrate how to accomplish various tasks in a website, and Etna constructs an action-state model of the website visualized as an action graph. An action graph includes definitions of tasks and actions, knowledge about their start/end states, and execution scripts for their automation. We report on our experience in building action-state models of many commercial websites and use cases that leveraged them.
We contribute a novel user- and activity-independent kinematics-based regressive model for continuously predicting ballistic hand movements in virtual reality (VR). Compared to prior work on end-point prediction, continuous hand trajectory prediction in VR enables an early estimation of future events such as collisions between the user’s hand and virtual objects such as UI widgets. We developed and validated our prediction model through a user study with 20 participants. The study collected hand motion data with a 3D pointing task and a gaming task with three popular VR games. Results show that our model can achieve a low Root Mean Square Error (RMSE) of 0.80 cm, 0.85 cm and 3.15 cm from future hand positions ahead of 100 ms, 200 ms and 300 ms respectively across all the users and activities. In pointing tasks, our predictive model achieves an average angular error of 4.0° and 1.5° from the true landing position when 50% and 70% of the way through the movement. A follow-up study showed that the model can be applied to new users and new activities without further training.
Design tools and research regarding laser-cut architectures have been widely explored in the past decade. However, such discussion has mostly revolved around technical and structural design questions instead of another essential element of laser-cut models — assembly — a process that relies heavily on components’ visual affordance, therefore less accessible to blind or low vision (BLV) people. To narrow the gap in this area, we co-designed with 7 BLV people to examine their assembly experience with different laser-cut architectures. From their feedback, we proposed several design heuristics and guidelines for Daedalus, a generative design tool that can produce tactile aids for laser-cut assembly given a few high-level manual inputs. We validate the proposed aids in a user study with 8 new BLV participants. Our results revealed that BLV users can manage laser-cut assembly more efficiently with Daedalus. Going forth from this design iteration, we discuss implications for future research on accessible laser-cut assembly.
Unwanted clutter in a photo can be incredibly distracting. However in the moment, photographers have so many things to simultaneously consider, it can be hard to catch every detail. Designers have long known the benefits of abstraction for seeing a more holistic view of their design. We wondered if, similarly, some form of image abstraction might be helpful for photographers as an alternative perspective or “lens” with which to see their image. Specifically, we wondered if such abstraction might draw the photographer’s attention away from details in the subject to noticing objects in the background, such as unwanted clutter. We present our process for designing such a camera overlay, based on the idea of using abstraction to recognize clutter. Our final design uses object-based saliency and edge detection to highlight contrast along subject and image borders, outlining potential distractors in these regions. We describe the implementation and evaluation of a capture-time tool that interactively displays these overlays and find that the tool is helpful for making users more confident in their ability to take decluttered photos that clearly convey their intended story.
We present an approach to in-air design drawing based on the two-stage approach common in 2D design drawing practice. The primary challenge to 3D drawing in-air is the accuracy of users’ strokes. Beautifying or auto-correcting an arbitrary drawing in 2D or 3D is challenging due to ambiguities stemming from many possible interpretations of a stroke. A similar challenge appears when drawing freehand on paper in the real world. 2D design drawing practice (as taught in industrial design school) addresses this by decomposing the process of creating realistic 2D projections of 3D shapes. Designers first create scaffold or construction lines. When drawing shape or structure curves, designers are guided by the scaffolds. Our key insight is that accurate industrial design drawing in 3D becomes tractable when decomposed into auto-correcting scaffold strokes, which have simple relationships with one another, followed by auto-correcting shape strokes with respect to the scaffold strokes. We demonstrate our approach’s effectiveness with an expert study involving industrial designers.
Visual symbols are the building blocks for visual communication. They convey abstract concepts like reform and participation quickly and effectively. When creating graphics with symbols, novice designers often struggle to brainstorm multiple, diverse symbols because they fixate on a few associations instead of broadly exploring different aspects of the concept. We present SymbolFinder, an interactive tool for finding visual symbols for abstract concepts. SymbolFinder molds symbol-finding into a recognition rather than recall task by introducing the user to diverse clusters of words associated with the concept. Users can dive into these clusters to find related, concrete objects that symbolize the concept. We evaluate SymbolFinder with two studies: a comparative user study, demonstrating that SymbolFinder helps novices find more unique symbols for abstract concepts with significantly less effort than a popular image database and a case study demonstrating how SymbolFinder helped design students create visual metaphors for three cover illustrations of news articles.
In this paper, we propose EIT-kit, an electrical impedance tomography toolkit for designing and fabricating health and motion sensing devices. EIT-kit contains (1) an extension to a 3D editor for personalizing the form factor of electrode arrays and electrode distribution, (2) a customized EIT sensing motherboard for performing the measurements, (3) a microcontroller library that automates signal calibration and facilitates data collection, and (4) an image reconstruction library for mobile devices for interpolating and visualizing the measured data. Together, these EIT-kit components allow for applications that require 2- or 4-terminal setups, up to 64 electrodes, and single or multiple (up to four) electrode arrays simultaneously.
We motivate the design of each component of EIT-kit with a formative study, and conduct a technical evaluation of the data fidelity of our EIT measurements. We demonstrate the design space that EIT-kit enables by showing various applications in health as well as motion sensing and control.
Electrical muscle stimulation (EMS) is an emergent technique that miniaturizes force feedback, especially popular for untethered haptic devices, such as mobile gaming, VR, or AR. However, the actuation displayed by interactive systems based on EMS is coarse and imprecise. EMS systems mostly focus on inducing movements in large muscle groups such as legs, arms, and wrists; whereas individual finger poses, which would be required, for example, to actuate a user's fingers to fingerspell even the simplest letters in sign language, are not possible. The lack of dexterity in EMS stems from two fundamental limitations: (1) lack of independence: when a particular finger is actuated by EMS, the current runs through nearby muscles, causing unwanted actuation of adjacent fingers; and, (2) unwanted oscillations: while it is relatively easy for EMS to start moving a finger, it is very hard for EMS to stop and hold that finger at a precise angle; because, to stop a finger, virtually all EMS systems contract the opposing muscle, typically achieved via controllers (e.g., PID)—unfortunately, even with the best controller tuning, this often results in unwanted oscillations. To tackle these limitations, we propose dextrEMS, an EMS-based haptic device featuring mechanical brakes attached to each finger joint. The key idea behind dextrEMS is that while the EMS actuates the fingers, it is our mechanical brake that stops the finger in a precise position. Moreover, it is also the brakes that allow dextrEMS to select which fingers are moved by EMS, eliminating unwanted movements by preventing adjacent fingers from moving. We implemented dextrEMS as an untethered haptic device, weighing only 68g, that actuates eight finger joints independently (metacarpophalangeal and proximal interphalangeal joints for four fingers), which we demonstrate in a wide range of haptic applications, such as assisted fingerspelling, a piano tutorial, guitar tutorial, and a VR game. Finally, in our technical evaluation, we found that dextrEMS outperformed EMS alone by doubling its independence and reducing unwanted oscillations.
We present PneuSeries, a series of modularized inflatables where their inflation and deflation are propagated in-between stage by stage to form various shapes. The key component of PneuSeries is the bidirectional check valve that passively regulates the air flowing in/out from/to adjacent inflatables, allowing each of the inflatables to be inflated/deflated one by one through serial propagation. The form of the inflatable series thus is programmed by the sequential operations of a pump that push/pull the air in/out. In this paper, we explored the design of PneuSeries and implemented working prototypes as a proof of concept. In particular, we built PneuSeries with (1) modularized cubical, cuboidal, tetrahedral, prismatic, and custom inflatables to examine their shape forming, (2) fast assembly connectors to allow quick reconfiguration of the series, and (3) folding mechanism to reduce irregularity of the shrunken inflatables. We also evaluated the inflating and deflating time and the flow rate of the valve for simulating the inflating and deflating process and display the steps and time required to transform in our software. Finally, we demonstrate example objects that show the capability of PneuSeries and its potential applications.
In this paper, we present HoloBoard, an interactive large-format pseduo-holographic display system for lecture based classes. With its unique properties of immersive visual display and transparent screen, we designed and implemented a rich set of novel interaction techniques like immersive presentation, role-play, and lecturing behind the scene that are potentially valuable for lecturing in class. We conducted a controlled experimental study to compare a HoloBoard class with a normal class through measuring students’ learning outcomes and three dimensions of engagement (i.e., behavioral, emotional, and cognitive engagement). We used pre-/post- knowledge tests and multimodal learning analytics to measure students’ learning outcomes and learning experiences. Results indicated that the lecture-based class utilizing HoloBoard lead to slightly better learning outcomes and a significantly higher level of student engagement. Given the results, we discussed the impact of HoloBoard as an immersive media in the classroom setting and suggest several design implications for deploying HoloBoard in immersive teaching practices.
Virtual try-on is a promising application of computer graphics and human computer interaction that can have a profound real-world impact especially during this pandemic. Existing image-based works try to synthesize a try-on image from a single image of a target garment, but it inherently limits the ability to react to possible interactions. It is difficult to reproduce the change of wrinkles caused by pose and body size change, as well as pulling and stretching of the garment by hand. In this paper, we propose an alternative per garment capture and synthesis workflow to handle such rich interactions by training the model with many systematically captured images. Our workflow is composed of two parts: garment capturing and clothed person image synthesis. We designed an actuated mannequin and an efficient capturing process that collects the detailed deformations of the target garments under diverse body sizes and poses. Furthermore, we proposed to use a custom-designed measurement garment, and we captured paired images of the measurement garment and the target garments. We then learn a mapping between the measurement garment and the target garments using deep image-to-image translation. The customer can then try on the target garments interactively during online shopping. The proposed workflow requires certain manual labor, but we believe that the cost is acceptable given that the retailers are already paying significant costs for hiring professional photographers and models, stylists, and editors to take photographs for promotion. Our method can remove the need of hiring these costly professionals. We evaluated the effectiveness of the proposed system with ablation studies and quality comparison with previous virtual try-on methods. We perform a user study to show our promising virtual try-on performances. Moreover, we also demonstrate that we use our method for changing virtual costumes in video conferences. Finally, we provide the collected dataset as the cloth dataset parameterized by various viewing angles, body poses, and sizes.
Automated understanding of user interfaces (UIs) from their pixels can improve accessibility, enable task automation, and facilitate interface design without relying on developers to comprehensively provide metadata. A first step is to infer what UI elements exist on a screen, but current approaches are limited in how they infer how those elements are semantically grouped into structured interface definitions. In this paper, we motivate the problem of screen parsing, the task of predicting UI elements and their relationships from a screenshot. We describe our implementation of screen parsing and provide an effective training procedure that optimizes its performance. In an evaluation comparing the accuracy of the generated output, we find that our implementation significantly outperforms current systems (up to 23%). Finally, we show three example applications that are facilitated by screen parsing: (i) UI similarity search, (ii) accessibility enhancement, and (iii) code generation from UI screenshots.
In the context of increasingly busy lives and mobility constraints, we present a unified space-time approach to support flexible personal scheduling. We distill an analysis of the design requirements of interactive space-time scheduling into a single coherent workflow where users can manipulate a rich vocabulary of spatio-temporal parameters, and plan/explore itineraries that satisfy or optimize the resulting space-time constraints. We demonstrate our approach using a proof-of-concept mobile application that enables exploration of the inter-connected continuum between task scheduling (temporal), and multi-destination route mapping (spatial). We evaluate the application with a user study involving an itinerary reproduction task and a free-form planning task. We also provide usage scenarios illustrating the potential of our approach in various contexts and tasks. Results suggest that our approach fills an important gap between route mapping and calendar scheduling, suggesting a new research direction in personal planning interface design.
Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Words, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across ∼ 22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these models with both automatic accuracy metrics and human rating shows that our approach can generate high-quality summaries for mobile screens. We demonstrate potential use cases of Screen2Words and open-source our dataset and model to lay the foundations for further bridging language and user interfaces.
People often have to remove their phone from an inaccessible location like a pocket to view things like notifications and directions. We explore the idea of viewing such information through the fabric of a pocket using low resolution bright LED matrix displays. A survey confirms viewing information on inaccessible phones is desirable, and establishes types of pockets in garments worn by respondents and what objects are typically put in pockets. A technical evaluation validates that LED light can shine through many common garment fabrics. Based on these results, functional hardware prototypes are constructed to demonstrate different form factors of through-fabric display devices, such as a phone, wallet, a key fob, a pen, and earbud headphone case. A simple interaction vocabulary for viewing key information on these devices is described, and the social and technical aspects of the approach are discussed.
Pointing transfer functions remain predominantly expressed in pixels per input counts, which can generate different visual pointer behaviors with different input and output devices; we show in a first controlled experiment that even small hardware differences impact pointing performance with functions defined in this manner. We also demonstrate the applicability of “hardware-independent” transfer functions defined in physical units. We explore two methods to maintain hardware-independent pointer performance in operating systems that require hardware-dependent definitions: scaling them to the resolutions of the input and output devices, or selecting the OS acceleration setting that produces the closest visual behavior. In a second controlled experiment, we adapted a baseline function to different screen and mouse resolutions using both methods, and the resulting functions provided equivalent performance. Lastly, we provide a tool to calculate equivalent transfer functions between hardware setups, allowing users to match pointer behavior with different devices, and researchers to tune and replicate experiment conditions. Our work emphasizes, and hopefully facilitates, the idea that operating systems should have the capability to formulate pointing transfer functions in physical units, and to adjust them automatically to hardware setups.
Video games remain largely inaccessible to visually impaired people (VIPs). Today’s blind-accessible games are highly simplified renditions of what sighted players enjoy, and they do not give VIPs the same freedom to look around and explore game worlds on their own terms. In this work, we introduce NavStick, an audio-based tool for looking around within virtual environments, with the aim of making 3D adventure video games more blind-accessible. NavStick repurposes a game controller’s thumbstick to allow VIPs to survey what is around them via line-of-sight. In a user study, we compare NavStick with traditional menu-based surveying for different navigation tasks and find that VIPs were able to form more accurate mental maps of their environment with NavStick than with menu-based surveying. In an additional exploratory study, we investigate NavStick in the context of a representative 3D adventure game. Our findings reveal several implications for blind-accessible games, and we close by discussing these.
Freehand gesture is an essential input modality for modern Augmented Reality (AR) user experiences. However, developing AR applications with customized hand interactions remains a challenge for end-users. Therefore, we propose GesturAR, an end-to-end authoring tool that supports users to create in-situ freehand AR applications through embodied demonstration and visual programming. During authoring, users can intuitively demonstrate the customized gesture inputs while referring to the spatial and temporal context. Based on the taxonomy of gestures in AR, we proposed a hand interaction model which maps the gesture inputs to the reactions of the AR contents. Thus, users can author comprehensive freehand applications using trigger-action visual programming and instantly experience the results in AR. Further, we demonstrate multiple application scenarios enabled by GesturAR, such as interactive virtual objects, robots, and avatars, room-level interactive AR spaces, embodied AR presentations, etc. Finally, we evaluate the performance and usability of GesturAR through a user study.
Text input is essential in tablet computer interaction. However, tablet software keyboards face the problem of misrecognizing unintentional touch, which affects efficiency and usability [29, 49]. In this paper, we proposed TypeBoard, a pressure-sensitive touchscreen keyboard that prevents unintentional touches. The TypeBoard allows users to rest their fingers on the touchscreen, which changes the user behavior: on average, users generate 40.83 unintentional touches every 100 keystrokes. The TypeBoard prevents unintentional touch with an accuracy of 98.88%. A typing study showed that the TypeBoard reduced fatigue (p < 0.005) and typing errors (p < 0.01), and improved the touchscreen keyboard’ typing speed by 11.78% (p < 0.005). As users could touch the screen without triggering responses, we added tactile landmarks on the TypeBoard, allowing users to locate the keys by the sense of touch. This feature further improves the typing speed, outperforming the ordinary tablet keyboard by 21.19% (p < 0.001). Results show that pressure-sensitive touchscreen keyboards can prevent unintentional touch, improving usability from many aspects, such as avoiding fatigue, reducing errors, and mediating touch typing on tablets.
Every day we are surrounded by spoken dialog. This medium delivers rich diverse streams of information auditorily; however, systematically understanding dialog can often be non-trivial. Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor, especially when compared to written prose. Furthermore, compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure. These concerns all demonstrate the need for a distinctly speech tailored interactive system to help users understand and navigate the spoken language domain. While individual automatic speech recognition (ASR) and text summarization methods already exist, they are imperfect technologies; neither consider user purpose and intent nor address spoken language induced complications. Consequently, we design a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges. Our system enables users to easily browse and navigate content as well as recover from errors in these underlying technologies. Finally, we present an evaluation of the system which highlights user preference for hierarchical summarization as a tool to quickly skim audio and identify content of interest to the user.
Wearable vibrotactile devices have many potential applications, including sensory substitution for accessibility and notifications. Currently, vibrotactile experimentation is done using large lab setups. However, most practical applications require standalone on-body devices and integration into small form factors. Such integration is time-consuming and requires expertise.
With a goal to democratize wearable haptics we introduce VHP, a vibrotactile haptics platform. It includes a low-power miniature electronics board that can drive up to 12 independent channels of haptic signals with arbitrary waveforms at a 2 kHz sampling rate. The platform can drive vibrotactile actuators including linear resonant actuators and voice coils. The control hardware is battery-powered and programmable, and has multiple input options, including serial and Bluetooth, as well as the ability to synthesize haptic signals internally. We developed current-based loading sensing, thus allowing for unique features such as actuator auto-classification, and skin-contact quality sensing. Our technical evaluations showed that the system met all our initial design criteria and is an improvement over prior methods as it allows all-day wear, has low latency, has battery life between 3 and 25 hours, and can run 12 actuators simultaneously.
We demonstrate unique applications that would be time-consuming to develop without the VHP platform. We show that VHP can be used as bracelet, sleeve and phone-case form factors. The bracelet was programmed with an audio-to-tactile interface and was successfully worn for multiple days over months by developers. To facilitate more use of this platform, we open-source our design and plan to make the hardware widely available. We hope this work will motivate the use and study of vibrotactile all-day wearable devices.
We present situated live programming for human-robot collaboration, an approach that enables users with limited programming experience to program collaborative applications for human-robot interaction. Allowing end users, such as shop floor workers, to program collaborative robots themselves would make it easy to “retask” robots from one process to another, facilitating their adoption by small and medium enterprises. Our approach builds on the paradigm of trigger-action programming (TAP) by allowing end users to create rich interactions through simple trigger-action pairings. It enables end users to iteratively create, edit, and refine a reactive robot program while executing partial programs. This live programming approach enables the user to utilize the task space and objects by incrementally specifying situated trigger-action pairs, substantially lowering the barrier to entry for programming or reprogramming robots for collaboration. We instantiate situated live programming in an authoring system where users can create trigger-action programs by annotating an augmented video feed from the robot’s perspective and assign robot actions to trigger conditions. We evaluated this system in a study where participants (n = 10) developed robot programs for solving collaborative light-manufacturing tasks. Results showed that users with little programming experience were able to program HRC tasks in an interactive fashion and our situated live programming approach further supported individualized strategies and workflows. We conclude by discussing opportunities and limitations of the proposed approach, our system implementation, and our study and discuss a roadmap for expanding this approach to a broader range of tasks and applications.
Programming by Demonstration (PbD) is a well-known technique that allows non-programmers to describe interactivity by performing examples of the expected behavior, but it has not been extensively explored for AR. We present Rapido, a novel early-stage prototyping tool to create fully interactive mobile AR prototypes from non-interactive video prototypes using PbD. In Rapido, designers use a mobile AR device to record a video prototype to capture context, sketch assets, and demonstrate interactions. They can demonstrate touch inputs, animation paths, and rules to, e.g., have a sketch follow the focus area of the device or the user’s world-space touches. Simultaneously, a live website visualizes an editable overview of all the demonstrated examples and infers a state machine of the user flow. Our key contribution is a method that enables designers to turn a video prototype into an executable state machine through PbD. The designer switches between these representations to interactively refine the final interactive prototype. We illustrate the power of Rapido’s approach by prototyping the main interactions of three popular AR mobile applications.
Detecting emotions while driving remains a challenge in Human-Computer Interaction. Current methods to estimate the driver’s experienced emotions use physiological sensing (e.g., skin-conductance, electroencephalography), speech, or facial expressions. However, drivers need to use wearable devices, perform explicit voice interaction, or require robust facial expressiveness. We present VEmotion (Virtual Emotion Sensor), a novel method to predict driver emotions in an unobtrusive way using contextual smartphone data. VEmotion analyzes information including traffic dynamics, environmental factors, in-vehicle context, and road characteristics to implicitly classify driver emotions. We demonstrate the applicability in a real-world driving study (N = 12) to evaluate the emotion prediction performance. Our results show that VEmotion outperforms facial expressions by 29% in a person-dependent classification and by 8.5% in a person-independent classification. We discuss how VEmotion enables empathic car interfaces to sense the driver’s emotions and will provide in-situ interface adaptations on-the-go.
Recent research showed how to import laser cut 3D models encoded in the form of 2D cutting plans into a 3D editor (assembler3 [28]), which allows users to perform parametric manipulations on such models. In contrast to assembler3 , which requires users to perform this process manually, we present autoAssembler, which performs this process automatically. AutoAssembler uses a beam search algorithm to search possible ways of assembling plates. It uses joints on these plates to combine them into assembly candidates. It thereby preferably pursues candidates (1) that have no intersecting plates, (2) that fit into a small bounding box, (3) that use plates whose joints fit together well, (4) that do not add many unpaired joints, (5) that make use of constraints posed by other plates, and (6) that conform to symmetry axes of the plates. This works for models that have at least one edge joint (finger or t-joint). In our technical evaluation, we imported 66 models using autoAssembler. AutoAssembler assembled 79% of those models fully automatically; another 18% of models required on average 2.7 clicks of post-processing, for an overall success rate of 97%.
Despite the increasing complexity and scale of people’s online activities, browser interfaces have stayed largely the same since tabs were introduced in major browsers nearly 20 years ago. The gap between simple tab-based browser interfaces and the complexity of users’ tasks can lead to serious adverse effects – commonly referred to as “tab overload.” This paper introduces a Chrome extension called Tabs.do, which explores bringing a task-centric approach to the browser, helping users to group their tabs into tasks and then organize, prioritize, and switch between those tasks fluidly. To lower the cost of importing, Tabs.do uses machine learning to make intelligent suggestions for grouping users’ open tabs into task bundles by exploiting behavioral and semantic features. We conducted a field deployment study where participants used Tabs.do with their real-life tasks in the wild, and showed that Tabs.do can decrease tab clutter, enabled users to create rich task structures with lightweight interactions, and allowed participants to context-switch among tasks more efficiently.
We introduce HowToCut, an automatic approach that converts a Markdown-formatted tutorial into an interactive video that presents the visual instructions with a synthesized voiceover for narration. HowToCut extracts instructional content from a multimedia document that describes a step-by-step procedure. Our method selects and converts text instructions to a voiceover. It makes automatic editing decisions to align the narration with edited visual assets, including step images, videos, and text overlays. We derive our video editing strategies from an analysis of 125 web tutorials and apply Computer Vision techniques to the assets. To enable viewers to interactively navigate the tutorial, HowToCut’s conversational UI presents instructions in multiple formats upon user commands. We evaluated our automatically-generated video tutorials through user studies (N=20) and validated the video quality via an online survey (N=93). The evaluation shows that our method was able to effectively create informative and useful instructional videos from a web tutorial document for both reviewing and following.
Digital fabrication machines for makers have expanded access to manufacturing processes such as 3D printing, laser cutting, and milling. While digital models encode the data necessary for a machine to manufacture an object, understanding the trade-offs and limitations of the machines themselves is crucial for successful production. Yet, this knowledge is not codified and must be gained through experience, which limits both adoption of and creative exploration with digital fabrication tools. To formally represent machines, we present Taxon, a language that encodes a machine’s high-level characteristics, physical composition, and performable actions. With this programmatic foundation, makers can develop rules of thumb that filter for appropriate machines for a given job and verify that actions are feasible and safe. We integrate the language with a browser-based system for simulating and experimenting with machine workflows. The system lets makers engage with rules of thumb and enrich their understanding of machines. We evaluate Taxon by representing several machines from both common practice and digital fabrication research. We find that while Taxon does not exhaustively describe all machines, it provides a starting point for makers and HCI researchers to develop tools for reasoning about and making decisions with machines.
Despite an exciting area with many promises for innovations in wearable interactive systems, research on interaction techniques for smart rings lacks structured knowledge and readily-available resources for designers to systematically attain such innovations. In this work, we conduct a systematic literature review of ring-based gesture input, from which we extract key results and a large set of gesture commands for ring, ring-like, and ring-ready devices. We use these findings to deliver GestuRING, our web-based tool to support design of ring-based gesture input. GestuRING features a searchable gesture-to-function dictionary of 579 records with downloadable numerical data files and an associated YouTube video library. These resources are meant to assist the community in attaining further innovations in ring-based gesture input for interactive systems.
We present MotionRing, a vibrotactile headband that creates illusory tactile motion around the head by controlling the timing of a 1-D, 360° sparse array of vibration motors. Its unique ring shape enables symmetric and asymmetric haptic motion experiences, such as when users pass through a medium and when an object passes nearby in any direction. We first conducted a perception study to understand how factors such as vibration motor timing, spacing, duration, intensity, and head region affect the perception of apparent tactile motion. Results showed that illusory tactile motion around the head can be achieved with 12 and 16 vibration motors with angular speed between 0.5-4.9 revolutions per second. We developed a symmetric and an asymmetric tactile motion pattern to enhance the experience of teleportation in VR and dodging footballs, respectively. We conducted a user study to compare the experience of MotionRing vs. static vibration patterns and visual-only feedback. Results showed that illusory tactile motion significantly improved users’ perception of directionality and enjoyment of motion events, and was most preferred by users.
X-Rings is a novel hand-mounted 360° shape display for Virtual Reality that renders objects in 3D and responds to user-applied touch and grasping force. Designed as a modular stack of motor-driven expandable rings (5.7-7.7 cm diameter), X-Rings renders radially-symmetric surfaces graspable by the user’s whole hand. The device is strapped to the palm, allowing the fingers to freely make and break contact with the device. Capacitance sensors and motor current sensing provide estimates of finger touch states and gripping force. We present the results of a user study evaluating participants’ ability to associate device-rendered shapes with visually-rendered objects as well as a demo application that allows users to freely interact with a variety of objects in a virtual environment.
In gaming, accurately rendering input signals on a display is crucial, both spatially and temporally. However, the asynchronicity between the input and output signal frequencies results in unstable responses called jitter. A recent research modeled this jitter mathematically [2]; however, the effect of jitter on human performance is unknown. In this study, we investigated the empirical effect of asynchronicity-induced jitter using a state-of-the-art high-performance mouse and monitor device. In the first part, perceptual user experience under different jitter levels was examined using the ISO 4120:2004 triangle test protocol, and a jitter of over 0.3 ms could be perceived by sensitive subjects. In the second part, we measured the pointing task performance for different jitter levels using the ISO 9241-9 (i.e., Fitts’ law) test, and found that the pointing performance was unaffected up to a jitter of 1 ms. Finally, we recommended display and mouse combinations based on our results, which indicated the need for a higher mouse polling rate than that of the current standard 1000-Hz USB mouse.
Data from multifactor HCI experiments often violates the assumptions of parametric tests (i.e., nonconforming data). The Aligned Rank Transform (ART) has become a popular nonparametric analysis in HCI that can find main and interaction effects in nonconforming data, but leads to incorrect results when used to conduct post hoc contrast tests. We created a new algorithm called ART-C for conducting contrast tests within the ART paradigm and validated it on 72,000 synthetic data sets. Our results indicate that ART-C does not inflate Type I error rates, unlike contrasts based on ART, and that ART-C has more statistical power than a t-test, Mann-Whitney U test, Wilcoxon signed-rank test, and ART. We also extended an open-source tool called ARTool with our ART-C algorithm for both Windows and R. Our validation had some limitations (e.g., only six distribution types, no mixed factorial designs, no random slopes), and data drawn from Cauchy distributions should not be analyzed with ART-C.
Communication software such as Clubhouse and Zoom has evolved to be an integral part of many people’s daily lives. However, due to network bandwidth constraints and concerns about privacy, cameras in video conferencing are often turned off by participants. This leads to a situation in which people can only see each others’ profile images, which is essentially an audio-only experience. Even when switched on, video feeds do not provide accurate cues as to who is talking to whom. This paper introduces GazeChat, a remote communication system that visually represents users as gaze-aware 3D profile photos. This satisfies users’ privacy needs while keeping online conversations engaging and efficient. GazeChat uses a single webcam to track whom any participant is looking at, then uses neural rendering to animate all participants’ profile images so that participants appear to be looking at each other. We have conducted a remote user study (N=16) to evaluate GazeChat in three conditions: audio conferencing with profile photos, GazeChat, and video conferencing. Based on the results of our user study, we conclude that GazeChat maintains the feeling of presence while preserving more privacy and requiring lower bandwidth than video conferencing, provides a greater level of engagement than to audio conferencing, and helps people to better understand the structure of their conversation.
This paper introduces a midair balloon interface, a fast and soft interactive object in mid-air. Our approach tackles the trade-off between safety and speed by controlling a soft helium-filled balloon with external actuators and sensors. We developed a prototype system that uses airborne ultrasound phased arrays to propel a balloon and high-speed stereo cameras to track its motion. This configuration realizes both a high thrust/weight ratio and such a soft body that is safe-to-collide. We describe a sight-based interaction and a touch-based interaction that leverage the safety and speed of a midair balloon interface. A sight-based interaction allows the user to keep the object inside her/his view within reach by controlling a balloon to follow the direction of the user’s face. A touch-based interaction allows the user to manipulate the object directly with his/her hand, issue a command by moving his/her finger on the surface, and receive vibrotactile feedback produced by vibrating the balloon with amplitude-modulated ultrasound. We describe the implementation and evaluation of the prototype and explore the application scenarios.
Efficient urban layout generation is an interesting and important problem in many applications dealing with computer graphics and entertainment. We introduce a novel framework for intuitive and controllable small and large-scale urban layout editing. The key inspiration comes from the observation that cities develop in small incremental changes e.g., a building is replaced, or a new road is created. We introduce a set of atomic operations that consistently modify the city. For example, two buildings are merged, a block is split in two, etc. Our second inspiration comes from volumetric editings, such as clay manipulation, where the manipulated material is preserved. The atomic operations are used in interactive brushes that consistently modify the urban layout. The city is populated with agents. Like volume transfer, the brushes attract or repulse the agents, and blocks can be merged and populated with smaller buildings. We also introduce a large-scale brush that repairs a part of the city by learning style as distributions of orientations and intersections.
Text entry is an important and frequent task in interactive devices including augmented reality head-mounted displays (AR HMDs). In current AR HMDs, there are still two main open challenges to overcome for efficient and usable text entry: arm fatigue due to mid-air input and visual occlusion because of their small see-through displays. To address these challenges, we present iText, a technique for AR HMDs that is hands-free and is based on an imaginary (invisible) keyboard. We first show that it is feasible and practical to use an imaginary keyboard on AR HMDs. Then, we evaluated its performance and usability with three hands-free selection mechanisms: eye blinks (E-Type), dwell (D-Type), and swipe gestures (G-Type). Our results show that users could achieve an average text entry speed of 11.95, 9.03 and 9.84 words per minutes (WPM) with E-Type, D-Type, and G-Type, respectively. Given that iText with E-Type outperformed the other two selection mechanisms in text entry rate and subjective feedback, we ran a third, 5-day study. Our results show that iText with E-Type can achieve an average text entry rate of 13.76 WPM with a mean word error rate of 1.5%. In short, iText can enable efficient eyes-free text entry and can be useful for various application scenarios in AR HMDs.
Non-verbal behavior is essential for embodied agents like social robots, virtual avatars, and digital humans. Existing behavior authoring approaches including keyframe animation and motion capture are too expensive to use when there are numerous utterances requiring gestures. Automatic generation methods show promising results, but their output quality is not satisfactory yet, and it is hard to modify outputs as a gesture designer wants. We introduce a new gesture generation toolkit, named SGToolkit, which gives a higher quality output than automatic methods and is more efficient than manual authoring. For the toolkit, we propose a neural generative model that synthesizes gestures from speech and accommodates fine-level pose controls and coarse-level style controls from users. The user study with 24 participants showed that the toolkit is favorable over manual authoring, and the generated gestures were also human-like and appropriate to input speech. The SGToolkit is platform agnostic, and the code is available at https://github.com/ai4r/SGToolkit.
The large potential of force feedback devices for interacting in Virtual Reality (VR) has been illustrated in a plethora of research prototypes. Yet, these devices are still rarely used in practice and it remains an open challenge how to move this research into practice. To that end, we contribute a participatory design study on the use of haptic feedback devices in the automotive industry. Based on a 10-month observing process with 13 engineers, we developed STRIVE, a string-based haptic feedback device. In addition to the design of STRIVE, this process led to a set of requirements for introducing haptic devices into industrial settings, which center around a need for flexibility regarding forces, comfort, and mobility. We evaluated STRIVE with 16 engineers in five different day-to-day automotive VR use cases. The main results show an increased level of trust and perceived safety as well as further challenges towards moving haptics research into practice.
Head shapes play an important role in 3D character design. In this work, we propose SimpModeling, a novel sketch-based system for helping users, especially amateur users, easily model 3D animalmorphic heads - a prevalent kind of heads in character design. Although sketching provides an easy way to depict desired shapes, it is challenging to infer dense geometric information from sparse line drawings. Recently, deepnet-based approaches have been taken to address this challenge and try to produce rich geometric details from very few strokes. However, while such methods reduce users’ workload, they would cause less controllability of target shapes. This is mainly due to the uncertainty of the neural prediction. Our system tackles this issue and provides good controllability from three aspects: 1) we separate coarse shape design and geometric detail specification into two stages and respectively provide different sketching means; 2) in coarse shape designing, sketches are used for both shape inference and geometric constraints to determine global geometry, and in geometric detail crafting, sketches are used for carving surface details; 3) in both stages, we use the advanced implicit-based shape inference methods, which have strong ability to handle the domain gap between freehand sketches and synthetic ones used for training. Experimental results confirm the effectiveness of our method and the usability of our interactive system. We also contribute to a dataset of high-quality 3D animal heads, which are manually created by artists.
Natural language interfaces (NLIs) have become a prevalent medium for conducting visual data analysis, enabling people with varying levels of analytic experience to ask questions of and interact with their data. While there have been notable improvements with respect to language understanding capabilities in these systems, fundamental user experience and interaction challenges including the lack of analytic guidance (i.e., knowing what aspects of the data to consider) and discoverability of natural language input (i.e., knowing how to phrase input utterances) persist. To address these challenges, we investigate utterance recommendations that contextually provide analytic guidance by suggesting data features (e.g., attributes, values, trends) while implicitly making users aware of the types of phrasings that an NLI supports. We present Snowy, a prototype system that generates and recommends utterances for visual analysis based on a combination of data interestingness metrics and language pragmatics. Through a preliminary user study, we found that utterance recommendations in Snowy support conversational visual analysis by guiding the participants’ analytic workflows and making them aware of the system’s language interpretation capabilities. Based on the feedback and observations from the study, we discuss potential implications and considerations for incorporating recommendations in future NLIs for visual analysis.
Camera tracking is an essential building block in a myriad of HCI applications. For example, commercial VR devices are equipped with dedicated hardware, such as laser-emitting beacon stations, to enable accurate tracking of VR headsets. However, this hardware remains costly. On the other hand, low-cost solutions such as IMU sensors and visual markers exist, but they suffer from large tracking errors. In this work, we bring high accuracy and low cost together to present MoiréBoard, a new 3-DOF camera position tracking method that leverages a seemingly irrelevant visual phenomenon, the moiré effect. Based on a systematic analysis of the moiré effect under camera projection, MoiréBoard requires no power nor camera calibration. It can be easily made at a low cost (e.g., through 3D printing), ready to use with any stock mobile devices with a camera. Its tracking algorithm is computationally efficient, able to run at a high frame rate. Although it is simple to implement, it tracks devices at high accuracy, comparable to the state-of-the-art commercial VR tracking systems.
Model-driven policymaking for epidemic control is a challenging collaborative process. It begins when a team of public-health officials, epidemiologists, and economists construct a reasonably predictive disease model representative of the team’s region of interest as a function of its unique socio-economic and demographic characteristics. As the team considers possible interventions such as school closures, social distancing, vaccination drives, etc., they need to simultaneously model each intervention’s effect on disease spread and economic cost. The team then engages in an extensive what-if analysis process to determine a cost-effective policy: a schedule of when, where and how extensively each intervention should be applied. This policymaking process is often an iterative and laborious programming-intensive effort where parameters are introduced and refined, model and intervention behaviors are modified, and schedules changed. We have designed and developed EpiPolicy to support this effort.
EpiPolicy is a policy aid and epidemic simulation tool that supports the mathematical specification and simulation of disease and population models, the programmatic specification of interventions and the declarative construction of schedules. EpiPolicy’s design supports a separation of concerns in the modeling process and enables capabilities such as the iterative and automatic exploration of intervention plans with Monte Carlo simulations to find a cost-effective one. We report expert feedback on EpiPolicy. In general, experts found EpiPolicy’s capabilities powerful and transformative, when compared with their current practice.
Entering text precisely by voice, users might encounter colloquial inserts, inappropriate wording, and recognition errors, which brings difficulties to voice editing. Users need to locate the errors and then correct them. In eyes-free scenarios, this select-modify mode brings a cognitive burden and a risk of error. This paper introduces neural networks and pre-trained models to understand users’ revision intention based on semantics, reducing the need for the information from users’ statements. We present two strategies. One is to remove the colloquial inserts automatically. The other is to allow users to edit by just speaking out the target words without having to say the context and the incorrect text. Accordingly, our approach can predict whether to insert or replace, the incorrect text to replace, and the position to insert. We implement these strategies in SmartEdit, an eyes-free voice input agent controlled with earphone buttons. The evaluation shows that our techniques reduce the cognitive load and decrease the average failure rate by 54.1% compared to descriptive command or re-speaking.
Interface designers often refer to UI behavior examples found in the wild (e.g., commercial websites) for reference or design inspiration. While past research has looked at retargeting interface and webpage design, limited work has explored the challenges in retargeting interactive visual behaviors. We introduce Umitation, a system that helps designers extract, edit, and adapt example front-end UI behaviors to target websites. Umitation can also help designers specify the desired behaviors and reconcile their intended interaction details with their existing UI. In a qualitative evaluation, we found evidence that Umitation helps participants extract and retarget dynamic front-end UI behavior examples quickly and expressively.
Effective haptic feedback in virtual reality (VR) is an essential element for creating convincing immersive experiences. To design such feedback, state-of-the-art VR setups provide APIs for programmatically generating controller vibration patterns. While tools for designing vibrotactile feedback keep evolving, they often require expert knowledge and rarely support direct manipulation methods for mapping feedback to user interactions within the VR environment. To address these challenges, we contribute a novel concept called Weirding Haptics, that supports fast-prototyping by leveraging the user’s voice to design such feedback while manipulating virtual objects in-situ. Through a pilot study (N = 9) focusing on how tactile experiences are vocalized during object manipulation, we identify spatio-temporal mappings and supporting features needed to produce intended vocalizations. To study our concept, we built a VR design tool informed by the results of the pilot study. This tool enables users to design tactile experiences using their voice while manipulating objects, provides a set of modifiers for fine-tuning the created experiences in VR, and allows to rapidly compare various experiences by feeling them. Results from a validation study (N = 8) show that novice hapticians can vocalize experiences and refine their designs with the fine-tuning modifiers to match their intentions. We conclude our work by discussing uncovered design implications for direct manipulation and vocalization of vibrotactile feedback in immersive virtual environments.
Tactile feedback of an object’s surface enables us to discern its material properties and affordances. This understanding is used in digital fabrication processes by creating objects with high-resolution surface variations to influence a user’s tactile perception. As the design of such surface haptics commonly relies on knowledge from real-life experiences, it is unclear how to adapt this information for digital design methods. In this work, we investigate replicating the haptics of real materials. Using an existing process for capturing an object’s microgeometry, we digitize and reproduce the stable surface information of a set of 15 fabric samples. In a psychophysical experiment, we evaluate the tactile qualities of our set of original samples and their replicas. From our results, we see that direct reproduction of surface variations is able to influence different psychophysical dimensions of the tactile perception of surface textures. While the fabrication process did not preserve all properties, our approach underlines that replication of surface microgeometries benefits fabrication methods in terms of haptic perception by covering a large range of tactile variations. Moreover, by changing the surface structure of a single fabricated material, its material perception can be influenced. We conclude by proposing strategies for capturing and reproducing digitized textures to better resemble the perceived haptics of the originals.
We present Roadkill, a software tool that converts 3D models to 2D cutting plans for laser cutting—such that the resulting layouts allow for fast assembly. Roadkill achieves this by putting all relevant information into the cutting plan: (1) Thumbnails indicate which area of the model a set of parts belongs to. (2) Parts with exposed finger joints are easy to access, thereby suggesting to start assembly here. (3) Openings in the sheet act as jigs, affording assembly within the sheet. (4) Users continue assembly by inserting what has already been assembled into parts that are immediately adjacent or are pointed to by arrows. Roadkill maximizes the number of joints rendered in immediate adjacency by breaking down models into “subassemblies.” Within a subassembly, Roadkill holds the parts together using break-away tabs. (5) Users complete subassemblies according to their labels 1, 2, 3…, following 1 -> 1 links to insert subassemblies into other subassemblies, until all parts come together. In our user study, Roadkill allowed participants to assemble layouts 2.4 times faster than layouts generated by a traditional pair-wise labeling of plates.
We propose a haptic device that alters the perceived softness of real rigid objects without requiring to instrument the objects. Instead, our haptic device works by restricting the user's fingerpad lateral deformation via a hollow frame that squeezes the sides of the fingerpad. This causes the fingerpad to become bulgier than it originally was—when users touch an object's surface with their now-restricted fingerpad, they feel the object to be softer than it is. To illustrate the extent of softness illusion induced by our device, touching the tip of a wooden chopstick will feel as soft as a rubber eraser. Our haptic device operates by pulling the hollow frame using a motor. Unlike most wearable haptic devices, which cover up the user's fingerpad to create force sensations, our device creates softness while leaving the center of the fingerpad free, which allows the users to feel most of the object they are interacting with. This makes our device a unique contribution to altering the softness of everyday objects, creating “buttons” by softening protrusions of existing appliances or tangibles, or even, altering the softness of handheld props for VR. Finally, we validated our device through two studies: (1) a psychophysics study showed that the device brings down the perceived softness of any object between 50A-90A to around 40A (on Shore A hardness scale); and (2) a user study demonstrated that participants preferred our device for interactive applications that leverage haptic props, such as making a VR prop feel softer or making a rigid 3D printed remote control feel softer on its button.
Today’s touchscreen devices commonly detect the coordinates of user input through capacitive sensing. Yet, these coordinates are the mere 2D manifestations of the more complex 3D configuration of the whole hand—a sensation that touchscreen devices so far remain oblivious to. In this work, we introduce the problem of reconstructing a 3D hand skeleton from capacitive images, which encode the sparse observations captured by touch sensors. These low-resolution images represent intensity mappings that are proportional to the distance to the user’s fingers and hands.
We present the first dataset of capacitive images with corresponding depth maps and 3D hand pose coordinates, comprising 65,374 aligned records from 10 participants. We introduce our supervised method TouchPose, which learns a 3D hand model and a corresponding depth map using a cross-modal trained embedding from capacitive images in our dataset. We quantitatively evaluate TouchPose’s accuracy in touch classification, depth estimation, and 3D joint reconstruction, showing that our model generalizes to hand poses it has never seen during training and can infer joints that lie outside the touch sensor’s volume.
Enabled by TouchPose, we demonstrate a series of interactive apps and novel interactions on multitouch devices. These applications show TouchPose’s versatile capability to serve as a general-purpose model, operating independent of use-case, and establishing 3D hand pose as an integral part of the input dictionary for application designers and developers. We also release our dataset, code, and model to enable future work in this domain.
Fiber – a primitive yet ubiquitous form of material – intertwines with our bodies and surroundings, from constructing our fibrous muscles that enable our movement, to forming fabrics that intimately interface with our skin. In soft robotics and advanced materials science research, actuated fibers are gaining interest as thin, flexible materials that can morph in response to external stimuli. In this paper, we build on fluidic artificial muscles research to develop OmniFiber - a soft, line-based material system for designing movement-based interactions. We devised actuated thin (øouter < 1.8 mm) fluidic fibers with integrated soft sensors that exhibit perceivably strong forces, up to 19 N at 0.5 MPa, and a high speed of linear actuation peaking at 150mm/s. These allow to flexibly weave them into everyday tangible interactions; including on-body haptic devices for embodied learning, synchronized tangible interfaces for remote communication, and robotic crafting for expressivity. The design of such interactive capabilities is supported by OmniFiber’s design space, accessible fabrication pipeline, and a fluidic I/O control system to bring omni-functional fluidic fibers to the HCI toolbox of interactive morphing materials.
Facial paralysis is the most common facial nerve disorder. It causes functional and aesthetic deficits that often lead to emotional distress and affect psychosocial well-being. One form of treatment is mirror therapy, which has shown potential but has several mirror-related drawbacks that limit its effectiveness. We propose FaraPy, the first mobile augmented reality mirror therapy system for facial paralysis that provides real-time feedback and tracks user progress over time. We developed an efficient convolutional neural network to detect muscle activations and intensities as users perform facial exercises in front of a mobile device camera. Our model outperforms the state-of-the-art model on benchmark data for detecting facial action unit intensity. Our user study (n=20) shows high user satisfaction and greater preference for our interactive system over traditional mirror therapy.
In many engineering disciplines such as circuit board, chip, and mechanical design, a hardware description language (HDL) approach provides important benefits over direct manipulation interfaces by supporting concepts like abstraction and generator meta-programming. While several such HDLs have emerged recently and promised power and flexibility, they also present challenges – especially to designers familiar with current graphical workflows. In this work, we investigate an IDE approach to provide a graphical editor for a board-level circuit design HDL. Unlike GUI builders which convert an entire diagram to code, we instead propose generating equivalent HDL from individual graphical edit actions. By keeping code as the primary design input, we preserve the full power of the underlying HDL, while remaining useful even to advanced users. We discuss our concept, design considerations such as performance, system implementation, and report on the results of an exploratory remote user study with four experienced hardware designers.
3D position tracking on smartphones has the potential to unlock a variety of novel applications, but has not been made widely available due to limitations in smartphone sensors. In this paper, we propose ReflecTrack, a novel 3D acoustic position tracking method for commodity dual-microphone smartphones. A ubiquitous speaker (e.g., smartwatch or earbud) generates inaudible Frequency Modulated Continuous Wave (FMCW) acoustic signals that are picked up by both smartphone microphones. To enable 3D tracking with two microphones, we introduce a reflective surface that can be easily found in everyday objects near the smartphone. Thus, the microphones can receive sound from the speaker and echoes from the surface for FMCW-based acoustic ranging. To simultaneously estimate the distances from the direct and reflective paths, we propose the echo-aware FMCW technique with a new signal pattern and target detection process. Our user study shows that ReflecTrack achieves a median error of 28.4 mm in the 60cm × 60cm × 60cm space and 22.1 mm in the 30cm × 30cm × 30cm space for 3D positioning. We demonstrate the easy accessibility of ReflecTrack using everyday surfaces and objects with several typical applications of 3D position tracking, including 3D input for smartphones, fine-grained gesture recognition, and motion tracking in smartphone-based VR systems.
In this paper, we present a method to integrate sensing capabilities into 3D printable metamaterial structures comprised of cells, which enables the creation of monolithic input devices for HCI. We accomplish this by converting select opposing cell walls within the metamaterial device into electrodes, thereby creating capacitive sensors. When a user interacts with the object and applies a force, the distance and overlapping area between opposing cell walls change, resulting in a measurable capacitance variation.
To help designers create interactive metamaterial devices, we contribute a design and fabrication pipeline based on multi-material 3D printing. Our 3D editor automatically places conductive cells in locations that are most affected by deformation during interaction and thus are most suitable as sensors. On export, our editor creates two files, one for conductive and one for non-conductive cell walls, which designers can fabricate on a multi-material 3D printer. Our applications show that designers can create metamaterial devices that sense various interactions, including sensing acceleration, binary state, shear, and magnitude and direction of applied force.
Trusscillator is an end-to-end system that allows non-engineers to create human-scale human-powered devices that perform oscillatory movements, such as playground equipment, workout devices, and interactive kinetic installations. While recent research has been focusing on generating mechanisms that produce specific movement-path, without considering the required energy for the motion (kinematic approach), Trusscillator supports users in designing mechanisms that recycle energy in the system in the form of oscillating mechanisms (dynamic approach), specifically with the help of coil-springs. The presented system features a novel set of tools tailored for designing the dynamic experience of the motion. These tools allow designers to focus on user experience-specific aspects, such as motion range, tempo, and effort while abstracting away the underlying technicalities of eigenfrequencies, spring constants, and energy. Since the forces involved in the resulting devices can be high, Trusscillator helps users to fabricate from steel by picking out appropriate steal springs, generating part lists, and producing stencils and welding jigs that help weld with precision. To validate our system, we designed, built, and tested a series of unique playground equipment featuring 2-4 degrees of movement.
Virtual assistants like Google Assistant and Siri often interface with external apps when they cannot directly perform a task. Currently, developers must manually expose the capabilities of their apps to virtual assistants, using App Actions on Android or Shortcuts on iOS. This paper presents savant, a system that automatically generates task shortcuts for virtual assistants by mapping user tasks to relevant UI screens in apps. For a given natural language task (e.g., “send money to Joe”), savant leverages text and semantic information contained within UIs to identify relevant screens, and intent modeling to parse and map entities (e.g., “Joe”) to required UI inputs. Therefore, savant allows virtual assistants to interface with apps and handle new tasks without requiring any developer effort. To evaluate savant, we performed a user study to identify common tasks users perform with virtual assistants. We then demonstrate that savant can find relevant app screens for those tasks and autocomplete the UI inputs.
Mix-and-match character creation tools enable users to quickly produce 2D character illustrations by combining various predefined accessories, like clothes and hairstyles, which are represented as separate, interchangeable artwork layers. However, these accessory layers are often designed to fit only the default body artwork, so users cannot modify the body without manually updating all the accessory layers as well. To address this issue, we present a method that captures and preserves important relationships between artwork layers so that the predefined accessories adapt with the character’s body. We encode these relationships with four types of constraints that handle common interactions between layers: (1) occlusion, (2) attachment at a point, (3) coincident boundaries, and (4) overlapping regions. A rig is a set of constraints that allow a motion or deformation specified on the body to transfer to the accessory layers. We present an automated algorithm for generating such a rig for each accessory layer, but also allow users to select which constraints to apply to specific accessories. We demonstrate how our system supports a variety of modifications to body shape and pose using artwork from mix-and-match data sets.
Today’s consumer virtual reality (VR) systems offer immersive graphics and audio, but haptic feedback is rudimentary – delivered through controllers with vibration feedback or is non-existent (i.e., the hands operating freely in the air). In this paper, we explore an alternative, highly mobile and controller-free approach to haptics, where VR applications utilize the user’s own body to provide physical feedback. To achieve this, we warp (retarget) the locations of a user’s hands such that one hand serves as a physical surface or prop for the other hand. For example, a hand holding a virtual nail can serve as a physical backstop for a hand that is virtually hammering, providing a sense of impact in an air-borne and uninstrumented experience. To illustrate this rich design space, we implemented twelve interactive demos across three haptic categories. We conclude with a user study from which we draw design recommendations.
Steering law reveals a linear relationship between the movement time (MT) and the index of difficulty (ID) in trajectory-based steering tasks. However, it does not relate the variance or distribution of MT to ID. In this paper, we propose and evaluate models that predict the variance and distribution of MT based on ID for steering tasks. We first propose a quadratic variance model which reveals that the variance of MT is quadratically related to ID with the linear coefficient being 0. Empirical evaluation on a new and a previously collected dataset show that the quadratic variance model accounts for between 78% and 97% of variance of observed MT variances; it outperforms other model candidates such as linear and constant models; adding the linear coefficient leads to no improvement on the model fitness. The variance model enables predicting the distribution of MT given ID: we can use the variance model to predict the variance (or scale) parameter and Steering law to predict the mean (or location) parameter of a distribution. We have evaluated six types of distributions for predicting the distribution of MT. Our investigation also shows that positively skewed distribution such as Gamma, Lognormal, Exponentially Modified Gaussian (ExGaussian), and Extreme value distributions outperformed the symmetric distribution such as Gaussian and truncated Gaussian distribution in predicting the MT distribution, and Gamma distribution performed slightly better than other positively skewed distributions. Overall, our research advances the MT prediction of steering tasks from a point estimate to variance and distribution estimates, which provides a more complete understanding of steering behavior and quantifies the uncertainty of MT prediction.
We present HelpViz, a tool for generating contextual visual mobile tutorials from text-based instructions that are abundant on the web. HelpViz transforms text instructions to graphical tutorials in batch, by extracting a sequence of actions from each text instruction through an instruction parsing model, and executing the extracted actions on a simulation infrastructure that manages an array of Android emulators. The automatic execution of each instruction produces a set of graphical and structural assets, including images, videos, and metadata such as clicked elements for each step. HelpViz then synthesizes a tutorial by combining parsed text instructions with the generated assets, and contextualizes the tutorial to user interaction by tracking the user’s progress and highlighting the next step. Our experiments with HelpViz indicate that our pipeline improved tutorial execution robustness and that participants preferred tutorials generated by HelpViz over text-based instructions. HelpViz promises a cost-effective approach for generating contextual visual tutorials for mobile interaction at scale.
People with low vision interact with smartphones using assistive technologies like screen magnifiers, which provide built-in touch gestures to pan and zoom onscreen content. These gestures are often cumbersome and require bimanual interaction. Of particular interest is panning gestures, which are issued frequently, which involve 2- or 3-finger dragging. This paper aims to utilize tilt-based interaction as a single-handed alternative to built-in panning gestures. To that end, we first identified our design space from the literature and conducted an exploratory user study with 12 low-vision participants to understand key challenges. Among many findings, the study revealed that built-in panning gestures are error-prone, and most tilt-based interaction techniques are designed for sighted users, which low vision users struggle to use as-is. We addressed these challenges by adapting low-vision users’ interaction behavior and proposed Tilt-Explore, a new screen magnifier mode that enables tilt-to-pan. A second study with 16 low-vision participants revealed that, compared to built-in gestures, the participants were significantly less error-prone; and for lower magnification scale (e.g., <4x), they were significantly more efficient with Tilt-Explore. These findings indicate Tilt-Explore is a promising alternative to built-in panning gestures.
Clinical documentation can be transformed by Electronic Health Records, yet the documentation process is still a tedious, time-consuming, and error-prone process. Clinicians are faced with multi-faceted requirements and fragmented interfaces for information exploration and documentation. These challenges are only exacerbated in the Emergency Department—clinicians often see 35 patients in one shift, during which they have to synthesize an often previously unknown patient’s medical records in order to reach a tailored diagnosis and treatment plan. To better support this information synthesis, clinical documentation tools must enable rapid contextual access to the patient’s medical record. MedKnowts is an integrated note-taking editor and information retrieval system which unifies the documentation and search process and provides concise synthesized concept-oriented slices of the patient’s medical record. MedKnowts automatically captures structured data while still allowing users the flexibility of natural language. MedKnowts leverages this structure to enable easier parsing of long notes, auto-populated text, and proactive information retrieval, easing the documentation burden.
In this paper, we present a method that makes 3D objects appear differently under different viewpoints. We accomplish this by 3D printing lenticular lenses across the curved surface of objects. By calculating the lens distribution and the corresponding surface color patterns, we can determine which appearance is shown to the user at each viewpoint.
We built a 3D editor that takes as input the 3D model, and the visual appearances, i.e. images, to show at different viewpoints. Our 3D editor then calculates the corresponding lens placements and underlying color pattern. On export, the user can use ray tracing to live preview the resulting appearance from each angle. The 3D model, color pattern, and lenses are then 3D printed in one pass on a multi-material 3D printer to create the final 3D object.
To determine the best fabrication parameters for 3D printing lenses, we printed lenses of different sizes and tested various post-processing techniques. To support a large number of different appearances, we compute the lens geometry that has the best trade-off between the number of viewpoints and the protrusion from the object geometry. Finally, we demonstrate our system in practice with a range of use cases for which we show the simulated and physical results side by side.
Touch point distribution models are important tools for designing touchscreen interfaces. In this paper, we investigate how the finger movement direction affects the touch point distribution, and how to account for it in modeling. We propose the Rotational Dual Gaussian model, a refinement and generalization of the Dual Gaussian model, to account for the finger movement direction in predicting touch point distribution. In this model, the major axis of the prediction ellipse of the touch point distribution is along the finger movement direction, and the minor axis is perpendicular to the finger movement direction. We also propose using projected target width and height, in lieu of nominal target width and height to model touch point distribution. Evaluation on three empirical datasets shows that the new model reflects the observation that the touch point distribution is elongated along the finger movement direction, and outperforms the original Dual Gaussian Model in all prediction tests. Compared with the original Dual Gaussian model, the Rotational Dual Gaussian model reduces the RMSE of touch error rate prediction from 8.49% to 4.95%, and more accurately predicts the touch point distribution in target acquisition. Using the Rotational Dual Gaussian model can also improve the soft keyboard decoding accuracy on smartwatches.
Understanding which hand a user holds a smartphone with can help improve the mobile interaction experience. For instance, the layout of the user interface (UI) can be adapted to the holding hand. In this paper, we present HandyTrak, an AI-powered software system that recognizes the holding hand on a commodity smartphone using body silhouette images captured by the front-facing camera. The silhouette images are processed and sent to a customized user-dependent deep learning model (CNN) to infer how the user holds the smartphone (left, right, or both hands). We evaluated our system on each participant’s smartphone at five possible front camera positions in a user study with ten participants under two hand positions (in the middle and skewed) and three common usage cases (standing, sitting, and resting against a desk). The results showed that HandyTrak was able to continuously recognize the holding hand with an average accuracy of 89.03% (SD: 8.98%) at a 2 Hz sampling rate. We also discuss the challenges and opportunities to deploy HandyTrak on different commodity smartphones and potential applications in real-world scenarios.
With the portable and affordable gaze input devices being marketed for end users, gaze-based interactions were getting increasingly popular. Unfortunately, the understanding about the dominant task of gaze input, i.e. eye pointing task, was still not sufficient although a performance model had been specifically proposed in previous study because of that 1) the original model was based on a specific circular target condition, without the ability to predict the performance of acquiring conventional rectangular targets and that 2) there was a lack of explanation from the perspective of the anatomical structure of the eyes. In this paper, we proposed a 2D extension to take account of more general target conditions. Carrying out two experiments, we evaluated the effectiveness of the new model and furthermore we found that the index of difficulty that we redefined for 2D eye pointing (IDeye) was able to properly reflect the asymmetrical impacts of target width and height, and consequently the IDeye model could more accurately and properly predict the performance when acquiring 2D targets than Fitts’ law, no matter what kind of saccades or eye orientations (i.e. saccadic eye movement directions) was employed to acquire the desired targets. According to the results, we provided more useful implications and recommendations for gaze-based applications.
Research software is often built as prototypes that never get widespread usage and are left unmaintained after a few papers get published. To counteract this trend, we propose a method for building research software with scale and sustainability in mind so that it can organically grow a large userbase and enable longer-term research. To illustrate this method, we present the design and implementation of Python Tutor (pythontutor.com), a code visualization tool that is, to our knowledge, one of the most widely-used pieces of research software developed within a university lab. Over the past decade, it has been used by over ten million people in over 180 countries. It has also contributed to 55 publications from 35 research groups in 13 countries. We distilled lessons from working on Python Tutor into three sets of design guidelines: 1) user experience design for scale and sustainability, 2) software architecture design for long-term sustainability, and 3) designing a sustainable software development workflow within academia. These guidelines can enable a student to create long-lasting software that reaches many users and facilitates research from many independent groups.
AirConstellations supports a unique semi-fixed style of cross-device interactions via multiple self-spatially-aware armatures to which users can easily attach (or detach) tablets and other devices. In particular, AirConstellations affords highly flexible and dynamic device formations where the users can bring multiple devices together in-air — with 2–5 armatures poseable in 7DoF within the same workspace — to suit the demands of their current task, social situation, app scenario, or mobility needs. This affords an interaction metaphor where relative orientation, proximity, attaching (or detaching) devices, and continuous movement into and out of ad-hoc ensembles can drive context-sensitive interactions. Yet all devices remain self-stable in useful configurations even when released in mid-air.
We explore flexible physical arrangement, feedforward of transition options, and layering of devices in-air across a variety of multi-device app scenarios. These include video conferencing with flexible arrangement of the person-space of multiple remote participants around a shared task-space, layered and tiled device formations with overview+detail and shared-to-personal transitions, and flexible composition of UI panels and tool palettes across devices for productivity applications. A preliminary interview study highlights user reactions to AirConstellations, such as for minimally disruptive device formations, easier physical transitions, and balancing ”seeing and being seen” in remote work.
HapticBots introduces a novel encountered-type haptic approach for Virtual Reality (VR) based on multiple tabletop-size shape-changing robots. These robots move on a tabletop and change their height and orientation to haptically render various surfaces and objects on-demand. Compared to previous encountered-type haptic approaches like shape displays or robotic arms, our proposed approach has an advantage in deployability, scalability, and generalizability—these robots can be easily deployed due to their compact form factor. They can support multiple concurrent touch points in a large area thanks to the distributed nature of the robots. We propose and evaluate a novel set of interactions enabled by these robots which include: 1) rendering haptics for VR objects by providing just-in-time touch-points on the user’s hand, 2) simulating continuous surfaces with the concurrent height and position change, and 3) enabling the user to pick up and move VR objects through graspable proxy objects. Finally, we demonstrate HapticBots with various applications, including remote collaboration, education and training, design and 3D modeling, and gaming and entertainment.
Today, many web sites offer third-party access to their data through web APIs. But manually encoding URLs with arbitrary endpoints, parameters, authentication handshakes, and pagination, among other things, makes API use challenging and laborious for programmers, and untenable for novices. In addition, each site offers its own idiosyncratic data model, properties, and methods that a new user must learn, even when the sites manage the same common types of information as many others.
In this work, we show how working with web APIs can be dramatically simplified by describing the APIs using a standardized, machine-readable ontology. By surveying a statistical sample of web APIs, we develop a simple ontology that can effectively describe the core functionality of nearly all of them. We then present Shapir, a system that includes a graphical, form-based authoring tool for the API description, from which Shapir can automatically generate a standardized JavaScript library for accessing data on the web site as objects with readable and writeable properties. This enables programmers to access data without learning the details of each API, and indeed allows them to use the same unchanged code for multiple web sites. We then integrate Shapir with Mavo, an HTML language extension for describing web applications declaratively, to also empower plain-HTML authors to access these APIs. In our lab evaluation, we found that programmers are able to accomplish program data management tasks that require multiple API requests 5.6 times faster on average using the Shapir generic library than using the popular Swagger API integration library. Using our Mavo-Shapir integration, even non-programmers were able to build functioning data management applications that access multiple web APIs in just 4 minutes.
Augmenting everyday surfaces with interaction sensing capability that is maintenance-free, low-cost (∼ $1), and in an appropriate form factor is a challenge with current technologies. MARS (Multi-channel Ambiently-powered Realtime Sensing) enables battery-free sensing and wireless communication of touch, swipe, and speech interactions by combining a nanowatt programmable oscillator with frequency-shifted analog backscatter communication. A zero-threshold voltage field-effect transistor (FET) is used to create an oscillator with a low startup voltage (∼ 500 mV) and current (< 2uA), whose frequency can be affected through changes in inductance or capacitance from the user interactions. Multiple MARS systems can operate in the same environment by tuning each oscillator circuit to a different frequency range. The nanowatt power budget allows the system to be powered directly through ambient energy sources like photodiodes or thermoelectric generators. We differentiate MARS from previous systems based on power requirements, cost, and part count and explore different interaction and activity sensing scenarios suitable for indoor environments.
Optical illusions have been brought into recent video games to enhance gaming experiences. However, a large corpus of optical illusions remains unused, while few games incorporate illusions seamlessly. To mitigate the gap, we propose a workflow to guide game designers in applying optical illusions to their video games, i.e., in making more illusion games. In particular, our workflow consists of 5 stages: (1) choosing a game object, (2) searching for a matching illusion, (3) selecting an illusion mechanic, (4) integrating the selected illusion into the game, and (5) optionally revealing the illusion. To facilitate our workflow, we provide a tag database with 163 illusions that are labeled by their in-game visual elements and desired effects. We also provide example editing interfaces of 6 illusions for game designers. We walk through our workflow and showcase 6 resulting illusion games. We implemented these 6 games (with and without illusion) and conducted a 12-participant study to gain a preliminary understanding of how illusions enhance gaming experiences. To evaluate our workflow, we invited 6 game designers and 6 experienced players to follow our workflow and design their own illusion games, where 3 experienced game designers completed 2-week in-depth developments. We report their games, qualitative feedback and discuss reflection on our workflow, database and editing interfaces.
Bluetooth requires device pairing to ensure security in data transmission, encumbering a number of ad-hoc, transactional interactions that require both ease-of-use and “good enough” security: e.g., sharing contact information or secure links to people nearby. We introduce Bit Whisperer, an ad-hoc short-range wireless communication system that enables “walk up and share” data transmissions with “good enough” security. Bit Whisperer transmits data to proximate devices co-located on a solid surface through high frequency, inaudible acoustic signals. The physical surface has two benefits: it enhances acoustic signal transmission by reflecting sound waves as they propagate; and, it makes the domain of communication visible, helping users identify exactly with whom they are sharing data without prior pairing. Through a series of technical evaluations, we demonstrate that Bit Whisperer is robust for common use-cases and secure against likely threats. We also implement three example applications to demonstrate the utility of Whisperer: 1-to-1 local contact sharing, 1-to-N private link sharing to open a secure group chat, and 1-to-N local device authentication.