P6 – Runway

Team CAKE (#13): Connie, Angela, Kiran, Edward

Project Summary

Run­way is a 3D mod­el­ing appli­ca­tion that makes 3D manip­u­la­tion more intu­itive by bring­ing vir­tual objects into the real world, allow­ing nat­ural 3D inter­ac­tion with mod­els using gestures.


We describe here the methods and results of a user study designed to help us with the design of our 3D modelling system. As described in further detail below, our prototype system provides many of the fundamental operations necessary for performing 3D modelling and viewing tasks, all taking advantage of gesture tracking aligned with our mid-air display. The purpose of this experiment is to evaluate the usability of our prototype system, more specifically to determine if there are any unanticipated effects of using our stereoscopic and gesture components together. We are performing this user test because observing actual users interacting with our system in its intended use cases will provide more useful insights on how to improve our system, compared with more artificial prototypes like the low-fi prototype of P4 or with more directed experiments such as only testing stereoscopic perception or gesture recognition.


Our implementation does not significantly differ from its P5 state. We spent the week fixing minor bugs in mesh handling and performance, since these were most likely to affect the user experience.



Our three participants were all undergraduate students here at Princeton, from a variety of backgrounds. Unlike for P2 and P4, when we specifically sought out users who would be more familiar with 3D modeling applications, here we sought users with a more mundane (or unrelated) set of skills, in order to focus more on the usability and intuitiveness of our system. None of our three users were intimately familiar with conventional 3D modeling software, nor are they from any particular field of study (although they did know each other prior to this experiment). From this we hoped to get a wider and perhaps less experienced/professional perspective on how approachable and intuitive our system is to someone who has not had to do these sorts of tasks before.


Hardware used for system:

  • 120Hz 3D Stereoscopic Monitor (Asus VG278H)
  • Nvidia 3D Vision Pro Kit (USB Emitter and Wireless Shutter Glasses)
  • Leap Motion gestural controller
  • Desktop with 3D-Vision compatible graphics card

Additional Hardware for experiment:

  • iPhone for taking photos

This experiment was performed at a desk in a dorm room (this being the location of the monitor and computer).


The tasks cover the fundamentals of navigation in 3D space, as well as 3D painting. The easiest task is translation and rotation of the camera; this allows the user to examine a 3D scene. Once the user can navigate through a scene, they may want to be able to edit it. Thus the second task is object manipulation. This involves object selection, and then translation and rotation of the selected object, thus allowing a user to modify a 3D scene. The third task is 3D painting, allowing users to add colour to objects. In this task, the user enters into a ‘paint mode’ in which they can paint faces various colours using their fingers as a virtual brush.

From user testing of our low-fi prototype, we found that our tasks were natural and understandable for the goal of 3D modelling with a 3D gestural interface. 3D modelling requires being able to navigate through the 3D space, which is our first task of camera (or view) manipulation. Object selection and manipulation (the second task) are natural functions in editing a 3D scene. Our third task of 3D painting allows an artist to add vibrancy and style to their models. Our tasks have remained the same from P5.


We started by emailing several suitable candidates for participating in our experiment, acquaintances (but not classmates or close friends) who were technically-savvy but not extremely experienced with our type of interface. Our email contained basic information about our system, but did not describe any of the specific capabilities in detail. For each of the three participants we obtained, we first gave them our consent form and pre-experiment survey (see below for original versions). The pre-experiment survey asked about demographic information as well as experience with stereoscopic displays, 3d modelling, and gestural interfaces. We then gave them a brief explanation and demo of how our system worked, in which we demonstrated the workflows and fundamental gestures that they had at their disposal. After making sure that they were able to perceive the objects floating in front of them, they then performed the calibration workflow and began the three tasks. Throughout the tasks, we had one person “coaching” them through any difficulties, giving the suggestions anytime they seemed to get stuck for too long. This was necessary since it was sometimes difficult to understand the gestures only from our demo (this is discussed in more detail below). After finishing the tasks, we then performed a brief interview, asking specific questions in order to stimulate conversation and feedback about the system (questions included below).

Subject 1 preparing to paint the object magenta

Subject 1 preparing to paint the object magenta.

Subject 2 preparing to manipulate a vertex.

Subject 2 preparing to manipulate a vertex.

Subject 3 translating the camera view.

Subject 3 translating the camera view.

Test Measures

We measured mostly qualitative variables, because at this stage, a lot of quantitative analysis would not be particularly helpful–we are not yet fine-tuning, but rather still gathering information as to what a good interface would be.

  • Critical Incidents: We recorded several incidents that indicated both positive and negative characteristics of our system. This is the most important qualitative data we can collect because it shows us exactly how users interact with our system, and thus illustrates the benefits and drawbacks to our system.
  • Timing: The amount of time it takes for the user to complete the task. This variable works as a preliminary measure as to how intuitive/difficult each task is for the users.

The following measures were obtained through post-experiment interviews. We asked participants to rate them on a scale from 1 to 5, where 1 was the worst and 5 was the best.

  • Ease of Use User Rating: This measure was meant to evaluate how easy the users subjectively found the interface to use–what good is an interface if it’s very hard to use?
  • Difficulty with Stereoscopy User Rating: We are using a 3D screen in our interface. One problem that tends to crop up with 3D screens is that they sometimes hurt the eyes and/or are hard to use. For this reason, we had the users rate how difficult it was to perceive the 3D objects in their locations in front of the scene.
  • Intuitiveness User Rating: A main aspect and important measure of how good a gestural user interfaces is derives from the intuitiveness of the gestures use. This class of interface is called a Natural User Interface (NUI) for a reason, it should simply make sense to the user. For this reason, we included this measure in our assessment of quality.
  • Preference of interface User Rating: In order to truly succeed, the interface we created has to be better than existing user interfaces–if no user would want to actually use the interface we created, then there are clearly problems with the interface. For this reason, we wanted to know if the users thought the interface was useful compared to existing mouse and 2D monitor 3D interfaces.


First of all, from the preliminary survey, it is apparent that aside from Subject 2, the group in general had very little experience with gestural interfaces, which made the group a good set of people to test the intuitiveness of our gestures on.

For the first task of view manipulation (translation and rotation of the camera view), all the users found translation to be easy and intuitive. Subject 1 found it confusing to distinguish between the effects of gestures using fists (view manipulation) and gestures using a pointed finger (object manipulation), and attempted to use object manipulation to complete the task. However, when reminded of the difference, she completed the task using the appropriate view manipulation gestures. She did attempt to continually rotate to achieve a large degree of rotation, which is an awkward gestures for one’s arms, and after a hint realized that she could stop and rotate again. Subject 2 picked up on the gestures more quickly and easily for both translation and rotation, though she also attempted to continually rotate for large rotations. Subject 3 also found it a little confusing to distinguish between the fist and finger gestures at the start, and found the ability to rotate objects out of the field of view to be confusing (with regard to getting them back in view). All of the subjects reported that the interface was easy to use and intuitive; Subject 2 (who used the interface with the most ease) found the gestures to be very intuitive.

For the second task of object manipulation (rotation of the object and object deformation), all the subjects found rotation easier than for the first task, after having gotten more used to the gestures. Vertex manipulation to deform the object was also grasped quickly and easily by Subjects 1 and 2; however Subject 3 did not realize that he needed to point very close to a vertex to select it, but after selecting the vertex, manipulation was easy. Subjects 2 and 3 forgot some of the gestures and needed reminding of which gestures corresponded to which functionality. With regard to remembering gestures, Subject 1 pointed out that having one fist to translate and two fists to rotate was confusing.

For the third task of object painting (the user is required to color in the sides of the object, and rotate it as well to paint the faces hidden from view), which was supposed to be the hardest task, the users surprisingly found it very intuitive and easy! Perhaps this was because the task corresponding most directly to a task you’d actually perform in the real world, like painting a model–changing the scene angle of the camera is not so much a real-world application, and could be more confusing. Subjects 1 and 2 did not realize that they could not paint on faces that weren’t visible, and needed to rotate the view to see the faces and paint them.

All the users found it easy to see stereoscopically, which was a pleasant surprise, since in the past there has been some time required before a user could see the stereoscopic 3D objects properly. They also all noted that the instability in detecting fists and fingers — the leap would often detect fists where there was a pointed finger — made the interface a bit more difficult to use. This significantly affected the difficulty of rotation, which Subject 3 found difficult enough to suggest that the learning curve might be steep enough that he would likely prefer using a traditional mouse and keyboard interface for 3D modelling.

Overall, rotation seemed to be the task that was hardest to learn, suggesting that we need to improve our rotation gestures. However, rotation is also the gesture most affected by the instability in leap gesture detection, which exacerbated the difficulty in rotation. Based on our experimentation with the Leap sensor, we have considered replacing our rotation gesture with a palm-orientation-based scheme. Another important issue to fix is that users commonly forget core gestures, especially forgetting the distinction between fist and finger gestures. We also commented on this issue in P4, but it was revealed to be a very important problem in P6; a reminder system (perhaps a sign floating in the background, or a training course) could be very helpful in mitigating this issue.


Consent Form
Pre-Experiment Survey
Post-Experiment Interview
Raw Data

P3 – Runway

Team CAKE (#13) – Connie (demos and writing), Angela (filming and writing), Kiran (demos and writing), Edward (writing and editing)

Mission Statement

People who deal with 3D data have always had the fundamental problem that they are not able to interact with the object of their work/study in its natural environment: 3D. It is always viewed and manipulated on a 2D screen with a 2 degree-of-freedom mouse, which forces the user to do things in very unintuitive ways. We hope to change this by integrating a 3D display space with a colocated gestural space in which a user can edit 3D data as if it is situated in the real world.

With our prototype, we hope to solidify the gesture set to be used in our product by examining the intuitiveness and convenience of the gestures we have selected. We also want to see how efficient our interface and its fundamental operations are for performing the tasks that we selected, especially relative to how well current modelling software works.

We aim to make 3D modelling more intuitive by bringing virtual objects into the real world, allowing natural 3D interaction with models using gestures.


Our prototype consists of a ball of homemade play dough to represent our 3D model, and a cardboard 3D coordinate-axis indicator to designate the origin of the scene’s coordinate system. We use a wizard of oz approach to the interface, where an assistant performs gesture recognition and modifies the positions, orientations, and shapes of the “displayed” object. Most of the work in this prototype is the design and analysis of the gesture choices.



Because of the nature of our interface, our prototype is very unconventional. It requires two major parts: a comprehensive gesture set, and a way to illustrate the effects of each gesture on a 3d object or model (neither of which is covered by the standard paper prototyping methods). For the former, we considered intuitive two-handed and one-handed gestures, and open-hand, fist, and pointing gestures. For the latter, we made homemade play dough. We spent a significant time discussing and designing gestures, less time mixing ingredients for play dough, and a lot of time playing with it (including coloring it with Sriracha and soy sauce for the 3D painting task). In general, the planning was the most difficult and nuanced, but the rest of building the system was easy and fun.

Gesture Set

We spent a considerable amount of time designing well-defined (for implementation) and intuitive (for use) gestures. In general, perspective manipulation gestures are done with fists, object manipulation gestures are done with a pointing finger, and 3D painting is done in a separate mode, also with a pointing finger. The gestures are the following (with videos below):

  1. Neutral – 2 open hands: object/model is not affected by user motions
  2. Camera Rotation – 2 fists: tracks angle of the axis between the hands, rotates about the center of the object
  3. Camera Translation – 1 fist: tracks position of hand and moves camera accordingly (Zoom = translate toward user)
  4. Object Primitives Creation – press key for object (e.g. “C” = cube): creates the associated mesh in the center of the view
  5. Object Rotation – 2 pointing + change of angle: analogous to camera rotation
  6. Object Translation – 2 pointing + change of location: analogous to camera translation when fingers stay the same distance apart
  7. Object Scaling – 2 pointing + change of distance: track the distance between fingers and scale accordingly
  8. Object Vertex Translation – 1 pointing: tracks location of tip of finger and moves closest vertex accordingly
  9. Mesh Subdivision – “S” key: uses a standard subdivision method on the mesh
  10. 3D Painting – “P” key (mode change) + 1 pointing hand: color a face whenever fingertip intersects (change color by pressing keys)


Our play dough recipe is simply salt, flour, and water in about a 1:4:2 ratio (it eventually hardens, but is sufficient for our purposes). We use a cardboard cutout to represent the x, y, and z axes of the scene (to make camera movements distinct from object manipulation). Lastly, for the sake of 3D painting, we added Sriracha and soy sauce for color. We did not include a keyboard model for selecting modes, to avoid a mess – in general, a tap on the table with a spoken intent is sufficient to model this.

To represent our system, we have an operator manually moving the object/axes and adding to/removing from/stretching/etc. the play dough as the user makes gestures.

Neutral gesture:
[kaltura-widget uiconfid=”1727958″ entryid=”0_5cywalwv” width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /]

Perspective Manipulation (gestures 2 and 3):
[kaltura-widget uiconfid=”1727958″ entryid=”0_plcklhja” width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /] [kaltura-widget uiconfid=”1727958″ entryid=”0_zkjrl2oe” width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /]

Object Manipulation (gestures 4-8):
[kaltura-widget uiconfid=”1727958″ entryid=”0_ectbev0x” width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /] [kaltura-widget uiconfid=”1727958″ entryid=”0_3c4tjcus” width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /] [kaltura-widget uiconfid=”1727958″ entryid=”0_pas35w52″ width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /] [kaltura-widget uiconfid=”1727958″ entryid=”0_kq3doena” width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /] [kaltura-widget uiconfid=”1727958″ entryid=”0_uqb8zyft” width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /]

3D Painting (gesture 10):
[kaltura-widget uiconfid=”1727958″ entryid=”0_mzod6bp3″ width=”260″ height=”206″ addpermission=”-1″ editpermission=”-1″ /]

Not shown: gesture 9.

Task Descriptions

Perspective Manipulation

The fundamental operation people perform when interacting with 3D data is viewing it. To be able to understand a 3D scene, they have to be able to see all sides and parts of it. In our prototype, users can manipulate the camera location using a set of gestures that will always be available regardless of the editing mode. We allow the user to rotate and translate the camera around the scene using gestures 2 and 3, which use a closed fist; we also allow for smaller natural viewpoint adjustments by moving their head (to a limited degree).

Object Manipulation

For 3D modelling, artists and animators often want to create a model and define its precise shape. One simple way of object creation is starting with geometric primitives such as cubes, spheres, and cylinders (created using gesture 4) and reshaping them. The user can position the object by translating and rotating (gestures 5 and 6), or alter the mesh by scaling, translating vertices, or subdividing faces (gestures 7-9). These manipulations are a combination of single finger pointing gestures and keyboard button presses. Note that these gestures are only available in object manipulation mode.

3D Painting

When rendering 3D models, we need to define a color for every point on the displayed surface. 3D artists can accomplish this by setting the colors of vertices or faces, or by defining a texture mapping from an image to the surface. In our application, we have a 3D painting mode that allows users to define the appearance of surfaces. Users select a color or a texture using the keyboard or a tablet, and then “paint” the selected color/texture directly onto the model by using a single finger as a brush.

P2 – Runway

Team 13 – CAKE

Connie (task description, interface design, blog post), Angela (task analysis, interface design), Kiran (contextual inquiries, interface design), Edward (contextual inquiries, task description)
(in general, we collaborated on everything, with one person in charge of directing each major component)


3d modeling is often a highly unintuitive task because of the disparity between the 3d work space and the 2d interface used to access and manipulate it. Currently, 3d modeling is typically done on a 2d display, with a mouse that works on a 2d plane. This combination of two 2d interfaces to interact with a 3d one (with the help of keyboard shortcuts and application tools) is hard to learn and unintuitive to work with. A 3d, gesture-based interface for 3d modeling combines virtual space and physical space in a way that is much more natural and intuitive to view and interact with. Such a system would have less of a learning curve and facilitate more efficient interactions in 3d modeling.

Contextual Inquiry


Our target user group is limited to individuals who regularly interact with 3d data and representations, and who seek to manipulate it in some way that depends on visual feedback. These are the people who would find the most practical use out of our system, and who would benefit most in their workflow.

Our first user was a professor in Computer Graphics. He has been doing 3D graphics for many years and has some experience with many different programs and interfaces for manipulating 3D data. However, he stated that he didn’t personally do heavy 3D tasks anymore (at least with the interface he showed us); however, he often shows new students how to use these interfaces. He likes exploring applications of 3D graphics to the humanities, such as developing systems for archaeological reconstruction. He does not focus much on interfaces, as he believes that a good spatial sense is more useful than an accessible interface (though for the applications that he develops, he is happy to alter the interface based on user feedback).

Our second user was a graduate student in Computer Graphics who has been at Princeton for several years. He has experience in 3D modelling using Maya, and his primary research project related to with designing interfaces for viewing and labelling 3D data. Because of his research and undergraduate background, he had a lot of background in basic 3D viewing manipulations and was very adept at getting the views he wanted. His priority for his interfaces was making them accessible to other, inexperienced users; this was because his research is supposed to be used for labelling objects in point clouds (which could be outsourced). In terms of likes and dislikes, he stated that he did not really get into modelling because it required too much refinement time and effort.

Our third interviewee was an undergraduate computer science student from another university. He has experience in modelling and animation in Maya, and did so as a hobby several years ago. He has taken courses in graphics and game design, and has a lot of gaming experience. However, his experience with Maya was entirely self-taught, and he acknowledged that his workflows were probably very unlike those of expert users. Because of this, his priorities were not speed or accuracy of tasks, but being able to create interesting effects and results. He was happiest when experimenting with the sophisticated, “expert” features.


We observed our first two CI subjects in their offices, with their own computers, monitors, mice, and keyboards. They were usually alone and focused on their task while using their systems. The last user did his modelling in his home or dorm room, generally alone or working with friends. We asked our subjects to show us the programs that they used to do 3D manipulation/modelling, and perform some of the tasks that they normally would. Following the master-apprentice model, we then asked them to go over the tasks again, and explain what they were doing as if they were training us to perform the same things. After this, we tried to delve a bit deeper into some of the manipulations that they seemed to take for granted (such as using the mouse to rotate and zoom) and have them try to break down how they did it, and why they might have glossed over it. Finally, we had a general discussion about their impressions about 3D manipulation in general, especially focusing on other interfaces they had used in the past and the benefits/downsides compared to their current programs.

The most common universal task was viewing the 3D models from different perspectives – each user was constantly rotating the view of the data (much less frequently panning or dollying). Another fairly common task was selecting something in the 3D space (whether a surface, a point, or a model). After selection, the two common themes were some sort of spatial manipulation of the selected object (like moving or rotating it), or non-spatial property editing of the selected object (like setting a label or changing a color). It seems that the constant rotation was helpful for maintaining a spatial sense of the object as truly 3D data, since without the movement it could simply look like a 2D picture of it (this is especially true because you can translate or zoom a single 2D image without gaining any more 3D information, whereas rotation is a uniquely 3D operation).

Essentially all of the operations we observed being performed fell into the categories mentioned above.

Our first user’s program was used for aligning several 3D scans to form a single 3D model; his operations involved viewing several unaligned scans, selecting a single one, and moving/rotating it until it was approximately aligned with another model. We noted that he could not simultaneously change views and move the scans, so he often had to switch back and forth between changing views and manipulating the object.

Our second interviewee showed us his research interface for labeling point cloud data. The point cloud data labeling application was really primarily about viewing manipulations: the program showed the user a region of a point cloud, and the user would indicate whether it was a car, streetlight, tree, etc. The user only really had to rotate/pan/zoom a little bit if necessary to see different views of the object under consideration, and sometimes select pre-segmented regions of the point cloud.

The Maya workflow was far more complex. For example, the skinning process consisted of creating a skeleton (with joints and bones), associating points on the model surface with sections of the skeleton, and then moving the joints. The first part was very imprecise, since it involved repeatedly creating joints at specific locations in space. After creating a joint, the user had to carefully drag it into the desired position (using the top, side, and front views). The user thought this was rather tedious, although the multiple views made it easy to be very precise. He did not go into much detail about the skinning process, using a smart skinning tool instead of “painting skinning weights” on the surface (which basically looked like spray-painting the surface with the mouse). Finally, joint manipulation just involved a small sphere around the joint that had to be rotated using the mouse. He also described how the advanced features generally worked (but didn’t actually show them) and described them as generally involving selecting a point and dragging it elsewhere, or setting properties of the selected object.

Task Analysis

  1. Who is going to use the system?
    Our system can be used in 3D Modelling: Game asset designers, film animators/special effects people, architectural designers, manufacturing, etc. More generally, we can use our system for basic 3D Manipulation and viewing – Graphics research, medical data examination (e.g. MRI scans), biology (examining protein/molecular models), etc. People in these fields generally are college educated (since the application fields are in research, industrial design or media), and because of the nature of the jobs they will probably have experience with software involving 3D manipulation already. They certainly must have the spatial reasoning abilities to even conceptualize the tasks, let alone perform them. It may be common for users to have more artistic/design background in non-research application contexts. We imagine that there is no ideal age for our system; because of the scope of the tasks we expect that the typical user will be teenaged or older. Additionally, our system requires two functioning eyes for the coupling of input-output spaces to be meaningful.
  2. What tasks do they now perform?
    In 3D modeling, they take 3D scans of models and put them together to form coherent 3D models of the scanned object. They also interact with 3D data in other ways (maps, point cloud data representations of real life settings), create and edit 3D objects (models for feature films and games: characters, playing fields), define interactions between 3D objects (through physics engines, for example, define actions that happen when collisions occur), and navigate through 3D point-clouds (that act as maps, of, say a living room: see floored.com).
  3. What tasks are desired?
    On a high level, the desired tasks are the same as the tasks they now perform; however, performing some of these tasks can be slow or unintuitive, especially for those who are not heavy users. In particular, manipulation of perspective is integral to all of the tasks mentioned above, and manipulating 3D space with 2D controls is not intuitive and often difficult to learn so that operations run smoothly.
  4. How are the tasks learned?
    Our first interviewee highlighted the difference between understanding an interface (getting it) and being good at using it intuitively (grokking it). The first of these involves learning how things work: right click and drag to translate, click and drag to rotate, t for translation tool, etc. These facts are learned through courses, online training and tutorials, and mentorship. The second part can only be done through lots of practice and actual usage to achieve familiarity with the 3D manipulations. Our first interviewee noted that, after learning gaining the spatial intuition for one type of interface, other such tasks and interfaces often become much easier to grok.
  5. Where are the tasks performed?
    Tasks are performed on a computer, typically in offices, as they are often a part of the user’s job. Animators generally work in very low lighting, to simulate a cinematic environment. People are free to stop by to ask questions, converse, etc. — which interrupts the task at hand, but does not actively harm it, just as with other computer-based tasks. Also some people might design at home for learning, or personal app or game development.
  6. What’s the relationship between user & data?
    The user wants to manipulate the data (view, edit, and create it). Currently, most 3D data is commercial, and the user is handling the data in the interests of a company (e.g. animated models for films), or for research purposes with permission given by a company (e.g. Google Streetview data). Some of it can be personally acquired, e.g. with kinect. Some users create the data: for example, designing game objects in video games, or 3D artwork for films.
  7. What other tools does the user have?
    Manipulation of 3D data requires tools with computing ability — with a computer, there are other interfaces, such as through the command line, to select and move objects (specifying exact locations). With animation, animators can use stop-motion animation, moving actual objects incrementally between individually photographed frames. In general, a good, high-resolution display is a must. Most users do their input via mouse and keyboard. There are some experimental interfaces that use specialized devices such as trackballs, 3D joysticks, and in-air pens. Many modellers also use physical models or proxies to approximate their design, e.g. animators might have clay models and architects might have small-scale paper designs.
  8. How do users communicate with each other?
    Users communicate with each other both online and in person (depending on how busy they are, how close they are located) for help and advice; they also report to managers with progress on their tasks. (for instance there are developer communities for specific brands of technology http://forum.unity3d.com/forum.php?s=23162f8f60e0b03682623bf37fd27a46 for example ).
    In general, modelling has not been a heavily collaborative task. 3D modellers might have to discuss with concept artists on how to bring their ideas into the computer. Different animators might be working on different effects on the same scene in parallel, such as texturing and animating, or postprocessing effects.
  9. How often are the tasks performed?
    Animators undertake 3D manipulation jobs daily — for almost the entire day (and during this time, continually manipulating view and selecting objects to create and edit the 3D data). Researchers, on average, tend to perform the tasks less frequently. The professor we interviewed rarely manipulates the 3D data himself (except for demos); the graduate student still interacts with the data daily, as his research project is to design an interface that involves 3D interaction. In terms of individual tasks, the functions that are performed most frequently are by far changing the perspective. This happens essentially continuously during the course of the 3D manipulation session, almost such that it doesn’t feel like a separate task. Next most common is selecting points or regions, as these are necessary for most actual manipulation operations.
  10. What are the time constraints on the tasks?
    As changing perspective is such a common task, required for most operations, it is a task which users will not want to spend much time on (given that their goals are larger-scale operations). For operations which they are aware require a significant amount of time (e.g. physical simulations, rendering), they are willing to wait, but would certainly prefer them to be faster — they are also more willing to wait for something like a fluid simulation if they are aware of what the end result will generally be like (which is another problem).
  11. What happens when things go wrong?
    Errors in modelling can generally be undone, as the major software keeps track of change history for each individual operation. Practical difficulties may arise in amount of computer resources required to create a model of high resolution.

Task Descriptions

  1. (Easy) Perspective Manipulation
    This task classically involves rotating, panning, and zooming the user’s view of the scene to achieve a desired perspective or to view a specific part of the scene/model. Perspective manipulation is a fundamental, frequent task for every user we observed. It seems to serve the dual purpose of preserving the user’s mental model of the 3D object, as well as presenting a view of the scene where they can see/manipulate all the points necessary to complete their more complex tasks.
    Currently, this task has a large learning curve, but once people are used to it, it becomes easy and natural; the only hurdle is that with current interfaces, it is not possible to change this while performing another mouse-based task. With our proposed interface, we first propose to separate the two purposes of preserving spatial sense and presenting manipulable views. With a stereoscopic, co-located display, we make preserving spatial sense almost a non-issue with no learning curve, especially with head-tracking. We also believe that using a gestural, high degree-of-freedom input method can allow for more intuitive camera control, equating to an easier learning curve. We also note that using gestural inputs will allow users to perform perspective manipulation simultaneously with other operations, which to us is more in line with how the users conceive of this (as a non-task, rather than a separate task).
  2. (Medium) Model Creation
    This task is to create a 3D mesh of some desired appearance. Depending on the geometry of the desired model, it can be created by starting off by combining and editing some simple geometry (spheres, ellipsoids, rectangular prisms), or modelled off of a reference image, from which an outline of the model can be drawn, extruded, and refined to create a model. The task involves object (vertex, edge, face) selection, creation, and movement (using perspective manipulation to navigate to the location of the point of interest), and typically involves many iterations to achieve the desired look or structure.
    Game designers and movie animators perform this task very often, and a current flaw of the system is that the creation of a 3D shape happens in a 2D environment. We anticipate that creating a 3D model in 3D space will be much more intuitive.
  3. (Hard) 3D Painting
    Color and texture on 3D models gives a sense of added style and presence. Many artists use existing images to texture a model, e.g. an image of metal for the armor of a model of a medieval knight, as it is relatively easy and efficient. However, when existing images do not suffice for texturing an object, an artist can paint the desired texture onto the object. Such 3D painting is a very specialized skill, as painting onto a 3D object from a 2D plane is very different from traditional 2D painting (many artists who are skilled in 2D painting are not in 3D painting, and vice versa) and can be unintuitive. Major platforms for 3D painting project a 2D image onto the object, which can cause unexpected distortions for those unfamiliar with the sense of operation. In our interface, we intend for users to be able to paint onto an object in 3D space with their fingers.

Interface Design


We intend to include functionality for most basic types of object manipulation (rotation, translation, scale), which will be mapped to specific gestures, typically with both hands and arms. We also want to allow for more precise manipulation such as selection/distortion and 3d painting, which involve gestures using specific fingers in more precise movements. To add control and selection capabilities, we hope to incorporate a tablet or keyboard, perhaps to change interaction modes or select properties such as colors and objects. Together, these will encompass a considerable portion of the functionality that basic 2d interfaces provide currently, while making the interaction more intuitive because it is happening in 3d space.


Easy task: Perspective Manipulation

Easy task: Perspective Manipulation

Medium task: Model Creation

Medium task: Model Creation

Hard task: 3D Painting

Hard Task: 3D Painting

System Sketches

Top: Front view of the system, including all component devices. Middle: Tablet (or keyboard) for additional selection and control, like mode or color change. Bottom: Side view of system, showing the devices and basic form of interaction.

Top: Front view of the system, including all component devices. Middle: Tablet (or keyboard) for additional selection and control, like mode or color change. Bottom: Side view of system, showing the devices and basic form of interaction.

Top: basic ways for the user to interact with the system using hand gestures. Bottom: More precise gestures and interactions that require greater accuracy. (Not shown: control mechanisms such as a tablet or keyboard to switch between them, or provide extra choices for functionality)

Top: basic ways for the user to interact with the system using hand gestures. Bottom: More precise gestures and interactions that require greater accuracy. (Not shown: control mechanisms such as a tablet or keyboard to switch between them, or provide extra choices for functionality)

A2 – Connie Wan (cwan)


I focused my observation on two major settings, each of which involved more than 3 individuals interacting in some way.

At first, I spent some time observing the handful of students who are in both COS426 and COS436, which happen back-to-back on Tuesdays and Thursdays in the Small and Large Auditoriums, respectively. There is no class in the Small Auditorium after COS426, but there is class before COS436 in the Large Auditorium on Thursdays only. As a result, student behaviors differed between Tuesdays and Thursdays. On Thursdays, students generally gathered in the banana gallery outside the auditoriums and either worked individually on laptops, or chatted in groups at a normal volume. Most students checked their phones more than once, and occasionally a few would wander to the bathroom or water fountain. On Tuesdays, however, students generally entered the Large Auditorium early and sat either individually or in pairs, usually with laptops out. Students spoke in quieter tones, and there was less movement: students tended to stay put once seated. Two students in particular exemplified this behavior in that they consistently talked with each other between these two classes, but in different ways depending on their location and the day of the week: they spoke more loudly and with other students in the banana gallery on a Thursday, and they sat together and chatted quietly in the Large Auditorium on a Tuesday. When sitting down, the two students chatted over their laptops and some work they both had at the time, one for graphics and one for another unknown course.

I also spent some time observing masses of students attempting to cross the northern portion of Washington Street to get to/from class. Pedestrian traffic here is pretty predictable in its patterns, peaking a couple minutes after class ends and before the next class starts. At these times, there are typically dozens of people waiting at the light at a time, on both sides. When pedestrian traffic is heaviest, students tend to stalk holes in vehicle traffic and cross during, just before, or just after red crossing lights whenever possible. At times, this causes floods of students to cross illegally, holding up vehicle traffic. Also during these times, bikers tend to have trouble avoiding pedestrians, adding to the mess (especially since bikers must take more roundabout routes to avoid stairs at the northernmost pedestrian crossing). Of particular interest were the times just after class began, when students were clearly running late. I observed a particular student run to catch a crossing light, dash across after the crossing light was already red, then slow to a brisk walk once across, continuing on at a more leisurely pace. This seemed to be a common pattern for students running late.


(Completed with Edward Zhang (edwardz))

  1. Somehow sync a mobile app with stoplights on Washington and Alexander, allowing people to check the light status at any time, and possibly to click the “wait” button from a certain distance away.
  2. Place simple security cameras with monitors (or just mirrors) outside classroom doors or in hallways. People love looking at themselves.
  3. Provide space and tables to allow for administrative activities (i.e. signing in, distributing handouts) to start outside classroom.
  4. Install ipod docks/speakers in waiting areas near classrooms that create “bubbles” of soft music that can’t be heard a few meters away.
  5. Strategically place tablets on stands/walls that have short daily math/trivia/etc. puzzles on them
  6. Program a timed phone silencer that turns the ringer on during waiting periods and off again during class based on calendar information.
  7. Rent scooters/bikes/etc. at unmanned stations around campus. Students can get and return vehicles at any station for small fee by swiping their PUID to open the vehicle locks at the stations.
  8. Create an app with an “I’m bored” button that starts a short game/convo with someone very nearby who is also bored. Should expire 1min before class.
  9. Install a terminal in each classroom that starts a short group game between classes, which any mobile device can join within that time window (imagine in-flight entertainment).
  10. Use sensors to track the number of students that walk into buildings and classrooms, for an idea of how traffic flows and what may be improved.
  11. Create a small portable fan/vent that diffuses/absorbs/counteracts the smell of any food you are carrying.
  12. Create a lightweight app that allows people to post a relevant status (e.g. “We’re [talking about final projects]/[complaining about workload] [at the front of the classroom]/[outside the doors]”).
  13. Display “Student(s) of the Day” profiles in classrooms,letting everyone associate a name with a personality and a face.
  14. Install a large floor display with interesting effects (e.g. lights up or makes sounds where stepped on) at a popular crossroads.
  15. Install large electronic white boards on walls that students can use for notes/games/graffiti/etc.
  16. Create a mobile app that shows a map of all available outlets on campus.


I chose to prototype ideas 1 (stoplight tracking) and 7 (bike/scooter rental). I chose the first because of the problem I saw with street crossing on campus, and because of personal experience with getting caught at a red light and being late for class as a result. The latter I chose because it presents some interesting security and economics questions to explore, and because I often find myself wanting a bike without the commitment of purchase or long-term rental.

Stoplight Tracking

Below is the concept sketch of a simple app that would provide students with the status and information about the two prominent stoplights on campus.


From this, I created a paper prototype (shown below). It includes a full frame for each major screen in the sketch, along with cut-outs of the buttons and numbers, which are secured with scotch tape. This was mainly to facilitate quick changing of the screen’s state for user testing, without the need to create a new panel for each possible combination of the colors and numbers. It was made entirely using printer paper, washable marker, and ink pen, over the course of about an hour.


Bike Rental Stations

I started with a few sketches on what a single gate at a single station would look like, including both the gate itself and the interface used to rent a bike from it.

From this, I created a simple paper prototype of the interface only, which is designed to be informative and easy to understand. I used a larger sheet to represent the device itself, which would presumably be simple plastic with a place to swipe a card. The various screens themselves are separate, interchangeable sheets. The three major screens are shown below. Two screens are missing, primarily because the pricing and status format requires thought beyond the prototyping stage, as described below. Again, this prototype is made of printer paper, washable marker, and ink pen.


Aside from the basic mechanism and interface for renting and returning bikes, there are three other major components of this system that I considered while designing it: convenience, security, and pricing. Addressing convenience is mostly a matter of determining where to place rental stations. I suggest a possible configuration below. Security is mostly concerned with ensuring that bikes are not stolen or destroyed, or if they are, it is possible to track down the perpetrator. With some careful planning, this should be possible using the information from a swiped PUID, and damages (and rental fees) can be charged to a student’s account. The actual prices require more research, but I feel a system with a small upfront fee and additional hourly rate would be appropriate. I initially considered a 10-minute grade period of zero cost, but this may encourage too much traffic at busy periods and prevent students who actually need the bikes from finding one.

Possible set of locations:

  • Dinky Station
  • Frist Campus Center
  • Friend Center
  • Dillon Gym
  • Rocky/Mathey
  • Firestone

User Testing

For the sake of user testing, I staked out a spot outside Forbes, where the stoplight on Alexander is not yet visible (about a 10 second walk away). Unfortunately, I could not do this at a time when students were actually rushing to class, otherwise I could never get anyone’s attention for long enough to complete the test. So instead, I loitered around the parking lot during my free time and accosted students heading toward Alexander, following this general procedure:

  • (beforehand) Choose a setting for Alexander: green-high, green-low, red-high, or red-low (where high and low indicate big and small numbers, respectively).
  • Introduce the user to their shiny new iPhone 5. Establish that they are late for class.
  • Give user about 10 seconds to take in the app.
  • Start counting down on the Alexander light and ask how they would react.
  • Change the color and/or number of Alexander and Washington and ask again how they would react.
  • Repeat for the other possible settings of color and number.
  • Collect any additional feedback.

I tested this with four individuals, choosing a different starting setting for each and varying them somewhat randomly throughout the test. I also, on a whim, flagged down a pair of friends to look at it together, to see how their interactions might differ.

The first key observation is that across the board, students understood the purpose of the app after looking at it for less than 10 seconds. This is actually in contrast to the short inquiries I made of my roommates based on the original sketch above — they took a long time to figure it out, even with the notes in blue. The addition of a title and the change in the button text are probably responsible for the difference, since they are more explicitly descriptive of their actual function.

The meat of the experiment revealed a key part of the users’ nature: telescopic vision. The individual users I tested fell neatly into exactly two categories: those who used the app to decide how fast the approach the intersection, and those who used the app to repeatedly push the “wait” button. These categories were robust for each user regardless of the current status of the light — button pushers would continue to push the button even if the light was green, and the others never considered pushing the button, regardless of whether they decided they would slow and wait for the next light. Users did realize they were expected to change their responses somehow as the settings changed; however, instead of changing what types of activities they would do, they focused on a change in the same type of action (e.g. walk slower or faster, or press the button more or less). One button presser was confused about the excessive number of questions, since he wouldn’t actually change anything and would have just continued pressing the button all the way across the intersection.

Lastly, there were a few significant holes in the way users interacted with the app. Almost all users focused explicitly on Alexander street, ignoring Washington entirely (although one button presser went ahead and pressed buttons for both each time). This clearly came from the bias of being 30 meters away from Alexander Street, so it was not surprising. Also somewhat unsurprising is that users completely ignored the “view schedule” option. This could be due to many things: having something counting down draws immediate attention, the schedules are not really relevant to someone heading toward the light (though perhaps it would be for someone waiting at it), and the “view schedule” button itself does not catch as much attention as the other pop-out buttons in the paper prototype.

In general, it seemed that users would be able to make fast use of this app, as it is clearly understandable and gets the point across. However, each user would tend to use it in a way that perhaps satisfies them the most, but may not be the most useful or effective way to use the app. Even so, there may not be much merit in catering the app to specialized needs or structuring it to encourage intelligent usage — in the end, it is meant to display a status (plus some helpful functionality) that users can use as they wish.