P5 – Group 16 – Epple

Group 16 – Epple

Andrew, Brian, Kevin, Saswathi

Project Summary:

Our project is to use Kinect to make an intuitive interface for controlling web cameras through using body orientation.


The first task we have chosen to support with our working prototype is the easy-difficulty task of allowing for web chat while breaking the restriction of having the chat partner sit in front of the computer.  A constant problem with web chats is the restriction that users must sit in front of the web camera to carry on the conversation; otherwise, the problem of off-screen speakers arises.  With our prototype, If a chat partner moves out of the screen, the user can eliminate the problem of off-screen speakers through intuitive head movements to change the camera view. Our second task is the medium-difficulty task of searching a distant location for a person with a web camera.  Our prototype allows users to seek out people in public spaces through using intuitive head motions to control the search of a space via web camera, just as they would in person.  Our third task is the hard-difficulty task of allowing web chat with more than one person on the other side of the web camera.  Our prototype allows users to web chat seamlessly with all the partners at once. Whenever the user wants to address a different web chat partner, he will intuitively change the camera view with his head to face the target partner.

Task changes:

Our set of tasks has not changed from P3 and P4.  Our rationale behind this is that each of our tasks was and is clearly defined and the users testing our low-fi prototype did not have complaints about the tasks themselves and saw the value behind them. The comments and complaints we received were more about the design of our prototype environment, such as placement of paper cutouts, and inaccuracies with our low-fi prototype, such as the allowance of peripheral vision and audio cues with it.

Design Changes:

The changes we decided to make to our design based on user feedback from P4 were minimal. The main types of feedback that we received, as can be seen on the P4 blog post, were issues with the design of the low-fidelity prototype that made the user experience not entirely accurate, suggestions on additional product functionality, and tips for making our product more intuitive. Some of the suggestions were not useful in terms of making changes to our design while other suggestions were very insightful but we decided that they were not essential for a prototype at this stage. For example, in the first task we asked users to keep the chat partner in view with the screen as he ran around the room. The user commented that this was a bit strange and tedious, and that it might be better to just have the camera track the moving person automatically. This might be a good change, but it changes that intended function of our system from being something that the user interacts with, as if peering into another room naturally, to more of a surveillance or tracking device. This kind of functionality change is something that we decided not to implement.

Users also commented that their usage of peripheral vision and audio cues made the low-fi prototype a bit less realistic, but that is an issue that arose due to inherent limits of a paper prototype rather than due to the design of our interface. Our new prototype will inherently overcome these difficulties and be much more realistic, as we will be using a real mobile display, and the user will only be able to see the web camera’s video feed.  The user can also actually use head motions to control the viewing angle of the camera. We did gain some particularly useful feedback, such as the suggestion that using something like an iPad would be useful for the mobile screen because it would allow users to rotate the screen to fit more horizontal or vertical space. This is something that we decided would be worthwhile if we chose to mass produce our product, but we decided not to implement this in our our prototype for this class as it is not essential to demonstrate the main goals of the project.  We also realized from our low fidelity prototype that the lack of directional sound cues from their speakers’ audio would make it hard to get a sense of which direction an off-screen speaker’s voice is coming from. We realized that implementing something like a 3D sound system or a system of providing suggestions on which way to turn the screen would be useful, but again, we decided that it was not necessary for our first prototype.

One particular thing that we have changed going from the low-fidelity prototype to this new prototype is the way that users would interact with the web camera. One of the comments we got from P4 was that users felt that they didn’t get the full experience of how they would react to a camera that independently rotated while they were video chatting. We felt that this was a valid point and something that we overlooked in our first prototype as it was low-fidelity. It is also something that we felt was essential to our proof of concept in the next prototype, so we have the web camera attached to a servo motor to rotate in front of the chat partner with our new prototype as show below.

-Web Camera on top of a servo motor:

Web Camera on Servo motor

Storyboard sketches for tasks:

Task 1- web chat while breaking the restriction of having the chat partner sit in front of the computer:

Task 1 – Chat without restriction on movement – Prototype all connected to one computer

Task 2 – searching a distant location for a person with a web camera:

Task 2 – Searching for a Friend in a public place – Prototype all connected to one computer

Task 3 – allowing web chat with more than one person on the other side of the web camera:

Task 3 – Multi-Person Webchat – Prototype all connected to one computer

Unimplemented functionality – camera rotates vertically up and down if user moves his head upwards or downwards:

Ability of camera to move up and down in addition to left and right.

Unimplemented functionality – Kinect face data is sent over a network to control the viewing angle of a web camera remotely:

Remote camera control over a network


We implemented functionality for the web camera rotation by attaching it to a servo motor that turns to a set angle given input from an Arduino.  We also implemented face tracking functionality with the Kinect to find the yaw of a user’s head and send this value as input to the Arduino through Processing using serial communication over a USB cable. The camera can turn 180 degrees due to the servo motor, and the Kinect can track the yaw of a single person’s face accurately up to 60 degrees in either direction while maintaining a lock on the person’s face. However, the yaw reading of the face is only guaranteed to be accurate within 30 degrees of rotation in either direction. Rotation of a face in excess of 60 degrees usually results in a loss of recognition of the face by the Kinect, and the user must directly face the Kinect before their face is recognized again. Therefore the camera also has a practical limitation of 120 degrees of rotation.  This is all shown in image and video form in the next section.

The parts of the system that we decided to leave unimplemented for this prototype are mainly parts that we felt were not essential to demonstrate the basic concept of our idea. For example, we have a servo motor that will rotate the webcam horizontally left and right, but we decided that it was not essential to, at this stage, have another servo motor rotating the camera vertically up and down, as it is a similar implementation of code and usage of input signals, only in a different direction. The usage cases for moving the camera up and down are also lacking as people usually do move vertically.  We also decided not to implement network functionality to transmit kinect signals to the arduino remotely at this stage. We intend to implement this functionality in a future prototype, but for the moment, we feel it is nonessential, and that it is sufficient to have everything controlled by one computer and simply divide the room using potentially a cardboard wall to keep the kinect side of the room and the web camera side of the room separated.  The one major Wizard-Of-Oz technique that we will use when testing this prototype is to thus pretend that the user is remotely far from the web chat partners, when in reality, they are in the same room, and we are using a simple screen to separate the two sides of interface.  This is because, again, the kinect and the arduino-controlled-webcam will be connected to the same computer to avoid having to send signals over a network, which we do not have the implementation for.  We will thus only pretend that the two sides of the video chat are far apart.for the purpose of testing the prototype.

We chose to implement the core functionality of our design for this prototype. It was essential that we implement face tracking with the Kinect, as this makes up half of our design. We also implemented control of the camera via serial communication with the Arduino. We decided to only implement yaw rotation and not pitch rotation because that would require two motors, and this prototype adequately demonstrates our proof-of-concept with only horizontal left-right rotation. We thus chose to implement for breadth rather than depth in terms of degrees of control over the web camera.  We also worked on remote communication between the Kinect and Arduino/camera setup, but have not finished this functionality yet, and it is not necessary to demonstrate our core functionality for this working prototype.  We thus, again chose to implement for breadth rather than depth at this stage in deciding serial communication with Arduino over a USB cable was enough.  By choosing breadth over depth, we have enough functionality with our prototype to test our three selected tasks, as all three essentially require face tracking control of the viewing angle of a web camera.

We used the FaceTrackingVisualization sample code included with the Kinect Development Toolkit as our starting point with the Kinect code.  We also looked at some tutorial code for having Processing and Arduino interact with each other at: http://arduinobasics.blogspot.com/2012/05/reading-from-text-file-and-sending-to.html


A video of our system.  We show Kinect recognizing the yaw of a person’s face and using this to control the viewing angle of a camera.  Note that we display on the laptop a visualizer of Kinect’s face tracking, not the web camera feed itself.  Accessing the web camera feed itself is trivial through simply installing drivers:

Video of working prototype

Prototype Images:

Kinect to detect head movement

Webcam and Arduino

Kinect recognizing a face and it’s orientation

Kinect detecting a face that is a bit farther away



P2: Group Epple (# 16)

Names & Contribution
Saswathi Natta — Organized Interview form & Questions. Did one interview
Brian Hwang – Writing
Andrew Boik – Did one Interview. Writing
Kevin Lee – Did one Interview, Finalized document [editing, combining interview descriptions, task analysis, user group narrowing]

Problem & Solution Overview
People cannot remotely explore a space in a natural way. When people watch feed from a web chat, they have no way to move the camera or change the angle, they only have the view that the “cameraman” decides to give them when recording.  They may send signals to the camera through keyboard controls or maybe have verbally command the cameraman to change the viewing angle; however, these are terrible interfaces for controlling the view from a remote camera. The goal of our project is thus to create an interface to make controlling remote viewing of an environment in the web chat setting more intuitive.  We aim to improve the situation by replacing mouse, keyboard, and awkward verbal commands to the cameraman with Kinect-based head tracking used to pan around the environment.  The image of the environment based on the change in head angle will then be displayed on a mobile display, which is always kept in front of the user.  This essentially gives a user control over the web camera’s viewing angle through simply moving his head.

Description of users you observed in the contextual inquiry.
Our target user group are students at Princeton that web chat with others on a routine basis.  We choose this target group as our project aims to make the web chat experience more interactive through providing intuitive controls for the web camera viewing angle; thus, students who routinely webchat are the ideal target users.

Person X

  • Education: Masters Candidate at Princeton University
  • Likes: Being able to web chat anywhere through mobile interface and easily movable camera.
  • Dislikes: Web chatting with people that are off-screen.
  • Priorities: No interruptions in the web chat, which might arise from poor connection quality or having to wait for a person to get in front of the camera.  Good video/audio quality.
  • Why he is a good candidate: X is a foreign student from Taiwan.  He routinely web chats with his family who live in Taiwan.

Person Y

  • Education: Undergraduate at Princeton University
  • Likes: multitasking while Skyping
  • Dislikes: bad connection quality
  • Priorities: keeping in touch with parents
  • Why he is a good candidate: Y is from California and communicates with her family regularly.

Person Z

  • Education: Undergraduate at Princeton University studying Economics
  • Likes: being able to see her dogs back in india.She wants to be able to talk to her whole family at once and watch her dogs run around.
  • Dislikes: the connection issues on skype. Does not want camera to be able to rotate all the time for privacy issues
  • Priorities: time management, keeping in touch with family.
  • Why she is a good candidate: Z is from India and communicates with her family every 2 weeks via skype.

CI interview descriptions

We arranged to visit X’s office where he was going to web chat with his family for us to see.  He shares his office with 5 other people in a large room.  The office consists of six desks with computers and books on each.  He occupies one of the desks.  We arranged to visit person Y in her dorm room when she was going to communicate with his parents.  The interview was conducted in a standard issue dorm room where Y lives alone.  We interviewed Z in the student center, a public place, to talk both about how she webchats with her family but also about searching for friends remotely. Before each web chat contextual inquiry interview, we asked the participants some questions to gain some background context.  We learned that person X web chats with his family in Taiwan once a week. The web chats are just routine check ups between the family members that can last from 15 minutes, if not much has happened in the week, to an hour, if there is an important issue.  These web chats are usually on Friday nights or the weekend when he is least busy.  He always initiates the web chats because his family does not want to interrupt him. Person Y usually calls her parents who live in California  multiple times per week, with each session lasting from 20 minutes to an hour. Person Z webchats with her family from her dorm room, sitting at a desk. She talks to both her parents and her grandparents who live in the same house in addition to her dogs via skype. She expressed interest in being able to to talk to her family all at once with a rotating camera as well as being able to see her dogs as they run around with a rotating camera. Person Z also experienced the need to find a friend in a public location such as Frist and found that a being able to check remotely would be useful, though she felt that the camera might be an invasion of privacy if users did not want to be seen both in a home or in a public place.

After gaining context through questions, we then proceeded with the actual web chats.  X used Facetime on his iPhone to web chat with his family who were also using an iPhone. Y, on the other hand, used Skype on their laptops to web chat with their parents.  At the beginning of each web chat, we briefly introduced ourselves to the web chat partners and then allowed the web chat to flow naturally while observing as per the Master-Apprenticeship partnership model.  We briefly interrupted with questions every so often to learn more about habits and approaches to tasks.  We sometimes asked also asked questions to understand their likes/dislikes/priorities regarding the current web chat interface, the results of which are listed with the descriptions of the users.  We found that the theme of each web chat was largely just discussion of what recently happened.  Each interview also shared an interesting common theme where the participant would most of the time engage in a one on one conversation with one family member at a time.  We reason that this theme exists due to the limitations of the web camera technology.  The camera provides a fixed scope that is usually only enough to view one person through.  To engage in intimate conversation, both chat partners need to be looking directly at each other; thus, there is no room for natural, intimate conversation with more than one family member at a time.  To deal with this, our participants instead engage in intimate conversations with each family member individually.  Indeed, at one point Person X’s father was briefly off-screen while speaking to the Person X, creating a fairly awkward conversation situation.  Person X started off by speaking to his mother, then asked his mother to hand the iPhone to his father so he could speak with him.  Person Y similarly began speaking with her mother, and later the father swapped seats in front of the camera with the mother when it was his turn.  Thus, a common task that was observed across each interview was where the participant requested to speak with another member through verbal communication.  The task was then fully accomplished by the web chat partners on the other side complying with the request by ending the conversation and handing off the camera or swapping locations with another chat partner.  We reason this common task exists because there is no natural way for the participants to actually control the web camera viewing angle to focus on another person.  Instead they must break the conversation and verbally express a request to switch web chat partners.  This request can then only be completed through moving around of partners on the other side of the web chat due to the limitations of the web chat interface.

An interesting difference that we found across the interviews is that Person X largely told his father the same things that he told his mother regarding events that happened in the past week.  However, the subjects of the conversations between Person Y and her two parents differed.  We reason that this was observed because of differences in relationships with the participants and the other chat members.  Person Y feels uncomfortable discussing certain topics with her father while being able to discuss them with her mother and vice versa.  Person X, however, is equally comfortable about talking with his parents about all matters.  Person Y also multitasked by surfing the web while chatting with her parents while Person X did not.  This difference could have arised because of a difference in technological capabilities, as iPhone is a single-foreground-application device while laptops are not.  Person X, however, had a laptop in front of him but did not surf the web with it.  We reason that this is because Person X is more engaged in the web chat sessions partly because he web chats only once a week with his family while Person Y web chats multiple times a week. For person Z regarding webchat, she also talked to her parents one at a time and found that she could not communicate with her dogs at all because they would not stay in front of the camera for a very long time. Regarding finding friends in a public location, Person Z would text a friend to ask where they were before leaving her room to meet them. She would also just walk around the building until she found them, or just sit in one location and wait for the friend to find her. This took considerable time if the friend was late or texted that they were in one location but had moved. A simple application to survey a distant room would have helped with this coordination problem.

Answers to 11 task analysis questions
1. Who is going to use system?
People who want to web chat or remotely work with others through similar means such as video conferencing will use our system.  People who want to search a distant location for a person through a web camera can also use the system.  Our user also needs to physically be able to hold the mobile viewing screen.
Background Skills:
Users will need to know how to use a computer enough to operate a web chat application, how to use a web camera, and how to intuitively turn their head to look in a different direction.

2. What tasks do they now perform?
Users currently control web chat camera viewing through:
-telling the cameraman, the guy on the other end of the web chat, to move the camera to change the view onto somewhere/someone else as with Person X.
-telling the guy on the other end of the web chat to swap seats with another guy as with Person Y.
-ask who is talking when there are off-screen speakers.

3. What tasks are desired?
Instead, we would like to provide a means of web chat control through:

  • Controlling web chat camera to look for a speaker/person if he is not in view.
  • Controlling web chat camera intuitively just by turning head instead of clicking arrows, pressing keys, or giving verbal commands.

4. How are the tasks learned?

Tasks of camera control are learned through observation, trial and error, verbal communication, and perhaps looking at documentation.

5. Where are the tasks performed?
At the houses/offices of two parties that are separated by a large distance.
Meetings where video conferencing is needed.

6. What’s the relationship between user & data?
User views data as video/audio information from web camera.

7. What other tools does the user have?
Smartphone, tablet, web camera, kinect, laptop, speakers, microphone.

8. How do users communicate with each other?
They use web chat over the internet through web cameras and laptops.

9. How often are the tasks performed?
-Video conferencing is weekly for industry teams.
-People missing their friends/family will web chat weekly.

10. What are the time constraints on the tasks?
A session of chat or a meeting generally will last for around one hour.

11. What happens when things go wrong?
Confusion ensues when speakers are out of view of web camera.  This often causes requests for the speaker to repeat what was just said and readjustment of camera angle or swapping of seats in front of the camera.  This is awkward and breaks the flow of the conversation.  Instead of facing this problem constantly, our interview participants have one on one individual conversations with their web chat partners.

Think about the tasks that users will perform with your system. Describe at least 3 tasks in moderate detail:
– task 1 : Web chat while breaking the restriction of having to sit in front of the computer

  • – Allow users to walk around while their chat partner intuitively controls the camera view to keep them in view and continue the conversation
  • -Eliminate problems of off-screen speakers.
  • – current difficulty rating: difficult
  • – difficulty rating with our system: easy

– task 2 : Be able to search a distant location for a person through a web camera.

  • – Allow user to quickly scan a person’s room for the person.
  • – Can also scan other locations for the person provided that a web camera is present.
  • – current difficulty rating: difficult – impossible if you’re not actually there
  • – difficult rating with our system: easy


– task 3 : Web chat with more than one person on the other side of the web camera.

  • – Make web chat more than just a one on one experience. Support multiple chat partners through allowing the user to intuitively change camera view to switch between chat partners without breaking the flow of the conversation.
  • – current difficulty rating: difficult
  • – difficult rating with our system: moderate

Interface Design
Text Description:
With our system, users will be able to remotely control a camera using head motion which is observed by a Kinect and mapped to the camera to change the view in a corresponding direction. The user will keep a mobile screen in front of them so as to always view the video feed from the camera.  This provides a more natural method of control than other webcam systems and allows a greater amount of flexibility in camera angles, as well as an overall more enjoyable experience.  Our system thus offers the sole functionality of camera control through intuitive head movement.  Current systems either require using a physical device as a controller or awkward verbal commands to control a remote camera angle while our system allows users to simply turn their heads to the turn the camera, similar to how a person would turn their head in real life to view anything that is not in their vision. The system will essentially function like a movable window into a remote location.


Task 1 – Able to move around while video chatting

Task 2: Searching for a friend in a public place

Task 3: Talking to multiple people at once


View user would have of the mobile screen and Kinect in the background to sense user rotation

Example of user using the mobile screen as a window into a remote location

Assignment 2: Princeton Pathfinder

ELE 469/ COS 436 – Assignment 2 : Individual Design Exercise

1)      I watched the IDEO video:  My favorite parts were the examples of how you do something first and then ask for forgiveness later, such as the bike hanging. I also liked the system of combining prototypes.

2)      Observing: I sat in the Friend center in between classes and observed students going in and out of class. What I found:

  • Some students arrive early,
    • They are already at the library, studying  and come down to class
    • Some print homework and rush around the library getting paper and stapling problem sets before rushing to class
    • Some just happen to come in and stand around outside the class room or go in early, not doing much except texting or checking their phones and computers
  • Many students arrive on time or a minute late and go straight into the lecture room
    • This is 5 minutes right before class that the halls are the most crowded
    • People meet their friends, form small groups just outside a big lecture hall and go in
    • Some people say hi in passing to friends
  • Then there are some students that arrive late
    • Some rush in, at a semi-running pace, jogging through the hallway and then quietly opening the lecture room door to not disturb the lecture
    • Some, even when late, just walk at a steady pace and walk into class
  • Some students, at the end of the class, walk out and back in if they forgot something
  • Many times students walk out in pairs or groups talking with friends
  • Sometimes, classes ended late, and if it was particularly late, some students would rush out trying to make it to their next class on time
  • When I chose specific students to observe as they waited for class:
    • One student was talking to friends outside of the classroom before they went to their respective classes
    • Another student tried to walk into the class room, but there was a lecture inside that had run over time, so they just waited outside. He looked at his phone and sat down by the window until students came out of the room
    • A third student as she was waiting for class sat down and crossed her arms and closed her eyes for a few minutes, just listening to her headphones before getting up to go into class. She took off her headphones as she walked into class and put them in her pocket.
    • A fourth girl walked toward her class room, then stopped midway to check her phone, respond to a text or maybe an email or facebook message. People swerved around her as she stood there texting. Then she looked up and put her phone away and walked into the classroom.

3)      Brainstorm:

  • 1)      More efficient way to print : Additional printers and printing stations
  • 2)      A printer outside the library with  “all clusters” access
  • 3)      App to find optimal path to get from point A to point B : Shortest path or fastest  path
  • 4)      App Record exact time that it will take to get form point A to point B on campus:  Uses student’s average walking pace to calculate time
  • 5)      An indicator light to tell students outside if a lecture is going on in a classroo
  • 6)      An indicator light to tell when an optimal time to enter class room unnoticed
  • 7)      An application in a phone or a separate device that will alert a person if they are blocking the path for others, such as when they stop to text with their head bent
  • 8)      A button to press so that doors will open without needing to grasp the handle
  • 9)     A way to share music that different students are listening to before class
  • 10)   A booth for quick naps with built in alarm for sited students between class
  • 11)   Icebreaker gam app for students to make friends in class: Students can collaborate later for projects
  • 12)   An app for students to find walking buddies before or after class:
  • 13)   An app to show where the nearest free food source is for a between class snack

4)      Two favorite Ideas:

  • 1)      Indicator Light for optimal time to walk in when class is already in session: This would be very useful, especially for late students wanting to walk in unnoticed
  • 2)      App to find optimal path to get from point A to point B : Shortest path or fastest path: This is what I would find most useful because I find that me and my friends constantly wonder which the shortest path is or the fastest path is as we walk to and from class.

5)  Paper Prototypes:

  • 1)      Paper Prototype I did not test: The indicator light prototype ended up being two lights drawn onto a postcard that would go outside the lecture room door. I could also make a paper prototype of the detector of the in-class noise that would send a signal for when there is enough noise that a student can enter unnoticed, but that would not be part of the usual user interface
  • 2)      The paper prototype that I did test: Princeton Pathfinder, a set of index cards representing an app that will take into account a user’s walking pace and suggest the fastest or shortest or most scenic path. I also included a bike path option

Paper Prototype windows

6)      User feedback:

Brian’s Suggestions :

  • Brian found that the difference between shortest path and fastest path were confusing so I may need to remove those two choices, or insert a short description about their differences that can be seen if the options are clicked on
  • He did find that the app would be very useful for him but also noted that a standard map would not include all the shortcuts
  • He suggested that the option to add in short cuts and paved or unpaved paths to the app would be a great way to make it more accurate


Victoria’s suggestions:

  • She found that the “calibrate” button being at the bottom was confusing, so moving it to a more visible location, or making a whole separate screen right after the welcome screen is the best way to have users calibrate their walking pace before selecting locations
  • She also found it un-intuitive to have a “next” button that needed to be pressed before the map was displayed. Just pressing enter should work.

















Jingwen’s suggestions:

  • She found that there was too much writing for the user
  • She also commented that the app could also have a “very fast” option, that would draw a path cutting across grass and areas that are not walkways if a student needs to get to a place as fast as they physically can.

General insights:

– I found that users were a bit confused as to how to move from one window to the next

– I observed that they did not like to read text to follow directions, so the app may need to be more intuitive, with the buttons speaking for themselves

– I also observed that users were confused with needing to click the “next” button

– I should add in a feature that allows users to fund the fastest possible way and input their own shortcuts