P5 – Group 16 – Epple

Group 16 – Epple

Andrew, Brian, Kevin, Saswathi

Project Summary:

Our project is to use Kinect to make an intuitive interface for controlling web cameras through using body orientation.


The first task we have chosen to support with our working prototype is the easy-difficulty task of allowing for web chat while breaking the restriction of having the chat partner sit in front of the computer.  A constant problem with web chats is the restriction that users must sit in front of the web camera to carry on the conversation; otherwise, the problem of off-screen speakers arises.  With our prototype, If a chat partner moves out of the screen, the user can eliminate the problem of off-screen speakers through intuitive head movements to change the camera view. Our second task is the medium-difficulty task of searching a distant location for a person with a web camera.  Our prototype allows users to seek out people in public spaces through using intuitive head motions to control the search of a space via web camera, just as they would in person.  Our third task is the hard-difficulty task of allowing web chat with more than one person on the other side of the web camera.  Our prototype allows users to web chat seamlessly with all the partners at once. Whenever the user wants to address a different web chat partner, he will intuitively change the camera view with his head to face the target partner.

Task changes:

Our set of tasks has not changed from P3 and P4.  Our rationale behind this is that each of our tasks was and is clearly defined and the users testing our low-fi prototype did not have complaints about the tasks themselves and saw the value behind them. The comments and complaints we received were more about the design of our prototype environment, such as placement of paper cutouts, and inaccuracies with our low-fi prototype, such as the allowance of peripheral vision and audio cues with it.

Design Changes:

The changes we decided to make to our design based on user feedback from P4 were minimal. The main types of feedback that we received, as can be seen on the P4 blog post, were issues with the design of the low-fidelity prototype that made the user experience not entirely accurate, suggestions on additional product functionality, and tips for making our product more intuitive. Some of the suggestions were not useful in terms of making changes to our design while other suggestions were very insightful but we decided that they were not essential for a prototype at this stage. For example, in the first task we asked users to keep the chat partner in view with the screen as he ran around the room. The user commented that this was a bit strange and tedious, and that it might be better to just have the camera track the moving person automatically. This might be a good change, but it changes that intended function of our system from being something that the user interacts with, as if peering into another room naturally, to more of a surveillance or tracking device. This kind of functionality change is something that we decided not to implement.

Users also commented that their usage of peripheral vision and audio cues made the low-fi prototype a bit less realistic, but that is an issue that arose due to inherent limits of a paper prototype rather than due to the design of our interface. Our new prototype will inherently overcome these difficulties and be much more realistic, as we will be using a real mobile display, and the user will only be able to see the web camera’s video feed.  The user can also actually use head motions to control the viewing angle of the camera. We did gain some particularly useful feedback, such as the suggestion that using something like an iPad would be useful for the mobile screen because it would allow users to rotate the screen to fit more horizontal or vertical space. This is something that we decided would be worthwhile if we chose to mass produce our product, but we decided not to implement this in our our prototype for this class as it is not essential to demonstrate the main goals of the project.  We also realized from our low fidelity prototype that the lack of directional sound cues from their speakers’ audio would make it hard to get a sense of which direction an off-screen speaker’s voice is coming from. We realized that implementing something like a 3D sound system or a system of providing suggestions on which way to turn the screen would be useful, but again, we decided that it was not necessary for our first prototype.

One particular thing that we have changed going from the low-fidelity prototype to this new prototype is the way that users would interact with the web camera. One of the comments we got from P4 was that users felt that they didn’t get the full experience of how they would react to a camera that independently rotated while they were video chatting. We felt that this was a valid point and something that we overlooked in our first prototype as it was low-fidelity. It is also something that we felt was essential to our proof of concept in the next prototype, so we have the web camera attached to a servo motor to rotate in front of the chat partner with our new prototype as show below.

-Web Camera on top of a servo motor:

Web Camera on Servo motor

Storyboard sketches for tasks:

Task 1- web chat while breaking the restriction of having the chat partner sit in front of the computer:

Task 1 – Chat without restriction on movement – Prototype all connected to one computer

Task 2 – searching a distant location for a person with a web camera:

Task 2 – Searching for a Friend in a public place – Prototype all connected to one computer

Task 3 – allowing web chat with more than one person on the other side of the web camera:

Task 3 – Multi-Person Webchat – Prototype all connected to one computer

Unimplemented functionality – camera rotates vertically up and down if user moves his head upwards or downwards:

Ability of camera to move up and down in addition to left and right.

Unimplemented functionality – Kinect face data is sent over a network to control the viewing angle of a web camera remotely:

Remote camera control over a network


We implemented functionality for the web camera rotation by attaching it to a servo motor that turns to a set angle given input from an Arduino.  We also implemented face tracking functionality with the Kinect to find the yaw of a user’s head and send this value as input to the Arduino through Processing using serial communication over a USB cable. The camera can turn 180 degrees due to the servo motor, and the Kinect can track the yaw of a single person’s face accurately up to 60 degrees in either direction while maintaining a lock on the person’s face. However, the yaw reading of the face is only guaranteed to be accurate within 30 degrees of rotation in either direction. Rotation of a face in excess of 60 degrees usually results in a loss of recognition of the face by the Kinect, and the user must directly face the Kinect before their face is recognized again. Therefore the camera also has a practical limitation of 120 degrees of rotation.  This is all shown in image and video form in the next section.

The parts of the system that we decided to leave unimplemented for this prototype are mainly parts that we felt were not essential to demonstrate the basic concept of our idea. For example, we have a servo motor that will rotate the webcam horizontally left and right, but we decided that it was not essential to, at this stage, have another servo motor rotating the camera vertically up and down, as it is a similar implementation of code and usage of input signals, only in a different direction. The usage cases for moving the camera up and down are also lacking as people usually do move vertically.  We also decided not to implement network functionality to transmit kinect signals to the arduino remotely at this stage. We intend to implement this functionality in a future prototype, but for the moment, we feel it is nonessential, and that it is sufficient to have everything controlled by one computer and simply divide the room using potentially a cardboard wall to keep the kinect side of the room and the web camera side of the room separated.  The one major Wizard-Of-Oz technique that we will use when testing this prototype is to thus pretend that the user is remotely far from the web chat partners, when in reality, they are in the same room, and we are using a simple screen to separate the two sides of interface.  This is because, again, the kinect and the arduino-controlled-webcam will be connected to the same computer to avoid having to send signals over a network, which we do not have the implementation for.  We will thus only pretend that the two sides of the video chat are far apart.for the purpose of testing the prototype.

We chose to implement the core functionality of our design for this prototype. It was essential that we implement face tracking with the Kinect, as this makes up half of our design. We also implemented control of the camera via serial communication with the Arduino. We decided to only implement yaw rotation and not pitch rotation because that would require two motors, and this prototype adequately demonstrates our proof-of-concept with only horizontal left-right rotation. We thus chose to implement for breadth rather than depth in terms of degrees of control over the web camera.  We also worked on remote communication between the Kinect and Arduino/camera setup, but have not finished this functionality yet, and it is not necessary to demonstrate our core functionality for this working prototype.  We thus, again chose to implement for breadth rather than depth at this stage in deciding serial communication with Arduino over a USB cable was enough.  By choosing breadth over depth, we have enough functionality with our prototype to test our three selected tasks, as all three essentially require face tracking control of the viewing angle of a web camera.

We used the FaceTrackingVisualization sample code included with the Kinect Development Toolkit as our starting point with the Kinect code.  We also looked at some tutorial code for having Processing and Arduino interact with each other at: http://arduinobasics.blogspot.com/2012/05/reading-from-text-file-and-sending-to.html


A video of our system.  We show Kinect recognizing the yaw of a person’s face and using this to control the viewing angle of a camera.  Note that we display on the laptop a visualizer of Kinect’s face tracking, not the web camera feed itself.  Accessing the web camera feed itself is trivial through simply installing drivers:

Video of working prototype

Prototype Images:

Kinect to detect head movement

Webcam and Arduino

Kinect recognizing a face and it’s orientation

Kinect detecting a face that is a bit farther away