P2: Group Epple (# 16)

Names & Contribution
Saswathi Natta — Organized Interview form & Questions. Did one interview
Brian Hwang – Writing
Andrew Boik – Did one Interview. Writing
Kevin Lee – Did one Interview, Finalized document [editing, combining interview descriptions, task analysis, user group narrowing]

Problem & Solution Overview
People cannot remotely explore a space in a natural way. When people watch feed from a web chat, they have no way to move the camera or change the angle, they only have the view that the “cameraman” decides to give them when recording.  They may send signals to the camera through keyboard controls or maybe have verbally command the cameraman to change the viewing angle; however, these are terrible interfaces for controlling the view from a remote camera. The goal of our project is thus to create an interface to make controlling remote viewing of an environment in the web chat setting more intuitive.  We aim to improve the situation by replacing mouse, keyboard, and awkward verbal commands to the cameraman with Kinect-based head tracking used to pan around the environment.  The image of the environment based on the change in head angle will then be displayed on a mobile display, which is always kept in front of the user.  This essentially gives a user control over the web camera’s viewing angle through simply moving his head.

Description of users you observed in the contextual inquiry.
Our target user group are students at Princeton that web chat with others on a routine basis.  We choose this target group as our project aims to make the web chat experience more interactive through providing intuitive controls for the web camera viewing angle; thus, students who routinely webchat are the ideal target users.

Person X

  • Education: Masters Candidate at Princeton University
  • Likes: Being able to web chat anywhere through mobile interface and easily movable camera.
  • Dislikes: Web chatting with people that are off-screen.
  • Priorities: No interruptions in the web chat, which might arise from poor connection quality or having to wait for a person to get in front of the camera.  Good video/audio quality.
  • Why he is a good candidate: X is a foreign student from Taiwan.  He routinely web chats with his family who live in Taiwan.

Person Y

  • Education: Undergraduate at Princeton University
  • Likes: multitasking while Skyping
  • Dislikes: bad connection quality
  • Priorities: keeping in touch with parents
  • Why he is a good candidate: Y is from California and communicates with her family regularly.

Person Z

  • Education: Undergraduate at Princeton University studying Economics
  • Likes: being able to see her dogs back in india.She wants to be able to talk to her whole family at once and watch her dogs run around.
  • Dislikes: the connection issues on skype. Does not want camera to be able to rotate all the time for privacy issues
  • Priorities: time management, keeping in touch with family.
  • Why she is a good candidate: Z is from India and communicates with her family every 2 weeks via skype.

CI interview descriptions

We arranged to visit X’s office where he was going to web chat with his family for us to see.  He shares his office with 5 other people in a large room.  The office consists of six desks with computers and books on each.  He occupies one of the desks.  We arranged to visit person Y in her dorm room when she was going to communicate with his parents.  The interview was conducted in a standard issue dorm room where Y lives alone.  We interviewed Z in the student center, a public place, to talk both about how she webchats with her family but also about searching for friends remotely. Before each web chat contextual inquiry interview, we asked the participants some questions to gain some background context.  We learned that person X web chats with his family in Taiwan once a week. The web chats are just routine check ups between the family members that can last from 15 minutes, if not much has happened in the week, to an hour, if there is an important issue.  These web chats are usually on Friday nights or the weekend when he is least busy.  He always initiates the web chats because his family does not want to interrupt him. Person Y usually calls her parents who live in California  multiple times per week, with each session lasting from 20 minutes to an hour. Person Z webchats with her family from her dorm room, sitting at a desk. She talks to both her parents and her grandparents who live in the same house in addition to her dogs via skype. She expressed interest in being able to to talk to her family all at once with a rotating camera as well as being able to see her dogs as they run around with a rotating camera. Person Z also experienced the need to find a friend in a public location such as Frist and found that a being able to check remotely would be useful, though she felt that the camera might be an invasion of privacy if users did not want to be seen both in a home or in a public place.

After gaining context through questions, we then proceeded with the actual web chats.  X used Facetime on his iPhone to web chat with his family who were also using an iPhone. Y, on the other hand, used Skype on their laptops to web chat with their parents.  At the beginning of each web chat, we briefly introduced ourselves to the web chat partners and then allowed the web chat to flow naturally while observing as per the Master-Apprenticeship partnership model.  We briefly interrupted with questions every so often to learn more about habits and approaches to tasks.  We sometimes asked also asked questions to understand their likes/dislikes/priorities regarding the current web chat interface, the results of which are listed with the descriptions of the users.  We found that the theme of each web chat was largely just discussion of what recently happened.  Each interview also shared an interesting common theme where the participant would most of the time engage in a one on one conversation with one family member at a time.  We reason that this theme exists due to the limitations of the web camera technology.  The camera provides a fixed scope that is usually only enough to view one person through.  To engage in intimate conversation, both chat partners need to be looking directly at each other; thus, there is no room for natural, intimate conversation with more than one family member at a time.  To deal with this, our participants instead engage in intimate conversations with each family member individually.  Indeed, at one point Person X’s father was briefly off-screen while speaking to the Person X, creating a fairly awkward conversation situation.  Person X started off by speaking to his mother, then asked his mother to hand the iPhone to his father so he could speak with him.  Person Y similarly began speaking with her mother, and later the father swapped seats in front of the camera with the mother when it was his turn.  Thus, a common task that was observed across each interview was where the participant requested to speak with another member through verbal communication.  The task was then fully accomplished by the web chat partners on the other side complying with the request by ending the conversation and handing off the camera or swapping locations with another chat partner.  We reason this common task exists because there is no natural way for the participants to actually control the web camera viewing angle to focus on another person.  Instead they must break the conversation and verbally express a request to switch web chat partners.  This request can then only be completed through moving around of partners on the other side of the web chat due to the limitations of the web chat interface.

An interesting difference that we found across the interviews is that Person X largely told his father the same things that he told his mother regarding events that happened in the past week.  However, the subjects of the conversations between Person Y and her two parents differed.  We reason that this was observed because of differences in relationships with the participants and the other chat members.  Person Y feels uncomfortable discussing certain topics with her father while being able to discuss them with her mother and vice versa.  Person X, however, is equally comfortable about talking with his parents about all matters.  Person Y also multitasked by surfing the web while chatting with her parents while Person X did not.  This difference could have arised because of a difference in technological capabilities, as iPhone is a single-foreground-application device while laptops are not.  Person X, however, had a laptop in front of him but did not surf the web with it.  We reason that this is because Person X is more engaged in the web chat sessions partly because he web chats only once a week with his family while Person Y web chats multiple times a week. For person Z regarding webchat, she also talked to her parents one at a time and found that she could not communicate with her dogs at all because they would not stay in front of the camera for a very long time. Regarding finding friends in a public location, Person Z would text a friend to ask where they were before leaving her room to meet them. She would also just walk around the building until she found them, or just sit in one location and wait for the friend to find her. This took considerable time if the friend was late or texted that they were in one location but had moved. A simple application to survey a distant room would have helped with this coordination problem.

Answers to 11 task analysis questions
1. Who is going to use system?
People who want to web chat or remotely work with others through similar means such as video conferencing will use our system.  People who want to search a distant location for a person through a web camera can also use the system.  Our user also needs to physically be able to hold the mobile viewing screen.
Background Skills:
Users will need to know how to use a computer enough to operate a web chat application, how to use a web camera, and how to intuitively turn their head to look in a different direction.

2. What tasks do they now perform?
Users currently control web chat camera viewing through:
-telling the cameraman, the guy on the other end of the web chat, to move the camera to change the view onto somewhere/someone else as with Person X.
-telling the guy on the other end of the web chat to swap seats with another guy as with Person Y.
-ask who is talking when there are off-screen speakers.

3. What tasks are desired?
Instead, we would like to provide a means of web chat control through:

  • Controlling web chat camera to look for a speaker/person if he is not in view.
  • Controlling web chat camera intuitively just by turning head instead of clicking arrows, pressing keys, or giving verbal commands.

4. How are the tasks learned?

Tasks of camera control are learned through observation, trial and error, verbal communication, and perhaps looking at documentation.

5. Where are the tasks performed?
At the houses/offices of two parties that are separated by a large distance.
Meetings where video conferencing is needed.

6. What’s the relationship between user & data?
User views data as video/audio information from web camera.

7. What other tools does the user have?
Smartphone, tablet, web camera, kinect, laptop, speakers, microphone.

8. How do users communicate with each other?
They use web chat over the internet through web cameras and laptops.

9. How often are the tasks performed?
-Video conferencing is weekly for industry teams.
-People missing their friends/family will web chat weekly.

10. What are the time constraints on the tasks?
A session of chat or a meeting generally will last for around one hour.

11. What happens when things go wrong?
Confusion ensues when speakers are out of view of web camera.  This often causes requests for the speaker to repeat what was just said and readjustment of camera angle or swapping of seats in front of the camera.  This is awkward and breaks the flow of the conversation.  Instead of facing this problem constantly, our interview participants have one on one individual conversations with their web chat partners.

Think about the tasks that users will perform with your system. Describe at least 3 tasks in moderate detail:
– task 1 : Web chat while breaking the restriction of having to sit in front of the computer

  • – Allow users to walk around while their chat partner intuitively controls the camera view to keep them in view and continue the conversation
  • -Eliminate problems of off-screen speakers.
  • – current difficulty rating: difficult
  • – difficulty rating with our system: easy

– task 2 : Be able to search a distant location for a person through a web camera.

  • – Allow user to quickly scan a person’s room for the person.
  • – Can also scan other locations for the person provided that a web camera is present.
  • – current difficulty rating: difficult – impossible if you’re not actually there
  • – difficult rating with our system: easy


– task 3 : Web chat with more than one person on the other side of the web camera.

  • – Make web chat more than just a one on one experience. Support multiple chat partners through allowing the user to intuitively change camera view to switch between chat partners without breaking the flow of the conversation.
  • – current difficulty rating: difficult
  • – difficult rating with our system: moderate

Interface Design
Text Description:
With our system, users will be able to remotely control a camera using head motion which is observed by a Kinect and mapped to the camera to change the view in a corresponding direction. The user will keep a mobile screen in front of them so as to always view the video feed from the camera.  This provides a more natural method of control than other webcam systems and allows a greater amount of flexibility in camera angles, as well as an overall more enjoyable experience.  Our system thus offers the sole functionality of camera control through intuitive head movement.  Current systems either require using a physical device as a controller or awkward verbal commands to control a remote camera angle while our system allows users to simply turn their heads to the turn the camera, similar to how a person would turn their head in real life to view anything that is not in their vision. The system will essentially function like a movable window into a remote location.


Task 1 – Able to move around while video chatting

Task 2: Searching for a friend in a public place

Task 3: Talking to multiple people at once


View user would have of the mobile screen and Kinect in the background to sense user rotation

Example of user using the mobile screen as a window into a remote location