Lunch & Learn: Bottom-up social data collection with allourideas.org and Matthew Salganik

cuteKittenMT.jpg

How cute is this kitten? Let’s vote!
(Photo: morguefile, courtesy hotblack)

In this week’s Lunch ‘n Learn on Wednesday, December 1st, Matthew Salganik, an Assistant Professor in Princeton’s Department of Sociology, presented some recent research that has resulted in the creation of an open-source polling site called www.allourideas.org. One of the inspirations for Salganik’s project came from an unlikely source– the popular website, www.kittenwar.com, where visitors to the site vote on which of two randomly paired photos of a kitten is cutest. Given two competing choices–in this case photos of two cute kittens—this site rapidly gathers user opinions in a way that makes it easy to track social signals; the site uses a fun mechanism for gathering information, and allows any user to easily upload a his or her own kitten photos, thereby instantly entering new contestants into the competitive arena of cuteness.

Considering the popularity and broad appeal of the kittenwar site, Salganik reflected on standard forms of data collection that have been, (and still are), commonly used for gathering information in the social sciences. For many researchers, collecting information from the general population depends upon using survey mechanisms that have changed little in the last century. In this traditional method of data-gathering, researchers think of the questions they want to ask their survey audience well in advance of any feedback from the actual survey. Participants in the survey either take all of the survey — and have their opinions included–or none—since partial data is rarely considered valid for the final results. Although in the 20th century, the mechanism for conducting surveys evolved from face-to-face, door-to-door polling, to random phone calls, to web-based research, this model of assessment has several unavoidable shortcomings. For example, one might ask “what important questions might the original survey have missed?” or, “how can the final interpretation of data be made more transparent to other researchers?” Focus groups and other open discussions methods can allow more flexibility in gathering input from respondents–as well as revealing why respondents make certain choices–but these methods tend to be slow, expensive, and difficult to quantify. Most significantly, all are based on the same methodology of the face-to-face survey, and are merely conducted with increasingly up-to-date and scalable methods of delivery. Web-based surveys admittedly reach many more people with far less overhead than did canvassing door to door, but are such computer-based surveys really taking advantage of the unique strengths of the World Wide Web? Kittenwar.com suggested to Salganik that there was another, more intuitive way to present ideas and gather data on the web.

Using the model of Wikipedia.org as an example, Salganik remarked upon the internet’s strength in engaging people at their own level of interest. Wikipedia, he said, has become an unparalleled information aggregation system because it is able to harvest the full amount of information that people are willing to contribute to the site. Describing this phenomenon as “the Fat Head vs. the Long Tail,” Wikipedia makes it possible to gather knowledge from people who have vastly different levels of commitment to improving the site. On one hand, there are those (fat heads) willing to spend days or months carefully researching and crafting entire Wikipedia entries — while others, (long tails), are content to insert a missing comma into an entry they happen to be reading at the moment. As such, Wikipedia.org is an example of what might be achieved by an application that truly understands how the internet works best. Traditional surveys can only capture a tiny segment of this range of audience participation and engagement.

So what does the intersection of kittenwar.com and Wikipedia suggest to a researcher who wants to design a 21st-century web-native survey? Salganik’s site,www.allourideas.org illustrates one solution: a model that takes advantage of the most essential quality of the World Wide Web – where, according to Salganik, “an unimaginable scale and granularity of data can be collected from day to day life.” The development of allourideas.org–funded in part by Google.com and the Center for Information Technology Policy at Princeton University (CITP)– uses the same” bottom-up” approach of kittenwar.com, paired with an algorithm developed by Salganik and his team, consisting of a single web developer, and several student researchers. The result is an open-source system where “any group, anywhere, can create their own wiki survey.”

Salganik describes the www.allourideas.org  website as an “idea marketplace,” designed to harvest the full amount of information that people are willing to provide on any given topic. Participants in a survey on the site are presented with random pairs of options, and pick the one they most favor; they then are given a second pair of different options, and vote again. Eventually, the most popular ideas — either provided by the survey author(s), or submitted by any person voting on the site — can be quickly identified.

 

AllOurIdeas.JPG

The homepage of www.AllOurIdeas.org

 

An early version of the site was developed for the Undergraduate Student Government (USG) at Princeton, as a mechanism to assess the most important campus issues according to Princeton students. Voting began with ideas submitted by leaders in the USG, with additional suggestions submitted by students participating in the polling. In the end, two of the top five ideas that emerged as the most important to the student population were contributed by student voters, and were not among the ideas originally suggested by the USG. The percentage of participation in the poll was also remarkable: 40% of the undergraduate population took part, resulting in nearly 40,000 votes on paired ideas–as well as generating 100 new ideas not thought of by the original authors of the survey. Salganik and his team concluded that using this survey tool on an audience that is already engaged in the issues being presented can result in an incredible amount of quality added to the data generated. “In the old survey method,” Salganik explained, “tons of data are left on the table.” New methods of data collection, such as allourideas.org, are by contrast inclusive, from the bottom up, and reflect the effort, interest, and participation that engaged respondents are willing to contribute to the discussion.

Since its public release, www.allourideas.org has generated 700 new idea marketplaces and 6,000 new ideas, uploaded over the course of 400,000 votes. Users of the free web-hosted interface include Columbia University Law School, The Washington Post, and the New York City Department of Parks. Anyone with a few ideas and a target audience willing to provide feedback can make their own space for collecting and prioritizing ideas on the allourideas.org site. Results are returned to the survey authors with full transparency, including so
me basic demographics about the geographic location of voters, the length of participation in each individual voting session, and the pair of choices at which a participant leaves the voting. (Salganik explained that leaving a session is sometimes indicative of the voter’s perception that their only choice is between two bad ideas, although in other cases, voters leave because they feel they’ve voted enough.) Voting is anonymous, and voters are encouraged to return to vote as often as they wish.

Salganik described some of the mechanics used to keep the voting fresh and current, such as weighting recently submitted new ideas with more frequent appearances in the polling to give them equal footing with older ideas. The polling mechanism is designed to handle a very large number of ideas, and the more people voting, the better the results.In future releases of the code, idea pairs might even be adaptive to prior choices made by an individual voter. It’s important to the success of such a binary voting system, explained Salganik, that voters don’t know previous results, because that ignorance avoids the mentality of the flash opinion. The ideal sized group for polling is at least 20 people, although any number of respondents can be accommodated. The poll currently being conducted by The Washington Post on reader feedback and participation is the largest to date on the site. At the time of this Lunch ‘n Learn, the poll had been open for 3 days, and had already generated more than 40,000 votes.

The concept behind www.allourideas.org consists of a few basic characteristics. The site is simple. It’s powerful. It’s free. It’s also constantly improving. It proves, Salganik concluded, that when information is presented and gathered properly, there is wisdom, rather than madness, in the opinions of the crowd – and there needn’t be a cute kitten anywhere in sight.

Free “idea marketplaces” can be created by anyone on the hosted site at www.allourideas.org. If you are interested in creating a site, come prepared with a target audience and a few ideas in mind — then invite your audience to begin voting and contributing their own ideas.

allourideas.org is also an open-source-code project. The code is available at github.com. You can also follow the project on Twitter and on Facebook.

This entry was posted in Lunch & Learn, Training and Outreach and tagged , , , , , , , , , , . Bookmark the permalink.