Lunch & Learn: Bottom-up social data collection with allourideas.org and Matthew Salganik

cuteKittenMT.jpg

How cute is this kit­ten? Let’s vote!
(Photo: morgue­file, cour­tesy hotblack)

In this week’s Lunch ‘n Learn on Wednes­day, Decem­ber 1st, Matthew Sal­ganik, an Assis­tant Pro­fes­sor in Princeton’s Depart­ment of Soci­ol­ogy, pre­sented some recent research that has resulted in the cre­ation of an open-source polling site called www.allourideas.org. One of the inspi­ra­tions for Salganik’s project came from an unlikely source– the pop­u­lar web­site, www.kittenwar.com, where vis­i­tors to the site vote on which of two ran­domly paired pho­tos of a kit­ten is cutest. Given two com­pet­ing choices–in this case pho­tos of two cute kittens—this site rapidly gath­ers user opin­ions in a way that makes it easy to track social sig­nals; the site uses a fun mech­a­nism for gath­er­ing infor­ma­tion, and allows any user to eas­ily upload a his or her own kit­ten pho­tos, thereby instantly enter­ing new con­tes­tants into the com­pet­i­tive arena of cuteness.

Con­sid­er­ing the pop­u­lar­ity and broad appeal of the kit­ten­war site, Sal­ganik reflected on stan­dard forms of data col­lec­tion that have been, (and still are), com­monly used for gath­er­ing infor­ma­tion in the social sci­ences. For many researchers, col­lect­ing infor­ma­tion from the gen­eral pop­u­la­tion depends upon using sur­vey mech­a­nisms that have changed lit­tle in the last cen­tury. In this tra­di­tional method of data-gathering, researchers think of the ques­tions they want to ask their sur­vey audi­ence well in advance of any feed­back from the actual sur­vey. Par­tic­i­pants in the sur­vey either take all of the sur­vey — and have their opin­ions included–or none—since par­tial data is rarely con­sid­ered valid for the final results. Although in the 20th cen­tury, the mech­a­nism for con­duct­ing sur­veys evolved from face-to-face, door-to-door polling, to ran­dom phone calls, to web-based research, this model of assess­ment has sev­eral unavoid­able short­com­ings. For exam­ple, one might ask “what impor­tant ques­tions might the orig­i­nal sur­vey have missed?” or, “how can the final inter­pre­ta­tion of data be made more trans­par­ent to other researchers?” Focus groups and other open dis­cus­sions meth­ods can allow more flex­i­bil­ity in gath­er­ing input from respondents–as well as reveal­ing why respon­dents make cer­tain choices–but these meth­ods tend to be slow, expen­sive, and dif­fi­cult to quan­tify. Most sig­nif­i­cantly, all are based on the same method­ol­ogy of the face-to-face sur­vey, and are merely con­ducted with increas­ingly up-to-date and scal­able meth­ods of deliv­ery. Web-based sur­veys admit­tedly reach many more peo­ple with far less over­head than did can­vass­ing door to door, but are such computer-based sur­veys really tak­ing advan­tage of the unique strengths of the World Wide Web? Kittenwar.com sug­gested to Sal­ganik that there was another, more intu­itive way to present ideas and gather data on the web.

Using the model of Wikipedia.org as an exam­ple, Sal­ganik remarked upon the internet’s strength in engag­ing peo­ple at their own level of inter­est. Wikipedia, he said, has become an unpar­al­leled infor­ma­tion aggre­ga­tion sys­tem because it is able to har­vest the full amount of infor­ma­tion that peo­ple are will­ing to con­tribute to the site. Describing this phe­nom­e­non as “the Fat Head vs. the Long Tail,” Wikipedia makes it pos­si­ble to gather knowl­edge from peo­ple who have vastly dif­fer­ent lev­els of com­mit­ment to improv­ing the site. On one hand, there are those (fat heads) will­ing to spend days or months care­fully research­ing and craft­ing entire Wikipedia entries — while oth­ers, (long tails), are con­tent to insert a miss­ing comma into an entry they hap­pen to be read­ing at the moment. As such, Wikipedia.org is an exam­ple of what might be achieved by an appli­ca­tion that truly under­stands how the inter­net works best. Tra­di­tional sur­veys can only cap­ture a tiny seg­ment of this range of audi­ence par­tic­i­pa­tion and engagement.

So what does the inter­sec­tion of kittenwar.com and Wikipedia sug­gest to a researcher who wants to design a 21st-century web-native sur­vey? Salganik’s site,www.allourideas.org illus­trates one solu­tion: a model that takes advan­tage of the most essen­tial qual­ity of the World Wide Web – where, accord­ing to Sal­ganik, “an unimag­in­able scale and gran­u­lar­ity of data can be col­lected from day to day life.” The devel­op­ment of allourideas.org–funded in part by Google.com and the Cen­ter for Infor­ma­tion Tech­nol­ogy Pol­icy at Prince­ton Uni­ver­sity (CITP)– uses the same” bottom-up” approach of kittenwar.com, paired with an algo­rithm devel­oped by Sal­ganik and his team, con­sist­ing of a sin­gle web devel­oper, and sev­eral stu­dent researchers. The result is an open-source sys­tem where “any group, any­where, can cre­ate their own wiki survey.”

Sal­ganik describes the www.allourideas.org  web­site as an “idea mar­ket­place,” designed to har­vest the full amount of infor­ma­tion that peo­ple are will­ing to pro­vide on any given topic. Participants in a sur­vey on the site are pre­sented with ran­dom pairs of options, and pick the one they most favor; they then are given a sec­ond pair of dif­fer­ent options, and vote again. Even­tu­ally, the most pop­u­lar ideas — either pro­vided by the sur­vey author(s), or sub­mit­ted by any per­son vot­ing on the site — can be quickly identified.

 

AllOurIdeas.JPG

The home­page of www.AllOurIdeas.org

 

An early ver­sion of the site was devel­oped for the Under­grad­u­ate Stu­dent Gov­ern­ment (USG) at Prince­ton, as a mech­a­nism to assess the most impor­tant cam­pus issues accord­ing to Prince­ton stu­dents. Vot­ing began with ideas sub­mit­ted by lead­ers in the USG, with addi­tional sug­ges­tions sub­mit­ted by stu­dents par­tic­i­pat­ing in the polling. In the end, two of the top five ideas that emerged as the most impor­tant to the stu­dent pop­u­la­tion were con­tributed by stu­dent vot­ers, and were not among the ideas orig­i­nally sug­gested by the USG. The per­cent­age of par­tic­i­pa­tion in the poll was also remark­able: 40% of the under­grad­u­ate pop­u­la­tion took part, result­ing in nearly 40,000 votes on paired ideas–as well as gen­er­at­ing 100 new ideas not thought of by the orig­i­nal authors of the sur­vey. Sal­ganik and his team con­cluded that using this sur­vey tool on an audi­ence that is already engaged in the issues being pre­sented can result in an incred­i­ble amount of qual­ity added to the data gen­er­ated. “In the old sur­vey method,” Sal­ganik explained, “tons of data are left on the table.” New meth­ods of data col­lec­tion, such as allourideas.org, are by con­trast inclu­sive, from the bot­tom up, and reflect the effort, inter­est, and par­tic­i­pa­tion that engaged respon­dents are will­ing to con­tribute to the discussion.

Since its pub­lic release, www.allourideas.org has gen­er­ated 700 new idea mar­ket­places and 6,000 new ideas, uploaded over the course of 400,000 votes. Users of the free web-hosted inter­face include Colum­bia Uni­ver­sity Law School, The Wash­ing­ton Post, and the New York City Depart­ment of Parks. Any­one with a few ideas and a tar­get audi­ence will­ing to pro­vide feed­back can make their own space for col­lect­ing and pri­or­i­tiz­ing ideas on the allourideas.org site. Results are returned to the sur­vey authors with full trans­parency, includ­ing so
me basic demo­graph­ics about the geo­graphic loca­tion of vot­ers, the length of par­tic­i­pa­tion in each indi­vid­ual vot­ing ses­sion, and the pair of choices at which a par­tic­i­pant leaves the vot­ing. (Sal­ganik explained that leav­ing a ses­sion is some­times indica­tive of the voter’s per­cep­tion that their only choice is between two bad ideas, although in other cases, vot­ers leave because they feel they’ve voted enough.) Vot­ing is anony­mous, and vot­ers are encour­aged to return to vote as often as they wish.

Sal­ganik described some of the mechan­ics used to keep the vot­ing fresh and cur­rent, such as weight­ing recently sub­mit­ted new ideas with more fre­quent appear­ances in the polling to give them equal foot­ing with older ideas. The polling mech­a­nism is designed to han­dle a very large num­ber of ideas, and the more peo­ple vot­ing, the bet­ter the results.In future releases of the code, idea pairs might even be adap­tive to prior choices made by an indi­vid­ual voter. It’s impor­tant to the suc­cess of such a binary vot­ing sys­tem, explained Sal­ganik, that vot­ers don’t know pre­vi­ous results, because that igno­rance avoids the men­tal­ity of the flash opin­ion. The ideal sized group for polling is at least 20 peo­ple, although any num­ber of respon­dents can be accom­mo­dated. The poll cur­rently being con­ducted by The Wash­ing­ton Post on reader feed­back and par­tic­i­pa­tion is the largest to date on the site. At the time of this Lunch ‘n Learn, the poll had been open for 3 days, and had already gen­er­ated more than 40,000 votes.

The con­cept behind www.allourideas.org con­sists of a few basic char­ac­ter­is­tics. The site is sim­ple. It’s pow­er­ful. It’s free. It’s also con­stantly improv­ing. It proves, Sal­ganik con­cluded, that when infor­ma­tion is pre­sented and gath­ered prop­erly, there is wis­dom, rather than mad­ness, in the opin­ions of the crowd – and there needn’t be a cute kit­ten any­where in sight.

Free “idea mar­ket­places” can be cre­ated by any­one on the hosted site at www.allourideas.org. If you are inter­ested in cre­at­ing a site, come pre­pared with a tar­get audi­ence and a few ideas in mind — then invite your audi­ence to begin vot­ing and con­tribut­ing their own ideas.

allourideas.org is also an open-source-code project. The code is avail­able at github.com. You can also fol­low the project on Twit­ter and on Face­book.

This entry was posted in Lunch & Learn, Training and Outreach and tagged , , , , , , , , , , . Bookmark the permalink.