Popularity of Data Analysis Software

Robert A. Muenchen wrote a use­ful book, R for SAS and SPSS Users. He also authors a blog entry that he updates reg­u­larly, where he presents var­i­ous ways of mea­sur­ing the pop­u­lar­ity or mar­ket share of data analy­sis soft­ware such as R, SAS, Stata, and SPSS. I think it is quite an infor­ma­tive read­ing. The fig­ure 1 is espe­cially striking!

Posted in Data Management, Software | Tagged | Leave a comment

State Data Centers

As a Pop­u­la­tion Research Librar­ian, I some­times field ques­tions about cen­sus data for indi­vid­ual states. Rather than wad­ing through the Cen­sus Bureau page to find detailed data and pro­jec­tions on the state and local level, I often go directly to a State Data Cen­ter site. Every state has one and they are part of the Cen­sus Bureau’s State Data Cen­ter Net­work. Basi­cally, these Cen­ters pull out the rel­e­vant data for their state and make it eas­ily acces­si­ble. I find the staff to be very pro­fes­sional and will­ing to answer ques­tions by phone, at least in New Jer­sey. So…if you live in New Jer­sey and would like to know how much more densely pop­u­lated your town was in 2010 com­pared to 2000, go to the New Jer­sey State Data Cen­ter and scroll down to the “Pop­u­la­tion Den­sity by Munic­i­pal­ity: 2000 & 2010” table. Have fun explor­ing!

Posted in U.S. Census | Tagged | Comments Off

Data Life Cycle

I have been col­lect­ing dia­grams depict­ing the con­cept of Data Life Cycle (or Data Life­cy­cle) for some time. Merriam-Webster’s online dic­tio­nary (2012) defines “life cycle” (the third mean­ing) as: “a series of stages through which some­thing passes dur­ing its lifetime.”

dsdrData have a life cycle as well; accord­ing to the Data Shar­ing for Demo­graphic Research (DSDR) (2005) book­let, Guide to Social Sci­ence Data Prepa­ra­tion and Archiv­ing – Best Prac­tice Through­out the Data Life Cycle. 3rd ed. Their dia­gram illus­trates “the key con­sid­er­a­tions ger­mane to archiv­ing at each step in the data cre­ation process (DSDR 2005: vii).”  This was where I first learned about the concept.

oaisLater on, I encoun­tered an ear­lier Data Life Cycle dia­gram from the Con­sul­ta­tive Com­mit­tee for Space Data System’s (CCSDS) pub­li­ca­tion dated back in 2002 (CCSDS 2002:4–1). It depicts nicely the func­tional enti­ties and their relationships.

ddiThomas’ (2005) was the first Data Life Cycle dia­gram I saw that fea­tured any sort of feed­back loops. This dia­gram is from the DDI 3.0 struc­tural reform group report that came out back in 2005.

ktTalk­ing about feed­back, Humphrey’s (2006) dia­gram is not strictly about the Data Life Cycle, but about (empir­i­cal) knowl­edge cre­ation, seen from the “data” angle.

shareOf course, if you are a project that is col­lect­ing lots of data, then the data man­age­ment process repeats itself, form­ing a nat­ural cir­cle, as in this fig­ure depict­ing the Sur­vey of Health, Age­ing and Retire­ment in Europe’s (SHARE)  data­base man­age­ment tasks.

iwgddOr, if you have a more abstract view­point, then the Data Life Cycle becomes a com­pli­cated human endeavor, as the Inter­a­gency Work­ing Group on Dig­i­tal Data (IWGDD) (2009) expresses in this visual model.

vumcIf we focus on a spe­cific set­ting where data are col­lected and “con­sumed,” then one of the appen­dices of Van­der­bilt Uni­ver­sity Med­ical Cen­ter (VUMC 2005) Infor­mat­ics Strate­gic Plan from back in 2005 has an inter­est­ing dia­gram that fea­tures four(!) circles.

I am not the first one who to col­lect dia­grams. Here is an Inter­na­tional Asso­ci­a­tion for Social Sci­ence Infor­ma­tion Ser­vices & Tech­nol­ogy (IASSIST) blog entry on the Dig­i­tal Life Cycle, pub­lished in 2006.

blIf you are inter­ested in know­ing how researchers actu­ally inter­act with infor­ma­tion, in 2009, the Research Infor­ma­tion Net­work (RIN) and British Library has pub­lished a report annex, titled Pat­terns of Infor­ma­tion Use and Exchange. Here is one of the case stud­ies. Arrows in the pic­ture indi­cate rela­tion­ships, as in “A may lead to B.” Dif­fer­ent col­ors rep­re­sent dif­fer­ent types of activ­i­ties adapted from Humphrey (2006) men­tioned above.

deathThus, data are cre­ated: Do they live for­ever? No. Data do die. Mich­ener et al. (1997) depicts the process of nor­mal degra­da­tion in data and meta­data over time.

needs toolsRep­re­sent­ing DataONE, Mich­ener (2011) can make the Data Life Cycles come alive either by adding the stakeholder’s needs, or asso­ci­at­ing each phase with avail­able appro­pri­ate tools.

icpsrThe lat­est Data Life Cycle dia­gram I came across so far brings us back to the begin­ning. By the fifth iter­a­tion, ICPSR’s (2012) book­let fea­tures the Data Life Cycle dia­gram that is severely bent to form an almost-circle, although it is not quite closed. The then-seven steps are now mor­phed into the six phases.

Ref­er­ences

Con­sul­ta­tive Com­mit­tee for Space Data Sys­tems (CCSDS) (2002) “Rec­om­men­da­tion for Space Data Sys­tem Stan­dards. Ref­er­ence Model for an Open Archival Infor­ma­tion Sys­tem (OAIS)” CCSDS 650.0-B-1 BLUE BOOK. Avail­able here. Accessed on Jan 9, 2012.

Data Shar­ing for Demo­graphic Research (DSDR)  (2005) Guide to Social Sci­ence Data Prepa­ra­tion and Archiv­ing – Best Prac­tice Through­out the Data Life Cycle. Third Ed. ICPSR. Uni­ver­sity of Michi­gan: Ann Arbor, MI. Avail­able here. Accessed on Feb 2012.

Humphrey, Charles (2006). “e-Science and the Life Cycle of Research” Doc­u­ment avail­able here. Accessed on Jan 9, 2012.

Hun­kler, C., T. Kneip, J. Korb­macher, S. Stuck, and S. Zuber. (2011). Glimps­ing into the black­box: Data man­ag­ing and clean­ing processes. In: M. Schröder. Ret­ro­spec­tive Data Col­lec­tion in the Sur­vey of Health, Age­ing and Retire­ment in Europe. SHARELIFE Method­ol­ogy,  Mannheim: MEA. Avail­able here. Accessed on Feb 2012.

Inter­a­gency Work­ing Group on Dig­i­tal Data (IWGDD) (2009) “Appen­dix B. Dig­i­tal Data Life Cycle.” In Har­ness­ing the Power of Dig­i­tal Data for Sci­ence and Soci­ety. Report of the Inter­a­gency Work­ing Group on Dig­i­tal Data to the Com­mit­tee on Sci­ence of the National Sci­ence and Tech­nol­ogy Coun­cil. Avail­able here. Accessed on Jan 9, 2012.

Inter-University Con­sor­tium for Polit­i­cal and Social Research (ICPSR). (2012). Guide to Social Sci­ence Data Prepa­ra­tion and Archiv­ing: Best Prac­tice Through­out the Data Life Cycle (5th ed). Ann Arbor, MI. Avail­able here. Accessed on Mar 9, 2012.

Inter­na­tional Asso­ci­a­tion for Social Sci­ence Infor­ma­tion Ser­vices & Tech­nol­ogy (IASSIST). (2006). “Con­cep­tu­al­iz­ing the Dig­i­tal Life Cycle”. A blog entry sub­mit­ted by Ann. Avail­able here. Accessed on Mat 9, 2012.

Merriam-Webster Online Dic­tio­nary. (2012) “life cycle”. Avail­able here. Accessed on March 9, 2012.

Mich­ener, William, James Brunt, John Helly, Thomas Kirch­ner, Susan Stafford. (1997). “Non­geospa­tial Meta­data for the Eco­log­i­cal Sci­ences Eco­log­i­cal Appli­ca­tions, Vol. 7, No. 1, pp. 330–342.

Mich­ener, William. (2011) “DataONE (Obser­va­tion Net­work for Earth): Enabling New Sci­ence by Sup­port­ing the Man­age­ment of Data Through­out its Life Cycle” A pre­sen­ta­tion at Work­shop on Research Data Life­cy­cle Man­age­ment (RDLM 2011). July 18–20, 2011. Prince­ton Uni­ver­sity, Prince­ton, NJ.

Thomas, Wendy, Aro­fan Gre­gory, Tom Piazza. (2005) “Struc­tural Reform Group Report.” Pre­sen­ta­tion. Part of a ses­sion on “An Inside View of DDI Ver­sion 3.0,” Chair Jostein Rys­se­vik. Avail­able here. Accessed on Jan 9, 2012.

Van­der­bilt Uni­ver­sity Med­ical Cen­ter (VUMC). (2005). Strate­gic Plan for VUMC Infor­mat­ics & Roadmap to 2010. Avail­able here. Accessed on Mar 9, 2012.

Posted in Data Management | Tagged , | Comments Off

Let’s Keep the Virtuous Circle Going

One way to under­stand the con­cept of the “Data Life Cycle” is to real­ize that there is a vir­tu­ous cir­cle going between data and research find­ings: new data beget new find­ings and the new find­ings, new data col­lec­tion, all the while deep­en­ing our under­stand­ing and enrich­ing our knowledge.

This was the image that came to my mind when I read emails from the Mex­i­can Migra­tion Project (MMP) and the Latin Amer­i­can Migra­tion Project (LAMP) ask­ing users to post the pub­li­ca­tions based on their data. Have you done research using either project data? Then please help the projects con­tinue to col­lect and share impor­tant migra­tion data by adding your pub­li­ca­tion to the list. Visit the project pub­li­ca­tions page (MMP, LAMP) today, and let’s keep the good cir­cle going!

Note: The man­ager of the MMP project informed me that MMP has an infor­ma­tive Fre­quently Asked Ques­tions (FAQ) page avail­able for the data users. Cool!

Posted in Data | Tagged | Comments Off

Firm Data User Support for the Fragile Families Study

For nearly 15 years now, researchers at Prince­ton and Colum­bia Uni­ver­si­ties have been col­lect­ing data from 20 cities on the lifestyles, health, and well­be­ing of unmar­ried par­ents and their chil­dren. This ongo­ing project, known as the Frag­ile Fam­i­lies and Child Well­be­ing Study, began by inter­view­ing par­ents when their chil­dren were born and has con­tin­ued with follow-up inter­views at their children’s first, third, fifth, and ninth birth­days. Researchers are now mak­ing prepa­ra­tions for the 15-year wave to exam­ine ado­les­cent well­be­ing, behav­ior and peer influ­ence, and once again follow-up with parents.

As the Frag­ile Fam­i­lies study con­tin­ues mov­ing for­ward, new find­ings are con­stantly emerg­ing from the data. The Future of Chil­dren pub­lished a vol­ume on Frag­ile Fam­i­lies sum­ma­riz­ing many of these find­ings. In addi­tion, hun­dreds of pub­li­ca­tions, includ­ing jour­nal arti­cles, books and book chap­ters, work­ing papers, and research briefs have been made avail­able for easy access on the Frag­ile Fam­i­lies pub­li­ca­tions web­site. Top­ics include fam­ily struc­ture, employ­ment and earn­ings, incar­cer­a­tion, child care, men­tal health and stress, par­ent­ing, rela­tion­ship qual­ity, race/ethnicity and nativ­ity, reli­gion, and much more. Keep your eyes peeled as these pub­li­ca­tions reveal the newest find­ings from the 9-year wave.

Researchers and Frag­ile Fam­i­lies staff mem­bers at Prince­ton seek to max­i­mize the use of the rich data that has come out of the Study by mak­ing data files avail­able for pub­lic use. Novice and expe­ri­enced data users can email the FFDATA team () with questions about the Study and receive help with downloading and using the various files. They can also inquire about the three-day Fragile Families data users’ workshop that will be held in July at Columbia University.

Note: Chang would like to thank the FFDATA team mem­bers for the inter­est­ing and enlight­en­ing con­ver­sa­tion on how much they do in order to sup­port their data users. The study recently released the 9-year wave public-use data through OPR Data Archive.

Posted in Data | Tagged , | Comments Off

Demography Volume 48

All four issues of Demog­ra­phy vol­ume 48 (2011) are now pub­lished. As a data per­son, I am inter­ested in the data the authors used for their research. Below is my attempt at sum­ma­riz­ing what I have found out by read­ing the data sec­tion of all the arti­cles — except five which did not directly rely on ana­lyz­ing data (i.e. a cor­rec­tion, an acknowl­edge­ment, the index, et cetera).

Some of the data/project ini­tials and short names are more famil­iar to the typ­i­cal data user while oth­ers are less so. Click on the name to learn more about them. I am plan­ning to fol­low up with all the data web­sites men­tioned here for updates and new releases.

This list was gen­er­ated by run­ning a sim­ple XQuery, which read three XML input files (articles.xml, articlesData.xml, and data.xml) and gen­er­ated an HTML DIV ele­ment. The data names are linked to the most closely related web­site, and the authors are linked to the arti­cle itself via its DOI. The files are avail­able for down­load­ing here (a zip file).

Feel free to let know if you find any errors of broken links. Thanks!

ACS Par­rado Add Health Scharoun-Lee et al., Kusunoki and Upchurch BRFSS Brat­ter and Gor­man CAPS Magruder CAS Swa­roop and Krysan CE Zagheni CLHLS Wen and Gu CPS DeLeire et al., Par­rado CPS Report Card Swa­roop and Krysan Chilean Birth Cer­tifi­cates Torche China Sta­tis­ti­cal Year­book Good­kind Chi­nese Cen­sus Eben­stein, Li et al. Chit­wan Val­ley Fam­ily Study Bohra-Mishra and Massey DHS Boc­quier et al., Case and Pax­son, Magruder DTR van den Berg et al., Behrman et al. Demo­graphic Year­book Zheng et al. ELSA Ploubidis and Grundy, Dela­vande and Rohwed­der ENADID Ren­dall et al.(1049–1058) ENOE Ren­dall et al.(1049–1058) Ethiopian Field Exper­i­ment Data Desai and Tarozzi FAOSTAT Lam Frag­ile Fam­i­lies Geller et al., Cor­man et al. GGS Perelli-Harris and Ger­ber GSS Wolfin­ger HMD Shkol­nikov et al. HRS Dela­vande and Rohwed­der IFLS Kuhn et al. INSEC Bohra-Mishra and Massey IPUMS-I Lam KIDS Can­cian et al. Kenyan RHC Data Luke et al. Kry­gyzs­tan Data Guil­lot et al. LPR Behrman et al. Mex­i­can Cen­sus Halpern-Manners Mozam­bi­can Sur­vey Agad­jan­ian et al. NCDB South et al. NELS:88 Stange NFHS Gaudin NHANES John­ston and Lee NHIS Fuller NIS Xie and Gough NLSY79 Bar­ber and East, Dar­i­o­tis et al., Brand and Davis, Vespa and Painter NNCS Swa­roop and Krysan NSFG Par­rado, Axinn et al., Magruder, Guzzo and Hay­ford NUJLSOA Tak­agi and Sil­ver­stein NVSS DeLeire et al. OECD.Stat Extracts Shkol­nikov et al. PETS Stange PSID South et al., Grieger and Danziger PUMS Elo et al., Thomas Reg­istry Data­base by Sta­tis­tics Nor­way Kalil et al. SABE Mau­rer SATP Bohra-Mishra and Massey SHARE Dela­vande and Rohwed­der SIPP Ren­dall et al.(481–506) SSD Zorlu and Mul­der Sim­u­la­tion Cebal­los, Diaz et al. U.S. Cen­sus McDaniel et al., Swa­roop and Krysan UN Data Lam, Espen­shade et al. US Supreme Court Data Stolzen­berg Vir­ginia 30k Board­man et al. WDI Zheng et al. WHO Data and Sta­tis­tics Shkol­nikov et al., Espen­shade et al. WHO Mor­tal­ity Data­base Ros­tron and Wilmoth WHS Pam­pel and Den­ney WIID2 Shkol­nikov et al. World Bank Data Shkol­nikov et al., Lam World Pop­u­la­tion Prospects Alkema et al.
Posted in Data | Tagged | Comments Off

2010 Census Data vs. American Community Survey

Until  the year 2010, the U.S. con­ducted a decen­nial cen­sus con­sist­ing of a short form, com­pleted by every­one, and a longer, more exten­sive form, com­pleted by cer­tain house­holds, so that infor­ma­tion about many vari­ables was avail­able for almost all geo­gra­phies for the decen­nial year.  As a result, we had a lot of infor­ma­tion for the decen­nial year and very lit­tle for the years in between. With the advent of the Amer­i­can Com­mu­nity Sur­vey (ACS), that has changed.

What is the ACS? It is an ongo­ing nation­wide sur­vey that replaces the long form.  It does not actu­ally “count” the pop­u­la­tion but it does give infor­ma­tion about the same vari­ables that were avail­able from the decen­nial cen­sus aver­aged into either  1-year, 3-year or 5-year esti­mates (peri­ods of time vs. a point in time).  So, this means we have more infor­ma­tion about the years in between the cen­sus, but less detail about the decen­nial year itself.

It will take some time to adjust to this new way of look­ing at cen­sus data and it helps to keep these impor­tant tips in mind:

  • Given the dif­fer­ences between the ACS and the decen­nial cen­sus, com­par­ing data from the two sources is not rec­om­mended. The only data that can be com­pared is the short form data from 2010 to the pre­vi­ous decen­nial censuses.
  • ACS data can be com­pared to ACS data. Best prac­tice is to com­pare only 1-year esti­mates with other 1-year esti­mates, 3-year esti­mates with other 3-year esti­mates and 5-year esti­mates with other 5-year esti­mates and the time period should not over­lap. For exam­ple, com­par­ing  data from 2005–2007 with 2006–2008 is not rec­om­mended but it is ok to com­pare 2005–2007 with 2008–2010.
  • Due to the nature of sur­vey data and the sam­ple sizes, data for the small­est geo­gra­phies may only be avail­able for the 5-year estimates.
  • Label your ACS data cor­rectly: “2005–2007 ACS data” vs. either “2005,” “2006” or “”2007.”
  • Most impor­tant: pay atten­tion to “Sam­pling Errors,” espe­cially to “Mar­gin of Error,” which is pre­sented with the data.
  • Need help? Con­tact a librar­ian.

 

Posted in Data, U.S. Census | Tagged , | Comments Off

Data Analysis Training at Firestone Library

If you or your stu­dents need a primer or refresher on data analy­sis, Oscar Torres-Reyna, one of the data con­sul­tants at Fire­stone Library, offers free train­ing ses­sions on Fri­day after­noons.  Reg­is­tra­tion is requested. For gen­eral infor­ma­tion about Get­ting Started in Data Analy­sis, Oscar has a great web page.

Some of Oscar’s class offer­ings include:

  • Explor­ing data and descrip­tive sta­tis­tics (Stata).
  • Explor­ing data and descrip­tive sta­tis­tics (R).
  • Intro­duc­tion to lin­ear regres­sion (Stata)
  • Intro­duc­tion to panel data analy­sis (Stata)
  • Intro­duc­tion to lin­ear regres­sion (R).
  • Intro­duc­tion to panel data analy­sis (R)

 

Posted in Analysis | Tagged , , | Comments Off

10,000

On the evening of Octo­ber the 3rd, the ten thou­sandth user* reg­is­tered to access the OPR Data Archive.

This is not as momen­tous as the world’s pop­u­la­tion reach­ing 7 bil­lion peo­ple, but it is a moment to cel­e­brate, nonethe­less. The user reg­is­tra­tion sys­tem went on-line in late 2003 when we had 76 users for the year. Since 2006, how­ever, the user list has grown by about 1,400 every year.

With a new pow­er­ful data­base engine and a com­pletely re-written web appli­ca­tion tak­ing advan­tage of the lat­est server tech­nol­ogy, the archive is capa­ble of serv­ing the cur­rent and future users reli­ably and rapidly.

In case you missed them, here are the most recent new and updated data releases:

  • Sur­vey of Unem­ployed Work­ers in New Jer­sey (NJUI)
  • The Mex­i­can Migra­tion Project (MMP) — 134 communities
  • The Frag­ile Fam­i­lies (FF) Year 9 Follow-up (Wave 5)

* User here means a dis­tinct email address. There may be peo­ple reg­is­tered with mul­ti­ple email addresses.

Posted in Data | Tagged | Comments Off

More about Stata Manuals

If you are look­ing for Stata man­u­als in the Prince­ton Libraries, the best way to find them is to go to the Main Library Cat­a­log, put “Stata” (no quotes) in the “Search For” box, click on “Sub­ject Head­ing” in the “Search By” box and then click the “Search” but­ton. You will see two rel­e­vant headings:

  • Stata (choose the one with the most hits) or
  • Stata-Handbooks, Man­u­als, etc.

Once you click on those links, you will see a list of what is avail­able. Some are located in Stokes Library, some are in Firestone.

There is a heavy demand for Stata books, so the one you are seek­ing may be charged out. If so, you may sub­mit a “Recall” notice request­ing that it be returned within two weeks (See the “Recall” but­ton at the top of the Library Cat­a­log screen.) Alter­na­tively, you can check if one of our Bor­row Direct part­ner libraries has a copy avail­able and get it from them. (See the “Bor­row Direct” link at the top of the Library Cat­a­log screen.) If you need any assis­tance with plac­ing these requests, please feel free to con­tact Joann () or another librarian () and we’ll be happy to help you.

 Also, if you have a suggestion for a Stata manual you would like Stokes Library to purchase, please send Joann an e-mail ().

.

 

 

Posted in Software | Tagged , | Comments Off