Barbie: The Doll Who Will Live Forever?

Cultural commentators have had a lot to say about Greta Gerwig’s smash summer movie, but no one I’ve read has considered Gerwig and Baumbach’s clever script as a post-modern take on a classic doll story like Rachel Field’s Hitty: Her First One Hundred Years (1929). The first I know of was Richard Johnson’s The History of a Doll (ca. 1780).  The heroine Charlotte could  describe and comment upon her experiences to the reader, from being carved from a tree branch to passing through several owners’ hands.  After surviving many accidents that required extensive restoration of her face and body, she was eventually burnt up in a fire.  Her lack of agency is central to the action: appearing lifeless to her owners, she is as much at their mercy, as if she were a servant or an enslaved person.

Barbie’s origins are more glorious than poor old Charlotte’s.  The little girls on earth are caring for their baby dolls when they see her in a striped one-piece bathing suit descending from the heavens like a goddess.  The little girls are so enchanted by the prospect of possessing a far more glamorous and empowering plaything that they immediately cast aside the baby dolls and heartlessly smash them to bits.  A less violent version of this scenario with a fairly happy ending plays out in Brenda’s “Victoria-Bess,” in which a beautiful expensive doll rules the nursery until deposed by an even more fashionable French one.  Ordered by her fickle, spoiled mistress to throw the shabby former favorite into the trash, a charitable relative rescues the humbled Victoria-Bess, who gratefully goes to a new owner, a poor girl recovering from surgery in the hospital.

Gerwig’s Barbie behaves less like a doll than an autonomous being that is not exactly human.  While the first shot is of a Mattel doll, the subsequent footage features Margot Robbie, who flirtatiously lowers her shades and winks.   What is that gesture supposed to mean?  A signal to not to overthink the ride on the hot pink roller coaster?  But the cracks and inconsistencies reveal some interesting angles on her creator’s game.

After Barbie finds herself thinking about death and her feet flatten, she is urged to consult the oracular Weird Barbie, a victim of rough doll play, from whom she learns that there’s a patch of cellulite on her thigh (surely impossible on hard plastic) and her old owner must be messing with her. While the director acknowledges that doll play comprises savagery, she roller blades around the possible plot implications of the Barbies being subject to the whims of Real World owners.  If Weird Barbies constituted the underclass, then mobs of mangled, neglected dolls like the one led by the Bad Doll in Ian McEwan’s The Daydreamer, might periodically roil Barbie Land.  If most girls’ nights were stopped dead by outbursts of existential angst, then the Barbies would all be in analysis and there would have to be a health care system.  The truly flawless Barbies could only belong to collectors, museums or extremely meticulous kids.  They would constitute the ruling class, which would disturb the benevolent, egalitarian administration of Barbieland’s vacuous perfection.Without any memories of having been a child’s plaything, Stereotypical Barbie has to seek the complete stranger who transferred anxieties to her and disrupted the rhythms of an rosy eternal now in the Real World (Los Angeles, naturally).   Throughout Barbie’s adventures, she is perceived neither as a doll or a human being all of the time: her status may fluctuate according to the situation, but her affect never changes.  When she crosses the border into Venice Beach, she passes for human in spite of her outfit—which didn’t seem especially outré for La-La-Land–because she attracts attentions from construction workers and a random bystander gives her shapely bottom a big smack.  The Mattel suits have no trouble identifying her as the doll that has to go back in the box, yet she can run like a gazelle in the painted-on, hot pink, lace-up bell bottom pants through the corporate head quarter’s labyrinthine corridors and maze of offices.

Reunited her owner and daughter, they all return to Barbieland and set the things which have gone so terribly wrong back to rights with the Mattel suits in hot pursuit.  After quelling the Kens’ abortive insurgence and restoring the matriarchy with only a few gracious concessions to the rebels, Stereotypical Barbie expresses the desire to be a real girl in the Real World.  She turns for help to Ruth Handler, the marketing barracuda behind the brand in her final incarnation as a sweet old bubbe who listens sympathetically over cups of tea.  This stand-in for a fairy godmother cautions her creation that humans get only one exit, but ideas live forever (presumably “Girls can do anything”). If Barbie truly wishes to be flesh and blood, i.e. sentient with a vagina, she, like Dorothy Gale, has always had the power to make her dream come true.  Without a dramatic wave of a wand that transforms plastic to muscle and bone (holding Ruth’s hands seem to have had something to do with it), the doll-being formerly known as Stereotypical Barbie leaves her dream house for Los Angeles, slips her flat feet into pink Birkenstocks, and is dropped off at the gynecologist’s.    And that’s all, folks.  No promise that she’ll live happily ever after.

For over a decade, a succession of creative teams tried to bring Barbie to the big screen, but crashed, and burned.  Margot Robbie was sure no one would finance the Gerwig-Baumbach script.  A successful director of small-budget Indie films who was ready to break the glass ceiling, Gerwig has to have known what side her bread was buttered on.  One way of keeping the plate with the Mattel logo up in the air was to avoid dark aspects which have always been present in doll stories.  Her claim that the movie had to be “totally bananas” could be interpreted as a palatable but slippery justification for furiously whipping the mixture to a froth and never letting it deflate. “Totally bananas” means that the poster boys for patriarchy had to be paper tigers.  The Mattel executives are more bumbling than the Keystone Cops, the Kens too disorganized to remember the all-important constitutional vote, and who could take Alan seriously?  The heartbroken Stereotypical Ken had to be satisfied by the stale old Tinseltown line that the key to happiness is the discovering that being yourself is better than good enough..  And the paradise of Barbies?  It’s a stretch to take seriously President Barbie, Dr. Barbie, Diplomat Barbie, etc. when they were brainwashed as easily as if they were bimbos (they are styled like them too).  Gloria’s rousing oration has no relevance to the powerhouses of Barbieland, none of whom have offspring to complicate their lives.  It’s really pitched to feminists and tired moms in the auditorium and to me it sounded more like a prompt to cheer at a pep rally than a serious statement about the difficulties of modern women’s lives.  And what would Ordinary Barbie look like?  Would she really be a marketable commodity?  Given the silliness of almost everyone Stereotypical Barbie meets during the film, it is hard to envision the advantages of trading one condition for the other.  Writer Barbie or exhausted executive assistant?  Unlike a doll in a traditional it-narrative, Stereotypical Barbie has told audience members too little about her thoughts and feelings for them to understand her dramatic change of heart.  Or did she?

With a billion dollars and counting in profits this week, Gerwig doesn’t have to apologize to anyone for any of her creative decisions. As eye-poppingly imaginative as the script and art direction was, more substantial ideas might have been mixed in with the fun for viewers to think about after they left the show.  Having seen it a second time last night with a first-time viewer, there’s plenty to talk about after the credits roll, but how much is the herky-jerky race through a landscape so packed with details that it makes your eyes bug.  Perhaps the film could be compared to a very elaborate doll house presented to a young girl, which the Edgeworths observed in Practical Education (1798), may not be able to hold her attention long, even though she may peep inside from time to time.

A furnished baby-house [ i.e. doll house] proves as tiresome to a child as a finished seat is to a young nobleman.  After peeping, for in general only a peep can be had into each apartment, after being roughly satisfied that nothing is wanting, and that consequently, there is nothing to be done.

Reading Gender in Children’s Literature Mathematically: An Award-winning Thesis

For her senior thesis, AnneMarie Caballero ’23 went through more than a thousand children’s books published during the 19th century and analyzed the pattern of topics in relation to the gender of protagonists. Titled “Gendered Topics: Boyhood and Girlhood in a Century of (Cotsen) Children’s Literature,” her project won Princeton’s Center for Digital Humanities (CDH) Senior Thesis Prize of 2023.

We are delighted that Caballero answered our interview questions about her research. An overview of her thesis, which gives us an opportunity to examine Cotsen’s collection from a new perspective, follows the interview. Caballero’s thesis will be made available via the repository of Princeton’s undergraduate senior theses and retrievable from the library online catalog in Fall 2023.


AnneMarie Caballero

AnneMarie Caballero ’23, winner of the Center for Digital Humanities Senior Thesis Prize for her project titled “Gendered Topics: Boyhood and Girlhood in a Century of (Cotsen) Children’s Literature.” (photo courtesy of AnneMarie Caballero)

Hi AnneMarie, please tell us about yourself. What would you like our readers to know about you?

I recently graduated from Princeton’s computer science department, although I also focused on English in my coursework. In college, I was a part of our literary magazine, Nassau Literary Review, the Model United Nations team, and I worked for the computer science department as a grader. In my free time, I love to read, cook, and play volleyball (although I’m bad at the last one).

You have written a fascinating senior thesis, which applies computational literary analysis to a corpus of 19th-century children’s literature and delineates large patterns of gender and space representations in them. How did the idea of this project come to you?

The project sprang out of a desire to look at literary history with a more large-scale lens than I previously had. My junior research had examined the role of female authors in the early British novel through semantic vectorization but had a limited scope (seven authors, 35 novels). I hoped with my senior thesis to work with a larger dataset that could more comprehensively address the questions I wanted to ask.

Children’s literature was a good fit for my goal. I initially chose it because I wanted a dataset that I had requisite domain knowledge in, and also because I was hoping to work with Professor William Gleason, whose class on children’s literature I had taken. Further, this focus facilitated working with a larger dataset because children’s literature is defined by its audience. As attitudes towards children change, we see the emergence of the genre (within the English language) in the 1740s, after which works are consistently published for children. This choice of genre ensured that there would be a plethora of works available across the course of the nineteenth century.

My research question focused on the gendered nature of the nineteenth-century literary market, specifically its tendency to treat girls and boys as separate consumers. This question originated because multiple literary scholars that I had read discussed this phenomenon. I wanted to see if it could be detected by topic modeling, and, if so, what are the different topics discussed by girls’ vs. boys’ books?

Tell us about your research process. How long did it take from beginning to end? What part would you describe was the most challenging? Rewarding? Enjoyable?

I started thinking about my senior thesis and its direction the summer before my senior year. I spent the fall proposing a topic, doing initial research, and beginning the creation of my dataset. However, because of my lighter course load in the spring, much of the work was done then. I’m unsure exactly how long it took, but easily in the hundreds of hours.

The most challenging part was defining the methodology for my research question. I wanted it to be flexible enough to be explored through qualitative analysis, but for these observations to be supported by concrete quantitative metrics. Similar research I was looking at focused more heavily on the qualitative, so finding that balance was a struggle.

The most rewarding part was probably when I found out that 112 of my 125 topics were considered statistically significant by gender. It was the last step in the research process before writing my results, and it was incredibly validating to see that my hypothesis (that books written for boys vs. for girls fundamentally featured different topics) was so strongly supported by the dataset and methods.

The most enjoyable part was showing my friends the results, and getting to discuss with them the interesting gender differences that cropped up. There was a lot of joy in getting to share my research with the people who had been there for me throughout the process.

How has this project facilitated your professional growth? Aside from gaining valuable insights into 19th-century children’s literature, what else have you harvested from the project in terms of skills, experience, and understandings?

I can’t emphasize enough how much the project shaped my view of research. One of the major ways that the project facilitated my professional growth was the experience creating the dataset. While I had curated a very small dataset for my junior year research, working on this project entailed a months-long curation process that consistently caused me to question myself and my decisions. In my conclusion, I included some advice for first-time curators, such as documenting your decision process, which allows other users of the data to understand sources of bias.

I also feel much more rooted in the digital humanities as a whole. One portion of my thesis was writing a short history of computational literary analysis, and to a lesser extent, the digital humanities. Familiarizing myself with the debates around the field helped me avoid some of the shortcomings that critics have pointed out about digital humanities research, while also taking advantage of benefits like the ability to look at whole eras of literary history quantitatively.

Further, beyond project specifics, executing such a significant independent project offered so many lessons. I learned about how to scope projects properly, partially because I was certainly too ambitious at the start—I wanted to answer three research questions and only got to one. I had to become comfortable reaching out to anyone who might have the appropriate domain knowledge to answer my questions, resulting in a lengthy acknowledgments section. I learned more about data science and statistical methods, which I had not focused on as much in my coursework.

The impressive corpus of digital texts you have curated may benefit future researchers conducting digital humanities studies. Are there any sample questions you can think of the corpus may help address?

Very much so! As mentioned, I had other research questions I was hoping to get to, but, as the thesis was already 159 pages (before the appendix and references), I ended up cutting them due to scope. The question I was really hoping to explore, but didn’t get to, was about the value of children in the nineteenth century. In her book, Pricing the Priceless Child, Viviana Zelizer explores how, beginning in the nineteenth century, the child gains in sentimental value as their financial value decreases. I was hoping to explore that trend in literature, by linking it to literary trends like the cult of childhood, and examining how much the books in the dataset use a diction of sentiment/emotion vs. a diction of utility.

Beyond that question, which I explored fairly in depth but did not get to apply to the dataset, there are endless new avenues for questions. Especially in the curation process, I regularly stumbled across questions and topics that the dataset could significantly address from discussions of colonialism to the fairy tale subgenre.

Any other aspect of the project I have not asked about and you’d like to share with our readers?

I would be remiss not to talk about how critical a role the Cotsen’s Children Library played in the project. After I decided to look at children’s literature, I needed to find a collection of works that would suit my purpose. I explore this more in my thesis, but no existing corpus met the project requirements. Professor Gleason showed me the nineteenth-century catalogue of the Cotsen Children’s Library as a starting point, and that fundamentally shaped my project.

Out of the 1020 works included in my final dataset, 416 were directly from the catalogue and the other 604 were almost all added because of the collection—works by authors in the collection or that I found while searching for works in the collection. While curation could be exhausting (it required searching the 6000+ works from the catalogue in the HathiTrust Digital Library search bar), it also was an amazing introduction to the variety of children’s literature in the nineteenth century. I often found myself down research rabbit holes, or even at times, just being surprised by the books. In one of the books, Other Stories by E. H. Knatchbull-Hugessen, I read its very long dedication to his armchair. It felt like uncovering a secret history, although one that was often troubling, especially with its treatment of non-European cultures, race and ethnicity, and colonialism.

A one-page dedication to the author’s soothing, non-judgmental comfy chair. In Other Stories, by E.H. Knatchbull-Hugessen ; with illustrations by Ernest Griset. London: George Routledge and Sons, 1880. (Cotsen 2646)

Moreover, when I was feeling particularly tired in the final weeks of my thesis, I stopped by the library, and ended up talking with the staff about my thesis. That memory was hugely encouraging as I finished my thesis and is still one of my favorite memories from my senior year.

Your work won this year’s Senior Thesis Prize from the Center for Digital Humanities. Big congratulations! What is your future career plan like?

Next year, I’m working on the Atlas product, a database-as-a-service, for MongoDB, a tech company in New York. While I loved my research and was lucky enough to be accepted to Cambridge’s MPhil in the digital humanities, I ultimately wanted to take time off from school. I had a really wonderful time interning with the company last summer, and I wanted to experience working full-time for a tech company, especially as I decide if I want to go into tech long-term or explore one of my other interests. I definitely see myself returning to the digital humanities, or more generally to a job at the intersection of tech and culture.

Lastly, since you (distantly) read over a thousand children’s books to conduct your research, please tell us about your childhood reading. Did you have any favorite books or reading material? Any people or places you associate with your early reading?

I actually very recently reread several favorite children’s books! One of my all-time favorites that I think really holds up is Tamora Pierce, particularly her Wild Magic and Circle of Magic series. Her female protagonists are better-written than most of the ones I find in adult literature. There are so, so many other series I love (Little House on the Prairie, Shannon Hale’s books—which I mentioned briefly in the thesis, Nancy Drew, Cornelia Funke’s Igraine the Brave, etc.), but Tamora Pierce’s books are the ones I go back to the most.

For people, obviously my parents and my siblings played a huge part in my reading. Also, my librarians: my elementary school librarian even gave me my school’s copy of Pride and Prejudice when I left because I was the only one who ever checked it out. Oddly enough, the place I associate with early reading is the upstairs hallway in my house. There’s a bookshelf there, and I remember sitting there on the beige carpet for hours, reading a book, and when I was done, just picking another one off the shelf.

Reading Children’s Literature, Fast and Slow

Partly due to the relative scarcity of children’s literature corpora, Caballero’s project is a rare computational literary analysis (CLA) that is implemented upon children’s texts. In the field of digital humanities (DH), “corpus” refers to a digital collection of texts. Having curated a corpus of 19th-century English-language children’s literature herself, Caballero applies the method of topic modeling to tease out the statistical pattern of topics in relation to the gender of protagonists. The strength of Caballero’s outstanding research lies in multiple areas. First, she is not afraid of engaging in controversies about DH and in thorny challenges of children’s literature studies. Second, she makes an impressive contribution to DH by publishing a large corpus of digitized children’s literature, which will benefit future researchers. Thirdly, by firmly grounding her statistical revelations in the concerns and findings of traditional literary criticism, the thesis carefully balances quantitative and qualitative methods, reaching nuanced conclusions that are both supported by large-scale analysis and informed by close reading of canons.

Chapter 1 reviews the history of CLA, which over time has succeeded in applying increasingly sophisticated computational tools such as natural language process to literary studies, processing texts on a scale previously impossible for the solitary researcher. Caballero visits debates around the field of DH and examines, with a fine-tooth comb, critiques that are among those made by its harshest detractors. This would shape the design and process of her project.

Responding to flaws that have been raised about DH scholarship, in Chapter 2 Caballero defines the scope of her data with transparency, meticulously documents how the dataset has been constructed, and makes it readily available through the HathiTrust Digital Library collection system. Caballero used A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (Princeton University Library, 2019) as a guide for building the digital corpus. The catalog, consisting of two tomes that stack up to four inches high, describes over 6300 titles published during the 19th century and having been collected by the late donor, Lloyd E. Cotsen, ’50 and Charter Trustee Emeritus. With what I can only imagine to be mighty Princetonian tenacity, Caballero has gone through all of them, selecting English-language titles that meet her criteria of narrative texts for children.

Caballero used A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (Princeton University Library, 2019) as a guide for locating texts in HathiTrust Digital Library and building the Cotsen Children’s Literature Dataset.

The curation process forced Caballero to wrestle with some of the fundamental questions that are beguilingly simple but fraught with rule-defying exceptions: How do you define literature? (e.g., should primers, ABC books, stories in verse, etc. be included ?) How do you define children’s literature? (e.g., is the presence of a child protagonist a necessary and sufficient criterion? Must children’s books be written with a young audience in mind? What about folk tales and fables, genres that were not produced only for children, but have morphed into classical children’s literature?) By sharing her challenging decision-making process, Caballero hopes to keep future users of her dataset fully informed of the limitations of the corpus.

Caballero was able to locate 416 titles of English-language children’s literature from the catalog that are available in full text in HathiTrust Digital Library (HTDL), a major repository of digital content from research libraries. By conducting author and keyword searches, she added another 604 titles to the Cotsen Children’s Literature (CCL) Dataset. Authors that appear most frequently in the dataset include Horatio Alger (1832-1899), Mary Martha Sherwood (1775-1851), Laura Elizabeth Howe Richards (1850-1943), Mrs. Molesworth (1839-1921), Oliver Optic (1822-1897), Louisa May Alcott (1832-1888), and A. L. O. E. (pseudonym of Charlotte Maria Tucker, 1821-1893) (68).

To prepare the dataset for computational analysis, Caballero then ran the 1000+ works through the BookNLP pipeline for nearly twenty-four hours of intensive analysis. An open-source natural language processing tool, the BookNLP pipeline tracks “all of the characters appearing within a work, the number of times they’re mentioned, the names and pronouns by which they are mentioned” (91), and other entities it is capable of recognizing and tagging in scale.

Chapter 3 describes the computational analysis of the texts in terms of gender and topics. First, based on the annotations generated by the BookNLP pipeline, Caballero determined that 613 titles met the mathematical threshold for having a central protagonist (92-95), all but eleven of them having an identifiable gender, which is treated as a proxy for the gender of the intended audience. However, Caballero is quick to point out that the intended audience does not equate the actual audience: whereas boys tend to read books with male protagonists, girls tend to cross gender boundaries and read about boys and girls (89).

The male-to-female ratio of protagonists in this subset approaches 1.9:1–389 titles with male protagonists versus 206 with female protagonists (99)–an uncanny number that echoes the findings of studies about gender imbalance with other bodies of literature. For example, McCabe et al. (2011) analyzed gender representation in 5,618 children’s books published throughout the 20th century in the United States. Through manual coding, her team of five scholars found a male-to-female ratio of 1.9:1 in title characters, and that of 1.6:1 among central characters. Data also suggested that male protagonists receive more mentions than female ones in the CCL works (101), again a pattern that is consistent with existing scholarship. Underwood et al. (2018) traced 104,000 works of English-language fiction spread over three centuries, from 1703 to 2009. Similarly using BookNLP and relying on HathiTrust Digital Library, they calculated the proportion of words used in describing female characters, and found a steady decline from the 19th century through the early 1960s.

Next, Caballero conducted topic modeling to sort the dataset into 125 clusters, each containing co-occurring words from which a topic or a theme may emerge. Among them, 112 were found to be gendered: 64 topics were more often in stories with male protagonists, and 48 more often in those with female protagonists, suggesting that boys’ topics are 33% more varied than girls’. The rest of the 13 topics were gender-neutral. Caballero presents both macro statistical revelations and in-depth analysis of selected topics.

I thought it would be interesting to pick titles from the Cotsen catalog of the 19th century and test to what extent an individual work reflects large patterns detected by machine. To build up the suspense further, of the five titles I selected from the catalog (largely based on the interest level of illustrations highlighted in the tomes), only one is in the CCL dataset, and four others are not available in HathiTrust, thus not having been “read” by computer programs mathematically.

Topic: Violence/Combat

Word cloud produced by running a statistical natural language processing toolset called MALLET. The topic of this cluster of words is labeled as Violence/Combat. Image courtesy of AnneMarie Caballero.

It should come as no surprise that the topic of violence or combat is found more often in works with male protagonists. One of the Cotsen titles I selected, The Little Deserter; Or, Holiday Sports; An Amusing Tale Dedicated to All Good Boys, epitomizes the strong connection between the topic and an intended boy audience–from the unequivocal dedication to boy readers in its subtitle, to illustrations that portray boys playing soldiers with menacing-looking toys and props.

The Little Deserter; Or, Holiday Sports; An Amusing Tale Dedicated to All Good Boys. Edinburgh: Oliver and Boyd, [1807 or 1808]. (Cotsen 7108)

If you find the scene of execution–in a book published during the Napoleonic Wars–offensively violent for 21st-century sensitivity, you are justifiably feeling so. Here is a spoiler that may be offered as a small solace: Julius, the boy protagonist who has been blindfolded and received the death penalty, bounces back in no time and puts dibs on playing the captain in tomorrow’s game.

Miss Johnston’s name was inscribed on the front pastedown and, as shown here, the front free endpaper of the copy of The Little Deserter. (Cotsen 7108)

What makes the Cotsen copy of The Little Deserter remarkable is that it carries evidence of girls’ expansive reading interests. A “Miss Elizabeth Johnston,” likely a former owner/reader of the book, inscribed her name twice in it. As quoted in Caballero (87), Kimberley Reynolds attributes the appeal of boys’ books for girl readers to the fact that books deemed suitable for young ladies were frequently unexciting tales for cultivating good behavior.

Topic: Island (Stranding)

The topic of this cluster of words is labeled as Island (Stranding). Word cloud courtesy of AnneMarie Caballero.

“Island” is a frequent word found in multiple topics that range from Island (Stranding) to Boats (Stranding, Shipwreck) and Nation (Nationalism), linking to the traditional boys’ adventure story as well as the historical subject of colonial conquest (139-140). Both literary criticism and Caballero’s computational study confirm a gendered landscape in children’s literature, which contrasts the feminine home with the masculine away, and excludes boy characters from the home and girl characters from the away (147). The dichotomy, however, is complicated by what is referred to as “adventurous domesticity” (142), whereby protagonists attempt to reconstruct domesticity while stranded.

“Adventures of Robinson Crusoe” in The Robinson Crusoe Picture Book. George Routledge and Sons, [not after 1873]. (Cotsen 152150)

The masculine pursuit for “home away from home” is well reflected in “Adventures of Robinson Crusoe,” a short, illustrated verse story based on Daniel Defoe’s novel. The titular castaway builds a thatched and fenced house that he can call his little home, makes furniture and clothes (he is pictured as putting finishing touches to an umbrella), keeps company with his dog and cats, and domesticates a young goat and a parrot he has found on the island.

Johnny Headstrong’s Trip to Coney Island. McLoughlin Bros., 1882. (Cotsen 540)

Of the five titles I selected, Johnny Headstrong’s Trip to Coney Island is the only one that is included in the CCL dataset, thanks to digitized copies contributed by member university libraries to HathiTrust. In this verse story, Johnny’s family takes a trip to Coney Island beach. Even though his sister Sue has also joined the outing, she is rarely mentioned. At one point she is described as sitting on the wooden horse of a carousel “like a lady,” i.e., side-saddling. Johnny is the protagonist and remains the center of attention (and chaos) by getting into a nonstop series of scrapes, departing the island with bandages over his nose and cheek at the end of the day. Johnny Headstrong’s adventure seems to be a quintessential bad boy’s tale, having packed into its 20 pages so many of boys’ topics on Caballero’s list (115-6): Movement, Body of Water, Boats, Injury, Donkeys, Animal, to name a few.

“Painful Emotion (Death)” is high on the list of girls’ topics (117, 121). Word cloud courtesy of AnneMarie Caballero.

The pattern of gendered topics does not mean that a boy’s tale is devoid of all topics that are statistically prominent in girl’s stories, and vice versa. Caballero conducts a case study with two of the best-known girls’ adventure stories, Alice’s Adventures in Wonderland and Alice Through the Looking-Glass, and finds a good portion (a quarter and nearly a half respectively) of the top 20 topics in each work are boy’s topics, such as Injury, Water, and Animal (143). Likewise, both Robinson Crusoe and Johnny Headstrong have their emotionally vulnerable moments, described in words that are frequently found in the topic Painful Emotion (Death), which is statistically a girls’ one. A forlorn Robinson Crusoe sometimes grows “very sad,” “cries aloud,” weeps “like any child,” thinks of his father and mother, and prays to God “with many tears” (“Adventures” 1-2). Johnny’s adventure begins as he tumbles overboard, is fished out of the water, and cries as he is sent to the engine-room to dry beside the furnace fire. In one episode, he slips away and loses his Papa and sister Sue, then begins “to cry,” “big tears” running down his chubby face (Johnny Headstrong’s). In another, he accidentally strikes a boy hard with a ball and, thinking the boy would surely die, sobs with “childish fright.” Towards the end he falls off a swing, and adults have to sooth “his sobs and groans.” It is tempting to ask if there might be any correlation between how broadly appealing a children’s story is and how inclusive the work is in encompassing gendered topics.

Girls, Domesticity, and Travel

“Confidential People” in The May Blossom, or, the Princess and Her People. Illustrations by H.H. Emmerson; verses by Marion M. Wingrave. London: Frederick Warne and Co., [1881]. (Cotsen 9380)

Caballero recognized that illustration is an essential element of the Cotsen collection, because of Lloyd Cotsen’s “passion for illustrated works that help children become independent readers” (Immel, quoted in Caballero 54). Her computational analysis handles only texts that have been OCRed, thus the machine has missed about half of the fun of perusing the Cotsen collection! The May Blossom, a collection of short verses, presents an intriguing case of what machine manages not to miss in spite of its singular focus on texts. In one of the entries “Confidential People,” a first-person “I” shares a secret with a second-person “you”–there is no textual description of the setting of the story. In the accompanying illustration, the two characters are seated in an intimate, ornate space, surrounded by objects that well match the most frequent keywords of the topic Domestic Space, one that is found more often in works with female protagonists.

The most frequent words in the cluster for the topic Domestic Space include room, table, chair, and sit (161). Word cloud courtesy of AnneMarie Caballero.

The narrator confides that she plans to marry “a sweet little beau” and to take a honeymoon by “a coach and six horses” to Lilliput Land next year. It is a striking contrast how a story that hints at an exciting trip to the faraway fantasy land is visually represented by two girls confined in a stuffy room, a setting that is mentioned nowhere in the text. Travel (Driving, Carriage) turns out to be one of the gender-neutral topics (114), meaning it is as likely to appear in stories with a male protagonist as a female one. How does it square that domestic space is tied to girls’ stories, yet travel is not ? In “Confidential People,” the girl’s narrative about travel is firmly grounded within approved gender roles. The endearingly amusing verse both adores the young narrator’s childish innocence and models an aspiration for marriage that leads to the fulfillment of traditional womanhood.

“Johnny’s First Motor Ride” in Little Tots Holiday Book: With Numerous Coloured Plates and Other Illustrations. London; New York : Frederick Warne & Co. (Cotsen 30357)

A close reading of another story that fits the topic of Travel (Driving, Carriage) invites us to consider what it means to be the central character of a story, and circles back to gender imbalance in terms of the count of female versus male protagonists as well as the proportion of words devoted to each gender. In “Johnny’s First Motor Ride,” the titular character receives a real little motor-car from his father and soon learns how to “control it with ease.” With a bonneted baby deposited in the passenger seat–possibly against the baby’s will, judging from his/her facial expression–he goes out for a ride[1]. After trying to abruptly avert a collision with Margery’s goat-chaise, however, Johnny finds his car stuck. It is at this point, where the story has run two-thirds of the way towards the end, that attention swerves to Margery. Described by her father as “a real clever little woman,” Margery is sympathetic, helpful, and resourceful. Even though it is not her fault that Johnny’s car malfunctioned, she does not abandon the stranded novice motorist. She sets to work “to harness the damaged motor-car to the goat-chaise,” which is pulled by “Nanny” the goat, and coaxed the hoofed “engine” to tow the modern vehicle home. “That was a real triumph for Nanny!”–the story concludes with the exclamation.

Whether by its title “Johnny’s First Motor Ride” or by the amount of text devoted to Johnny, the protagonist of the story is apparently a boy–to machine’s mathematical “mind” at least. I can’t predict how a human reader interprets who is the central character of the story. Margery clearly shines with what she has done, even though she doesn’t receive the most mention in the story. That the credit for the successful rescue act should go to the goat implicitly imparts a self-effacing virtue expected from females. The girl character is sidelined even in a story where she is not the damsel in distress but the heroine who saves the day.

Caballero’s computational analysis of a sizeable body of 19th-century English language children’s literature reveals a gendered landscape, tethering female characters to the domesticity and the inward, freeing male characters to the wider world away from home, enlarging the gap between endorsed masculine and feminine behavior, and bundling implicit morals and values for each gender. She brings rich complexity into her project by tracing how a large-scale analysis of over a thousand works agrees with or departs from findings based on traditional literary criticism of a limited number of canons. It is a testament to the robustness of her study that, for the five titles from the Cotsen collection–only one of which available in the dataset–the patterns still hold true and help us gain fresh insights into these dusty volumes.


[1] Johnny and his father break all the modern government regulations for driving an automobile. There is no publication date on the book. Let’s assume the story was written soon after the invention of the first automobile in 1886, before the driver’s license began to be implemented by the end of the 19th century, or before the age restriction was first introduced in Pennsylvania in 1909.


Caballero, AnneMarie. Gendered Topics: Boyhood and Girlhood in a Century of (Cotsen) Children’s Literature, Princeton University, 2023.

Mccabe, Janice, et al. “Gender in Twentieth-Century Children’s Books: Patterns of Disparity in Titles and Central Characters.” Gender & Society, vol. 25, no. 2, 2011, pp. 197-226.

Underwood, Ted, David Bamman, and Sabrina Lee. “The Transformation of Gender in English-Language Fiction.” Journal of Cultural Analytics, vol. 3, no. 2, 2018.


Datasets Curated by AnneMarie Caballero (the exact scope of each dataset is detailed on page 65 of her thesis):

The Cotsen Children’s Literature (CCL) Dataset (1021 items as of July 2023) [URL]

  • The subset of titles as found in A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (416 items) [URL]
  • The subset of titles as found outside A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (605 items) [URL]

Titles as found in Cotsen’s catalog but excluded from the CCL Dataset (123 items) [URL]

GitHub repository of the cleaning script for the CCL Dataset [URL]

(Edited by Andrea Immel)