Who is Hua Mulan?

So you think you know who Mulan is? Perhaps you know the feisty girl from the eponymous cross-dressing warrior of the 1998 Disney animated film Mulan. She is the rebellious teenager who escapes the suffocating social expectations for a maiden and heads to the battle zone, where she finds peace with who she is. Or, if you are a Chinese speaker, you may have first learned about the weaver-turned-soldier from the “Ballad of Mulan,” the lyrics of a folk song first preserved in writing in as early as the sixth century. In the memorable rhyming text she is the filial and brave daughter who is determined to shield her aging father from a perilous military life.

Mulan’s story is included in an advertisement booklet titled Women’s Twenty-Four Filial Exemplars in Color Pictures 女子二十四孝彩圖, published by a pharmaceutical company in Shanghai in 1941. Whereas the historic figures featured in the classic Twenty-Four Filial Exemplars were nearly all male, the booklet focuses on young Chinese girls’ and women’s filial piety. The caption emphasizes that when Mulan returns home after serving eleven years in the army, she is “apparently still a virgin” (page 7). The facing page advertises fish liver oil, said to have ingredients supplied by an American vitamin company. In Nü zi er shi si xiao cai tu. Shanghai: Xin Yi Pharmaceutical Company, 1941. (Cotsen 75832)

China’s Bravest Girl: The Legend of Hua Mu Lan, told by Charlie Chin 陳建文; illustrated by Tomie Arai 新居富枝; Chinese translation by Wang Xing Chu 王性初. Emeryville, CA: Children’s Book Press, 1993. (Cotsen 17732)

Have you ever wondered, however, what kind of Chinese girl Mulan was? Weren’t women in ancient China supposed to have their feet bound? How could Mulan have gotten away from the crippling practice? Was Mulan’s family rich or poor–and does it matter? Did Mulan really grow up in those circular communal buildings portrayed in Disney’s live-action adaptation of 2020? If not, where was her hometown?

The Legend of Mu Lan: A Heroine of Ancient China, written and illustrated by Jiang Wei 姜巍 and Gen Xing 根兴. Monterey, CA: Victory Press, 1992. (title page) (Cotsen 13496)

“Hua Mulan,” text by Haimo, illustrated by Alang Illustrations Studio. Serialized in Xiao pi pa 小枇杷, 2013, no. 1, a magazine for young learners of the Chinese language in North America. (page 8) (Cotsen 153521)

Long before inspiring Disney films, the legend of Mulan was already being dramatized in plays and novels, retold in comic books and picture books, and reenacted in Chinese movies and television series as far back as the sixteenth century. Mulan’s malleability does not stop at changing her outfit and camouflaging her biological sex to enlist her service at a time of crisis. In adaptation after adaptation, the heroine amasses a growing inventory of virtues and qualities, from filial piety to chastity, bravery, superb combat skills, military acumen, humbleness, loyalty, patriotism, and selflessness. She also subtly shifts other dimensions of her identity at the service of messages embedded in the next new iteration. Let’s take a closer look at the girl in disguise.

What about footbinding?

Mulan’s military service is all the more extraordinary against the backdrop of what we assume to be a Confucian Chinese society that confines women within the domestic sphere. What if Mulan was not molded from the prototype of an ethnic Han girl immersed in the doctrine of Confucianism? Scholarship suggests that the origin of the character Mulan was foreign to the Chinese-speaking Han people (Dong 2011, 53), China’s largest ethnic group which got its name from the Confucius-worshipping Han dynasty (202 BCE-220 CE). The ballad can be traced to the tradition of northern nomadic tribes, whose women were skilled at horse riding and archery (53). Active in the Mongolian area–to the north of the Great Wall–the nomads ran into frequent military conflicts with the Han empire, which collapsed in the year 220. The Tuoba clan of the Xianbei nomads, speakers of an Altaic language, eventually migrated south, established the Northern Wei in 386, and ruled northern China. The Xianbei rulers implemented Sinicization and assimilation policies in the fifth century, ushering in a period of cultural hybridity and social blending of Han and non-Han people that are subtly reflected in the lyrics (Dong 2011, 57; Millward 2020).

The ballad is likely set during the Northern Wei period (386-534), which predates the prevalence of footbinding among elite Han Chinese women of the Song dynasty (960-1279) for centuries. Mulan’s ruler is referred to by the Altaic title “Khan” 可汗 and the Chinese term “emperor” 天子 interchangeably. Philologist Sanping Chen (2012) proposes that “Mulan” (meaning “magnolia” in Chinese) is a Sinicized Tuoba word referring to “a large male cervid” (59) that includes stag, bull, and even unicorn—possibly explaining why, in the ballad, Mulan does not change her name (as she does in Disney films) and raises no questions among her fellow soldiers and Khan. Mulan’s capability of transitioning from a weaver to a soldier can be rationalized by the cultural mishmash of her time. That she begins in the ballad working at a loom reflects the influence of Han cultural expectations on women’s domestic role (Dong 2011, 57-58); yet she seems to adapt to military life smoothly, as a Xianbei woman might.

A portrait of Xu Wei (徐渭, 1521-1593), who wrote the first prominent adaptation of the Ballad of Mulan into a two-act play. (Wikimedia.org)

Chinese adapters of the ballad may have been aware of Mulan’s debatably uncertain ethnic background, but progressively shed non-Han elements from her identity and reshaped the story as one about Han people battling against invading nomadic tribes. In the first prominent dramatization of the ballad into a two-act play by Xu Wei during the sixteenth century, the protagonist introduces herself as a descendant of a prestigious military family from the Western Han dynasty–anachronistically, she has bound feet (Xu 1984, 44-45), necessitating a suspension of disbelief in her physical mobility on the battleground. Furthermore, Xu gives her the family name Hua (44), thereby transforming Mulan (“stag/bull”) into Hua Mulan (“flower magnolia”), a conventionally female name in Chinese. She enlists under the disguise of her father’s name, yet her feminine looks nonetheless draw fellow draftees’ immediate erotic attention (47).

Later adaptations invariably portray Mulan as a Han girl. During the first half of the tumultuous twentieth century, when China faced chronic military threats particularly from Imperial Japan, Han-centric narratives co-opted the filial daughter as a patriotic, self-sacrificing role model fighting a just war to defend the nation-state (Edwards 2016, 19).

Mulan Joins the Army 木蘭從軍, illustrated by Wang Shuhui. Beijing: Zhao Hua Fine Arts Publishing House, 1953. Third edition. (page 2) (Princeton University Library 5797/1126)

Caption: All of a sudden, the northern Tujue king led his troops and horses into Chinese territory, committed acts of rape, plunder, and all sorts of atrocities, resulting in many Chinese deaths and injuries. Men, women, the old, and the young were displaced and lost their homes.

Wang Shuhui (王叔晖, 1912-1985), one of the few female lianhuanhua illustrators in twentieth-century China. In Wang Shuhui, an Enduring Giant and a Master of Chinese Painting 巨擘传世: 近现代中国画大家王叔晖, by Zhao Deyang. Beijing: Gaodeng jiaoyu chubanshe, 2018. (page 10) (Marquand Library ND1049.W3647 Z436 2018)

In Mulan Joins the Army (1953), an adaptation in lianhuanhua form (akin to comic books) illustrated by Wang Shuhui, the heroine is explicitly named as a member of the “Han army” (Wang 1953, 30 & 32) fighting the invading Tujue nomads from the north, who are described as having inflicted great trauma on Chinese people (2). Published at a time when the Sino-Japanese War (1937-45) was still clear in the rearview mirror, the indictment against the fictional Tujue recalls the Japanese wartime atrocities. The lianhuanhua is based on a Peking opera play with the same title written during the war by Ma Shaobo, a Chinese Communist Party official in charge of war propaganda (Wang 2001, title page). Ma took inspiration from multiple sources, including an earlier Peking opera play performed by the famed actor Mei Lanfang in 1912.

A cross-dressing Mulan as played by Mei Lanfang in Peking opera, premiered in March 1912. In Mei Lanfang 梅兰芳, edited by Liu Shaowu. Beijing: Beijing chu ban she, 1997. (page 59)
Since traditional Peking opera had a long history of using an all-male cast, there was a double gender-twist to Mei’s performance: he was a male actor playing the female role of Mulan on the stage, who is cross-dressing as a male in the story. Mei’s play was hugely influential, inspiring Ma Shaobo’s adaptation, which subsequently serves as the basis of retellings in a Henan opera, a Huangmei opera, and lianhuanhua in the second half of the 20th century.

Although Mulan has consistently been re-imagined as a Han girl, some subtle inclusive changes can be detected in adaptations in the 1980s. In a version illustrated by Wang Zhongqing (1984), the text retains the emperor’s Altaic title “Khan,” and the image portrays the ruler with a prominent aquiline nose and eyes set somewhat deeper than for other characters—a slight symbolic hint of non-Han facial features and a nod to China’s long history of ethnic multiplicity.

Mulan’s ruler Khan as depicted in The Poem of Mulan 木兰辞, illustrated by Wang Zhongqing. Shanghai: Shanghai People’s Fine Arts Publishing House, 1984. (partial image on the 19th panel)

The enemies, a tribe from northwestern China, on horseback, waging a battle at the border of the Great Wall. In Mulan Joins the Army 木兰从軍, text by Yu Peiming, illustrated by Xiang Weiren. Shanghai: Juvenile & Children’s Publishing House, 1983. First edition. (cover and page 1) (Cotsen S-000381)

Another version, illustrated by Xiang Weiren (1983), eventually absorbs the invading nomads into the expanded Chinese identity, thereby transforming the Chinese-against-foreigner conflict into a domestic conflict. In his version of the story, the enemies are a tribe from present-day “northwestern China” (1), who, after their defeat, pledge allegiance to a benevolent Chinese emperor. The peaceful resolution folds ethnic minorities into the Chinese empire, and is aligned with the People’s Republic of China’s orthodox political ideal of integrated, harmonious inter-ethnic relationship accomplished by a powerful, paternalistic central government.

To sum up, the Ballad of Mulan originated as a product of cultural exchange and amalgamation, but in adaptations by Chinese writers, the heroine is divested of her non-Han origin and participates in a Han-dominant narrative that assigns ethnic minorities as “other,” foreign, and invasive. It was not until the 1980s that adaptations attended to the non-Han as part of a multi-ethnic China, three decades after the PRC touted the ethnic unity project in the 1950s.

Is Mulan’s family rich or poor?

The ballad does not articulate Mulan’s socioeconomic background, but clues abound: her family owns a loom and livestock, and is able to purchase a horse and riding equipment. In Xu Wei’s dramatization, Mulan’s family owns a servant girl. Beginning in the 1950s, whether Mulan is portrayed as being rich or poor carried political significance under the Communist regime. A concern for Mulan’s class background can be detected in Ma Shaobo’s foreword for his play. A seasoned Party member, Ma would be fully conversant with the Communist class theory and familiar with the land redistribution campaigns that targeted rich land owners. His foreword introduces Mulan as coming from a “farming family” 农家 (Ma 1949, 1), implying a humble background but stopping short of declaring her poor. As the script reveals, Mulan, though never labelled as being well-off, is more likely to be from a property-owning family. As she educates a less enthusiastic fellow recruit, to defend one’s country is congruent with the private interest of protecting the safety and property of their own families. If everybody refuses to fight as a soldier, “by the time the enemy takes the world, we would lose our country and our homes! We can’t even protect our ancestral graves, farming fields, and gardens 田园, much less spend time with children and grandchildren,” Mulan points out (22; emphasis mine).

Ma did not author the second sentence, but apparently lifted it verbatim from the script of Mei Lanfang’s stage performance (Zhui yu xuan 1922, 16). In spite of Ma’s class consciousness, his adaptation preserves the economic status of Mulan’s family, because it fosters a motive to join the army by appealing to the audience’s self-interest. The play would not risk alienating members from any class that had the capacity and motivation to contribute to the war effort. The Party redirected attention to “class struggle” only after the defeat of the Japanese (Jackal 1981, 107). Literature and media from the later Mao era gradually sharpened a dichotomy between the virtuous poor peasants and proletariat and the evil rich landowners and capitalists, bolstering the moral justification for punishing the latter.

In fact, a 1950s version of Mulan presented her family as being even more affluent. This time, her story was reenacted in a Henan opera play during the Korean War. Chen Xianzhang, the main writer of the script, and Chang Xiangyu, the lead singer actress who played Mulan, were a married couple. After the People’s Volunteer Army (PVA) of China joined the Korean War and suffered setbacks in June 1951, Chang formed an ambitious plan to donate a fighter aircraft to the PVA. She succeeded in collecting enough money to purchase a MiG-15 jet fighter by touring China between 1951 and 1952, performing the extremely popular Hua Mulan on stage up to 120 times (Jing 2000, 10). The Henan opera play makes a stronger suggestion that Mulan is identified more with the property-owning class than with the destitute poor. She reasons with a fellow recruit who complains about the unfairness of having to leave his parents, wife, and children behind:

There are tens of thousands of soldiers and generals on the frontier. Who do not have elders, children, farming lands, estates, and homes 田产家园? If all are attached to their homes and refuse to fight, the fire of war would have been burning at our doorsteps. (Chen and Wang 1954, 15; emphasis mine)

The phrase “farming lands, estates, and homes” expands upon the “farming fields and gardens” in Ma’s version and connotes even more wealth. Given the priority of fundraising for the Korean War, it makes sense that the script writers would see no conflict between the moneyed class and contribution to the war.

“Who do not have elders, children, farming lands and estates?” the titular character Hua Mulan (played by Chang Xiangyu 常香玉) sings in the 1956 movie version of the Henan opera. (YouTube.com)

In illustrated adaptations of these plays, Mulan’s socioeconomic class is never specified verbally. However, even though the text is noncommittal about this dimension of her family background, visual artists may have to commit to a firmer choice in depicting people and their material environment. Their illustrations show a comfortable home in accordance with what is implied in the source text. The characters’ clothing shows no patchwork, which is a common visual shorthand for poverty in lianhuanhua and the wider visual culture of Maoist China. In multiple versions Mulan either reads the military post that summons her father to the battlefront, or, as in Wang (1953), writes a letter to the marshal explaining her request for leave (38). Being female and literate in China would put Mulan among the privileged who had the wherewithal and willingness to give daughters an education.

In Mulan Joins the Army, illustrated by Wang Shuhui, Mulan is shown dressed as a scholar (page 38). (Princeton University Library 5797/1126)

After coming home and changing back into maiden’s clothing, Mulan meets her fellow soldiers in a living room that exudes affluence. In Mulan Joins the Army 木蘭從軍, illustrated by Liu Danzhai. Shanghai: Shanghai People’s Fine Arts Publishing House, 1955. (page 43) (Columbia University Library 5237.49 4942)

The most remarkable tension concerning Mulan’s economic status arises in Liu Danzhai’s (1955) picture book. Half of the scenes are set in Mulan’s home, showing a capacious dwelling tastefully furnished—including a day bed inset with large, gray-veined white marble panels (39)—and artistically decorated with painted screens, potted plants, and wooden stands made from carved tree roots (37, 43). Her house is skirted by a corridor with red latticed railings, looking out to a serene garden replete with banana leaves and other lush greenery (7, 41). Somewhere in the back the family have pens for a fat pig and a stubborn sheep, to be slaughtered to celebrate their daughter’s homecoming (35). As an artist trained in classical Chinese painting, Liu visualized in meticulous detail the idealized high culture of a rural gentry home from an unspecified bygone era.

The trouble was that Liu had to wrestle with an incongruence between his interpretation of Mulan’s family background and that made in the paratext of his picture book, which was prefaced by the historian and folklorist Gu Jiegang 顾颉刚 (1893-1980). It is unclear if the painter had access to the preface during his creative process. If he did, then he apparently chose not to bend his mind to the historian’s view, but made a small compromise. Gu prefaces the picture book with an analysis of the Ballad of Mulan, and opines that the poem reflects how the oppressive ruling class of the Western Wei dynasty forced civilians to join the army, imposing hardship especially on the elderly and the poor (Liu 1955, 3). Liu has instead portrayed what appears to be a financially solid family sending their daughter to the war, except that in Mulan’s otherwise impeccable dwelling, the pink outer layer of an exterior wall of her loom room is in disrepair at a corner, exposing several gray bricks. Is this small defect the artist’s compromise with the historian’s interpretation? It looks almost as though Liu added the detail as an afterthought, a gesture to meet Gu’s concern for the poor halfway, but this surmise will have to be falsified/confirmed by consulting the artist and publisher’s archives.

The exterior wall of Mulan’s loom room is shown as exposing gray bricks, the only unkempt corner in an otherwise impeccable dwelling. In Mulan Joins the Army, illustrated by Liu Danzhai. (page 5) (Columbia University Library 5237.49 4942)

Where is Mulan from?

Fujian Tulou, a UNESCO World Heritage Site, photographed by Song Xiang Lin (unesco.org)

A young Mulan chases a hen up the rooftop of the tulou. In Mulan, directed by Niki Caro. Disney, 2020. (YouTube.com)

In Disney’s live-action version Mulan (2020), the girl’s family lives in a circular building large enough to house an entire clan. Called tulou (earth buildings), such structures are the signature dwellings of Hakka Han Chinese communities, most often found in rural Fujian, China. The unique architectural style adds to the visual interest and cultural curiosity of the film, and the intended defensive function of tulou ties to the plot of the story. Does that setting make Mulan a local of Fujian Province?

As it turns out, Mulan is just as fluid when it comes to her birthplace. The original ballad cites vague geographical markers such as the Yellow River and Yan Mountain, making it difficult to pin down where the story is set (as well as giving interpreters liberty to choose). Not only has Mulan been adopted as a fictional Han girl, the popular filial daughter has also been naturalized as a historical figure, proudly claimed as a native in various regions, and recorded in local gazetteers since the Song Dynasty (960-1279) (Dong 2011, 87).

Mulan is depicted as a local of Yan’an, Shaanxi Province, wearing a wool head wrap distinct to the area. In Mulan Joins the Army 木蘭從軍, by Ma Shaobo. Shanghai: Shang za chu ban she, 1953. Revised edition. (title page) (East Asian Library 5715/7293)

In the preface to his play, Ma Shaobo (1953) names at least seven locations that, to his knowledge, had been claimed as “hometowns” of Mulan (2). The place Ma selected—Shangyi Village, Yan’an Prefecture, Shaanxi Province—was a convenient choice. The play was first written in 1943 and performed in Communist-controlled areas to “boost morale” (1953, 1). Associating Yan’an, the then headquarters of the Chinese Communist Party, with the heroine’s birthplace lent the seat of the Communist proto-state the positive light of a virtuous historical celebrity; it made the local daughter, who models patriotic behavior on a par with the standards of a Communist army soldier, more relatable to the audience. Having served its war mobilization tasks in that region, Yan’an eventually faded from later adaptations, in which Mulan is not tied to any particular locale. She is simply a Han Chinese girl and shares her identity with a much wider range of readers across China.

I Am Mulan 我是花木兰, one of the latest retellings of the Ballad of Mulan in Chinese picture books, written by Qin Wenjun 秦文君, illustrated by Yu Rong 郁蓉. Beijing: China Children’s Press & Publication Group, 2017. Available in Swedish edition Jag är Hua Mulan (Hjulet, 2021) and English edition I Am Mulan (Balestier Press, 2023), translated by Anna Gustafsson Chen and Helen Wang respectively.

Throughout the 1500 years during which Mulan’s story has been disseminated in oral, written, visual, and performance cultures, from possibly a nomadic tribe to a Chinese context to a transnational stage/screen, the girl in disguise has proved to be more pliant than required simply for the feat of cross-dressing. At the core of a Mulan in perennial transformation is a persistent readiness to offer her service. In the story, she is ready to change her appearance and to feign a male identity for the service of her family or country. In the omnipotent hands of storytellers, different dimensions of her identity are altered, be it her ethnicity, hometown, or class background, for the service of particular agendas, like boosting morale among the audience, modeling patriotism for readers, raising funds for the war effort, gracing political images, or increasing the box office profits.

References

Chen, Sanping. 2012. Multicultural China in the Early Middle Ages. Philadelphia: University of Pennsylvania Press.

Chen, Xianzhang 陈宪章 and Wang Jingzhong 王景中. 1954. 花木兰: 豫剧 [Hua Mulan: a Henan Opera]. Xi’an: Chang’an shudian.

Dong, Lan. 2011. Mulan’s Legend and Legacy in China and the United States. Philadelphia: Temple University Press.

Edwards, Louise. 2016. Women Warriors and Wartime Spies of China. Cambridge: Cambridge University Press.

Jackal, Patricia Stranahan. 1981. Changes in Policy for Yanan Women, 1935-1947. Modern China 7 (1): 83-112.

Jing, Hua 荆桦. 2000. 常派艺术的铺路石: 我所知道的陈宪章先生 [The Paving Stone for the Chang School of Henan Opera Art: Mr. Chen Xianzhang as I Know Him]. Dongfang Yishu (02): 10-12.

Liu, Danzhai 刘旦宅, illus. 1955. 木兰从军 [Mulan Joins the Army], first edition. Shanghai: Shanghai renmin meishu chubanshe.

Ma, Shaobo 马少波. 1949. 木兰从军 [Mulan Joins the Army]. Shanghai: Xinhua shudian.

Ma, Shaobo 马少波. 1953. 木兰从军 [Mulan Joins the Army], revised edition. Shanghai: Shangza chubanshe.

Millward, James. 2020, September 25. “Mulan: More Hun than Han.” Los Angeles Review of Books: China Channel, accessed June 6, 2024, https://chinachannel.lareviewofbooks.org/2020/09/25/mulan-xinjiang/

Wang, Shuhui 王叔晖, illus. 1953. 木兰从军 [Mulan Joins the Army], third edition. Beijing: Zhaohua meishu chubanshe.

Wang, Shuhui 王叔晖, illus. 2001. 木兰从军 [Mulan Joins the Army], text by Yang Ying 杨英, first edition. Beijing: Renmin meishu chubanshe.

Wang, Zhongqing 王仲清, illus. 1984. 木兰辞 [The Poem of Mulan], first edition. Shanghai: Shanghai renmin meishu chubanshe.

Xiang, Weiren 项维仁, illus. 1983. 木兰从军 [Mulan Joins the Army], text by Yu Peiming  俞沛铭, first edition. Shanghai: Shaonian ertong chubanshe.

Xu, Wei 徐渭. 1984. 四聲猿: 歌代嘯 [Four Cries of a Gibbon: Songs in Place of Howls], first edition. Shanghai: Shanghai guji chubanshe.

Zhui yu xuan 綴玉軒. 1922. 木兰从军: 梅兰芳秘本 [Mulan Joins the Army: Mei Lanfang’s Private Script]. Hongkong: Xianggang tonglehui. https://hdl.handle.net/2027/uc1.b3961359

Acknowledgement

Thanks go to Dr. Lena Henningsen (University of Heidelberg) and Dr. Emily Graf (University of Tübingen) for their insightful comments on the first draft of this post; and to Dr. Helen Wang, on her invaluable editing and feedback on my second draft!

Thank Columbia University Library for making its copy of Mulan Joins the Army illustrated by Liu Danzhai, a full-color picture book rare for Chinese publishing of the 1950s, available for interlibrary borrowing.

Reading Gender in Children’s Literature Mathematically: An Award-winning Thesis

For her senior thesis, AnneMarie Caballero ’23 went through more than a thousand children’s books published during the 19th century and analyzed the pattern of topics in relation to the gender of protagonists. Titled “Gendered Topics: Boyhood and Girlhood in a Century of (Cotsen) Children’s Literature,” her project won Princeton’s Center for Digital Humanities (CDH) Senior Thesis Prize of 2023.

We are delighted that Caballero answered our interview questions about her research. An overview of her thesis, which gives us an opportunity to examine Cotsen’s collection from a new perspective, follows the interview. Caballero’s thesis will be made available via the repository of Princeton’s undergraduate senior theses and retrievable from the library online catalog in Fall 2023.

Interview

AnneMarie Caballero

AnneMarie Caballero ’23, winner of the Center for Digital Humanities Senior Thesis Prize for her project titled “Gendered Topics: Boyhood and Girlhood in a Century of (Cotsen) Children’s Literature.” (photo courtesy of AnneMarie Caballero)

Hi AnneMarie, please tell us about yourself. What would you like our readers to know about you?

I recently graduated from Princeton’s computer science department, although I also focused on English in my coursework. In college, I was a part of our literary magazine, Nassau Literary Review, the Model United Nations team, and I worked for the computer science department as a grader. In my free time, I love to read, cook, and play volleyball (although I’m bad at the last one).

You have written a fascinating senior thesis, which applies computational literary analysis to a corpus of 19th-century children’s literature and delineates large patterns of gender and space representations in them. How did the idea of this project come to you?

The project sprang out of a desire to look at literary history with a more large-scale lens than I previously had. My junior research had examined the role of female authors in the early British novel through semantic vectorization but had a limited scope (seven authors, 35 novels). I hoped with my senior thesis to work with a larger dataset that could more comprehensively address the questions I wanted to ask.

Children’s literature was a good fit for my goal. I initially chose it because I wanted a dataset that I had requisite domain knowledge in, and also because I was hoping to work with Professor William Gleason, whose class on children’s literature I had taken. Further, this focus facilitated working with a larger dataset because children’s literature is defined by its audience. As attitudes towards children change, we see the emergence of the genre (within the English language) in the 1740s, after which works are consistently published for children. This choice of genre ensured that there would be a plethora of works available across the course of the nineteenth century.

My research question focused on the gendered nature of the nineteenth-century literary market, specifically its tendency to treat girls and boys as separate consumers. This question originated because multiple literary scholars that I had read discussed this phenomenon. I wanted to see if it could be detected by topic modeling, and, if so, what are the different topics discussed by girls’ vs. boys’ books?

Tell us about your research process. How long did it take from beginning to end? What part would you describe was the most challenging? Rewarding? Enjoyable?

I started thinking about my senior thesis and its direction the summer before my senior year. I spent the fall proposing a topic, doing initial research, and beginning the creation of my dataset. However, because of my lighter course load in the spring, much of the work was done then. I’m unsure exactly how long it took, but easily in the hundreds of hours.

The most challenging part was defining the methodology for my research question. I wanted it to be flexible enough to be explored through qualitative analysis, but for these observations to be supported by concrete quantitative metrics. Similar research I was looking at focused more heavily on the qualitative, so finding that balance was a struggle.

The most rewarding part was probably when I found out that 112 of my 125 topics were considered statistically significant by gender. It was the last step in the research process before writing my results, and it was incredibly validating to see that my hypothesis (that books written for boys vs. for girls fundamentally featured different topics) was so strongly supported by the dataset and methods.

The most enjoyable part was showing my friends the results, and getting to discuss with them the interesting gender differences that cropped up. There was a lot of joy in getting to share my research with the people who had been there for me throughout the process.

How has this project facilitated your professional growth? Aside from gaining valuable insights into 19th-century children’s literature, what else have you harvested from the project in terms of skills, experience, and understandings?

I can’t emphasize enough how much the project shaped my view of research. One of the major ways that the project facilitated my professional growth was the experience creating the dataset. While I had curated a very small dataset for my junior year research, working on this project entailed a months-long curation process that consistently caused me to question myself and my decisions. In my conclusion, I included some advice for first-time curators, such as documenting your decision process, which allows other users of the data to understand sources of bias.

I also feel much more rooted in the digital humanities as a whole. One portion of my thesis was writing a short history of computational literary analysis, and to a lesser extent, the digital humanities. Familiarizing myself with the debates around the field helped me avoid some of the shortcomings that critics have pointed out about digital humanities research, while also taking advantage of benefits like the ability to look at whole eras of literary history quantitatively.

Further, beyond project specifics, executing such a significant independent project offered so many lessons. I learned about how to scope projects properly, partially because I was certainly too ambitious at the start—I wanted to answer three research questions and only got to one. I had to become comfortable reaching out to anyone who might have the appropriate domain knowledge to answer my questions, resulting in a lengthy acknowledgments section. I learned more about data science and statistical methods, which I had not focused on as much in my coursework.

The impressive corpus of digital texts you have curated may benefit future researchers conducting digital humanities studies. Are there any sample questions you can think of the corpus may help address?

Very much so! As mentioned, I had other research questions I was hoping to get to, but, as the thesis was already 159 pages (before the appendix and references), I ended up cutting them due to scope. The question I was really hoping to explore, but didn’t get to, was about the value of children in the nineteenth century. In her book, Pricing the Priceless Child, Viviana Zelizer explores how, beginning in the nineteenth century, the child gains in sentimental value as their financial value decreases. I was hoping to explore that trend in literature, by linking it to literary trends like the cult of childhood, and examining how much the books in the dataset use a diction of sentiment/emotion vs. a diction of utility.

Beyond that question, which I explored fairly in depth but did not get to apply to the dataset, there are endless new avenues for questions. Especially in the curation process, I regularly stumbled across questions and topics that the dataset could significantly address from discussions of colonialism to the fairy tale subgenre.

Any other aspect of the project I have not asked about and you’d like to share with our readers?

I would be remiss not to talk about how critical a role the Cotsen’s Children Library played in the project. After I decided to look at children’s literature, I needed to find a collection of works that would suit my purpose. I explore this more in my thesis, but no existing corpus met the project requirements. Professor Gleason showed me the nineteenth-century catalogue of the Cotsen Children’s Library as a starting point, and that fundamentally shaped my project.

Out of the 1020 works included in my final dataset, 416 were directly from the catalogue and the other 604 were almost all added because of the collection—works by authors in the collection or that I found while searching for works in the collection. While curation could be exhausting (it required searching the 6000+ works from the catalogue in the HathiTrust Digital Library search bar), it also was an amazing introduction to the variety of children’s literature in the nineteenth century. I often found myself down research rabbit holes, or even at times, just being surprised by the books. In one of the books, Other Stories by E. H. Knatchbull-Hugessen, I read its very long dedication to his armchair. It felt like uncovering a secret history, although one that was often troubling, especially with its treatment of non-European cultures, race and ethnicity, and colonialism.

A one-page dedication to the author’s soothing, non-judgmental comfy chair. In Other Stories, by E.H. Knatchbull-Hugessen ; with illustrations by Ernest Griset. London: George Routledge and Sons, 1880. (Cotsen 2646)

Moreover, when I was feeling particularly tired in the final weeks of my thesis, I stopped by the library, and ended up talking with the staff about my thesis. That memory was hugely encouraging as I finished my thesis and is still one of my favorite memories from my senior year.

Your work won this year’s Senior Thesis Prize from the Center for Digital Humanities. Big congratulations! What is your future career plan like?

Next year, I’m working on the Atlas product, a database-as-a-service, for MongoDB, a tech company in New York. While I loved my research and was lucky enough to be accepted to Cambridge’s MPhil in the digital humanities, I ultimately wanted to take time off from school. I had a really wonderful time interning with the company last summer, and I wanted to experience working full-time for a tech company, especially as I decide if I want to go into tech long-term or explore one of my other interests. I definitely see myself returning to the digital humanities, or more generally to a job at the intersection of tech and culture.

Lastly, since you (distantly) read over a thousand children’s books to conduct your research, please tell us about your childhood reading. Did you have any favorite books or reading material? Any people or places you associate with your early reading?

I actually very recently reread several favorite children’s books! One of my all-time favorites that I think really holds up is Tamora Pierce, particularly her Wild Magic and Circle of Magic series. Her female protagonists are better-written than most of the ones I find in adult literature. There are so, so many other series I love (Little House on the Prairie, Shannon Hale’s books—which I mentioned briefly in the thesis, Nancy Drew, Cornelia Funke’s Igraine the Brave, etc.), but Tamora Pierce’s books are the ones I go back to the most.

For people, obviously my parents and my siblings played a huge part in my reading. Also, my librarians: my elementary school librarian even gave me my school’s copy of Pride and Prejudice when I left because I was the only one who ever checked it out. Oddly enough, the place I associate with early reading is the upstairs hallway in my house. There’s a bookshelf there, and I remember sitting there on the beige carpet for hours, reading a book, and when I was done, just picking another one off the shelf.

Reading Children’s Literature, Fast and Slow

Partly due to the relative scarcity of children’s literature corpora, Caballero’s project is a rare computational literary analysis (CLA) that is implemented upon children’s texts. In the field of digital humanities (DH), “corpus” refers to a digital collection of texts. Having curated a corpus of 19th-century English-language children’s literature herself, Caballero applies the method of topic modeling to tease out the statistical pattern of topics in relation to the gender of protagonists. The strength of Caballero’s outstanding research lies in multiple areas. First, she is not afraid of engaging in controversies about DH and in thorny challenges of children’s literature studies. Second, she makes an impressive contribution to DH by publishing a large corpus of digitized children’s literature, which will benefit future researchers. Thirdly, by firmly grounding her statistical revelations in the concerns and findings of traditional literary criticism, the thesis carefully balances quantitative and qualitative methods, reaching nuanced conclusions that are both supported by large-scale analysis and informed by close reading of canons.

Chapter 1 reviews the history of CLA, which over time has succeeded in applying increasingly sophisticated computational tools such as natural language process to literary studies, processing texts on a scale previously impossible for the solitary researcher. Caballero visits debates around the field of DH and examines, with a fine-tooth comb, critiques that are among those made by its harshest detractors. This would shape the design and process of her project.

Responding to flaws that have been raised about DH scholarship, in Chapter 2 Caballero defines the scope of her data with transparency, meticulously documents how the dataset has been constructed, and makes it readily available through the HathiTrust Digital Library collection system. Caballero used A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (Princeton University Library, 2019) as a guide for building the digital corpus. The catalog, consisting of two tomes that stack up to four inches high, describes over 6300 titles published during the 19th century and having been collected by the late donor, Lloyd E. Cotsen, ’50 and Charter Trustee Emeritus. With what I can only imagine to be mighty Princetonian tenacity, Caballero has gone through all of them, selecting English-language titles that meet her criteria of narrative texts for children.

Caballero used A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (Princeton University Library, 2019) as a guide for locating texts in HathiTrust Digital Library and building the Cotsen Children’s Literature Dataset.

The curation process forced Caballero to wrestle with some of the fundamental questions that are beguilingly simple but fraught with rule-defying exceptions: How do you define literature? (e.g., should primers, ABC books, stories in verse, etc. be included ?) How do you define children’s literature? (e.g., is the presence of a child protagonist a necessary and sufficient criterion? Must children’s books be written with a young audience in mind? What about folk tales and fables, genres that were not produced only for children, but have morphed into classical children’s literature?) By sharing her challenging decision-making process, Caballero hopes to keep future users of her dataset fully informed of the limitations of the corpus.

Caballero was able to locate 416 titles of English-language children’s literature from the catalog that are available in full text in HathiTrust Digital Library (HTDL), a major repository of digital content from research libraries. By conducting author and keyword searches, she added another 604 titles to the Cotsen Children’s Literature (CCL) Dataset. Authors that appear most frequently in the dataset include Horatio Alger (1832-1899), Mary Martha Sherwood (1775-1851), Laura Elizabeth Howe Richards (1850-1943), Mrs. Molesworth (1839-1921), Oliver Optic (1822-1897), Louisa May Alcott (1832-1888), and A. L. O. E. (pseudonym of Charlotte Maria Tucker, 1821-1893) (68).

To prepare the dataset for computational analysis, Caballero then ran the 1000+ works through the BookNLP pipeline for nearly twenty-four hours of intensive analysis. An open-source natural language processing tool, the BookNLP pipeline tracks “all of the characters appearing within a work, the number of times they’re mentioned, the names and pronouns by which they are mentioned” (91), and other entities it is capable of recognizing and tagging in scale.

Chapter 3 describes the computational analysis of the texts in terms of gender and topics. First, based on the annotations generated by the BookNLP pipeline, Caballero determined that 613 titles met the mathematical threshold for having a central protagonist (92-95), all but eleven of them having an identifiable gender, which is treated as a proxy for the gender of the intended audience. However, Caballero is quick to point out that the intended audience does not equate the actual audience: whereas boys tend to read books with male protagonists, girls tend to cross gender boundaries and read about boys and girls (89).

The male-to-female ratio of protagonists in this subset approaches 1.9:1–389 titles with male protagonists versus 206 with female protagonists (99)–an uncanny number that echoes the findings of studies about gender imbalance with other bodies of literature. For example, McCabe et al. (2011) analyzed gender representation in 5,618 children’s books published throughout the 20th century in the United States. Through manual coding, her team of five scholars found a male-to-female ratio of 1.9:1 in title characters, and that of 1.6:1 among central characters. Data also suggested that male protagonists receive more mentions than female ones in the CCL works (101), again a pattern that is consistent with existing scholarship. Underwood et al. (2018) traced 104,000 works of English-language fiction spread over three centuries, from 1703 to 2009. Similarly using BookNLP and relying on HathiTrust Digital Library, they calculated the proportion of words used in describing female characters, and found a steady decline from the 19th century through the early 1960s.

Next, Caballero conducted topic modeling to sort the dataset into 125 clusters, each containing co-occurring words from which a topic or a theme may emerge. Among them, 112 were found to be gendered: 64 topics were more often in stories with male protagonists, and 48 more often in those with female protagonists, suggesting that boys’ topics are 33% more varied than girls’. The rest of the 13 topics were gender-neutral. Caballero presents both macro statistical revelations and in-depth analysis of selected topics.

I thought it would be interesting to pick titles from the Cotsen catalog of the 19th century and test to what extent an individual work reflects large patterns detected by machine. To build up the suspense further, of the five titles I selected from the catalog (largely based on the interest level of illustrations highlighted in the tomes), only one is in the CCL dataset, and four others are not available in HathiTrust, thus not having been “read” by computer programs mathematically.

Topic: Violence/Combat

Word cloud produced by running a statistical natural language processing toolset called MALLET. The topic of this cluster of words is labeled as Violence/Combat. Image courtesy of AnneMarie Caballero.

It should come as no surprise that the topic of violence or combat is found more often in works with male protagonists. One of the Cotsen titles I selected, The Little Deserter; Or, Holiday Sports; An Amusing Tale Dedicated to All Good Boys, epitomizes the strong connection between the topic and an intended boy audience–from the unequivocal dedication to boy readers in its subtitle, to illustrations that portray boys playing soldiers with menacing-looking toys and props.

The Little Deserter; Or, Holiday Sports; An Amusing Tale Dedicated to All Good Boys. Edinburgh: Oliver and Boyd, [1807 or 1808]. (Cotsen 7108)

If you find the scene of execution–in a book published during the Napoleonic Wars–offensively violent for 21st-century sensitivity, you are justifiably feeling so. Here is a spoiler that may be offered as a small solace: Julius, the boy protagonist who has been blindfolded and received the death penalty, bounces back in no time and puts dibs on playing the captain in tomorrow’s game.

Miss Johnston’s name was inscribed on the front pastedown and, as shown here, the front free endpaper of the copy of The Little Deserter. (Cotsen 7108)

What makes the Cotsen copy of The Little Deserter remarkable is that it carries evidence of girls’ expansive reading interests. A “Miss Elizabeth Johnston,” likely a former owner/reader of the book, inscribed her name twice in it. As quoted in Caballero (87), Kimberley Reynolds attributes the appeal of boys’ books for girl readers to the fact that books deemed suitable for young ladies were frequently unexciting tales for cultivating good behavior.

Topic: Island (Stranding)

The topic of this cluster of words is labeled as Island (Stranding). Word cloud courtesy of AnneMarie Caballero.

“Island” is a frequent word found in multiple topics that range from Island (Stranding) to Boats (Stranding, Shipwreck) and Nation (Nationalism), linking to the traditional boys’ adventure story as well as the historical subject of colonial conquest (139-140). Both literary criticism and Caballero’s computational study confirm a gendered landscape in children’s literature, which contrasts the feminine home with the masculine away, and excludes boy characters from the home and girl characters from the away (147). The dichotomy, however, is complicated by what is referred to as “adventurous domesticity” (142), whereby protagonists attempt to reconstruct domesticity while stranded.

“Adventures of Robinson Crusoe” in The Robinson Crusoe Picture Book. George Routledge and Sons, [not after 1873]. (Cotsen 152150)

The masculine pursuit for “home away from home” is well reflected in “Adventures of Robinson Crusoe,” a short, illustrated verse story based on Daniel Defoe’s novel. The titular castaway builds a thatched and fenced house that he can call his little home, makes furniture and clothes (he is pictured as putting finishing touches to an umbrella), keeps company with his dog and cats, and domesticates a young goat and a parrot he has found on the island.

Johnny Headstrong’s Trip to Coney Island. McLoughlin Bros., 1882. (Cotsen 540)

Of the five titles I selected, Johnny Headstrong’s Trip to Coney Island is the only one that is included in the CCL dataset, thanks to digitized copies contributed by member university libraries to HathiTrust. In this verse story, Johnny’s family takes a trip to Coney Island beach. Even though his sister Sue has also joined the outing, she is rarely mentioned. At one point she is described as sitting on the wooden horse of a carousel “like a lady,” i.e., side-saddling. Johnny is the protagonist and remains the center of attention (and chaos) by getting into a nonstop series of scrapes, departing the island with bandages over his nose and cheek at the end of the day. Johnny Headstrong’s adventure seems to be a quintessential bad boy’s tale, having packed into its 20 pages so many of boys’ topics on Caballero’s list (115-6): Movement, Body of Water, Boats, Injury, Donkeys, Animal, to name a few.

“Painful Emotion (Death)” is high on the list of girls’ topics (117, 121). Word cloud courtesy of AnneMarie Caballero.

The pattern of gendered topics does not mean that a boy’s tale is devoid of all topics that are statistically prominent in girl’s stories, and vice versa. Caballero conducts a case study with two of the best-known girls’ adventure stories, Alice’s Adventures in Wonderland and Alice Through the Looking-Glass, and finds a good portion (a quarter and nearly a half respectively) of the top 20 topics in each work are boy’s topics, such as Injury, Water, and Animal (143). Likewise, both Robinson Crusoe and Johnny Headstrong have their emotionally vulnerable moments, described in words that are frequently found in the topic Painful Emotion (Death), which is statistically a girls’ one. A forlorn Robinson Crusoe sometimes grows “very sad,” “cries aloud,” weeps “like any child,” thinks of his father and mother, and prays to God “with many tears” (“Adventures” 1-2). Johnny’s adventure begins as he tumbles overboard, is fished out of the water, and cries as he is sent to the engine-room to dry beside the furnace fire. In one episode, he slips away and loses his Papa and sister Sue, then begins “to cry,” “big tears” running down his chubby face (Johnny Headstrong’s). In another, he accidentally strikes a boy hard with a ball and, thinking the boy would surely die, sobs with “childish fright.” Towards the end he falls off a swing, and adults have to sooth “his sobs and groans.” It is tempting to ask if there might be any correlation between how broadly appealing a children’s story is and how inclusive the work is in encompassing gendered topics.

Girls, Domesticity, and Travel

“Confidential People” in The May Blossom, or, the Princess and Her People. Illustrations by H.H. Emmerson; verses by Marion M. Wingrave. London: Frederick Warne and Co., [1881]. (Cotsen 9380)

Caballero recognized that illustration is an essential element of the Cotsen collection, because of Lloyd Cotsen’s “passion for illustrated works that help children become independent readers” (Immel, quoted in Caballero 54). Her computational analysis handles only texts that have been OCRed, thus the machine has missed about half of the fun of perusing the Cotsen collection! The May Blossom, a collection of short verses, presents an intriguing case of what machine manages not to miss in spite of its singular focus on texts. In one of the entries “Confidential People,” a first-person “I” shares a secret with a second-person “you”–there is no textual description of the setting of the story. In the accompanying illustration, the two characters are seated in an intimate, ornate space, surrounded by objects that well match the most frequent keywords of the topic Domestic Space, one that is found more often in works with female protagonists.

The most frequent words in the cluster for the topic Domestic Space include room, table, chair, and sit (161). Word cloud courtesy of AnneMarie Caballero.

The narrator confides that she plans to marry “a sweet little beau” and to take a honeymoon by “a coach and six horses” to Lilliput Land next year. It is a striking contrast how a story that hints at an exciting trip to the faraway fantasy land is visually represented by two girls confined in a stuffy room, a setting that is mentioned nowhere in the text. Travel (Driving, Carriage) turns out to be one of the gender-neutral topics (114), meaning it is as likely to appear in stories with a male protagonist as a female one. How does it square that domestic space is tied to girls’ stories, yet travel is not ? In “Confidential People,” the girl’s narrative about travel is firmly grounded within approved gender roles. The endearingly amusing verse both adores the young narrator’s childish innocence and models an aspiration for marriage that leads to the fulfillment of traditional womanhood.

“Johnny’s First Motor Ride” in Little Tots Holiday Book: With Numerous Coloured Plates and Other Illustrations. London; New York : Frederick Warne & Co. (Cotsen 30357)

A close reading of another story that fits the topic of Travel (Driving, Carriage) invites us to consider what it means to be the central character of a story, and circles back to gender imbalance in terms of the count of female versus male protagonists as well as the proportion of words devoted to each gender. In “Johnny’s First Motor Ride,” the titular character receives a real little motor-car from his father and soon learns how to “control it with ease.” With a bonneted baby deposited in the passenger seat–possibly against the baby’s will, judging from his/her facial expression–he goes out for a ride[1]. After trying to abruptly avert a collision with Margery’s goat-chaise, however, Johnny finds his car stuck. It is at this point, where the story has run two-thirds of the way towards the end, that attention swerves to Margery. Described by her father as “a real clever little woman,” Margery is sympathetic, helpful, and resourceful. Even though it is not her fault that Johnny’s car malfunctioned, she does not abandon the stranded novice motorist. She sets to work “to harness the damaged motor-car to the goat-chaise,” which is pulled by “Nanny” the goat, and coaxed the hoofed “engine” to tow the modern vehicle home. “That was a real triumph for Nanny!”–the story concludes with the exclamation.

Whether by its title “Johnny’s First Motor Ride” or by the amount of text devoted to Johnny, the protagonist of the story is apparently a boy–to machine’s mathematical “mind” at least. I can’t predict how a human reader interprets who is the central character of the story. Margery clearly shines with what she has done, even though she doesn’t receive the most mention in the story. That the credit for the successful rescue act should go to the goat implicitly imparts a self-effacing virtue expected from females. The girl character is sidelined even in a story where she is not the damsel in distress but the heroine who saves the day.

Caballero’s computational analysis of a sizeable body of 19th-century English language children’s literature reveals a gendered landscape, tethering female characters to the domesticity and the inward, freeing male characters to the wider world away from home, enlarging the gap between endorsed masculine and feminine behavior, and bundling implicit morals and values for each gender. She brings rich complexity into her project by tracing how a large-scale analysis of over a thousand works agrees with or departs from findings based on traditional literary criticism of a limited number of canons. It is a testament to the robustness of her study that, for the five titles from the Cotsen collection–only one of which available in the dataset–the patterns still hold true and help us gain fresh insights into these dusty volumes.

Note

[1] Johnny and his father break all the modern government regulations for driving an automobile. There is no publication date on the book. Let’s assume the story was written soon after the invention of the first automobile in 1886, before the driver’s license began to be implemented by the end of the 19th century, or before the age restriction was first introduced in Pennsylvania in 1909.

References

Caballero, AnneMarie. Gendered Topics: Boyhood and Girlhood in a Century of (Cotsen) Children’s Literature, Princeton University, 2023.

Mccabe, Janice, et al. “Gender in Twentieth-Century Children’s Books: Patterns of Disparity in Titles and Central Characters.” Gender & Society, vol. 25, no. 2, 2011, pp. 197-226.

Underwood, Ted, David Bamman, and Sabrina Lee. “The Transformation of Gender in English-Language Fiction.” Journal of Cultural Analytics, vol. 3, no. 2, 2018.

Resources

Datasets Curated by AnneMarie Caballero (the exact scope of each dataset is detailed on page 65 of her thesis):

The Cotsen Children’s Literature (CCL) Dataset (1021 items as of July 2023) [URL]

  • The subset of titles as found in A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (416 items) [URL]
  • The subset of titles as found outside A Catalogue of the Cotsen Children’s Library: The Nineteenth Century (605 items) [URL]

Titles as found in Cotsen’s catalog but excluded from the CCL Dataset (123 items) [URL]

GitHub repository of the cleaning script for the CCL Dataset [URL]

(Edited by Andrea Immel)