A Fortnightly Review of
Canon/Archive: Studies in Quantitative Formalism from the Stanford Literary Lab
Franco Moretti, editor
By CHLOË HAWKEY.
LITERARY CRITICISM IS a fraught field—far from popular, often charged with elitism, yet from its supporters’ standpoint, at least, essential to the vitality of literature. For those of us who value it and seek it out, it is not really about intellectual gods handing down their rulings from on high; it is an essentially democratic, discursive pursuit, in which we have the ability to participate. We see critics, who have dedicated their lives to reading, thinking, and reckoning with the world in which we all live, simply as being in a better position than the rest of us to start that conversation. As F.O. Matthiessen, literary critic and Harvard English professor, once wrote, criticism has the ability to offer “life-giving communications between art and society.”
And so, at a time when we have ever more communication, but when art is still seen as frivolous and when society is struggling with political, environmental, and social woes more than ever, every new collection of criticism has the potential to—if not save us, then at least light our way, help us to make sense of our complex and often terrifying world by bringing us new interpretations of art, literature, and culture. The publication of Canon/Archive: Studies in Quantitative Formalism from the Stanford Literary Lab has thus caused understandable excitement among those who are capable of getting excited about literary criticism. Edited by Franco Moretti and comprising eleven of the Stanford Literary Lab’s so-called “pamphlets” (all originally published independently), the collection represents an effort on the part of the researchers to merge older styles of literary criticism with powerful computers and complex algorithms.
The work focuses on nineteenth-century literature—with brief forays into the eighteenth century—but not just on the relatively narrow canon that we all know so well: Austen, Bronte, Dickens, Shelley, Hardy. Instead, Moretti and his fellow researchers turn to the broadest archive available to them, striving to have their work represent, whenever possible, the same range of writers and books to which their contemporary readers would have had access. Though, to some extent, this is simply part of an effort to be as “scientific” as possible, it also represents an admirable democratic impulse, one which deserves some acknowledgment. (Interestingly, the collection includes a chapter on the question of the canon versus the archive, in which the researchers discover, through tracking the type of language used in canonical versus non-canonical texts, that what held back the long-term popularity of the latter was their indulgence in “heteroglossia,” essentially other languages—the news, politics, aesthetics of the day. In other words, we love Austen today because she doesn’t require us to be versed in the political life of small-town, nineteenth-century England; we can still relate to her. So there is reason why we read the books we do today—but we needed Moretti et al. to work with that larger archive on our behalf in order to explain that reason to us.)
The organizing body from which this collection of criticism arises is the Stanford Literary Lab, a phenomenon that is much more difficult to imagine existing at any university other than that incubator of Silicon Valley-style “innovation.” Founded by Moretti in 2010, it is a “research collective that applies computational criticism, in all its forms, to the study of literature.” Open to undergraduates, graduate students, and faculty at Stanford—as well as, occasionally, to faculty elsewhere—the lab is fundamentally a collaborative operation, in which, Moretti explains,1“the team sits together around a table—a lab table, as essential a tool as the really expensive ones—and discusses how to make sense of the results.”
As a child of the Bay Area, I’ve grown up with Silicon Valley looming large in my imagination—and not always in a good way—so the idea that this collection might demonstrate an ability to put the number-crunching, data-driven, housing-market-destroying forces of Silicon Valley to good work in revitalizing literary criticism was one that I coveted. And indeed, Canon/Archive does demonstrate the flexibility of this seemingly traditional field, does demonstrate that literary criticism is not a fossil so much as another evolving creature, capable of moving and changing with the rest of the world. This is, in itself, a worthy contribution to the field, and not one to be taken lightly.
The methods they use as they apply their algorithms to literature are innovative indeed. In order to study subjects like the style of sentences, the thematic value of paragraphs, and the geographical locations of emotions in London, the researchers used computer programs—perhaps most often, programs that determined the frequency of certain words within texts over time. Thus, having programmed the computer to identify a certain word or a certain type of word, they feed massive amounts of text from digital archives into it and receive, in return, massive amounts of data about the recurrence of certain words or syntactical features or their trends over time. This data they then proceed to plot on graphs and charts, which, Moretti assures us in the introduction, are methods of displaying “the specific object of study of computational criticism…our ‘text’; the counterpart to what a well-defined excerpt is to close reading.” Unfortunately, what the Literary Lab researchers fail to take into account is that most of the readers of their book are hardly specialists in understanding such graphs of “big data.” For this reader, at least, the most they could offer were those visual representations of the same trends over time that were already described in the text. Alas.
Nevertheless, the researchers’ general strategy succeeds: they gather the data, identify the patterns in it, and then use their background in literary theory to analyze those patterns along conceptual lines. They seem quite proud of the way in which they weave their results and conceptual analysis together, the way they ground their theories in data and elucidate their data with theories: “Only from their encounter,” they assure us, “did critical knowledge arise.”
We are left with what is, in all fairness, a quite convincing movement from huge quantities of data, through patterns, to new forms of conceptual awareness. I will readily admit that I am much less skeptical now than I was 308 pages ago about the possibility of “operationalizing”—essentially measuring—literature, about our ability to pin down “solid facts” about fiction, and about the use of “corroborating” existing literary theories.
Take, for example, the chapter titled “From Keywords to Cohorts,” a study of “novelistic language” conducted by Ryan Heuser and Long Le-Khac. In it, they describe their development of Correlator, a computer program that allowed them to track the “decade-level frequencies of words.” By recording how often particular words were used within a given decade and then by matching those words with other words that were used equally often (or rarely) over time, Correlator showed which words were used together and when. This program returned the remarkable discovery that correspondence in meaning could appear out of this strictly historical-frequency data. In other words, when the researchers entered the word “integrity” into Correlator to find what words in the corpus had the most similar historical behavior, the words returned included “modesty,” “sensibility,” and “reason.” Seeking only similarity in historical use, they discovered similarity in meaning.
Ultimately, Heuser and Le-Khac put Correlator to use identifying and making sense of one key trend: over the course of the eighteenth and nineteenth centuries, this set of words, which they call “abstract value” words, declined as the use of concrete, descriptive words increased. They were then able to use this discovery to explain a social shift over this same period, in which an increasingly urban population rendered small-town morality irrelevant; in a world of strangers (such as nineteenth-century London), everything and everyone was unfamiliar, and intimate moral valuation became impossible.
Heuser and Le-Khac write of Correlator’s role in this process, “It took a computational method of finding language trends to discover this other group of words that, while not semantically related to the abstract value words, are historically related.” In other words, this computer program, with its ability to create and analyze huge amounts of data, enabled them to write intellectual history, to track the way people were thinking over time.
Even for something of a literary-critical traditionalist like myself, it didn’t take much convincing for me to find both the value and the excitement in Heuser and Le-Khac’s research. I was more skeptical when it came to aspects of literature more artistic, if you will, than word usage over time. Scribbled in the margins of Moretti’s introduction, next to a sentence that began “Katsma found a way of operationalizing his intuition,” I left a note that says, “Should intuition be operationalized?”—which, I confess, filled me with self-righteousness as I wrote it. But upon returning to it several days later, having read Harvard grad student Holst Katsma’s chapter on loudness in the novel, I found that I could answer in the affirmative fairly easily. Yes—one can (though perhaps “should” remains a bit strong) take a vague feeling for how a piece of literature works and find a way to measure it, or at least Katsma can.
He makes a compelling case. Beginning by describing a sense we’re all familiar with—that we hear the narrator’s and characters’ voices in a novel, even when we read it silently—Katsma goes on to find a way to measure that loudness, to make the vague and abstract concrete. Using the “speaking verbs” attached to dialogue (“they shouted,” “she said”), he divided the texts into “loud” and “neutral” dialogue excerpts, which he then subjected to a series of computational studies. The first was a “most distinctive word” test, in which a computer identified the words most frequently used within each type of dialogue. The results were surprising: “Loudness,” he writes, “showed an affinity, not to topics, but to grammatical structures.” Loud dialogue—though perhaps “intense” is more accurate—was not loud because it was about murders or fires. It was “loud” (or intense) because it included a disproportionate number of verbs, pronouns, and questions and far fewer adjectives, nouns, and prepositions—because, in other words, it avoided description in favor of action.
Katsma then goes on to plot the loudness of two novels, Dostoyevsky’s The Idiot and Austen’s Pride and Prejudice, on a chapter-by-chapter basis. He finds, and his graphs convincingly display, that loudness within an individual work of literature builds in a series of crescendos—louder, louder, louder, and then quieter for a while, quieter still, until it begins to grow again. Assuming, perhaps rather boldly, that these two books are representative of the archive at large, loudness thus provides, as he points out, “a means for thinking about novelistic order while maintaining an interesting distance from the plot.”
Indeed, it does. I confess that I found this chapter fascinating: I knew exactly what he meant when he first mentioned “loudness,” but I couldn’t imagine a way to measure it (or, as I now know to write, to operationalize it). And so, to watch as he found a way to do that and then as he entered so much text into a computer and made sense of such a veritable mountain of data—it was amazing. And as a result of the process, I do have some new sense of the way that dialogue functions in a novel, a sense that will perhaps even change how I read in some modest way.
And yet. It is impossible not to wonder, as my frequent and increasingly urgent scribblings in the margins of the book ask, to what end? What do these literary concepts, so scientifically proven, so articulately expressed do—for me, for you, for Katsma?
KATSMA SEEMS MORE aware than most of the critics in this collection that the reader actually exists, so he is less hesitant to address the way that literature affects readers. But even he fails to press ahead, fails to suggest the way that the emotional impact of a novel can move a reader—to kindness, to self-awareness, to understanding, to action.
It’s a frustrating shortcoming and ultimately the one that most stands in the way of my enthusiasm for this collection. We’re busy. If we weren’t busy before January 20, 2017, we’re certainly busy now. If a book is going to ask for our time, attention, and energy, it needs to assure us of its importance—and this book, important though it has the potential to be, utterly refuses to do that.
In his conclusion, Moretti writes, “Algorithms have changed what we study, and how we study it. Think of reading. For centuries, reading has been indispensable to the understanding of literature. In front of Figure 10.1, it is nothing. Nothing. Just to be clear, it’s not that we should stop reading books. Reading is one of life’s great pleasures.”
I might begin by pointing out that reading is not merely “indispensable to the understanding of literature.” Tape is indispensable to the wrapping of presents. Knives are indispensable to the cutting of holiday pies. Reading is the whole means by which literature exists in our world; reading is the only way we can come to know literature intimately, emotionally.
As such reading is not some frivolous amusement, not merely a “pleasure.” Anyone who picks up a book like Canon/Archive is someone who has been terrified by a work of literature, and I don’t mean by some gruesome zombie scene, but by the sheer power of the words. Those of us who seek out literary criticism are those who have finished books too overwhelming to describe, who have been moved to tears and odd swells of emotion, who have the tendency to grope through our bookshelves and favorite libraries and bookstores on bad days—and we come knocking on the doors of critics because we hope that they might have the words that we don’t have to describe the awe, or anger, or frustration we feel in the presence of those books.
We do not come for algorithms or for patterns, or even for the assurance that this newly discovered trend grew out of a historical development. We come for something at once bigger and deeper and more elusive. Lionel Trilling, that great mid-century literary critic and proponent of morality (in more of a left-liberal sense than a Christian-right sense), wrote, “The novelist goes where the law cannot go; he tells the truth where the formulations of even the subtlest ethical theorist cannot…he gives us the models or the examples by which, half-unconsciously, we make our own moral selves.” This is what we seek in reading fiction, and by extension, what we hope our best critics will elucidate for us. I wish Moretti and associates had succeeded in this—god only knows, our moral selves could use all the help they can get these days.
IN SINGING IN THE RAIN, there is a moment when Cosmo Brown, the piano-playing sidekick to Gene Kelly’s character, Don, comes up with the idea of having an actress with a terrible voice lip-synch in their studio’s debut of a “talking picture.” He demonstrates lip-synching, having another friend sing while he moves his lips, then turns to Don. “Well, convincing?” he asks.
“Enchanting…” Don replies, dripping sarcasm. “What?”
Most of the time that I was reading this book, I felt like Don: Enchanting—what? This is all well and good. It’s remarkable what computers can do and equally what these students of literature can do with these computers, large digital archives, and a long list of algorithms. And some of their findings are truly very interesting: it is possible to measure loudness in a novel and to plot the change in loudness in novels over time! You can keep track of the way that abstract value words gave way to concrete descriptive words over the nineteenth century!
But always, after I smiled and granted that I was enchanted, I was left with the what? What does this mean to my life or to Franco Moretti’s? This deepens my understanding, in a very technical sense, of the text, but does it enrich the experience I have reading in it? And does it enrich the life I lead once I’ve capped my pen and returned the book to the shelf?
Of course, as fans of Singing in the Rain know, the joke may be on me: ultimately, Cosmo’s lip-synching plan saves the musical and the love story both. It turns out to be far more than a cute trick—it opens to a Hollywood community set in its ways a whole new way of moving forward. I leave that possibility open.
But, if I may: computational criticism is going to need a considerable dose of the compelling and the urgent if it hopes to achieve a level of Gene-Kelly-tap-dancing revelation.
Associate editor Chloë Hawkey studied American History and Latin at Columbia University. She lives in the San Francisco Bay area and works as a whitewater river guide on the Rogue river in the summer months. She is the Fortnightly‘s ‘American Note’ columnist; an archive of her Notes is here.
- In Literary Lab Pamphlet 12: Literature Measured, April 2016 (pdf). Pamphlets of the Stanford Literary Lab.