15 December 2021

Ngrams, or How to Be Groovy in 1864.

 Let's get a bit convoluted, shall we? Last month on the Short Mystery Fiction Society* list Judy Penz Sheluk pointed to a blog piece she wrote about a webinar Iona Whishaw gave.  Her subject was Ngrams.  According to Wikipedia "an n-gram (sometimes also called Q-gram) is a contiguous sequence of n items from a given sample of text or speech."

And what the hell does that mean, you may ask. Take a look at the diagram below.  This is an ngram of Google books showing how often the terms crime fiction, detective fiction, mystery fiction, and noir fiction showed up in each year.  More accurately, it indicates what percentage of pairs of words published in a given year consists of the pair you are looking for.  So detective fiction was the most popular term until 2011 when crime fiction surpassed it.  I would have guessed that happened decades earlier.

Pretty cool?  But wait: we are just starting.  Not visible at the bottom of the screen is the fact that you can look up all the books (magazines, law codes, etc.) that contain your phrase in a given year or time period.

If you are writing historical fiction you have just acquired an amazing new tool, thanks to Sheluk and Wishaw.

 I wrote a story earlier this year set in 1967 and I used the word groovy.  So let's see how that word does in the ngram world.  The diagram below shows the word was very popular in 1967, although it peaked in 1970.

But wait - why do we see that huge jump around 2010?  A quick click on the 2009-2011 button reveals a programming language called Groovy. And sure enough, if we make the ngram case sensitive Groovy becomes briefly more popular than its lower case sibling.

But I learned something even weirder. Groovy was being used long before the flower children's parents were even born. I found this quotation from the Saturday Review, January 1864: "For a groovy parent trains a groovy child, and the groovy child must be father of a groovy man."

How hip those Victorian English dudes were, you may be saying. Alas, the anonymous writer did not mean it as a compliment. He was talking about being stuck in a rut, thinking inside the box. Very much not groovy.

I am also writing a story set in 1959 and one of the characters is socially awkward, has certain verbal tics, and can do amazing mathematical feats in his head. Today most of us amateur diagnosticians would say "he's on the autistic spectrum." But would anyone have used that term sixty years ago? We can go to ngrams again, but this reveals a weakness of the tool.

Because when I search for uses before 1960 I find publications that supposedly have that date, but were really published later.  There is a 1992 edition, for example, of a psychiatric manual which was first published in the 1950s, and Google Books can't spot the difference.  There is a similar problem with journals that were founded a long time ago.  (HathiTrust, another great free tool for historical sources, suffers from the same limitation.)

On the other hand... A few weeks ago Leigh wrote a fascinating piece here about words and concepts that started in the 1980s.  His source claimed that "eggs benedict" wasn't given that name until 1984.  Google Books Ngrams quickly found it in a  the Hotel St. Francis Cookbook, 1919 edition.

And now I'm hungry.  But before I head to the fridge, much thanks to Judy Penz Sheluk and Iona Wishaw for pointing out this cool tool.  You can play around with the Google Books ngram viewer here.

*I am the Society's current president and I hereby invite you to join.  It's free but new memberships are not accepted between January 1- May 1, so hop to it here.


  1. Rob, it was interesting watching readers debunk eggs benedict. In the 1970s, I used to enjoy Sunday brunch in New York, where eggs benedict was a feature and a fixture, and there I discovered the difference between hollandaise and béarnaise sauces. Our Elizabeth Dearborn dined upon eggs benedict a decade earlier in the 1960s. Then you, Rob, blew us all away with the 1919 reference. Really amazing.

  2. Rob, has the ngram data been normalized to account for, say, population or ease of publishing?

    1. No, but remember it is giving you a percentage of word-use in a particular year, so it is apples to apples. In other words it was just as easy to use "mystery fiction" as "detective fiction" in 1928, no matter what the population or ease of publishing was. Unless I misunderstand you?

  3. Back in Charlotte Yonge's "The Long Vacation" (1895), the heir speaks to his family of how “there never were such groovy people as you are!” And he meant, stuck in his ways.
    Also, in Victorian times, "gay" meant first bright and lively, and then became a term for prostitution, as in "the gay life". (Which makes sense, because prostitutes wore bright colors and had to look very lively and fun to get customers.)
    The Lexicon Balatronicum: A Dictionary of Buckish Slang, Unicversity Wit, and Pickpocket Eloquence (1811), defines "Pig" as "a police officer. A China street pig; a Bow-street officer."
    Also, in Victorian literature and diaries, the verb "spend" often didn't have anything to do with money - it was about sex.


Welcome. Please feel free to comment.

Our corporate secretary is notoriously lax when it comes to comments trapped in the spam folder. It may take Velma a few days to notice, usually after digging in a bottom drawer for a packet of seamed hose, a .38, her flask, or a cigarette.

She’s also sarcastically flip-lipped, but where else can a P.I. find a gal who can wield a candlestick phone, a typewriter, and a gat all at the same time? So bear with us, we value your comment. Once she finishes her Fatima Long Gold.

You can format HTML codes of <b>bold</b>, <i>italics</i>, and links: <a href="https://about.me/SleuthSayers">SleuthSayers</a>