FAQs

Wordsworth is a tool to help historical fiction authors avoid linguistic anachronisms. You can compare a passage from your story to a corpus of fiction from the decade you're writing about, or look up whether a specific phrase is found in fiction from that decade.

About Wordsworth's Features

What is a bigram?

A bigram (pronounced "BY-gram") is a linguistic term for a phrase of two words.

How does the Bigram Search feature work? Does it take into account that spelling has changed over the last 200 years?

The Bigram Search feature looks for a specific two-word phrase within a certain decade. If the phrase is not found, it will also look for the hyphenated and single-word versions of that phrase. (Think "tea cup" vs. "tea-cup" vs. "teacup".) This is because English from 1800-1923 tended to use more hyphenated phrases than 2010s English does. For instance, the hyphenated word "dinner-parties" appears in the 1810s corpus, but the bigram "dinner parties" does not.

What are the Google Ngram charts and where do they come from?

The Google Books Ngram Viewer is a Google feature that graphs word and phrase frequencies over time, based on the Google Books corpus. The Ngram charts can be embedded in other websites using iframes. Wordsworth's Google Ngram charts default to pulling from Google's "English Fiction" corpus.

How does the Discount Words feature work?

This feature allows you to tell the system to ignore certain words in your passage before analyzing it. It is most useful in situations when you already know that certain words are not found in the comparison corpus and you don't need the system to tell you that again. For instance, Mary Robinette Kowal's fantasy novel Shades of Milk and Honey is heavily influenced by the works of Jane Austen, but it features a hero named "Mr. Vincent" and a magic system known as "glamour". The words "vincent" and "glamour" do not appear in our 1810s corpus, so Kowal might find it useful to tell the system to discount those words.

What are the advantages of registering for an account?

Most of Wordsworth's features are free to use without registering, but if you are frequently using the Discount Words feature, you may wish to create an account. With a free user account, you can maintain a list of words to ignore that Wordsworth will automatically leave out of search results. You will no longer need to input these Discount Words manually every time that you want to analyze a passage of your text.

About Wordsworth's Corpus

What books are in the comparison corpus?

You can view a list of all books in the corpus here.

Where have you sourced the texts for the comparison corpus?

All texts come from Project Gutenberg.

What qualifies a book for your comparison corpus?

All of the books in our corpus are works of fiction written in English between the years of 1800 and 1923. Additionally, the corpus focuses on stories that had a realistic, contemporary setting at the time they were published -- although we've waived that rule in order to include some influential works of genre fiction.

If Wordsworth's corpus focuses on realistic fiction, why does it include horror and mystery stories like Dracula and The Adventures of Sherlock Holmes?

Although Dracula is a fantasy/horror novel and the Sherlock Holmes stories are mystery fiction, they are recognizably set in late-Victorian England and offer much valuable information about language use in that era. They have also proved extremely influential in modern pop culture. Some of our users may be writing mystery or fantasy fiction themselves, and it therefore makes sense for our corpus to include some of the foundational works of the genre.

The bestselling American novel of the 1800s was Ben-Hur. Why isn't it in your corpus?

Ben-Hur is itself a work of historical fiction, taking place in Biblical times. Therefore, its language would likely skew our corpus results in a way that wouldn't be helpful to someone who's using Wordsworth to help them write a story that is set in the 1800s. This is also why we have avoided adding other popular historical novels like Walter Scott's Ivanhoe (written 1820, set in the 1100s), Edith Wharton's The Age of Innocence (written 1920, set in the 1870s), or Charles Dickens' A Tale of Two Cities (written 1859, set in the late 1700s).

I am writing a World War II novel. Why doesn't your corpus go up to the 1940s?

As of 2019, the public domain in the United States ends in 1923. Therefore, to avoid potential legal liability, we have avoided analyzing and storing word-frequency data from novels published in 1924 and beyond. As more 1920s novels enter the public domain in the years to come, we will try to add some of them to our corpus. If you feel that the current 95-year U.S. copyright term is ridiculously long, lobby your Congressional representatives.

I am writing a novel about the Tudors. Why doesn't your corpus go further back than 1800?

We chose 1800 as our start date because that's when the Google Ngram data starts. (Google may have made this choice because English spelling was much less standardized prior to the publication of Samuel Johnson's dictionary in the UK in 1755 and Noah Webster's dictionary in the USA in 1806.) Additionally, realistic contemporary prose fiction—the mainstay of Wordsworth's corpus—wasn't really a thing in the 1500s and 1600s. However, since several users have requested a corpus for earlier centuries and there wouldn't be copyright issues with that, we will continue to think about whether and how to implement this.

I've thought of another text that isn't in your corpus, but it should be. How can I add it?

Email us at marissa.wordsworth @ gmail.com and suggest it! If it fits the corpus parameters, we'll look into adding it in a subsequent release. However, we do not plan to develop a feature to allow users to update the corpus themselves.

About Wordsworth in General

How did you come up with the idea for this app?

Wordsworth is inspired by the work of Ben Schmidt, a digital-humanities professor who, in the early 2010s, wrote about trying to identify linguistic anachronisms in historical fiction screenplays by comparing them to texts written in the period they were set. Schmidt's program and corpus is not publicly available, so we decided to try to create an approximation of it.

Why is your app called Wordsworth? It's not geared toward poets.

We can't resist a literary pun and we told ourselves that no one else would overthink this.

I have a question that isn't answered on this page / I think I've discovered a bug on Wordsworth. How can I contact you?

Email the app's creator at marissa.wordsworth @ gmail.com, or submit an Issue on Wordsworth's GitHub page.