Word prevalence indicates word difficulty

Bookmarked Word prevalence norms for 62,000 English lemmas (

Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people.

Word prevalence is also likely to be of interest to natural language processing researchers writing algorithms to gauge the difficulty of texts. At present, word frequency is used as a proxy of word difficulty (e.g., Benjamin, 2012; De Clercq & Hoste, 2016; Hancke, Vajjala, & Meurers, 2012). Word prevalence is likely to be a better measure, given that it does not completely reduce to differences in word frequency.

Dropped into this article from the table of words men are more likely to know than women and vice versa. They are mostly specialty vocabulary – for women, textiles, for men, military and mechanical. I hadn’t thought much before of the gender skew of vocabulary, and what it implies to know certain specialty words over others, and am not sure how to feel about knowing a lot more of the “women ones” despite being a science major and reading lots of military sci-fi and not being super into clothes 😉 I’d be more interested to see gender differences for slightly less specialized words.

By Tracy Durnell

Writer and designer in the Seattle area. Freelance sustainability consultant. Reach me at She/her.

Leave a Reply

Your email address will not be published. Required fields are marked *