The idea for the Describing Words engine came when I was building the engine for Related Words (it’s like a thesaurus, but gives you a much broader set of related words, rather than just synonyms). While playing around with word vectors and the “HasProperty” API of conceptnet, I had a bit of fun trying to get the adjectives which commonly describe a word. Eventually I realised that there’s a much better way of doing this: parse books!
Project Gutenberg was the initial corpus, but the parser got greedier and greedier and I ended up feeding it somewhere around 100 gigabytes of text files – mostly fiction, including many contemporary works. The parser simply looks through each book and pulls out the various descriptions of nouns.
Hopefully it’s more than just a novelty and some people will actually find it useful for their writing and brainstorming, but one neat little thing to try is to compare two nouns which are similar, but different in some significant way – for example, gender is interesting: “woman” versus “man” and “boy” versus “girl”. On an inital quick analysis it seems that authors of fiction are at least 4x more likely to describe women (as opposed to men) with beauty-related terms (regarding their weight, features and general attractiveness). In fact, “beautiful” is possibly the most widely used adjective for women in all of the world’s literature, which is quite in line with the general unidimensional representation of women in many other media forms. If anyone wants to do further research into this, let me know and I can give you a lot more data (for example, there are about 25000 different entries for “woman” – too many to show here).
The blueness of the results represents their relative frequency. You can hover over an item for a second and the frequency score should pop up. The “uniqueness” sorting is default, and thanks to my Complicated Algorithm™, it orders them by the adjectives’ uniqueness to that particular noun relative to other nouns (it’s actually pretty simple). As you’d expect, you can click the “Sort By Usage Frequency” button to adjectives by their usage frequency for that noun.
Special thanks to the contributors of the open-source mongodb which was used in this project.