This is an app that can take large amounts of text and make sure that you're not overusing particular words. This started as most things do these days when you run across an errant post on social that RUINS YOUR DAY. For me, that was author Selene Kallan's Facebook repost of her Retweet. My own manuscript of nearly 90,000 words had 47 instances of "sigh" in it. Was that too much? Too little? Would other writers want to know?
So I solved this problem like I do most in my life. I scraped more data from the web than anyone else ever has on such a snipe hunt. Specifically, from the Brown Corpus of American English, which has about a million words from various sources all bundled up nicely for analysis. And very helpful, their data can be broken down by lemmas, which is basically the root form of the word, so plurals and tense changes don't get binned separately (e.g., sigh, sighs, sighed, all get counted as "sigh").
One note, I seemed to have a problem when I compiled the lemmas. It seemed to miss the verb form "be", which is about 7% of American English, which I find hilarious. I could not replicate the error, which means there may be more silliness in this tool. If you see something like that, please Contact Me.
Epilogue: My 47 instances of "sigh" represented 0.054% of my manuscript, as opposed to the Brown corpus, where it is 0.051%. It turns out my heroine does not sigh too often, afterall.
Lemma | Count | Your Frequency | Corpus Frequency | Absolute Difference | Relative Difference |
---|---|---|---|---|---|