Perplexity and burstiness are the two most frequently mentioned aspects regarding writing, whether human or AI. But what are they exactly? The answer is right in this article, with detailed examples and explanations of how these two measures interact.
Moreover, you’ll also discover how researchers employ perplexity and burstiness in AI detection. Don’t hesitate anymore; start reading now!
What is Perplexity?
Perplexity is a measure to evaluate the predictability of the next word in a sequence of words. The lower the perplexity is, the easier you can guess the next word. It also reflects the writing comprehensibility. So, how is it related to AI-generated content?
AI language models like GPT-4 are designed to closely mimic human writing by producing text with low perplexity, but they are also predictable and boring. In contrast, human writing tends to have higher perplexity due to more creative language choices.
Signs of High Perplexity:
- Unexpected Word Sequence: The words are arranged oddly and unexpectedly.
- Lack of Coherence: The sentence doesn’t make sense, making it difficult for the model to predict the next word.
- Low Predictability: The model has trouble assigning a high probability to any one word as the next in the sequence because of the sentence’s nonsensical character.
Signs of Low Perplexity:
- Common Phrases: The sentence uses a common and well-understood phrase.
- Logical Structure: The words follow a logical and predictable sequence.
- High Predictability: The model finds it simple to predict the next word since each word is highly probable given the words that came before it.
Though it is a helpful indicator of text fluency and predictability, perplexity should not be used alone to identify whether material is produced by AI or by humans.
What is Burstiness?
Burstiness is where AI models use several words or phrases in clusters but not so much afterward. For example, an AI article about a birthday party has a burstiness problem if it mentions “cake” many times in the beginning but doesn’t talk about it anymore in the latter parts.
AI-generated content tends to have a low burstiness score. Meanwhile, human-written text usually has a higher burstiness score. If you find work overly repetitive and dull, check its burstiness score; there is a high chance it is AI-generated.
Sign of High Burstiness:
- Variation in Sentence Length: The text has long and short sentences.
- Mix of Structures: There are simple and more complicated sentence structures.
- Dynamic Flow: The variation creates a more engaging and dynamic flow, which keeps the reader’s interest.
Sign of Low Burstiness:
- Uniform Sentence Length: Every sentence is short and of about the same length.
- Consistent Structure: The sentences follow a similar pattern and structure.
- Repetitiveness: A paragraph can become boring and less interesting to read if there is little diversity.
Some authors use burstiness in their writing style, as repeating words can emphasize particular key points or create dynamic shifts in the story’s rhythm. Like perplexity, though, it shouldn’t be relied upon alone to identify whether content is generated by AI or by humans. For correct detection, you must do a full analysis including many metrics and factors.
Examples of Burstiness and Perplexity
Perplexity
In the table below, TechDictionary has prepared two examples based on their perplexity score and the reason for that score:
High Perplexity | Low Perplexity | |
Examples | The Eiffel Tower, known for its role in the American Revolution, stands tall in Berlin. | The Eiffel Tower, an iconic symbol of Paris, is one of the most recognized structures in the world. |
Reasons for the score | – Incorrect facts – Geographically disparate elements => Possible an AI mistake of combining incoherent information | – Based on common knowledge => Possible AI imitating human writing |
Burstiness
Here are two standard examples of high and low burstiness and the explanation behind the judgment:
High Burstiness | Low Burstiness | |
Examples | Cats are popular pets. Cats are often kept in homes. Many people love cats because cats are affectionate. Cats, with their playful nature, make homes lively. It’s no wonder cats are loved. | Cats are popular pets known for their playful nature. They are often loved for being affectionate, and their distinct personalities make each one unique. |
Reasons for the score | The word “cats” repeats multiple times within a short paragraph => Possible AI-generated text | – More natural flow – Less repetitive word => Possible AI imitating human writing |
How Do Perplexity And Burstiness Interact in Writing?
Perplexity and burstiness seem like two different matters but actually affect each other. The combination of burstiness and perplexity keeps the reader interested and shapes our writing in a unique way. Low perplexity means our writing is clear and makes sense, making us trustworthy to you as a reader. We aim for lower perplexity to ensure our words are easy to follow and don’t become too complicated or unpredictable.
However, we also add a burst of flavor by including surprising turns and twists to turn reading into an adventure. This will give words generated by AI, or any other creative work, life and draw attention without creating misunderstanding.
The balance of two factors is important not only for keeping readers engaged but also for teaching AI models to generate text that sounds more human.
In order to maintain the message clearly and make the material fascinating to read, writers—human or artificial—must master the connection between these two elements.
Perplexity And Burstiness in AI Detection
So, How are Perplexity and Burstiness relevant to AI detection? In fact, most AI detectors, such as Originality AI or GPTzero, are trained with this information. When an AI system is taught to comprehend and produce language, it learns from a large amount of text and tries to predict the next word based on the words it has already seen. Fundamentally, it comes down to finding patterns.
AI models often create sentences in the most natural sequence possible, resulting in low perplexity. Meanwhile, human writing’s perplexity is higher since the word choices are more diverse, leading to typos.
The same applies to burstiness, as AI writing tends to use several words and phrases repeatedly, resulting in low burstiness, opposite to human writing.
That’s why researchers use these two criteria to train AI models. Perplexity helps them compare different AI models and find those with the lowest perplexity.
As for burstiness, researchers employ this score to make AI writing as natural as possible. They’ll feed AI with texts from different genres to avoid repeating particular words or phrases too much.
Conclusion
Understanding perplexity and burstiness’s nature will enhance your knowledge about AI writing and AI detection. From there, you can use these two measurements to enhance your AI models or AI detections.
If you like this article, subscribe to TechDictionary for more. And ask any question about perplexity and burstiness via comment.