Evaluating AI with Standardized Prompts
A consistent series of prompts is used for each AI model, including the new Claude 3.7 Sonnet. Standard prompts are important because they provide a more accurate baseline for output. The consistent parameters make it easier to compare. It uses basic rather than heavy prompt engineering. As the reviewer notes, heavy prompt engineering is less important as models improve.
Eventually, writers should simply communicate clearly, and the models will write well, without needing hacks or prompt engineering. This helps test how well an AI model responds to a prompt with less manipulation.
A Look at the Document Used for Measurement
The information collected is based on the reviewer's analysis. This is, to his knowledge, the only qualitative assessment of AI models for creative writing. Here's what the assessment document looks like:
The reviewer uses a series of prompts, using the exact same prompts for each model to provide a control. The prompts are relatively simple to assess baseline performance.