How AI can assist in headline testing

Nearly every major news website relies on A/B testing for their headlines. Editors often struggle to come up with additional headlines for their articles. So, how can AI be of assistance in headline testing tools?

What follows is the result of a collaboration between an editorial analytics company, (smartocto) and a Dutch regional broadcaster (Omroep Brabant) to see if ChatGPT is able to generate compelling alternative headlines, and whether this can be integrated into the workflow of a busy news site.

Results sneak peek:

ChatGPT often comes up with winning headlines
There were practical challenges in using this approach
Ultimately the final decision needs to rest with the editor

The study: compare the original with the winner

First, a baseline. To establish that, smartocto analysed nearly 9,000 articles from 57 news sites around the world published during June 2023 which had undergone conventional, editor-led A/B headline testing.

The best metric to measure impact is the Click Through Rate (CTR), representing the percentage of website visitors who clicked on the headline.

(The original headline might also be the winning headline, because we’re assuming that the original headline would have been the headline if there hadn’t been a test. When calculating ‘winners’, the percentage of ‘loyalty clicks’ was taken into account - the percentage of visitors who stayed on the page for 15 seconds or more, thus identifying clickbait)

Smartocto and Omroep Brabant are hosting a free webinar on A/B testing, where these findings will be further discussed and tips for successful implementation of this strategy shared.

Ask for a demo

The graph shows that, on average, the winning headline received 19.3% more clicks than the original headline.

How are the results with ChatGPT?

So, back to the first results of the ChatGPT test with Omroep Brabant. Of 40 headlines tested at that broadcaster, ChatGPT came up with the winner 23 times; the original headline won 17 times.

But, it’s when we look at uplift that the result becomes interesting: our findings show that while it’s worth using ChatGPT, it’s better to come up with alternative headlines yourself. There’s a +19.3% uplift for the human editors and +9.2% for ChatGPT.

There’s potential for further study on this, but there are three key takeaways so far:

ChatGPT offers suggestions, not solutions

Ultimately, ChatGPT is a suggestion machine, not a solutions machine and this gets to the core of what it (or any other AI tool) does. That's why we had ChatGPT generate three alternative headlines, from which the editor would choose one. In many cases, these headlines needed further adjustment for the following reasons:

Suggestions were factually incorrect.
Headlines provided were too clickbait-oriented.
They contained too many emotional words like ‘heartbreaking’ or ‘apocalyptic’.
Headlines didn't align with the Omroep Brabant brand and tone of voice.
ChatGPT wouldn’t give suggestions on sensitive content such as ‘rape’ or ‘murder’. The company behind ChatGPT, Open AI, thinks that’s dangerous.

The editor is in charge

To be clear, the goal here wasn't to determine whether ChatGPT can outperform journalists in headline creation, but rather to see if ChatGPT can be beneficial in helping to make suggestions. One good example of doing so:

Original headline: ‘Flames shoot out from the roof in a major house fire in Breda’
The alternatives were not compliant, but ChatGPT suggested adding that two houses were evacuated. The editor combined headlines: ‘Flames shoot out from the roof in a major house fire, two houses evacuated’. That headline won the test with a CTR of 11.35% against 8.46%.

Use ChatGPT as a sparring partner, much like you would with a colleague. "We were quite critical of the suggestions that ChatGPT came up with, but often we still saw a new perspective that we hadn't thought of ourselves," says Omroep Brabant editor Janneke Bosch.

Editors must pay attention to the ChatGPT prompt input

As an editor, you can also modify the prompt, which aligns with your responsibility and journalistic ethics: if you want shorter headlines as suggestions, specify this. It’s easy to make a mistake here. During the first 20 articles tested, the prompt was very simple: "Create three alternative headlines for this article: [article URL]."

The problem was that ChatGPT 3.5 (the free version used during this study) cannot browse the internet and so any alternative suggestions were solely based on the provided URL. Surprisingly, the input was still reasonably good. After that, smartocto’s prompt engineer helped and together with Omroep Brabant, this is the prompt they’re using now:

In conclusion…

Editors are generally great at crafting alternative headlines, but if the editorial team lacks the time or energy for brainstorming alternatives, ChatGPT can be a valuable collaborator with just a few clicks.

That said, it’s far from perfect. The editors at Omroep Brabant noticed that ChatGPT often provided unusable suggestions when they wanted to express something unique in the headline - maybe to emphasise that someone has been interviewed or that it was a follow-up.

As with so much in the world of big data, the answers given are only ever as good as the questions asked. Whether or not editors feel that modifying prompts is more work than just generating alternatives through AI-free brainstorming is perhaps a personal preference - but there are A/B testing tools that can automate this task...

This story was first published on INMA.org