Newsrooms need actionable data

At smartocto we believe that data is really only useful if it’s actionable. Anything else is noise. And newsrooms are noisy enough.

So we’ve been excited to work on the News Needs Notifications project (Triple N) with Dmitry Shishkin over the past few months.

User Needs proved to be a useful content commissioning framework at the BBC World Service, so we wanted to see if we could take that framework and create actionable notifications which would allow editors to monitor the efficacy of their content as it goes through the Story Life Cycle.

Starting in September, while Dmitry, Erik and Rutger were working with newsrooms on the end-user side of the project, the data team - headed up by our CDO Ilija Susa - hunkered down and started creating the actual tool.

· · · · · · · · · · · · · · · · · · · · · · · · · · ·

The brief we gave our data team:

Create a system of notifications to enable newsrooms to see when a story is ready for a follow up, and identify the optimal way to deliver that story via the user needs approach. We also need insights on business level, to be able to provide practical advice on topics, authors and story performance.

· · · · · · · · · · · · · · · · · · · · · · · · · · ·

things have to be trusted to be used

The challenges:

As with all new endeavours, there’s always a niggling thought: if it’s such a good idea, why is no one else doing it?

Here’s Ilija on some of the concerns they were keeping an eye on right from the start:

  1. “The first question - and our main concern - right from the beginning was accuracy. We know that editors are wary of integrating data into their workflows, and if we’re asking them to start using a tool which creates actionable notifications, the algorithm needs to be bombproof for them to trust it. So, we spent time working out how to get a formula to create a workable data set, which would register higher than the typical average of 80% found.
  2. Secondly, it was clear that working out how to flag content was going to present a challenge. We needed to automatically flag content into the six user needs categories. It turned out that this hadn’t really been tried before. It’s commonplace to sort content by theme, topic and writing style, or stylometry (and it’s done all the time this way). An academic paper, published in Brazil, had tried to classify by format. I found a couple of places which did it by writing style, but the accuracy was low - at about 65-70%. We learned very quickly that as there are no concrete distinctions between these categories, results will likely differ from person to person - it’s fundamentally subjective.
  3. “Then of course, thirdly, there’s the issue of what you do with those data sets, and how you create a system of notifications when the flagging algorithm isn’t 100% accurate. It’s another layer of analysis and parameterisation on top of the classification algorithm. We knew from the start that we'd be working with different statistical, lexical, semantic AND stylometry features - that’s a bit outside the norm.”

Those are concerns from a data point of view, but we know there’s a broader issue hinted at in the first two. It’s simply that unless newsrooms can trust it to work, it’s nothing but a lovely theoretical exercise. And no one needs another of those.

Things have to be trusted to be used. So we asked the data team exactly what they’ve been up to, in order to shed a little light on the process of creating those notifications.

The algorithm and notification-building process

Read the interview here for all the technical details, or skip ahead to the summary.

OK, so Ilija, you’re going to have to talk us through this, because we’re not data scientists. How do you even start with a project like this?

Well, we started off with a zero measurement report (more on that in a bit) to get an idea of where newsrooms are currently at with their balance of news. It confirmed what seems to be a widespread trend of over-reliance on one user need - ‘update me’. In fact, our data confirmed what had been found at the BBC: in the content we surveyed, it appears around 70% of the time.

Now, this is something which baffles me as a non-data person: how do you categorise content into user needs? I imagine it’s much more simple if you’re trying to organise by topic, for example?

Absolutely. Topic analysis is straightforward - there are likely to be keywords you’re going to have in common across articles about, say, ‘soccer’, or ‘coronavirus’. But with user needs, that’s not the case. I read a research paper about some people who tried to segment along the lines of writing style and article format, but didn’t return very high accuracy rates - and that’s a deal breaker.

I’m glad you brought that up. Why’s that the case?

If you’re relying on an algorithm to predict churn, for example, a slightly lower accuracy rate doesn’t impact the overall result, but when you’re using an algorithm as the foundation for a notification system like this one, it’s a completely different matter.

Here, if the algorithm is failing to correctly categorise articles in the first place, then the notifications we’re asking it to fire as a result won’t be accurate - and no editor in their right mind is going to go for that. Trust is key. Accuracy is essential.

So how do you go about creating that first data set, then?

We started off by automating the process to get our foundational data set - but it became apparent that this wasn’t going to work. We switched to doing it manually.

content doesn't fall neatly into a single category

What were you looking for at this stage?

First it was a case of sorting content into the correct user need. But even this wasn’t without its problems. The reality of content is that a lot of the time, it doesn’t fall neatly into a single category - there will always be content that bridges multiple user needs, which makes it difficult to categorise, but that’s a reality that we have to be able to work with. Real life isn’t always neat.

How did you solve this problem?

We approached this problem by assigning each piece of content 5 points, and asked our team to distribute those points across the user needs they thought the article in question fulfilled. Initially we did this for 20-25 articles, independently of each other, and then merged the results.

As people who are familiar with the user needs framework, we thought it would be indicative of how easily this could be done. Generally the results showed shared beliefs, but in 15% of cases, different user needs were flagged - in most cases because of the multiple categories those articles sat in. That was still enough to keep us up at night.

What did you learn from that error rate?

It was an opportunity to check and test our findings - especially because we knew why those errors were being recorded. Bojana [one of the data team] took a sample of 30 articles from Omroep Brabant which were created after the user needs workshop and which fell into categories other than the ‘update me’ one, and then she tried to process the articles in a different way to check if the algorithm worked. With the exception of one article, all flagged correctly. And, that single inaccuracy was due to the fact that - once again - that article was flagged in multiple categories.

At this point we’d been working mostly with a semantical analysis - with the text and transcript of articles. This is fine if you can guarantee that a certain type of article is always going to be written in the same way, but of course in reality, that’s not always the case. If we prepared the algorithm for one client this way, it likely wouldn’t work for the next, so we had to work on a lexical approach as well. In the end we looked at something like 50 different features - things like number of words, verbs and nouns, whether or not there was a gallery or internal image, peoples names. So although the examples we were drawing on were semantic in scope, we ended up with something more recognisable as a lexical analysis.

You mentioned that 70% of content typically flags under the ‘update me’ user need. How do you work with what sounds like such an unbalanced data set?

You’re right, and one of the challenges here is in fact exactly the reason why this project is so important: we were working with severely imbalanced data sets. We had to use sampling techniques to address this.

And what about building the actual notifications?

Once we worked through that foundational set, we manually flagged 1 000 articles and trained the algorithm through those. I took a couple of weeks away from the office to focus solely on this, and without those day to day distractions, we produced something workable and accurate.

This foundational work was the determining factor in many ways: the mechanics of building the actual notification weren’t hugely dissimilar from our existing bank of notifications, so that part was quite a fun exercise.

The user needs approach for covering the news is functional, emotional or contextual

Talk us through this diagram, Ilija. We talk a lot about six user needs, but here you’ve divided them into three main groups, not six. What’s all that about?

We discovered that - from a data point of view - the main differences (semantic and lexical) could be separated into three main categories of news content. It was much harder to make the distinction between six groups than it was between three, and in fact we found that the results were more than adequate for this initial phase of the project.

So those types are ‘functional’, ‘emotional’, and ‘contextual’ context, right? And two user needs ‘sit’ in each of those categories?

Exactly. These three groupings - though the content itself may vary considerably between the user needs - share similar enough components to be a workable framework for our algorithm. And, most importantly, from a data point of view, these three have proven sufficient - at least at this stage - to be able to identify and trigger accurate notifications.

Give us an example of the kind of notification you’ve built.

We have a notification that is something like “This article has had a good start - in the first hour it’s generated 50 000 page views - why don’t you write another article from another user needs perspective?” - and of course we’d identify which user need would be most effective to do that.

Obviously at smartocto we’ve spent a lot of time building notifications which alert users if an article has generated so many page views, so that part had already been written. The trick now is joining the dots and helping newsrooms to address those other user needs.

And, that’s the main point of this anyway, isn’t it?

Exactly. Ultimately we need to be able to create a tool which works, which serves a purpose and is useful to the newsroom. Of course there are still things which we’d prefer to change, but it’s a great start - and we know from our experience at Omroep Brabant that it’s working.

Download our 'Actionable user needs' whitepaper

Discover how a user needs project with our notifications changed the workflows, output and engagement of 3 newsrooms.

So, for those of you who prefer a quick summary:

This project isn’t an intellectual exercise or a project destined to gather dust. The purpose of this project has been to find a way to create actionable notifications, which can be guaranteed to help boost engagement in actual newsrooms. It has to work, and it has to be trusted, to be effective.

The project

  • This is a new approach. There are examples of algorithms which categorise by style or topic, but none which differentiate by user need - and that’s something which newsrooms could really benefit from.

Issues of accuracy:

  • The main obstacle to its success was in how accurately articles could be flagged into the correct user need category. If that failed, the notification would also.

Defining the right approach

  • A data set organised around the parameters of user needs isn’t impossible to create, but it is necessary to combine lexical and semantic approaches to achieve the accuracy required.
  • We have approximately an 85% accuracy rate in flagging content correctly, but the 15% ‘error’ is mostly attributable to content straddling categories and can be fixed (and has been).

Building the notifications themselves

  • We used the building blocks of our existing notification system, which are proven to work with high accuracy rates.
  • We found that although there are six distinct user needs, for the purposes of creating accurate notifications, three broad groupings of content type were sufficient.
  • Those three types are functional, emotional and contextual content.

There’s lots more to read about the Triple N project. Find out how it started and how it’s going. And if you’d like any more information about how this might help your newsroom, get in touch. We’re always happy to chat.

Like our content?

Subscribe to our newsletter to receive blogs, news and tips right in your inbox.