State of Data Journalism 2023 results


AI / OSINT Usage

In 2023 we included a special module in the survey concerning usage of AI and OSINT. We found that just over one in three has used AI as part of their data journalism work. When it comes to Open-Source Intelligence (OSINT), the equivalent share is one in five.

Across different types of newsrooms, AI and OSINT are used at differing rates. While AI is making its way in data journalism practices at similar levels whether the newsroom is local, national, or international in scope, there are large differences when it comes to OSINT, with 23% of respondents working in international newsrooms using OSINT, against 12% of those in local newsrooms. These patterns demonstrate the importance and value of AI for local newsrooms, with particular benefits for reporters productivity.


Of those who have stated they have used AI, 45% have leveraged Artificial Intelligence to search for content. This is followed by content generation (38%) and text mining (37%). Much more rarely, AI is used to perform predictive modelling (8%). AI being used primarily for content searching suggests that data journalists are tapping into advanced algorithms to sift through large datasets, extract relevant information, or even identify patterns that might not be immediately obvious. This points to a trend where the role of AI is to augment the journalist's capabilities, allowing them to handle content navigation and generation more efficiently than traditional methods. This can include automated writing of reports on finance, sports, and elections where data is structured and predictable. Meanwhile, text mining can help uncover insights from unstructured data like social media posts, news articles, and research papers. There were also several respondents who used the “other” text field box to specify that they use AI as a programming assistant. One respondent explained that “I often ask ChatGPT to broadly help me think of two strategies in code (R of Python) with the use of certain packages to answer a question, and then build on that code and finetune it. I also ask ChatGPT about errors I've gotten and what to do with them”, while another added that “GPT4 and GitHub Copilot have both increased my coding productivity significantly.”

The data highlights distinct trends in AI task distribution across various scopes. Tasks such as Data cleaning / processing, Fact-checking and content verification, and Content generation are consistently prominent across all scopes. However, there is a noticeable concentration of tasks like Image and video analysis in international and national newsrooms, while Text mining and Data visualisation are more prevalent in local newsrooms.

The type of AI tools most commonly used are generative AI tools, which have been used by 62% of AI users.

The biggest challenge when it comes to using AI is a limited understanding of the tool or technology (56%). More than half of respondents are concerned about bias in and ethics of AI models. 41% stated they find a lack of dedicated time to experiment with the technology a barrier. The challenges expressed by journalists regarding AI, such as limited understanding and concerns about bias and ethics, are reflective of broader concerns in society about the implications of these technologies. It shows a need for better education on AI and a cautious approach to its application, ensuring that AI assists rather than dictates journalistic work. But there are also issues with unequal access to AI tools, with one respondent explaining that “Users from my country (Russia) are banned from using many of the AI tools due to sanctions and corporate policies”.


OSINT is used by nearly six out of 10 OSINT users to verify sources, images, or videos. This is closely followed by using OSINT to navigate maps, satellites, and location-based information (57%). The use of OSINT for verifying sources, images, and videos is increasingly becoming an essential component of journalism in the digital age, where misinformation and disinformation can spread rapidly online. By leveraging OSINT tools to authenticate content before publication, data journalists play a role in maintaining journalism’s credibility and trust with audiences. The use of OSINT tools to navigate and analyse maps, satellite imagery, and location-based data may also indicate a journalistic trend towards more in-depth investigative work. Such tools enable journalists to verify claims and report on events from a distance, which is particularly valuable in hard-to-reach conflict zones or inaccessible regions for other reasons.

Most OSINT used comes in the form of public government data and reports (72%), while news media and internet media are used by around six out of 10 OSINT users. The prevalence of public government data and reports as sources for OSINT reflects the importance of transparency and official records in journalistic research. However, the use of news media and internet media from citizens points to the growing role of user-generated content in newsgathering and storytelling.

Open Source Intelligence users provided a comprehensive overview of the challenges associated with OSINT. One respondent voiced frustration over lack of formal educational resources, stating, "There's no formal training or school to learn OSINT skills for journalists, leaving many of us to figure it out on our own." Evaluating the quality of social media data appeared as another challenge, with another participant stressing that, "Better skills are needed to understand how to do this effectively and in a timely manner." These challenges imply, as respondents suggested, that "OSINT is very time intensive”. This is particularly the case for freelancers who lack a supportive network for data validation, one respondent indicated. Access to essential verification tools also posed a significant barrier, as noted by a participant who remarked, "Tools to verify information or fact-check data are not open source and often require expensive subscriptions or are locked behind paywalls or private institutions who hold the license" Concerns were also raised regarding the removal of user-generated content by tech companies, with one respondent stating, "Many big tech companies are removing sensitive data because it violates platform policies. But when this data is removed, it means it disappears for documentary evidence purposes."

Despite these challenges, respondents shared valuable insights into strategies for resource acquisition and skill enhancement. One suggestion included subscribing to OSINT resources like newsletters and following hashtags on social media. Another respondent learned by reading “the methodologies from investigations published by reputable newsrooms”. Several institutions were mentioned as providers of useful case studies, including Bellingcat, GIJN, ICFJ, and EJC.

Ethical considerations also played a role in respondents' perspectives, with an emphasis on data security and privacy. As one respondent put it, "Data security is key for privacy and digital security reasons when it comes to working with OSINT data." Another respondent talked about the balance between public interest and the right to privacy, stating, “Sometimes sensitive data is shared in a story, but this is often when the importance of the story is heavily in the public interest. Sharing sensitive data should be proportionate and justified. ”

In conclusion, the findings from these survey modules underscore the growing significance of Artificial Intelligence and Open Source Intelligence in data journalism practices. AI primarily aids in content searching, generation, and text mining, enhancing journalists' capabilities. Challenges include limited understanding, bias concerns, and unequal access to AI tools, emphasising the need for education and cautious implementation.

OSINT plays a crucial role in combating online misinformation, aiding journalists in source verification and investigative work. Its significance in addressing misinformation will likely grow in the context of upcoming elections and the reporting of ongoing conflicts like those in Ukraine and Gaza. By utilising tools such as satellite data and social media analysis, journalists can access hard-to-reach locations and provide accurate reporting. The steps needed and methods undertaken to generate a piece of factual reporting are used to promote the validity of the coverage and trustworthiness of the media brand, as BBC Verify attempts to do.

Finally, we already see that the marriage of AI and OSINT creates a powerful toolkit for data reporting, as indicated by one respondent: “AI algorithms assist me in sifting through large datasets and identifying patterns, trends, or anomalies that may not be immediately apparent through traditional methods. For example, I use machine learning models to automate the categorization of vast amounts of information collected during OSINT processes. These models can help classify data into relevant topics, themes, or sentiment, enabling me to quickly distill insights and focus on key areas of interest.”

