Computational Journalist Francesco Marconi On How Editorial Algorithms will Transform the News Industry
Reported by Hannes Cools
Artificial intelligence news coverage is full of stories of incredible technological feats and machines that take over humans’ jobs. In reality, the threat is less dramatic but just as transformative, according to computational journalist Francesco Marconi, who predicts that the future of journalism will be driven by editorial algorithms with human editors at the helm.
Marconi previously co-led AI at the Associated Press and later served as R&D Chief at The Wall Street Journal. In 2020, he published “Newsmakers: Artificial Intelligence and the Future of Journalism”. Marconi recently spoke to us about computational journalism, his new company launched at Newlab, the skills news professionals will need, the disruptions to come to newsgathering and how AI could be the key to creating “ground truth” for the industry.
What is computational journalism? Where does the term originate from?
Broadly speaking it refers to the approach of using computational methods to source, analyze and distribute information. My point of view on computational journalism is centered around the key premise that news events can be tracked and explained with the same level of analytical rigor used by scientists to study the natural world. It has to do with building tools that detect newsworthy events in continuous flows of information.
Computational Journalism is a recent field but one that is quickly becoming foundational for newsrooms. It was developed as a response to the flood of information that society is now exposed, to which presents a new challenge that cannot be solved exclusively with the traditional reporting methods. The explosion of data from the web, sensors, mobile devices and satellites has moved us to an environment where there’s too much information. As a society, we retain and produce more information now than at any previous point in human history, it takes much more effort to filter out unwanted information than to accumulate it.
Computational Journalism uses methods from data science and computer science to allow us to filter through high volumes of information in a much more effective manner. It’s about contextualizing raw data by continuously tracking multiple sources and by processing and vetting it in real-time. Artificial Intelligence, particularly machine learning, natural language processing and natural language generation play an important role in this new approach. Being able to use AI to derive analytical signals allows computational journalists to ground reporting in facts and eventually could establish “ground truth”.
This field of journalism originated in multidisciplinary research settings at institutions like Columbia University, Stanford University and Northwestern University, among others who started setting up Computational Journalism Labs to bring journalists, academics and technologists together. Eventually established organizations like the Associated Press started experimenting and adopting some of these practices in their to day operations. The proliferation of AI and cloud computing combined with its adoption by major news organizations are major factors contributing to the dramatic rise of computational journalists in recent years. In addition, other industries beyond news are starting to use it as a business intelligence tool capable of generating reliable information to make decisions.
What is the relation between computational journalism and data journalism?
Traditional data journalism focuses on enhancing reporting by analyzing statistics that can provide a deeper insight to a particular story. Computational journalism on the other hand looks at ways of providing context at scale, by constantly monitoring information and building tools to surface newsworthy events.
Computational journalism focuses on creating repeatable processes of sourcing data and building information systems that can be used more than one time. To use an analogy, data journalism finds the source of water, while computational journalism builds the aqueduct so the water can constantly flow to those who need it. To be clear, both functions are crucial, often computational journalism starts with a data journalism exploration.
The role of computational journalists is not necessarily to write or visualize data-driven stories, but to write editorial algorithms that use journalistic principles to detect and contextualize data from a diverse array of sources. Such systems need transparency and accountability that tech companies cannot offer and that’s why the role of journalists is so important when building the information platforms of the future. Editorial algorithms are coded with journalistic principles and are designed to avoid the pitfalls of machine bias.
What are good examples or case studies of computational journalism?
There are many case studies from different news organizations, but I can highlight some examples that I’ve worked on in the past.
At the Associated Press, we used natural language generation to automate part of AP’s financial and sports coverage. In business news, the newswire went from covering 300 companies with human writers to covering over 4,400 companies with the help of smart machines. What’s impactful about it, is that although the coverage volume increased significantly, the error rate went down. This effort started in 2013, and is one of the first computational journalism efforts done at scale by a major news organization.
At the Wall Street Journal, Talk 2020 is an AI-powered text analysis platform and search tool that allows the newsroom to access 30 years of public statements made by politicians. It pulls transcripts from speeches, rallies, interviews and press conferences. The tool allows reporters to track what candidates are saying, investigate their stances on issues over time, explore speech patterns and perform other text analysis. This project initially started as an internal tool, it was later made available to subscribers through the 2020 election season.
The opportunity to work at these two major news organizations inspired me to start my own company, Applied XL which recently received a seed investment. We are building event detection systems to source data using custom algorithms rooted in editorial principles, which are constantly calibrated and tuned by experts in the loop. Our growing catalog of algorithms draws on a set of very specific newsworthiness criteria that, when triggered, dynamically generate news alerts with context. Our first vertical will focus on life sciences, where we are mining data on clinical trials, pharmaceutical industry regulation, and healthcare policy.
What is a computational journalist? What are the right skills to become a computational journalist?
Computational Journalism is a multidisciplinary field which takes best practices from different areas including data science, engineering and product development. In terms of specific skills it’s relevant to know programming languages such as Python, R and SQL (which are also used in traditional data journalism). In addition, computational journalists need to be knowledgeable of machine learning, natural language processing and natural language processing. Since Artificial Intelligence plays an important role in the tools developed by computational journalists it is equally important to understand data pipelines as well as cloud services.
What is the future of computational journalism?
Computational journalism is the future of news. In fact, what we call “newsworthy” today, in the next decades will evolve into a process exponentially more focused on finding statistical outliers and explaining them through human perspective. Editorial algorithms will play a major role in how all news organizations source and analyze data. This doesn’t mean that the future of journalism is about letting the machines run loose. It’s the opposite. Computational journalists are still the ones who decide what weights, parameters and transparency principles to apply to their machine learning models. Optimal computational journalistic performance happens somewhere between finding these new data signals and having humans validating and contextualizing them.
By Francesco Marconi, Columbia University Press, 2020
By Nick Diakopoulos, Harvard University Press, Jun 2019
By Kevin Roose, Random House, Jan 2021
By Journalism AI (Polis LSE), March, 2021
By Reuters Institute for the Study of Journalism, January 2021
Join over 160 member companies and 800 experts and innovators applying transformative technology to the world’s biggest challenges.
Apply for Newlab membership.