Why Editorial Must Own AI Implementation

News organisations are building AI systems as technical projects with editorial consultation. They perhaps have it backwards.

The implementations that succeed will treat AI as an editorial capability that requires engineering support, not the reverse. This isn’t a radical position. Across industries, leading organizations have recognized that AI transformation demands deep collaboration across functions. BlackRock’s AI governance emphasizes tight collaboration between technical and talent leadership. Moderna created a Chief People and Digital Technology OIicer role, explicitly combining HR and IT leadership to accelerate AI integration. These companies understand that AI touches every aspect of how work gets done, and that no single function can own it alone.

News organizations face an even stronger case for editorial ownership. In most business functions, AI operates as a knowledge tool, drawing on its training data to generate responses, draft communications, or analyze information. In editorial contexts, this approach is dangerous. When a large language model draws from its training data to create or modify journalistic content, the risks of hallucination, factual error, and inadvertent plagiarism become unacceptable. Editorial applications require AI to function as a language tool, processing and transforming content that journalists provide rather than generating knowledge from its own training. This distinction sounds simple. Implementing it is not. And the judgment required at every stage is editorial judgment.

The Current Approach and Its Limits

Retrieval-augmented generation (RAG) represents the current best practice for constraining AI to work only with verified content. Rather than letting a model access its training data, RAG systems provide specific documents, typically the archive, verified sources, and other journalistically vetted material, for the AI to draw upon.

The results have often been disappointing. Experiments where users can “talk to the archive” produce mixed outcomes at best. Publishers in Europe and the United States report that users try these systems once and don’t return. The question is whether this reflects fundamental limitations of conversational interfaces with news content, or whether the implementations themselves are inadequate.

Evidence suggests the latter. Most RAG implementations still follow often a naive pattern: content feeds into a vector database, semantic search retrieves relevant chunks, and an LLM assembles them into responses. This architecture fails because it treats AI implementation as a purely technical problem. Three areas determine success, and each requires editorial judgment: input, retrieval, and output.

The Input Problem

Software follows a simple rule: poor input produces poor output. But with traditional systems, errors can be traced and reproduced. Large language models are stochastic. When something goes wrong, diagnosing the cause within the processing chain becomes extremely diIicult. This makes input quality paramount.

Articles written for human readers are not optimized for machine processing. Consider how journalists naturally write: “The minister announced the policy yesterday. She expects implementation by March.” For a human reader, context makes the referent clear. For a RAG system that chunks content into segments, the second sentence may end up separated from the first, and suddenly the system has no idea who “she” refers to. Semantic search compounds this problem. It finds content with similar patterns, not necessarily similar meaning. A user searching for “Minister Schmidt” won’t surface chunks containing only “she,” even if they contain the most relevant information about the minister’s actions.

Then there’s metadata, consistently undervalued in every news organization I’ve encountered. Without robust metadata, systems cannot filter or contextualize eIectively. Semantic similarity alone cannot distinguish between a 2019 article about “the current president” and a 2024 article using the same phrase. Metadata provides the context that makes intelligent retrieval possible.

The Retrieval Problem

When a user asks “What did Apple announce about privacy?” the query is fundamentally ambiguous. Which announcement? Last week’s feature update, last year’s policy change, or the keynote from three years ago that established their current positioning? The user may have a specific event in mind or want a comprehensive overview. Current RAG systems don’t ask. They assume. They retrieve semantically similar chunks
and synthesize them, potentially conflating announcements from different eras into a response that misrepresents the company’s evolving position.

The best approach when intent is unclear is to ask. But determining what constitutes a clear question versus an ambiguous one requires linguistic expertise. What synonyms might users employ? What diIerent phrasings express the same intent? These are questions that language professionals are trained to answer. Retrieval also involves choosing appropriate search methods. Semantic search alone is often insuIicient. Some queries need full-text search; others benefit from knowledge graphs that map relationships between entities. Knowing which approach serves which query type requires understanding both the content and how users seek information about it.

The Output Problem

Even with sound input and retrieval, the final synthesis step presents risks. The model may inadvertently editorialise, introducing framing or emphasis absent from the source material. It may blend facts from diIerent time periods, creating technically accurate but contextually misleading responses. It may adopt a tone inconsistent with editorial standards.

Identifying these problems requires precisely the skills journalists possess: sensitivity to language, understanding of how framing shapes meaning, and judgment about what constitutes appropriate editorial voice.

Four Editorial Responsibilities

If AI implementation requires editorial judgment at every stage, then editorial must be embedded in the development process, not consulted after the fact. This translates to four specific responsibilities.

Content optimization. Editorial teams must own the process of making content machine-readable while preserving accuracy. This means developing standards for language model optimisation: eliminating ambiguous references, using explicit dates rather than “yesterday,” ensuring entity names appear consistently. This is translation work, converting journalism written for humans into formats that serve both human and machine readers. AI can assist with this task, but editorial must verify the results.

Metadata governance. Metadata is as important as content itself, and editorial must treat it accordingly. This goes beyond basic tagging to include structured information about user needs addressed, topics covered, temporal context, and relationships to other content. Systems can assist with metadata generation and management, but responsibility for accuracy belongs with editorial.

Development participation. Editorial staff must be embedded in AI development teams. They bring essential expertise: understanding how questions can be phrased differently while seeking the same information, anticipating user intent, and ensuring systems reflect journalistic judgment about what matters and why. Aftonbladet in Sweden demonstrated this approach with their EU Buddy project, where editorial teams created comprehensive question-and-answer datasets that dramatically improved response quality.

Testing and feedback. Reinforcement learning with human feedback requires humans who understand what constitutes an adequate response in journalistic terms. Is this answer accurate? Appropriately nuanced? Consistent with editorial standards? Engineers cannot make these judgments, not because they lack intelligence, but because this isn’t their domain of expertise.

The Value Proposition

General-purpose AI will always be more powerful in certain dimensions than purpose-built editorial AI. The scale of systems like ChatGPT is unmatchable. But scale is not the value proposition of news organisations.

What distinguishes journalistic AI from general-purpose AI is precisely what distinguishes journalism from the undifferentiated mass of online information: editorial judgment, verified content, and linguistic precision. These qualities cannot be engineered in after the fact. They must be present from the beginning, embedded in how systems are designed, built, and refined.

The organisations that succeed with AI will be those that recognise this technology as inseparable from editorial function, not a tool that editorial uses, but an extension of editorial judgment itself. That recognition must translate into organisational structure, development processes, and accountability.

AI implementation is not an engineering problem with editorial consultation. It is an editorial problem that requires engineering capability.

This article was also published on the INMA Media Leaders blog .