AI projects are only as good as the data sources they can access, and as publishers become more aware of the opportunities that they have to license their work to specific AI providers, the race is heating up to secure access contracts, and ensure that your AI bot is more informed and accurate than the other.
Today, Wikimedia Foundation, the group in charge of Wikipedia, has announced new access deals with Amazon, Meta, Microsoft, Mistral AI, and Perplexity, which will enable these AI projects to gain more direct access to Wikipedia info to power their AI systems.
As per Wikimedia:
“In the AI era, Wikipedia’s human-created and curated knowledge has never been more valuable. Today, Wikipedia is among the top-ten most-visited global websites, and it is the only one to be run by a nonprofit. Global audiences view more than 65 million articles in over 300 languages nearly 15 billion times every month, and its knowledge powers generative AI chatbots, search engines, voice assistants, and more. Wikipedia remains one of the highest-quality datasets for training Large Language Models.”
Wikimedia’s Enterprise APIs enable commercial deals linked to Wikipedia data, which provide another form of income for the non-profit repository.
And now, Wikimedia will be securing more of that funding from these AI projects, as the platforms look to sure up their data inputs to maintain their AI tools.
Information supply is becoming a bigger consideration, with all the big players signing access deals with the major publishers. OpenAI, for example, now has deals in place with news publishers like News Corp and Conde Naste, while it also recently signed a content licensing partnership with Disney for image generation. Meta has signed deals with several major publications, including CNN, Fox News, People and more, while xAI relies on real-time data from X to power its responses.
The need for information is what’s sparked speculation that OpenAI may look to acquire Pinterest, because without an owned data source, it’s going to be increasingly hard for these projects to go it alone, and develop their own AI offerings.
That was further underlined recently, when Reddit sued several major AI projects for data scraping, as it looks to protect its data sources.
Having access to trusted, vetted, verified info is crucial to ensuring the accuracy of AI answers, and that’s likely to price many smaller AI players out of the market, as the big platforms win exclusive rights to more content.
Really, this underlines the ongoing value of journalism, and of platforms that can provide vetted data. Which may well ensure that original, researched content isn’t superseded by AI generators, as AI tools won’t work without such inputs.
Does that mean that original, well-researched content is actually of more value in the AI era?
I mean, someone’s gotta’ be doing the work, right?