Get ready for AI models to potentially get a massive content infusion from two of the internet’s biggest user-generated platforms – WordPress and Tumblr.
According to reports, the companies behind these blogging giants are gearing up to sign a deal that would grant AI firms like OpenAI and Midjourney access to a treasure trove of user-created posts, images, and other content to train their artificial intelligence systems.
Table of Contents
WordPress and Tumblr Eye Massive Content Deal with AI Firms
Now, nothing is officially announced yet. But the reports cite internal sources and communications showing that Tumblr, in particular, has been scrambling to prepare a large data handover of user content spanning from 2014 to 2023.
And we’re not just talking public posts here. Allegedly, private blogs, deleted accounts, explicit material, and even unanswered askbox messages may be getting scooped up and packaged for the AI companies to digest and learn from.
Yikes, right? A bit unnerving to think your old emo Tumblr musings or private WordPress posts couldEnd up being a tiny part of the next ChatGPT brain.
The parent company Automattic, which owns both WordPress and Tumblr, hasn’t confirmed the specific deal yet. But they did release a statement emphasizing how important it is for users to have the ability to opt-out of having their content used for AI training purposes.
Automattic says it’s working directly with “select AI companies” under the condition that opt-out preferences are “respected” and any users who opt-out later can have their past data removed from the training pools.
The company claims it already blocks typical web crawlers used by AI firms and lets users discourage search indexing, which should also deter AI content vacuums. Though they admit there’s no legal requirement…yet.
Pending legislation in the EU could put more power in users’ hands over how their content gets collected and repurposed to make our new robot overlords smarter.
For now, Automattic says any partnerships will align with hot-button issues for users like proper attribution, opt-out controls, and having a real say over how your content gets disseminated and used.
Still, the idea of mass personal blog archives potentially getting fed into proprietary AI systems is definitely raising some eyebrows around data privacy and consent. Looks like the world of user-generated content and artificial intelligence are on a collision course whether some bloggers like it or not.