"#News #publishers limit #InternetArchive access due to #AI scraping concerns."https://www.niemanlab.org/2026/01/news-publishers-limit-internet-archive-access-due-to-ai-scraping-concerns/

petersuber

"#News #publishers limit #InternetArchive access due to #AI scraping concerns."
https://www.niemanlab.org/2026/01/news-publishers-limit-internet-archive-access-due-to-ai-scraping-concerns/

PS: I'm one who thinks AI training on copyrighted content is #FairUse and (separate point) even desirable in the case of academic research.
https://fediscience.org/@petersuber/113443473594224752

But this kind of training will create huge collateral damage --indirectly through publisher action -- if it diminishes the @internetarchive.

#Copyright #Journalism

petersuber

Update. It's happening. "News Publishers Are Now Blocking The Internet Archive, And We May All Regret It."
https://www.techdirt.com/2026/02/13/news-publishers-are-now-blocking-the-internet-archive-and-we-may-all-regret-it/

@mmasnick is right: "In our rush to punish #AI companies, we’re destroying public goods that serve everyone…We’re sacrificing the historical record not because of proven harm, but because publishers are worried about what might happen. That’s a hell of a tradeoff."

#Copyright #InternetArchive #Journalism #Publishers
@internetarchive

petersuber

Update. But are #publishers right to worry that #AI companies can freely scrape the #WaybackMachine in order to train their tools? No, says Mark Graham, director of the Wayback Machine.
https://www.techdirt.com/2026/02/17/preserving-the-web-is-not-the-problem-losing-it-is/

"The Wayback Machine is built for human readers. We use rate limiting, filtering, and monitoring to prevent abusive access, and we watch for and actively respond to new scraping patterns as they emerge."

#Copyright #InternetArchive #Journalism
@internetarchive