@tasket @angelfeast It's not clear to me that I'm looking at the right place. Is this the data being used by Mozilla? I'm hoping that could be resolved by more than the 10 minutes of research I spent on it. I'd like even more for it to require much less research to understand the supply chain of a product offered as a public service. I've also got lots of reasons not to give them the benefit of the doubt here.
twifkak@mas.to
Posts
-
Firefox uses on-device downloaded-on-demand ML models for privacy-preserving translation. -
Firefox uses on-device downloaded-on-demand ML models for privacy-preserving translation.@tasket @angelfeast https://paracrawl.eu/moredata says "This is a release of text from Internet Archive.... The project also used CommonCrawl which is already public." Those crawls quite famously/infamously include copyrighted content. I don't see anything to suggest they filtered those datasets for public domain annotations. (Not that such an annotation would be enforceable, but it would at least be an indication of intent.)
-
Firefox uses on-device downloaded-on-demand ML models for privacy-preserving translation.@tasket Perhaps. Show me what rights they have to it.
-
Firefox uses on-device downloaded-on-demand ML models for privacy-preserving translation.@tasket It would be, much in the way that "guaranteed not to turn pink in the can" is a valid description of bad salmon [1]. A disingenuous mislead from what people really care about in the product.
[1] I know it didn't happen. It's a good metaphor.
-
Firefox uses on-device downloaded-on-demand ML models for privacy-preserving translation.@firefoxwebdevs What do you mean "open data"? https://firefox-source-docs.mozilla.org/toolkit/components/translations/resources/01_overview.html points to https://browser.mt/ points to https://paracrawl.eu/index.php which says "We do not own any of the text from which these data has been extracted."