Thursday News: Twitter user languages, hidden costs of Getty Images, crowdsourcing book material, and Scribd’s stubborn piracy rumor

Thursday News: Twitter user languages, hidden costs of Getty Images, crowdsourcing...

The Languages of Twitter Users – This is a pretty nifty little chart detailing the different languages used on Twitter, and how their combination has changed since 2007. Although English still represents the largest percentage, its share of the Twitter marketplace has decreased over the past seven years, and several new languages have been added to the mix over the last couple of years (including Swedish, Chinese, Polish, and Thai). –New York Times

Getty Images Allows Free Embedding, but at What Cost to Privacy? – I should have known this was too good to be true. Or, rather, that the use of these images would be far from “free.” In addition to all the normal privacy concerns you have when you use certain types of third-party content, Getty is going to be amassing a massive database on user data that it might “monetize,” which would be an indirect, but no less worrying, charge to image users. Let’s hope that Getty looks at better, less invasive and ethically suspect methods to profit from doing the right thing.

For one thing, given its scale and popularity, Getty Images embeds may appear on a significant number of different sites that a single user visits. That would allow Getty to correlate more information about a user’s browsing history than any single site could. That information, in turn, is subject to government requests, sales to data brokers, or even breaches or leaks.

These concerns might be mitigated by a strong privacy policy or some indication of what Getty intends to log and how it’s going to use it. Unfortunately, we’ve gotten the opposite. –EFF

Aziz Ansari Is Crowdsourcing Reddit for His New Book Modern Romance – So Aziz Ansari has set up a subreddit through which he can source material for his new book on dating and romance, especially within a social media environment. On the one hand it seems an obvious, even genius, strategy, but on the other, it’s going to be interesting to see if anyone objects to having their words used in Ansari’s book, even though he’s clearly stated that all of the responses can be used in his project. And I’ll admit that I’m getting increasingly uncomfortable with the trend of crowdsourcing intellectual property (Kindle Worlds, I’m looking at you). I mean, who’s profiting most here – contributors, individual creators, or corporations?

As Famously points out, r/modernromantics acts as a reverse AMA: Instead of answering Redditors’ queries, Ansari is asking the questions. (There used to be a joke thread posted by a different user, titled something like “Would you leave your significant other for Aziz?”, but it’s since been removed.) So far, Redditors are engaging in the kind of open-minded discussion I see on other dating subs (I frequent r/okcupid), but with far more optimism –Huffington Post Books

Scribd, Piracy, and Why You Can’t Always Believe What You Read Online – Nate Hoffelder addresses the tenacious rumor that users of Scribd can easily pirate books downloaded from that site. Rich Meyer at indies Unlimited has been urging authors to remove their Smashwords books from Scribd based on this rumor. Beyond the ongoing hysteria over piracy, this rumor demonstrates — as Hoffelder points out — a lack of understanding about how technology works, specifically about how publishing technology works.

Juli [Monroe] and I both think that Scribd stores their ebooks in a folder called documents_cache. We came to this conclusion independently, and if that is where Scribd puts the ebooks then I seriously doubt the average user will be able to strip the DRM. The ebooks aren’t stored as ebooks; instead they are stored as collections of JSON, CSS, and image files. And while I can’t speak for the JSON files, the image files have DRM of some kind. –The Digital Reader