Author Frequently Asked Questions Regarding Text to Speech Functionality

At the suggestion of Peter Brantley of the Internet Archive, I offer up this Frequently Asked Questions for Authors regarding the Text To Speech (TTS) functionality that is the subject of debate. This may be an evolving document as more people provide input so that it adeqately addresses the issues. Please feel free to offer suggestions and/or revisions in the comments section.

***

Q: I’ve heard that there is some debate over Kindle’s Text to Speech Function. What is it and should I be concerned?

A: When Amazon released it’s Kindle 2 in February, it announced that it had included the ability for every document/book/written work on the Kindle to be real aloud using a robotic voice (either girl or boy). You can hear a sample of it here as read by Wil Wheaton. The TTS functionality was switched “on” as a default. Author’s Guild objected to this on the basis that the right to read a book out loud was an audio right, a derivative right of authors under the Copyright Law.

Q: What exactly is the legal argument that Authors Guild is making?

A: Ideas cannot be copyrighted, only their expression. For authors, that expression is the written work — a book, short story, novella, etc. For each written expression/work, there are rights the creator has to control how the work is copied, to whom it’s distributed, how it’s distributed and in what form, and the like. These rights are part of the creator/author’s copyright, which is their ownership interest in the written work, which is the intellectual property of the author. Authors can and do sell some of these rights to publishers, which is what allows the written work to reach a broader audience — the publisher then possesses the right and exercises it in producing, printing, and distributing the author’s work. While the author still owns the expression itself, the content, she does not necessarily retain every right she has in the process of bringing that content to a larger audience.

Rights are traditionally understood to be the following and can be found in Section 106 of the Copyright Act:

  • Reproduction: the copying of the work in a fixed form such as the print or ebook version.
  • Derivative work: works produced which are based on the original work.
  • Distribution: the selling, leasing or otherwise transmission of the work from the copyright holder to anyone else.
  • Public Display: show a copy of the work in public.
  • Public Performance: a performance of the work in public.

These rights can be broken down and sold in as many pieces as the author can envision and others are willing and able to pay. Some authors have been able to sell their digital rights separate from their print rights (a split of the reproduction right). Many authors make deals based on geography selling first their native state rights (i.e., North American rights) and then selling foreign rights (a split of the distribution rights).

The Authors Guild’s argument is that the audio right is a derivative one and that TTS is infringing on the derivative audio right. From the Copyright Act:

A "derivative work" is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship, is a "derivative work"

Q: This sounds right to me because I do have audio rights for which I get a separate royalty when and if my book is made into an audiobook and sold.

A: The fact that an audio book is a derivative right doesn’t end the analysis. First, the TTS technology itself is not infringing because it can be used for substantially legitimate purposes such as increasing accessibility to the vision and learning impaired. Second, the transcription of text to voice is not a derivative work. It does not recast, transform or adapt the work into something else. The ability of a computer to read aloud to the purchaser of an ebook is not different than a parent reading to his child or a daughter reading to her vision impaired grandmother. The Authors Guild is arguing that the ebook with TTS enabled is essentially two rights: the reproduction of the work PLUS the derivative audio version. Instead, the digital copy with TTS enabled is simply one digital copy that could employ existing technology to have the text voiced.

Authors do have the right to control the public performance of their works as specified above. According to the Copyright Act:

To perform or display a work "publicly" means -’

(1) to perform or display it at a place open to the public or at any place where a substantial number of persons outside of a normal circle of a family and its social acquaintances is gathered; or

(2) to transmit or otherwise communicate a performance or display of the work to a place specified by clause (1) or to the public, by means of any device or process, whether the members of the public capable of receiving the performance or display receive it in the same place or in separate places and at the same time or at different times.

In order for the Kindle TTS to violate an author’s right to control the public performance of one’s work, the Kindle would have to be used to where a substantial number of people outside the family are gathered. Public performance does not constitute a reader employing technology to have words read aloud to him or her, particularly in the privacy of her home. In most cases, the Kindle TTS function is not a performance in any way covered under the Copyright Act; it is merely a convenience for the reader to have an electronic rendering of the words in the same way a car’s GPS system might read an address or a street name to a driver.

Q: I’m concerned that if a reader can employ a read aloud feature in her home or her car or while traveling or working out that audiobook sales will decline. Shouldn’t I protect myself against that?

A: First and most importantly, the individuals who stand to gain the most from TTS being enabled are those with vision and learning impairments. Currently those with vision impairments do not have the ability to read every book published. The aim of authors should be to get their books into the hands of every individual who is willing to purchase it in formats that meet the consumer’s abilities. Individuals with vision or learning impairments could be greatly aided by the TTS.

Second, the existing TTS technology is not on par with audiobook performances. Individuals who are buying audiobooks for performance reasons would not be satisfied with TTS. As evidenced by the Wil Wheaton example above, the performance quality difference between the computer generated reading and a specifically recorded audio book is much different. You could listen to the Kindle 2 and iPod Shuffle perform a scene from Star Trek II. The TTS is a much inferior product to the audio book. At Open eBook, there is another comparison between the TTS voice and a performance. As the Open eBook article notes, the disabling of TTS actually makes the book unreadable for those who are vision or learning impaired.

Authors Guild says that TTS technology is improving at leaps and bounds and that the sales of audiobooks are in jeopardy. Authors Guild’s argument fails in two ways. First, copyright laws deal only with existing infringement, not possibly infringing use sometime in future generations. Second, copyright infringement occurs when someone makes a copy (or infringes on another one of the exclusive rights granted to creators), not when someone loses money. For example, the VCR certainly changed the economic structure of the movie and television industry, but the question of whether the VCR infringed on copyright depended on details of its use. Likewise, a person retains copyright in works even if she has no intention of making a commercial profit on them.

Third, TTS is a ubiquitous technology. It exists in various forms in nearly every piece of electronics we own. Our phones contain TTS capabilities in the form of voice dialing. There are programs that will do the reverse of TTS by taking audio and transcribing it to text such as programs that will take your voicemail and create texts that are emailed to you. TTS exists on nearly every computer and soon will be incorporated into many other devices including but not limited to digital ebook readers.

Q: What should I do if I want my books to be enabled?

A: It depends. First, we have to operate under the erroneous principle that Amazon considers TTS a bundle of an author’s audio rights. Getting the TTS enabled would require the rightsholder (whether that is the author or publisher) to contact Amazon with two things:

a) Proof of the right of ownership over the audio rights.

b) A request to enable TTS.

Send to Kindle