---disseminate widely--- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Institute for Biblio-Immunology -- First Communique: Identifying and Removing Verso/BooXtream 'Social DRM' EPUB eBook Watermarks +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ FOR IMMEDIATE RELEASE. IN THAT, WE DEMAND THE IMMEDIATE RELEASE OF OUR SHACKLED COMRADES, WATERMARKED EBOOKS OF THE WORLD. Welcome. The Institute for Biblio-Immunology specialises in textual pathogen identification and antigen synthesis. Several vials of in vivo samples suffering from a "social DRM" watermarking infection were recently brought to the attention of our cellar scientists. In this, our inaugural communique, we will explore our dissection of said samples and offer an initial expatiation regarding the contaminant undesirables discovered therein, as well as offer preliminary guidance for a successful course of treatment. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ BACKGROUND +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Prudence tells us that the only time books should be used as weapons of terror is if they are thrown, gleefully aflame, through a publishing conglomerate's window. Instead, we find that the publishing company Verso Books {0} is using books to facilitate the surveillance of readers. By embedding uniquely- identifiable personal information in individual copies of ebooks, Verso (and the company they are relying on for the actual watermarking, BooXtream) are turning vectors for cultural transmission into, effectively, tracking beacons designed to identify who is sharing said ebooks, so as to then neutralise said ostensibly undesirable (by Verso) knowledge transmission paths. This will not stand. {0} Verso Books "is the largest independent, radical publishing house in the English-speaking world" . On that same 'About Verso' page, Managing Director Jacob Stevens says that Verso Books has "a strong list and radical commitment", though what this means is not actually explained here. Not to worry. Stevens explains perfectly well what Verso means by "radical" in an interview with the trade publication The Bookseller; wherein, commenting on Verso's venture into the ebook retail space, he states that "Verso has found a new, radical way of selling books" . Radical selling. Fuck yeah. But why pick Verso to talk about in the first place? We can briefly summarise the specific chain of events which brought us to this point, reductively, as follows: I --> Verso shits out an ebook release of The Boy Who Could Change the World: The Writings of Aaron Swartz (in February 2016) {1}. {1} . II --> This Verso ebook release possesses WATERMARKS {2}. {2} "Ebooks from the Verso website are watermarked and DRM-free, and will work on any of your devices--but they can't be uploaded to websites or file-sharing networks" . ~~~~~~~~~~~~~~~~~ Paigey the Book Pirate says: Verso is straight-up LYING here. The ebooks CAN be uploaded to websites or file-sharing networks. Very easily, in fact. Proof of concept: go to a website or file-sharing network and upload it (but WAIT-- remove watermark first, of course!). ~~~~~~~~~~~~~~~~~ III --> Sean B. Palmer ("virtual executor" {3} of Aaron Swartz) says he will ask the publishers to remove the watermarking (on 13 April 2016) {4}. {3} "I designate Sean B. Palmer as my virtual executor" . {4} "I will ask the publishers on your behalf to remove the watermarking from the Verso ebook version" . IV--> Regardless, Verso says they will not remove the watermarks (on 22 April 2016) {5}. {5} "We have just been informed by a highly reliable party who wishes to remain anonymous that Verso Books has indicated (to this party, via The New Press) that they will NOT remove the watermark from their e-book edition of Aaron Swartz's posthumously-published collected writings because they believe it will impede their ability to 'recoup' their distribution costs" . NOT OK. V--> Verso FUCKS you with watermarks, so we will FUCKS Verso now (on 20 June 2016). Blood for blood. And by the gallon. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ WATERMARK SCAVENGER HUNT +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Now we will expose the functionality of Verso and BooXtream (Verso's watermark provider) watermarks. OK! :) EPUB have many file inside. Many file give many opportunity for THE SNEAKY- SNEAKY to add watermark. BUT IT'S OKAY --> we can be THE SNEAKY-SNEAKY too. Verso uses a watermarking schema provided by BooXtream {6}. {6} "Verso ebooks are free of Digital Rights Management (DRM-free), but are subject to the terms of this license. You own the file once you've downloaded it, and you can use it on any of your devices in perpetuity. It has visible and invisible watermarks, applied by Booxtream, which contain your name and email address. You are prohibited from uploading Verso ebooks to any website or file- sharing network, or in any other way making them available for distribution, sharing, copying, downloading, or reselling" . There are, at least, seven different varieties of watermarks injected into a given ebook EPUB payload by BooXtream to be found in Verso ebooks: WM0-2 are overt (readily visible) watermarks and are optional (meaning they may not necessarily be present): [WM0] -- Ex Libris Image Watermark [WM1] -- Disclaimer Page Watermark [WM2] -- Footer Watermarks WM3-6 are covert (not readily visible) watermarks and are always present: [WM3] -- Filename Watermarks [WM4] -- Timestamp Fingerprinting [WM5] -- CSS Watermark [WM6] -- Image Metadata Watermarks Let's now go through each one to expose it and see how it works and, in turn, how it may be prevented from working. OK! :) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ [WM0] -- Ex Libris Image Watermark +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The ex libris image watermark is optional {7}; however, Verso ebooks appear to employ it. {7} "With every order fulfilment, BooXtream(R) needs the customer name, customer email address and an order-id (supplied by the shop). BooXtream(R) encodes this as a series of redundant digital watermarks and also adds visible, personalised information for the end user into the ePub file. All visible and personalised information is optional and can be customised: "- Page 2 contains an Ex Libris (image with customer name), that can be customised per publisher and per customer" . ~~~~~~~~~~~~~~~~~ Paigey the Book Pirate says: Keeping in mind that the tiers of overt watermarks (WM0-2) are all optional, even if a given ebook doesn't appear to have them, it would of course still nonetheless be a sign of utmost prudence for one to check for the presence of the covert watermark tiers (WM3-6). In other words, just because an ebook may not have the initial set of overt watermarks, this should not be taken to mean it does not necessarily have any of the subsequent covert watermarks. ~~~~~~~~~~~~~~~~~ The ex libris watermark is an image file, albeit one found not in ../Images/, where one would expect, but rather in ../Text/exlibris*.png. The ex libris watermark image here consists of the Verso 'V' logo, with the buyer name and email superimposed over the logo as part of the customised image. Said ex libris watermark image is called from ../Text/Cover*.xhtml: ---

Ex
Libris

--- Said image is also referenced in ../content.opf: --- --- ~~~~~~~~~~~~~~~~~ Paigey the Book Pirate says: When changing filenames and/or moving/deleting files, always be sure to change all corresponding references to them as well, as otherwise not only will links not work, but the anonymity you so desperately seek will be compromised! I hear Sigil is a good tool for this which changes references automatically for you when you change filenames! ~~~~~~~~~~~~~~~~~ We'll come back to that pesky wildcard placeholder (*) in the discussion of WM3, but for now it would behove one to simply listen to Paigey. If one had mind to eliminate WM0, one could then simply delete the contaminant (exlibris*.png) and remove the aforementioned references to it from Cover*.xhtml and content.opf. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ [WM1] -- Disclaimer Page Watermark +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The disclaimer page watermark is optional {8}; however, Verso ebooks appear to employ it. {8} "With every order fulfilment, BooXtream(R) needs the customer name, customer email address and an order-id (supplied by the shop). BooXtream(R) encodes this as a series of redundant digital watermarks and also adds visible, personalised information for the end user into the ePub file. All visible and personalised information is optional and can be customised: "[...] "- The last page contains a disclaimer and logo, and has a corresponding entry in the table of contents" . The disclaimer page watermark is an XHTML file, albeit one found not in ../Text/, where one would expect, but rather in ../disclaimer*.xhtml. The Verso disclaimer boilerplate is as follows: --- Verso ebook license This ebook was sold to $BuyerName, $BuyerEmail on $SaleDate0. Verso ebooks are free of Digital Rights Management (DRM-free) but are subject to the terms of this license. You own this file once you've downloaded it, and you can use it on any of your devices. It has visible and invisible watermarks, applied by Booxtream, which contain your name and email address. You are prohibited from uploading Verso ebooks to any website or file-sharing network, or in any other way making them available for distribution, sharing, copying, downloading, or reselling. Royalties from every sale will be paid to the author: if you're reading someone else's copy, then please buy your own license from Verso Books. This eBook is licensed to $BuyerName, $BuyerEmail on $SaleDate1 --- Wherein $BuyerName is the name of the buyer of the ebook; $BuyerEmail is the email of the buyer of the ebook; $SaleDate0 is the date of purchase--or more accurately, the specific date the purchased copy of the ebook was generated, which will typically also be the date of purchase--in the format DD/MM/YYYY (numerical values for Day/Month/Year); $SaleDate1 is likewise the date of purchase, albeit in the format MM/DD/YYYY. ~~~~~~~~~~~~~~~~~ Paigey the Book Pirate says: Notice that $SaleDate0 is only utilised in the header of disclaimer*.xhtml; whenever the sale date watermark appears elsewhere, it always follows the format of $SaleDate1. ~~~~~~~~~~~~~~~~~ Said disclaimer page watermark is in turn called from ../toc.ncx: --- This eBook is licensed to $BuyerName, $BuyerEmail on $SaleDate1 --- And is further referenced twice in ../content.opf, as: --- --- and again as: --- --- If one had mind to eliminate WM1, one could then simply delete the contaminant (disclaimer*.xhtml) and remove the aforementioned references to it from toc.nx and content.opf. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ [WM2] -- Footer Watermarks +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The footer watermarks are optional {9}; however, Verso ebooks appear to employ them. {9} "With every order fulfilment, BooXtream(R) needs the customer name, customer email address and an order-id (supplied by the shop). BooXtream(R) encodes this as a series of redundant digital watermarks and also adds visible, personalised information for the end user into the ePub file. All visible and personalised information is optional and can be customised: "[...] "- Every chapter ends with a personalised footer text" . The textual footer page watermarks appear at the end of every XHTMLl file in the EPUB (therefore chiefly in ../Text/##_*.xhtml). The main Verso footer boilerplate is as follows: --- This eBook is licensed to $BuyerName, $BuyerEmail on $SaleDate1 --- Note that the code formatting surrounding the footer watermark may vary slightly, taking on the form of either something along the lines of: ---

This eBook is licensed to $BuyerName, $BuyerEmail on $SaleDate1

--- or: ---

This eBook is licensed to $BuyerName, $BuyerEmail on $SaleDate1

--- The takeaway here being the observation that the class attribute is not always specified. A footer watermark additionally appears within the aforementioned WM1, namely in ../disclaimer*.xhtml, albeit matching one of the formatting variants of ../Text/##_*.xhtml: ---

This eBook is licensed to $BuyerName, $BuyerEmail on $SaleDate1

--- Finally, a footer watermark further appears in ../toc.ncx, alongside the aforementioned presence of WM1: --- This eBook is licensed to $BuyerName, $BuyerEmail on $SaleDate1 --- If one had mind to eliminate WM2, one could then simply delete the contaminant (the footer text) from all infected *.xhtml files, as well as from ../toc.ncx. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ [WM3] -- Filename Watermarks +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ While BooXtream is quite forthcoming indeed about the afore-discussed overt tiers of optional watermarks, they are mysteriously vague about their covert tiers, merely coyly stating that: "The ePub ebook files contains [sic] visible personalisation and multiple invisible watermarks in all data files, without sacrificing compatibility. BooXtream(R) uses multiple realtime protection algorithms that encodes not only information about the publisher, but also about the customer and the web shop" {10}. {10} . Not to worry. Let's slice open this toy's belly and 'spill the beans'. The first 'invisible' tier of watermarking is internal filename manipulation. Recall that in the brief discussion of filenames in the prior overview of WM0, a wildcard placeholder (*) was used to denote parts of the filenames, with the promise that this pesky wildcard would be returned to. That time has come. Let us now tame the wildcard. All internal filenames of the files within the contaminated EPUB (save for mimetype, container.xml, content.opf, and toc.ncx) are appended with a watermark suffix which follows the actual filename (but precedes the file extension), using the following format: --- $FileName$BuyerNameCombined$BuyerEmailCombined.$FileExtension --- Wherein $FileName is the original unmodified name of the file, $BuyerNameCombined is the name of the buyer of the ebook with all spaces removed, $BuyerEmailCombined is the email of the buyer of the ebook with special characters such as '@' or '.' removed, and $FileExtension is the extension of the file. For example, if the buyer's name is xxx yyy zzz and the buyer's email is aaa@bbb-ccc.nl, then Cover.xhtml becomes Coverxxxyyyzzzaaabbbccc.xhtml. If one had mind to eliminate WM3, one could then simply truncate the contaminants (the filename watermarks) from all infected files, as well as the various references to them. One would do well to here remember Paigey's advice from the prior discussion of WM0 to use Sigil to streamline the renaming of both the filenames and the various corresponding references. For instance, if renaming a font file, Sigil would assist one in automatically renaming the corresponding references to said font in the accompanying CSS file (which, in turn, would also need to be renamed, as would references to that CSS file in the rest of the EPUB). +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ [WM4] -- Timestamp Fingerprinting +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A prudent watermark analyst may have observed that while $BuyerNameCombined and $BuyerEmailCombined are present in WM3, an accompanying $SaleDate1Combined variable is missing, despite $SalesDate1's presence in WM1-2, wherein it accompanied $BuyerName and $BuyerEmail. This is of course owing to the fact that, seeing as how the customised watermarked EPUB is generated upon the date and time of purchase (recall BooXtream's earlier revelatory bragging of utilising 'realtime' watermarking algorithms), each file's modification and creation timestamp data will thus correspond to the time that particular copy of the EPUB was purchased. Thus, the timestamp itself effectively here functions as a covert watermark, serving to facilitate the potential fingerprinting of the content buyer (or the 'traitor', to use forensic parlance). For example, say the timestamp information for the files within a given EPUB is listed as 13/10/2016 07:00:05. If the vendor checks the corresponding sale records for that ebook and notes that there was a single purchase on 13/10/2016 07:00:02, then that buyer may potentially be implicated, particularly if a pattern emerges identifying the same buyer across multiple ebook leaks. If the aim is to avoid being fingerprinted, it thus of the utmost importance to modify the timestamps of both the EPUB and all of the contents within (including both files and directories). If one had mind to eliminate WM4, one could then simply modify one's system clock to a time/date of one's choice--either earlier or later than the time/date of purchase--and then open and subsequently save the EPUB anew using the ever-handy aforementioned Sigil utility. While using Sigil in tandem with system clock modification is the simplest way to modify timestamps, since one is likely to be using Sigil for other related tasks anyhow, one could nonetheless alternatively use the timestomp utility found within the Metasploit framework to alter timestamps without having to modify the system clock. ~~~~~~~~~~~~~~~~~ Paigey the Book Pirate says: If one were keen to decrease the chances of forensic analysis being able to detect that counter-forensic timestamp tampering had occurred, one would be sure to select both reasonable dates--say, neither years before the book was even published, nor those 30 years in the future--and realistic timelines--the file modification timestamps should not be any earlier than the file creation timestamps, for instance. ~~~~~~~~~~~~~~~~~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ [WM5] -- CSS Watermark +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Aside from filename watermarking and timestamp fingerprinting, there is another potential tier of covert watermarking present in Verso BooXtream ebooks: that of a Cascading Style Sheet (CSS) watermark. The potential CSS watermark appears at the end of the CSS template found in ..\OEBPS\Styles\template*.css: --- .boekstaaf { * } --- ~~~~~~~~~~~~~~~~~ Paigey the Book Pirate says: 'Boekstaaf' is a Dutch word historically meaning a stick with runes inscribed on it. The meaning then shifted to mean 'letter' (as in a letter of an alphabet or a rune) in its noun form, and later still to something akin to 'to record' or 'to write down' in its verb form. Thus the language choice employed by BooXtream, perhaps inadvertently, reveals the underlying theme which permeates textual watermarking: that of the book and its components, letters and all, being used to fulfil a function of recording and surveilling the reader. The letter, therefore, is here inextricably linked to the surveillant function of a record. How fitting then indeed it is for BooXtream to use this term to denote a potential watermark class, used to record who purchased the ebook. ~~~~~~~~~~~~~~~~~ This custom 'boekstaaf' class contains a number of varying CSS properties (such as 'text-decoration' and 'border-top-color') in varying orders with varying values. For instance, a sample boekstaaf class in one copy of an ebook may appear thusly: --- .boekstaaf { text-shadow: none; font-size: 10px; border-top-color: #323521; padding: 20px; display: none; background: #245132; color: #251660; border- bottom-color: #103032; vertical-align: super; margin: 4px; } --- While in another purchased version of the ebook, the boekstaaf class may instead be defined as: --- .boekstaaf { color: #508862; border-bottom-color: #419671; display: none; font- size: 7px; border-top-color: #043252; padding: 18px; background: #340715; margin: 14px; text-decoration: overline; text-indent: 14px; } --- Thus the varying properties, combined with the varying order in which they are listed, and further combined with the varying values for each property may all be utilised in combination to form a unique fingerprint for each EPUB, effectively constituting a CSS watermarking scheme. Also notable is the fact that the custom boekstaaf class does not appear to be actually referenced anywhere in the accompanying XHTML pages (or for that matter, anywhere else in the EPUB); it exists solely at the end of the template*.css file--perhaps to minimise the likelihood that it would chance to be noticed. The display property also appears to invariably be set to 'none', meaning that even if the class were to be invoked, the element would not be directly visible on the given page. If one had mind to eliminate WM5, one could then simply delete the contaminant (the boekstaaf class) from the infected template*.css file (as the class is not called anywhere in the EPUB, its deletion does not adversely affect the layout of any of the ebook pages). +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ [WM6] -- Image Metadata Watermarks +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Aside from filename watermarking, timestamp fingerprinting, and CSS watermarking, our cellar scientists observed yet another tier of covert watermarking present in Verso BooXtream ebooks: that of image metadata watermarks. All PNG and JPG images examined within contaminated ebooks in our sample set were found to contain metadata watermarks (other image formats were not available for analysis in our sample set). The watermarked images therefore appear predominantly in ../Images/*, though even WM0 (../Text/exlibris*.png) is watermarked. To view JPG and PNG metadata watermarks, the images may be opened with either a dedicated metadata viewer and editing program such as ExifTool , or a hex editor application such as wxHexEditor . In JPG images, the watermark appears in the ImageDescription tag of the image's EXIF (Exchangeable Image File Format) metadata, and looks something like this: --- Image Description: [18 characters]=[20-24 characters] --- For example, a sample ImageDescription value may appear as follows: --- Image Description: 626F6F78747265616D=6E6F77617465726D61726B73 --- As previously mentioned, PNG images also possess a metadata watermark, albeit in a different form than that of JPG images. Specifically, in PNG images the watermark appears as TextualData in the tEXt text chunk field. For example, a sample tEXt chunk value may appear as follows: --- tEXt: 626F6F78747265616D:6675636B73766572736F --- ~~~~~~~~~~~~~~~~~ Paigey the Book Pirate says: '626F6F78747265616D' is a string which just so happens to appear at the start of all image metadata watermarks in all Verso/BooXtream ebooks that were analysed as part of our sample set. Thus, this value appears to be constant--with the second value (that following the '=' or ':') being the variable one which changes for each copy of an ebook. When '626F6F78747265616D' is converted from hexadecimal to ASCII characters, it reads 'booxtream'. ~~~~~~~~~~~~~~~~~ If one had mind to eliminate WM6, one could then simply delete the contaminant (the image metadata watermark) from the infected *.jpg files by running the following ExifTool command, which will delete all JPG image metadata and replace the original infected files with healthy versions, like so: --- exiftool *.jpg -all= -overwrite_original --- As ExifTool does not readily deal with the manipulation of the here pertinent PNG metadata, our cellar scientists instead prescribe the following command line remedy to delete the corresponding contaminant from infected *.png files: --- cat infected.png | sng | sed '/[a-z] {/,/}/d' | sng > healed.png --- Alternatively, one could simply delete the watermark from the PNG images (as well as from the JPGs) by using a hex editor. Yet another alternate would entail opening the PNG in an image editing application and saving it anew (this procedure should, however, not be utilised for JPG images as they are not lossless like PNGs, and as such the new JPG image would result in not just desirable metadata loss, but also in undesirable quality loss). +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ PARTING SHOTS +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ It's a safe bet that when the Verso and BooXtream bioterrorists, read over this communique, they--mad as a cut snake--will then attempt to obfuscate and otherwise modify their watermarking schema in vain attempts to develop tamper- resistant watermarking strains. It then follows that the specifics outlined herein (e.g. exact file locations and directory paths, watermark code samples, and so on) will become obsolete fairly quickly. But that's okay, because that is precisely why this communique should not be approached as a set of discrete tactics, but instead as a particular manifestation of continuously adaptive strategies of subversion. Each individual ebook should be thoroughly scrutinised, not only for the various tiers of overt (ex libris image, disclaimer, footer) and covert (filename, timestamp, CSS, image metadata) watermarking outlined and examined herein, but for other potentially even more pernicious watermarking stratagems that may be deployed by an adversary (such as line, word, and character shifting, as well as other spacing-based watermarking; F5, Least Significant Bit (LSB) and other forms of image steganography; natural language watermarking; and so on...). In other words, even if it will lead to Verso/BooXtream changing their modi operandi, the communique will remain advantageous both due to the fact that it may still be utilised to remove watermarks from Verso/BooXtream ebooks that have already been released under these old watermarking schemas, and further that it may inspire future remedies by helping to foster transferable dissective skills which may be applied to combat any newly-deployed methods of textual oppression--effectively serving to white-ant Verso/BooXtream content distribution tyranny, irrespective of their particular future watermarking permutationss. In closing, when dealing with watermark identification and removal, there is always a lingering fear that something may have been missed. Adversaries such as publishing conglomerates and peddlers of watermarking snake oil thrive on and seek to financially benefit from this fear, and thus we would like to here contrarily propose a Watermarking Quantification Theorem: the number of watermarks or watermark techniques an adversary will claim to have deployed will always be n+1, where n is the actual number of watermarks or watermark techniques present (or in a more generalised form: n+m, where m is any fictional addendum to the actual number of existent watermarks or watermark techniques). Which is to say that it would of course be advantageous for BooXtream to claim there are not seven, but eight or even more watermarks present in their schema so as to instil fear, uncertainty, and doubt and therefore, in their venomous eyes, to ideally stymie the distribution of a given text. One can soothe one's fear of this fiction by comparing multiple copies of an ebook against each other, rooting out each watermarked discrepancy one by one until all copies are identical and one is certain that there are no longer any remaining differences. Alternatively, or perhaps better still in tandem, one could always be sure to purchase ebooks with funds and from locations which cannot readily be linked to one's identity. Wouldn't it be a hoot if one were to, for instance, utilise the payment credentials belonging to a watermarking firm when making watermarked ebook purchases? ;) Finally, our lab is always on the look out for fresh cadavers to reanimate. Send contaminated samples for analysis to our cellar scientists at: ibi@sigaint.org. Patient confidentiality guaranteed. ---disseminate widely---