Jan 15th update: At the request of a few authors, we’ve disabled the links to the infringing copies in the Example section.
Book publishers frequently ask us how much online piracy is impacting their revenue. Today, with the release of the first study to quantify book piracy in the U.S., we’re pleased to announce new capabilities to help book publishers answer this critical question.
Key findings of the research are listed below, and the full study can be downloaded here (.pdf).
We plan to update and add more depth to these findings regularly as we expand our anti-piracy services. If you publish books or journals and wish to see how FairShare Guardian can protect your titles, please contact us.
Key Findings
Significant amounts of pirated book downloads are taking place online, representing potential losses of $2.75-3 billion, representing roughly 10% of the total United States book sales
Across the 913 books included in the study, nearly 10,000 pirated copies of every title in the study was available for free download.
The Business and Investing, Professional & Technical and Science genres have the largest potential lost sales per title.
Source: Attributor January, 2010
Source: Attributor January, 2010
Examples
Listed below are examples of pirated titles and a link to just one of the 25 sites included in the study (Copies downloaded figures are as of 01/14/10).
Arts and Photography
Architect’s Drawings by Kendra Schank Smith, 10,010 copies downloaded
Biographies & Memoirs
Dreams from My Father by Barack Obama, 2,850 copies downloaded
Business & Investing
Freakonomics by Steven Levitt, 1,132 copies downloaded
Culinary and Hospitality
Mastering the Art of French Cooking by Julia Child, 659 copies downloaded
Fiction
The Girl with the Dragon Tattoo by Stieg Larsson, 1,732 copies downloaded
Bed of Roses by Nora Roberts, 1,156 copies downloaded
Angels and Demons by Dan Brown, 8,177 copies downloaded
Science
Advanced Calculus by Wilfred Kaplan, 3,526 copies downloaded
Molecular Biology of the Cell by Bruce Alberts, 3,584 copies downloaded
Methodology
FairShare Guardian™ service monitored piracy for 913 popular books in categories representative of the industry across the the top 25 one-click hosting sites starting in October 2009 for a period of 90 days. Download .pdf copy to view list of sites.
FairShare Guardian captured the number of successful downloads completed for each of the 913 titles as reported on four file hosting sites that make the download data available (4shared.com, scribd.com, wattpad.com and docstoc.com). Across these four sites, a total of 3.2 million downloads occurred.
Across the top 25 one-click hosting sites, a total download figure of over 9 million copies was projected using the 36.4% share of one-click hosting sites that the four above-mentioned sites represent. Download .pdf copy to view share figures.
The retail value of these 9 million copies was calculated to reach $380 million. Each book’s retail price and category/genre information was collected from Amazon.com.
The 913 titles in this study represent works from publishers totaling 13.5% of the U.S. book publishing market. Projecting this $380 million value to the entire industry results in total potential piracy figure of $2.8 billion.
This study does not to answer the question, “How many of these pirated books would have been purchased legally if piracy was not an option?” Previous piracy studies assume a one-to-one substitution, meaning all pirated material would have been purchased and thus the market value of pirated books is equal to the actual loss, though Attributor feels this is an overly optimistic assumption. This issue will be addressed in a future research phase.
The Google ad serving juggernaut appears secure even if Microsoft and Yahoo agree on a deal to merge their search advertising businesses. Our analysis of ad server calls across 75 million domains shows that both Yahoo and Microsoft lost significant share compared to our March report and now make up less than 15% of the total market.
The explosion of new ad networks in 2008 had an impact, but only one newcomer – Revenue Science- broke into the top 5. Unsurprisingly, Google’s AdSense is dominant on smaller sites while Google-owned DoubleClick leads on larger sites; however AdSense made great strides on larger sites, leapfrogging Yahoo into 2nd place.
Read on for more details including new breakouts by content category. If you are a publisher looking to tap into the viral syndication of your content, please contact us for a free report on how your content is monetized across the Web.
Key Findings
DoubleClick and AdSense continue to dominate ad server market share, capturing 31% and 26% of Unique Users, respectively.
DoubleClick continues to own the “Head” while AdSense owns the “Tail”.
For larger sites with over 1MM Monthly Unique Users, DoubleClick has a 36% share, a nearly 3x share advantage over its nearest non-Google competitor, Yahoo. Together, AdSense and DoubleClick capture 54% of this segment.
For smaller sites with less than 1MM Monthly Unique Users, AdSense has a 38% share, a more than 7x share advantage over its nearest competitor, ValueClick. Together, Google and DoubleClick capture 60% of this segment.
Share positions for the ‘Big 3’ – DoubleClick, AdSense and Yahoo – vary widely by vertical.
DoubleClick share is strongest on Automotive sites (58%), whereas AdSense rules Blog sites (40%). Yahoo is strongest on Health sites (13%).
AdSense has gained 2 points of share on large sites while Yahoo has lost 4 points since the March report.
Overall Ad Server Market Share
Ad Server Market Share Breakdown by Site Traffic
Ad Server Market Share Breakdown for Selected Content Categories
Methodology
Attributor analyzed the ad server calls across 75 Million domains as part of its October 2008 crawling operations. For a refresher on the difference between an Ad Server and Ad Network, there is a good description here . The data was then combined with compete.com’s October 2008 traffic data and its vertical site category classification to provide the unique user and content category breakdowns. Each share total represents the sum of all ad networks owned by each company, with Google as the exception in which DoubleClick and AdSense are displayed separately. For example, Atlas DMT share is counted within Microsoft share numbers and Advertising.com is included in AOL share numbers.
*** Update 12/19/08 ***
After speaking with the folks at OpenX, a popular free ad server for publishers, we discovered that we were not detecting their share correctly. We are working with them to fix for our next release.
If you’re a publisher, the recent months have not been kind. Large brand advertisers are pulling back on their spending, and depending on which analyst is talking Internet advertising has either slowed or started to decline.
Logically, publishers are hunkering down and looking for ways to incrementally boost the audience and cpms from their own sites.
We can help you achieve revenue growth by tapping into the audience that is already viewing your virally syndicated articles on other sites. For the first time, publishers can quantify their off-site audience and, by identifying how your content is monetized, access an incremental revenue stream that averaged over $150,000 annually for each publisher in our study.
See below for our key findings – for a full copy of the study including FAQs on the methodology, please download the .pdf
What we did:
Loaded the full feeds from over 100 publisher sites across a variety of content categories.
Found web-wide reuse across 30 Billion pages during Sept. ‘08, discarding identifiable licensed copies.
Eliminated any pages in which the reuse was less than 50% and less than 125 words of the original article.
Calculated the off-site page view opportunity using estimate data provided by Compete.com.
Calculated the audience multiplier for each publisher site and for 10 top categories. This is the audience opportunity. expressed as a multiple of the page views on the destination site.
What we found:
Across all sites in the study, publishers have an untapped off-site audience that is nearly 1.5 times the size of the audience that visits their destination site
While each category has incremental opportunity, the auto and travel content categories have the largest off-site audience.
Using a cpm of $1, 42 percent of publishers studied are missing out on up to $50k in annual ad revenue; 33 percent are missing out on up to $250k in annual ad revenue and 25 percent are missing out on more than $250k in annual ad revenue from off-site content.
Want to find out your TrueAudience™?
It’s very simple (and free). Fill out a short form. One of our team members will contact you for your RSS feed and a list of licensees after which Attributor does all the work. You can expect to receive your free report in about a week.
Just in time for tonight’s debates, our Web-wide tracking study shows that John McCain has pulled ahead online. For the first time since we started tracking the campaign in mid July, McCain’s messages are viewed by a larger audience than Obama.
Please see our posts here and here to review our previous findings.
Here are our key findings (through September 24th)
Thanks to a huge post-convention bounce, McCain took the lead over Obama in overall web traffic for the first time. The online audience viewing McCain’s messages grew 61% faster than Obama’s since the conventions.
Palin’s convention speech has been viewed by an estimated 18,000,000 online viewers.
The economy is the most copied issue for both candidates. Iraq has fallen out of McCain’s top 5 issues for the first time in the campaign.
While Obama still leads in the blogosphere, his lead has shrunk nearly 43% since the conventions.
True or False? Attributor Investigates Election 2008’s Conventional Wisdom
Conventional Wisdom: TV viewership follows the same pattern as views of online content.
Attributor says . . . False
While TV is still king in terms of total audience, the web is substantial and shows different results. Online, the McCain and Palin speeches combined attracted an additional 35 million views of the speech text after the convention. Palin’s speech alone garnered almost 20 million versus Obama’s 11 million.
Conventional Wisdom: By virtue of his appeal to a younger demographic, Obama is reaching a larger audience online.
Attributor says . . . False For the first time, McCain has surpassed Obama online.
McCain’s huge boost from his and Palin’s convention speeches now place him above Obama. Since the convention, McCain’s online audience grew 61% faster than Obama’s.
Conventional Wisdom: Americans discount negative campaigning as “politics” and don’t pay much attention to it.
Attributor says . . . False
The rapidly increasing online negative campaigning on McCain’s behalf is paying off. People are reading the attacks- and in significant numbers. In September, almost 5 million people viewed attacks on Obama, compared to the roughly 3 million people who viewed attacks on McCain- almost 2:1
Tracking the war of words between John McCain and Barack Obama enables us for the first time to measure which messages are picked up the most across the Web and which position statements are attracting the largest audiences.
Obama’s messages continue to draw a larger online audience, but McCain closed the gap by almost 10% in the last two weeks. Obama’s messages received an estimated 38 million page views, compared to 36 million for McCain. Two factors appear to be driving McCain’s comeback.
McCain surged in the blogosphere which has been an area of Obama’s strength. Almost 350 new bloggers picked up McCain’s message, a 30% advantage over Obama. This translated into a 2:1 blog page view advantage across U.S. visitors over the last two weeks and ate into Obama’s overall audience lead in blogs.
McCain’s negative campaigning appears to be paying off. Obama attacks received an estimated 2.8 Million page views August-to-date – almost 3x as many as attacks on McCain over the same time period.
Despite allegations of media favoritism, McCain’s words are featured 80% more often on news sites than Obama’s. McCain beats Obama by almost 3 to 1 on the major networks’ websites (Fox, NBC, CBS, ABC, and CNN).
McCain’s messages continue to be picked up more from his position statements (55%) than his speeches (45%) Obama’s messages are picked up more often through his speeches (67%) than his position statements (33%).
Each candidate’s position on the economy is now the most widely read across the web with Obama’s Economic position leap-frogging his Iraq policy for the first time.
In just 3 weeks, Obama’s Berlin speech has gone viral and has been viewed an estimated 2 million times – off of Obama’s official site.
True or False? Attributor Investigates Election 2008’s Conventional WisdomConventional Wisdom: The news media is in love with Barack Obama and gives his messages a disproportionate amount of coverage.
Attributor says . . . False
McCain continues to trounce Obama on major news site coverage. He’s out-matching Obama almost 3 to 1 on the major networks’ websites (Fox, NBC, CBS, ABC, and CNN). When local and national newspaper and magazine websites are factored in, McCain still outdoes Obama almost 2 to 1.
Conventional Wisdom: Obama and McCain are attacking each other with negative messages at the same rate.
Attributor says . . . False
Within the last two weeks, criticism of Obama has surged with his words being attacked more than twice as much as McCain. An “attack match represents an issue or speech that is matched on a site known to be unfriendly to the candidate. We matched McCain against the 50 most trafficked liberal sites and Obama against the 50 most trafficked conservative sites to get these numbers.
Conventional Wisdom: Americans discount negative campaigning as “politics” and don’t pay much attention to it.
Attributor says . . . False
The increased negative campaigning on McCain’s behalf appears to be paying off. Americans are reading the attacks in significant numbers. Month-to-date, there were an estimate 2.8M views of attacks on Obama compared to 1.1M views of attacks on McCain.
Conventional Wisdom: By virtue of his appeal to a younger demographic, Obama is reaching a larger audience online.
Attributor says . . .True – but it’s getting much closer
Obama’s speeches and position statements continue to attract a larger online audience due to his strength in the blogsosphere. McCain’s recent gains in blogs are aiding his comeback and likely aligned with his more aggressive online blog tactics.
The battle between John McCain and Barack Obama is a war of words.What makes this election different is how far and fast those words can go.The Internet accelerates the reach of the candidates’ messages through online news media, and blogging amplifies the voices of both sides’ supporters.Attributor is tracking how McCain and Obama’s messages move across the Internet and, for the first time, has measured the size of the online audience reading these messages.
This is the first of a series of regular insights Attributor will be sharing on the election between now and November 4th. Download a .pdf of the full study here.
What we did:
Used the Attributor platform to capture each candidate’s key issues and speeches as stated on their official web sites:www.johnmccain.com and www.barackobama.com.
Tracked the distribution of each candidate’s words across 25 Billion+ pages, including blogs and social networks, looking for unique web pages containing matches of their speeches and official position statements.
Analyzed and categorized the individual sites and pages containing the candidates’ messages.
Using Attributor’s Audience Finder™ Technology, estimated the reach of each candidate’s messages across the Internet.
Key Findings:
McCain is holding his own online. The overall audience viewing each candidate’s message was virtually the same with each totaling over 65 Million page views.
McCain’s campaign messages are more likely to be picked up from his position statements (64%) than his speeches (36%) Obama’s messages are picked up more often through his speeches (71%) than his position statements (29%)
Despite allegations of media favoritism, McCain’s words are featured 112% more often on news sites than Obama’s.In July there were over 20 Million views of McCain’s messages on news sites.
Obama is dominating the blogosphere with over 12 Million views of his messages occurring on blogs in July. Twice as many bloggers repeat his messages than those of McCain.Obama owns a similar advantage on social networking sites like Face book and MySpace.
True or False? Attributor Investigates Election 2008’s Conventional Wisdom
Conventional Wisdom: The news media is in love with Barack Obama and gives his messages a disproportionate amount of coverage.
Attributor says . . . False
Perhaps this is true when it comes to television or print. But when it comes to coverage of each candidate’s position statements or speeches, McCain is trouncing Obama. His lead on major network web sites (Fox, NBC, CBS, ABC and CNN) is almost 4 to 1.
Conventional Wisdom: Obama’s youthful support translates into higher support across social networks.
Attributor says . . . True
Across social networking sites like Facebook and MySpace, Obama’s messages are picked up by more than 2 to 1.
Conventional Wisdom: Obama’s grassroots efforts have resulted in a substantial lead across the blogosphere.
Attributor says . . . True
Obama has an 86% edge in bloggers who incorporate his messages into blog posts. Obama’s Berlin speech fueled a 10% jump in commentary across the Blogosphere.
Conventional Wisdom: Obama’s message is best delivered through speeches, while McCain’s strength is with his position on issues.
Attributor says . . . True
Obama’s speeches are being picked up on the Web much more than his position statements. The opposite is true for McCain.
Conventional Wisdom: By virtue of his appeal to a younger demographic, Obama is reaching a larger audience online.
Attributor says . . . False – it’s virtually a dead heat.
Our Audience Finder technology shows that Obama’s advantage across the blogs and social media give him a slight advantage over McCain overall. McCain is staying close via his reach on traditional news sites.
In July, Obama’s messages reached slightly over 65 million page views whereas McCain’s estimated page views were just short of 65 million.
Obama’s reach across the blogosphere was more than double than McCain’s in July.
McCain enjoyed an 86% reach advantage on Web sites in July.
Stay tuned for regular insights on the 2008 Election from the Attributor blog.
Last week’s actions prove that publishers want a piece of the online advertising pie. After analyzing content monetization across 68 million domains, it’s clear that publishers have a huge opportunity to collect revenue directly from ad networks. (If you are like me and need a refresher on the difference between an Ad Server and an Ad Network, there is a good description here .)
What we did:
We analyzed the ad-server calls across 68 million domains captured from our January, 2008 crawling operations. The data was joined with January, 2008 unique user data from our friends at Compete to determine market share numbers.
What we found:
DoubleClick and Google dominate overall market share capturing 35% and 34% of unique users, respectively.
DoubleClick owns the head and Google owns the tail. For sites with over 1MM monthly unique users, Doubleclick has a 48% share, a 3x advantage over 2nd place Yahoo. For sites with less than 100k monthly unique users, Google has an 8x share advantage over 2nd place MSN.
Professionally produced content is widely proliferated across highly trafficked, commercial sites, representing an untapped opportunity for publishers to increase their revenue through content licensing, ad revenue share or link-building.
Conclusions:
The GoogleClick combination is an ad-serving juggernaut. They should be at the top of your call list to collect a % off of every ad dollar made off your content.
Content is proliferating all over the place – Attributor finds an average of 20 different copies for each article we track.
There is a lot of money at stake. 64% of the copies have ads on their pages and most republishing is on sites with > 1MM monthly unique users.
It’s an SEO goldmine. 57% of the copies we find do not link back to the original sites.
Stay tuned for regular reports on the pace at which articles, images and videos are spreading across the web and implications for the online content economy.
Methodology notes: This report represents a snapshot of ad server distribution in January, 2008 across 68 million domains Less than 5% of the domains contained more than one ad server call – in these cases, the traffic for the domain was associated with each ad network found. We did not attempt to de-duplicate the unique user numbers.
If you’re a celebrity, having your image copied across the web may be a good thing – people are talking about you and reinforcing this conversation with an image that, in most cases, puts you in a flattering context. So we thought it would be fun to look at celebrity images as a means to showcase Attributor’s web-wide monitoring capabilities and the opportunities this visibility uncovers.
This study does not attempt to make light of the issues that photographers face when confronted with unauthorized use of their work – if Lane Hartwell’s images are proliferating at even a fraction of the rate of the images on Maxim’s, FHM’s and People’s hot lists, there is an obvious impact on her business.
Attributor found 2,547 copies of the images across the web.
Problem or Opportunity?
There is plenty of evidence to suggest an untapped opportunity for publishers. The facts:
Are the copying sites commercial? Yes, a whopping 73% of the copying sites had ads on their pages.
How much traffic did these sites receive? According to our friends at Compete.com, about a third of the sites containing copies of the images were visited by more than 50k people in December, 2007.
Are any of the copying sites linking back to the original site?Very few – only 13% of the copies found linked back to the original or related celebrity site.
How do copies of the images rank in search engines? Very high. In fact, of the top 10 females, a copy outranked the original image in Google search results 100% of the time.
Implications for Publishers and Content Creators
Opportunities abound to harness value from your content as it leaves your site.
First, incremental revenue through new licenses of commercial image usage is available and ready for the taking. With web-wide visibility, finding new leads and billing existing licensees gets a lot easier.
Securing links to drive increased traffic is another untapped opportunity. Link building is the backbone of SEO best practices – using Attributor, you can now increase traffic on their destination sites by securing links.
Lastly, the findings are another reminder of images’ viral potential, waiting to be propelled by new viral content strategies. Implementing, measuring and optimizing these strategies requires web-wide, contextual visibility of where your content appears.
And the award goes to…Megan Fox! Best known for her heroics against the Decepticons in the movie Transformers, Megan Fox had the most copied celebrity image on the Web out of all the images of the women on Maxim’s “2007 Hot 100” and FHM’s “100 Sexiest Women 2007”, despite her official ranking of 18th and 65th on those lists, respectively.
Here’s what we did:
First, we found lists of the hottest women in 2007 from FHM and Maxim, and lists of the hottest men in 2007 from People’s “Hottest Bachelors 2007” and People’s “Sexiest Man Alive 2007”
Next, we used Attributor’s image monitoring platform to scan the web for copies of the images.
Finally, we reviewed the results, tallied the number of times each image was copied, painfully sorted through thousands of pictures of beautiful women, and categorized the type of site doing the copying.
The Results:
The top five most copied female celebrity images on web with the FHM and Maxim rankings shown in parentheses are:
Megan Fox (65,18)
Jessica Alba (1,2)
Rihanna (Not Ranked, 8 )
Halle Barry (16, 55)
Lindsay Lohan (41, 1)
Let’s not forget the men. Matt Damon will undoubtedly be pleased that he led the list of the most copied male celebrity images across both the People “Sexiest Man Alive 2007” and “Hottest Bachelors 2007” lists. His married status did not dampen Web user’s enthusiasm for his photo at all. Bachelor Matthew McConaughey, the cover photo for the “Hottest Bachelors” list, made a respectable runner-up rank in terms of his well-copied image.
The top five most copied male celebrity images from both People lists with official magazine rankings in parentheses are:
Matt Damon
Matthew McConaughey
Patrick Dempsey
James McAvoy
Jake Gyllenhaal
And where are these images being used most? You guessed it. Gossip sites – they represent 36% of all sites found as publishing your own gossip site appears to be the new black. Here’s a breakdown of the sites where we found the images
Gossip Sites 36%
Movie sites: 15%
Fan sites: 7%
Recognized domains that appear to be licensing the images: 7%
Splogs: 2%
Other 33% (Personal homepages/blogs, non-english sites)
Web advertising networks – which include those run by Google, Yahoo and MSN – do a great job presenting advertisements that are highly relevant to the content on any web page. The question is, how often does the revenue from those ads actually reach the folks who create and own the content?
Our recent study on music lyrics illustrates the magnitude of this issue very well. First some background – last April, Yahoo Music partnered with Gracenote and became the first site to publish “official” song lyrics. The USA Today reported that Yahoo shares with the copyright holders the revenue from the ads that will be displayed alongside the lyrics. Just last week, MTV and AOL announced that they would also promote official lyrics on their web sites.
Why so much attention to song lyrics? It all comes down to Search. According to an Ask.com study, the term “song lyrics” was the 6th most popular search query last year.
What we did:
Loaded lyrics from the following 14 songs into Attributor in mid-September: Umbrella (Rihanna), Before He Cheats (Carrie Underwood), Big Girls Don’t Cry (Fergie), Bleed it Out (Linkin Park), Beautiful Girls (Sean Kingston), You Can’t Stop the Beat (Hairspray Soundtrack), Can’t Tell Me Nothing (Kanye West), The Pretender (Foo Fighters), Stronger (Kanye West), Plies (Shawty), I Get Money (50 Cent), Let it Go (Keyshia Cole), Ayo Technology (50 Cent) and Good Life (Kanye West)
The service then scanned billions of pages across the web to find copies of the songs
For each song we compared the search engine ranking of Yahoo Music’s “official” version with the copies on Google and Yahoo search engines
What we found:
1524 nearly exact copies across 300 different sites
57% of the copies had ads on the pages
None of the copies contained links back to the official version at Yahoo Music
100% of Google searches ranked the copying site higher than the official version when searching with terms for “Song + lyrics (e.g. “stronger lyrics”)
81% of Yahoo Searches ranked the copying site higher than the official version using the same search terms
To view the entire study and find out how much more Kanye West’s new album was copied than 50 Cent’s new release, please download the .pdf
So what can newspapers, magazines and writers do to capture full value for their original content? The first step is understanding how and where your content is being copied. With this information, you can decide how to act through Attributor:
Request a link back to your original improving your search engine ranking.
Ask the site to deposit a % of the revenue they make from your content into your AdSense account.
Send a formal DMCA takedown notice – we will ensure that it gets taken down from search engines.
Last week, Eileen Naughton, Google’s director of media platforms, told the American Magazine Conference, “Don’t fear Google”. With Google’s AdSense revenues surpassing $5 Billion a year, “Fear” is the wrong term. How about making Google and the other search engines accountable?