SOLUTIONS HOW IT WORKS PRODUCTS NEWS & PRESS CONTACT US

Archive for October, 2007

A link is worth a 1,000 words

One of my favorite parts of my job involves hypothesizing about which types or genres of content might be copied frequently and then testing my theories live.

Diving into recipe copying felt very natural for me – I am an aspiring chef who has taken more amateur cooking classes than I’d like to admit. I love to try out new recipes on my unsuspecting family and—unlike my mother’s generation—don’t have to pore through unwieldy, batter-splattered cookbooks to find a new way to make chicken.

As it turns out, many of the recipes I find online are copies. Jennifer Guevin of CNET wrote an in-depth piece on our study of recipe duplication online, but here are the highlights.

What we did:

  • Loaded 37,000 publicly available recipes from Epicurious.com, Allrecipes.com and RachelRaymag.com
  • Let Attributor scan for matches or copies of the recipes.
  • Reviewed the matches and used Attributor’s % copied sorting feature to eliminate those that seemed to be derivatives, rather than copies of the original.
  • Took a random sampling of the recipes and plugged the recipe titles into Google search.

What we found:

  • Over 10,000 copies of the recipes were spread over 3,000 different sites.
  • Most were almost word-for-word copies. Across all matches, the average % copied was over 70%.
  • 57% of the sites with copied recipes had ads on their pages.
  • 60% of the sites with copied recipes failed to link back to the original recipe site.
  • For over 50% of the recipes we put into Google search, the copied recipe had a higher search rank than the original.

What it means:

Recipe Sites are definitely losing out on traffic. Using a conservative methodology that excludes the search engine impact of copies outranking the original, we estimate recipe sites are losing ~13% of their monthly traffic to recipe copying.

Will the recipe sites want to send DMCA Takedown notices to the 3,000 sites copying their content? Probably not.

Might they want to request each copying site to link back to their original recipe site? I think so, especially when the recipe sites realize the impact these links would have on their search engine rankings. Links are a critical part of search engine rankings, and anyone who publishes original content needs to understand it.

Another month passes . . . without Google’s Claim Your Content

For those keeping track, Google announced in April they were very close to launching a new filtering system, dubbed “Claim Your Content”. The system would give content owners automated tools to identify copyrighted material for removal. Later, in July, an attorney representing Google said they were planning to roll out Claim Your Content in September.

The industry relaxed a bit. Bloggers rejoiced. Lawyers started to look for new sources of litigation.

It was a great step for the online content economy – at last, the industry would have the transparency and accountability required to support the motivations of those who create and publish valuable content.

Today is the first of October, and still no word from Mountain View. And while everyone waits, some point out that Google continues to profit from sites with unauthorized copies of original content.

Is Google delaying the launch to milk even more out of its immensely profitable search engine? I doubt it. A better explanation for the delay might be the realization of the major challenges involved in getting this right.

You might ask, what’s hard about anything for Google? Here’s what I think are the six reasons it is particularly difficult for Google to do this right:

1. Removal across Google’s main index. Everyone focuses on YouTube’s responsibilities as a hosting site, but since Google is the world’s leading search engine, shouldn’t Google also scan and remove instances of pages with that video in the broader Google index, even when hosted on another site? That’s not an unreasonable expectation, particularly if – as at least one analyst believes –Google is already applying digital fingerprinting to content to improve their web indexing and eliminate duplication.

2. Removal from AdSense network. Google’s AdSense is the fuel that makes much of the online economy go. So if Google removes a video from YouTube but it shows up on another site that has AdSense ads, why shouldn’t the owner expect that Google would remove those ads – as Google’s own policy promises?

3. Removal across the Web. Google has a commanding lead with YouTube, but there are hundreds of thousands of sites that host or embed videos. Without a Web-wide solution, publishers will have no visibility into content popping up on the latest social network, blog or hosting site. Unless Google can make those content claims count across the Web, no individual site has much incentive to go legit, since they know this just gives the edge to less conscientious competitors.

4. A solution for all media types. Video may get all the press, but text is still the Internet’s navigational currency. The text on your site powers your ads and search rank, but text content also supports splogs and useless made-for-AdSense pages. Images make the Web worth viewing, yet nine out of 10 Web images may involve infringement. Can Google expect to bring all types of media together in universal search results, but only let you claim your content when it is a video?

5. Available to publishers of all sizes. I’m sure Google gets this one – they practically invented the “long tail” and realize that success is not just about satisfying Viacom and Disney. But that means that any publisher large or small must be able to stake their claim and have it count.

6. Independent and unbiased. For publishers to feel confident in a content claiming system, they must believe that it works without conflict of interest. Since Google controls and monetizes most search results and puts ads across more content pages than any other ad network, what extra steps must Google take to gain the confidence of publishers that their claims will always count, and that questions of fair use will be resolved objectively?

A long list for sure, but nobody said it would be easy. There’s no doubt Google believes in the potential of the online content economy – that’s why they paid $1.65 billion dollars for YouTube. The question is: Given Google’s unique role in the content economy, are they really in a position to make it work?

What do you think is a must-have for Claim Your Content? Any predictions for its eventual roll-out date?