Guide - Duplicate Content - Myths & Facts

madoctopus · Dec 9, 2010

I see a lot of people misunderstanding the whole duplicate content thing. Some say such thing does not exists and if you start an autoblog you will get rich, others swear that Google loves unique content and you have to write everything by hand. As usual, the truth is somewhere in between.

What exactly is duplicate content
This is where most people misunderstand the whole situation. Search engines define it as two or more pages on the same site, that have identical or very similar content. Some people put identical or very similar content that exists on different sites in the same category, though. For clarity, from here on we will call duplicate content on pages of the same site "duplicate" and the duplicate content on different sites "non-original". By "non-unique" or "unique" I will refer to any type of content that is not unique (or unique), regardless if it is on same site or on different sites.

Quality is the reason
There is no penalty with any of the two cases. Penalty means punishment. Punishment would mean search engines see you have non-unique content so they decide you've been a bad boy and you have to be spanked. That is not the case. The real reason why search engines have an issue with non-unique content is that they want to offer quality in they SERPs. The purpose of the search engine is to give you answers to your search.

Let's take an example situation where you search for "digital camera". What are you really searching for. If you would refine the search, you could make it any of the following:

buy digital camera
digital camera shopping
digital camera store
digital camera reviews
digital camera specs
best digital camera
compact digital camera
dslr digital camera

Now, the key is to determine user intent. You may be interested in information (specs, reviews, which is the best model), or perhaps to buy one (buy, shopping), or maybe you have a specific type of camera in mind (compact, dslr). If you search for one of the variations in that list, the SE knows better what you're looking for. If you search for "digital camera" however, SE don't know what kind of information you want. They don't know much about your intent, besides the fact it is related to digital cameras.

In order to make sure you get the reason you were looking for, a SE will try to include all types of results in its SERPs. You will get a few online retailers like Amazon.com, a few review sites like reviews.cNet.com, generic sites like Wikipedia, some YouTube videos in case you prefer a video over reading, etc. Basically, the SE gives the user a comprehensive result.

Now, imagine you just searched for "digital camera" and the SE returned only stores. If you want to buy that's perfect, but if you want to read reviews you think "WTF, I don't want this stuff, I am not ready to buy". That's the mildest form of getting results that are not useful. In a way is like being served salad, then salad, then salad again instead of a stake with salad and then desert. Imagine if we take this a step further and instead of getting 10 shopping results with different information, price, etc. you get 10 sites but each of them is entirely identical to the other one. What's the point to get 10 different results if all sites have the same price and give you the same information?!

This is exactly why a SE will do its best to give you results that are different from each other. Even if you search for a very specific long tail keyword like "Canon T2i Digital Camera Review", you don't want 10 results that point to different sites having the same information. You want to read more than one opinion about that particular digital camera.

Duplicate content (on the same site)
Since a SE will only include one result from each site in it's SERPs, if you have two or more pages with the same content on your site, it has to pick one and discard the others. Of course you can get a double listing, where the SE will show a second indented URL from your site. However, the important thing to note in this case is that the SE will show a second page only if it thinks it is a good complementary result to the first one. It will never show as a secondary indented result a URL of a duplicate page because it make no sense.

What you have to do in this case is make sure that the SE will show the URL you want. You may get duplicate pages on your site because of the tag listings in the case of blogs, or because of dynamic GET variables (e.g. product_listing.php?min_price=100&max_price=200) which actually return the same items as a different set of variables (e.g. product_listing.php?min_price=0&max_price=300). Those are not the only ways to end up with duplicate content on your site. There are countless ways. I won't go into details about what you should do to avoid having duplicate content on your site.

There are many good guides about this topic, information architecture and usability. One thing I will mention because many people disregard it, is that you are very likely to be better off with the tag listings marked with "d0f0llow,n0index". Obviously this is valid generally for blogs. Also, it helps to add a description of about 100-150 words on your category pages or any listing pages so even if you do end up with most of the posts in the listing being the same, that description will make the page be a bit different. Most importantly, have different page titles (for pagination you can add a "page 2" after the title). As I said, I won't go more into it here. Information architecture is however a very important aspect when it comes to both SEO and usability.

Non-original content (on different sites)
If you use non-original content on your site (e.g. you "stole" it from some other site) and you don't rank, it means you're penalized, right? NO! If you've been following through, you should already know why you don't rank. If the source site (from where you stole the content) already ranks with that content, there is no way that you will rank for the same keywords. Since the keywords in this piece of text are one of the ranking factors, the other major factor being backlinks, if you don't have more and/or better targeted backlinks than the source, you won't rank for any keyword. Bottom line is that if the source ranks for a keyword, you won't rank for it, or you will rank way below the source site (source ranks on #5, you rank on #250).

That means, in order to rank with non-original content, you have to beat all the other sites (that use the same content) with backlinks. This is a really bad idea, because unless you post this non-original content on a domain with high trust rank and authority and at the same time build strong and/or many relevant backlinks, you will never outrank the source site, or whatever site has the same content and does a better job at link building and trust rank growth.

If you have a 1 month old site with a relatively small number of backlinks and about 100 pages and Wikipedia steals all 100 pages and publishes them on wikipedia.com you can bet your right arm they will rank #1 for any keyword related to those pages and you won't rank anywhere. They don't even need external backlinks towards those specific pages, their trust rank and domain authority is enough.

Some content is inherently non-original
The SE engineers naturally realized that some content will get copied a lot on many sites. It is the natural life of such content. Think about product specifications, news, press releases, quotes, etc. A digital camera has certain specifications released by the manufacturer. While you can change a word here and there, most of it stays the same. News gets published on a bunch of sites, especially short news. If Barack Obama said something, you have to quote him as he said it. You can't change his words.

Mashups
An interesting breed of sites with non-original content are the mashup sites. They basically construct a page with content on a given topic, by combining small excerpts from multiple sites that cover that topic. This, while resulting in non-original content is actually very useful for a visitor because it gives him a collection of summaries about the topic of interest.

Imagine you want to buy a digital camera, you could manually search for reviews on a SE, then search for specs, then search for stores and try to find the best price. It's time consuming. Instead you could go to AlaTest.com or TestFreaks.com which are mashups of reviews and get all the info you need in one place. It's comprehensive, you can go to the source site to read more in-depth if you wish so and you save time.

The content of mashups is not a perfect copy of some specific page of another site, but a collection of fragments of multiple sites. It is a big difference between this and how most people implement autoblogs - by copying the excerpt of one specific post only from it's RSS feed.

What you should remember from this is that if you want to build autoblogs, you should think mashups instead. It is significantly more difficult to implement a mashup system but if you do it right, it is worth every minute.

Spun content
Spun content, if done well, can look original enough to make it in the SERPs. Of course, the "if done well" is the key. Most people don't invest the proper time to manually write complex, multi-level seeds. They just replace words with their synonyms which means the number of good (original enough) spins is very low. Since they are eager and adepts of "get rich quick" mentality, they will generate way more spins than optimal, reducing the uniqueness of ALL spins (results). If you write a complex seed article and spin it within the optimal limits however, this technique can give you a lot of content and for a fraction of the cost of what you'd pay somebody to write it for.

A few things to keep in mind regarding spun content:

It is extremely difficult to develop an algorithm and software to accurately detect non-original content across the entire Web. It would require extreme processing power that not even Google has. This is because you would literally have to compare any web page to every other page on the web. However, there are shortcuts that though are not great, can be implemented using much lower processing power. They generally rely on some sort of statistical natural language processing (generally using N-gram patterns). Using such an approach, it is actually extremely easy and not resource intensive to detect content resulted from too many spins.
It is relatively easy to detect incorrect grammar. Most content is not grammatically correct since not even native English writers are 100% grammatically correct on the web. However, when you detect a very large percentage of grammar mistakes in all the text on a site, it's a high statistical chance there's something fishy with it.

What you want from your content?
When trying to decide whether to use non-original content or write original one, you should think what you require from that content. If you need content for a high quality site with a lot of traffic (a.k.a. potential customers), you obviously want content that converts. If you want content for your link network/wheel/pyramid/spherical-cube-with-5-edges and you don't need those sites to convert but just sit at the bottom of the "food chain", feel free to use non-origina content. Just remember that mashups, mixed, spun or any type of "randomization" (for lack of a better term) is much better than just scraping the excerpt from a RSS feed or stealing the entire article as-is from an article directory.

By the way, "spherical-cube-with-5-edges" is a new, extremely powerful linking scheme that I developed. It relies on principles of quantum mechanics and the theory of relativity. Yeah, I'm just messing with you, genius... I smell some melted brain there.

Going black hat
WARNING: This is ILLEGAL. It is copyright infringement. Anyway, I bet some of you already do it, so I'll tell you how to do it properly.

Have you ever stolen articles from article directories or even from the source sites? Many people do it. Funny thing is that besides from being illegal because you're stealing somebody's work, it doesn't help them much either. That's because their timing is wrong.

They go and search for "my super duper keyword" on EzineArticles or whatever, pick 20-30 articles and dump them onto their new blog. They remove any link from the article (resource box) and place their own links in the body of the article. Then they repeat the process. Maybe if they are more skilled when it comes to coding, they build an automated system to do it while they sleep. Regardless how they do it, it is not too efficient. If you've been paying attention you should know why. They basically copy the article as-is, the full article and as I explained way above, end up competing against the source site (e.g. EzineArticles) solely in the backlinks arena. Good luck with that, you will need it. If that article is old, a SE is pretty sure you're not the source. If it got picked up by other sites and published on them you don't compete just with the source site, but the others too. You're screwed! The link you get from that page won't be entirely useless, but not too helpful either. There is one thing you can do however: monitor new sites that pop-up in the SERPs for your keywords (note I said new sites not new pages of old sites) and also new articles on article directories (make a script to monitor them or their RSS feeds). Steal the content as usual, then throw some links to that page quickly. Don't build crappy links though. No N0F0ll0w crap or profiles that take long to get indexed by the SE.

The goal is to get that new page indexed fast and make the SE also find the links towards it pretty fast. How fast? If you can get indexed in 15min you're game. Works up to a few days too, just not as well. If you get indexed before the source site and the SE picks up the backlinks too, you have a chance of actually ranking higher. Otherwise, you won't rank higher than the source but you will outrank the other sites that steal the article later and are too lazy to build links to it.

Ideally, you don't want to rank higher than the source because people might complain. Heck it doesn't even matter if you rank in the first 200 positions. But if you do it like this, the links from that stolen article on your site is more valuable.

There are some things you have to keep in mind:

If your entire site has content built like this, it won't do too great. Not even when it comes to the backlink value. Google will simply see you're "fissy" and not love you much. However, if you mix this technique with some properly spun content, some mashup-style pages, etc. Google won't have an easy time figuring out if it's a spam-blog or a normal site.
It's not really worth doing it if you do it by hand and/or with your own domains. Unless you automate the whole thing, and use junk sites (e.g. WPMU hosts, free blog platforms) is too much effort for too little gain.

That's all folks! I hope it makes things more clear for you and helps you build better strategies. When you make your 1st million dollars don't forget to send me a bottle of Jack Daniels.

GreyWolf · Dec 9, 2010

Nice guide madoctopus. Hopefully it'll help people understand that duplicate content is only an issue under a few circumstances. For duplicate articles on separate websites it really doesn't matter unless you're doing article submissions. It isn't an SEO problem so much as the better article directories requires unique content in order to be accepted. As far as the search engines are concerned it isn't a problem unless its duplicate content on subpages of a single website.

There was a discussion earlier today in the thread SEO myths and reality, that antx16 covering the same topic.

Since this thread is pretty much targeted specifically to the topic of duplicate content, I'll quote my replies from the other thread here. I think they can add a little to this thread as well.

GreyWolf said:
The idea of a duplicate content penalty goes back to the days before google when webmasters would create multiple identical landing pages on a site with different filenames and titles targeting different keywords. At the time it was a good SEO tactic, but it quickly became abused and the SEs began treating it similar to keyword stuffing. People with SEO websites, SEO blogs, Guru ebooks, and other instruction sites started telling people that multiple pages on your website must have unique content.

Then article directories started becoming popular, webmasters started submitting articles for backlinks. It was common to submit the same article to multiple sites, and even just take an existing article change the backlinks and submit it again. In order to maintain a higher quality level many directory sites would reject articles that aren't unique. The article directory FAQs, writing blogs, making money ebooks and instruction websites targeting article writing started telling people the need to have unique content.

Somehow these separate issues became combined in peoples minds. People posting on forums, writing blogs, making 'get rich quick' ebooks, etc started combining those to different situations into a single idea that 'you have to have unique content for everything'. You really only absolutely need unique content if your submitting to article directory sites that demand it.

There's many legitimate reasons for duplicate content to exist and even be necessary. So, search engines don't really care about duplicate content on unrelated websites. The duplicate content penalty everyone worries about only applies to duplicate content on the same website. Whether you need to use unique content or duplicate content depends on what you want to do with your website and what visitors your targeting.

Unique content is still a good idea, but not when it reaches the point of becoming unreadable garbage. If you want to have a popular site that people have a reason for returning to then it's a good idea to include a lot of unique content. But many sites composed almost entirely of duplicate content also do very well. Autoblogs are a perfect example that duplicate content isn't a SEO killer for a website. Press releases are a great example that duplicate content isn't a SEO killer for articles.

GreyWolf said:
Read my above post again. The duplicate content penalty isn't for content on separate websites.

If you scrape a post and the SE finds your website and the original author's website then it will index both pages for the same keywords. The site that has better overall SEO will have their page ranking higher. The one that doesn't rank as well will be due to other factors, not because it's duplicate content. If you don't have unique content then you might be up against more competition for keywords in that content, but that isn't really a penalty.

If you have multiple pages on your own website that all use the same content, then as google crawls your site and finds multiple pages with duplicate content then it'll choose one page as the most relevant and ignore the others. Unless your using a canonical meta tag, google will make its own decision which is the best page. Whatever choice it makes will probably not have the effect your hoping for when you created the pages. Instead of having one page ranking well, you'll have multiple pages ranking poorly.

Even if other websites have the same content... even if their sites are already indexed for that content... that same content on one of your pages will still be unique when compared to the other pages on your website. For your own SEO efforts the keyword value of that content will apply to that single page of your site. How well you rank compared to other similar websites will depend on how much better your overall SEO and SEP is than theirs.

GreyWolf said:
{Another important point is that}... there's a difference between duplicate content and copyright infringement. This might also explain how the idea of duplicate content penalty has expanded well beyond what it actually is.

Algorithms can't really distinguish between duplicate content and copyright infringement, so there's really no way for the crawlers to tell the difference. But if you're using copyrighted content and someone files a complaint to google then your pages can be removed from the results.

This isn't really a duplicate content penalty though. It's a copyright infringement penalty, and it will only happen if google is given notification by the copyright holder. The bulk of the content people talk about when discussing duplicate content does not fall into this category.

GreyWolf said:
The reality of scraping is that most duplicate content doesn't fall into the category of copyright infringement. When dealing with copyrights, most of the time reusing scraped content falls within the realm of fair use. Even if fair use doesn't apply, most of the time no one will ever file a complaint.

In any case it isn't a duplicate content penalty.

{If you're using the scraped content to post articles on blogs and other pages just to create relevant backlinks to your main website }... then that insulates you {somewhat} even from a copyright infringement penalty. Even if one of the articles did contain copyrighted content, (and someone filed a complaint to google), all that would happen is that one page would get removed from the serp. Since it isn't a page on your site all that would mean is you lost that single backlink. All the rest of the articles you scraped and reposted will still be out there working for you.

madoctopus · Dec 9, 2010

@GreyWolf: yeah, i started to write a reply to SEO myths and reality thread, but ended up pretty big

So I decided to start a new thread.

As I said, I think that copying content as-is is not the best way to go. I have sites where I added non-original content and while previously they were (the original content) indexed in 1 day, after posting a bunch of non-original posts it started to take days to get indexed. Also, relying only on such content solely to build backlinks I don't think is a good idea.

GreyWolf · Dec 9, 2010

madoctopus said:
@GreyWolf: yeah, i started to write a reply to SEO myths and reality thread, but ended up pretty big So I decided to start a new thread.

As I said, I think that copying content as-is is not the best way to go. I have sites where I added non-original content and while previously they were (the original content) indexed in 1 day, after posting a bunch of non-original posts it started to take days to get indexed. Also, relying only on such content solely to build backlinks I don't think is a good idea.

I don't disagree with you that unique content is usually going to be better for your site. Both onsite content as well as offsite articles.

You have a good point about using unique content. Depending on what your doing it can help your site a lot. But that isn't really because of any kind of duplicate content penalty, but more of a quality issue. You're definately better off creating good quality sites with good quality unique content.

That doesn't mean you should just create garbage because it's unique. Your better off using duplicate content than you are with unreadable garbage content. People that get so worried about duplicate content that they start spinning articles to the point of becoming jibberish, then the quality is lost.

Whether you're better off with unique content or duplicate content really depends on what your doing. People that get so consumed with the idea of a duplicate content penalty can end up creating content that does them more harm than if they had just not worried about it.

madoctopus · Dec 9, 2010

@GreyWolf: yeah, it is certainly not a penalty and works well for link building. but, from my experience it helps with the link juice and quality as you say if you mix and mash up the content.

regarding spinning i built a system that generates up to 1000 words articles. it is based on about 60,000 words (the seed text), logic rules to build up the paragraphs and sentences and a madlib approach (has a database with facts behind it). I should get 10,000-40,000 articles out of it and most of them are 50% or higher unique. Also, any resulting spin is more than 90% grammatically correct, it is coherent and guess what, it is also useful to the reader. Basically a natural language generation system without the fluff (AI, math, n-grams database, etc).

tutzor · Dec 9, 2010

regarding spinning i built a system that generates up to 1000 words articles. it is based on about 60,000 words (the seed text), logic rules to build up the paragraphs and sentences and a madlib approach (has a database with facts behind it). I should get 10,000-40,000 articles out of it and most of them are 50% or higher unique. Also, any resulting spin is more than 90% grammatically correct, it is coherent and guess what, it is also useful to the reader. Basically a natural language generation system without the fluff (AI, math, n-grams database, etc).

That's impressive, send me the source code

pasdoy · Dec 9, 2010

tutzor said:
That's impressive, send me the source code

lol, good luck

xbox360gurl70s · Dec 9, 2010

Thanks and rep given. I can tell you put hours to get this report done. Kudos to your efforts, it was a well informative read on my part, been gaming the system ever since but fresh new insights from your report do me well

philly3 · Dec 9, 2010

This is VERY useful! Glad to see someone posting content that can really help a lot of BHW users! Thanks given

hamd01 · Dec 9, 2010

I hate to be a stick in the mud, as its really great that people contribute articles such as this..... BUT....

NOBODY knows! Fact! If they did know they would be on a beach retired. You cannot say yes or no when it comes to Google. You certainly can't determine how Google treats non-unique content.

IMO the safest bet is to use unique content, as you know 100% that Google wont penalize you for it. By using anything less you run the risk.

For example, take a look at this interview with a senior Google developer. He clearly states that unique products, services and content. To quote him word-for-word " It comes down to creating unique content on each page".

http://www.toprankblog.com/2010/06/google-maile-ohye-ses/

Bottom line, nobody knows how Google works...and the best information we have from Google staff is that UNIQUE CONTENT is crucial.

incomefast · Dec 9, 2010

Really useful post, Thanks. This type of post will really make BHW useful for all.

GreyWolf · Dec 9, 2010

hamd01 said:
I hate to be a stick in the mud, as its really great that people contribute articles such as this..... BUT....

NOBODY knows! Fact! If they did know they would be on a beach retired. You cannot say yes or no when it comes to Google. You certainly can't determine how Google treats non-unique content.

IMO the safest bet is to use unique content, as you know 100% that Google wont penalize you for it. By using anything less you run the risk.

For example, take a look at this interview with a senior Google developer. He clearly states that unique products, services and content. To quote him word-for-word " It comes down to creating unique content on each page".

http://www.toprankblog.com/2010/06/google-maile-ohye-ses/

Bottom line, nobody knows how Google works...and the best information we have from Google staff is that UNIQUE CONTENT is crucial.

Well you might be quoting him "word-for-word", but you're applying it out of context to what was said.

Here's the statement you're referring to.

What about for e-commerce sites that are extremely large with lots of products? We've seen sites with fewer items and more information on each product outranking us.

That's what we would expect from Mayday. Users don't care if your site has many items, they care about descriptive content. They don't want to see content that is just a title and an image. It comes down to creating unique content on each page. We crawl what we want to keep in the index and we keep in the index what users want to see in the search results. The drop off you are noticing is because we are focusing on content-rich pages and less on sites that are just tons of pages without value. If you link content up and bring the link structure up, we'll crawl that more often. But at the end of the day, more content matters.

That reply is in regards to websites that are providing more images and less content. What it's saying is that product images on webpages also need to have descriptive content. Product pictures with just a title is not enough, it also needs content.

Your jumping right at the idea, "It comes down to creating unique content...", but your mind is leaving out the "...on each page." part. You're also disregarding that it's in reference to e-commerce sites that use too many pictures with little to no descriptive content. The most important statement in that reply is, "...we are focusing on content-rich pages and less on sites that are just tons of pages without value."

Any time you hear or read something from the Google PR dept, they always say unique content. It's like a PR catch phrase, they use it no matter what, even if all they mean is your site needs content.

The google PR dept will release statements worded in such a way to imply people need to build sites with tons of unique content. But all they're really saying is you need to build content rich sites, and each page needs to have different content from the other pages on your site. Google will never come right out and say it, but it's very possible to create a unique content rich website with many pages full of value even if the same content exists on other websites as well.

Most importantly though... the statements in that article are referring to unique content on your website. Even if you take that to mean unique content that exists nowhere else on the web, it still doesn't mean you're backlinks all have to come from sites with unique content as well. In other words you can still post duplicate articles across all the web2.0 sites you can. Create blogs and profiles every place that will let you and post the same well written article and include backlinks on all those sites with relevant keyword anchor text. All those duplicate articles will still give you very relevant backlinks that will help your site move up in the serps.

So instead of worrying about having absolutely unique content, webmasters would be better off focused on creating sites that are content rich and full of pages that provide value to your visitors. Each page needs to be unique, but you can still create that even if using duplicate content. It just can't be duplicated between the internal pages.

If you're building a whitehat site then it's a good idea to have unique content that sets you apart from everyone else. It gives visitors a reason to keep returning to your site. But the most important part, unique or not is to have lots of content rich pages that will make visitors find value in your website.

It's been stated many times before...
If you want to have a site that google likes, then create a site that visitors will like. That's what google is trying to find.

madoctopus · Dec 9, 2010

It's your opinion but I disagree with you. You don't have to know the entire algorithm that Google uses. Understanding a single combination of factors and what result it gives is enough to do something useful.

All the information I included in my guide is based on observations, statements made by Google and other SE, research papers and talks by search engineers and natural language researchers. I just derived a conclusion and an explanation from all that and thought how I could use all this information in an useful manner.

As long as it is technically impossible to detect non-original content and as GreyWolf said, detect copyright infringement, and since some content is inherently non-original, things look pretty clear to me. If you also read a bit about NLP and stuff like that things become a bit more clear.

About the penalty... there is no such thing. Google said it themselves. And if you still think there is a penalty you should read my guide again because i explained it there pretty well i think.

If you think copying content from other sites doesn't work you can look at some of those crappy splogs that actually rank with stolen content. You can also look at alatest.com and testfreaks.com - file the first one employs some natural language processing, feature extraction and natural language generation techniques (which is really not something easy to develop), testfreaks.com just dumps excerpts from review sites on a page, specs list from database, price comparison, adds a comment form so they actually have a chance to get some original content from users, and that's about it.

hamd01 said:
I hate to be a stick in the mud, as its really great that people contribute articles such as this..... BUT....

NOBODY knows! Fact! If they did know they would be on a beach retired. You cannot say yes or no when it comes to Google. You certainly can't determine how Google treats non-unique content.

IMO the safest bet is to use unique content, as you know 100% that Google wont penalize you for it. By using anything less you run the risk.

For example, take a look at this interview with a senior Google developer. He clearly states that unique products, services and content. To quote him word-for-word " It comes down to creating unique content on each page".

http://www.toprankblog.com/2010/06/google-maile-ohye-ses/

Bottom line, nobody knows how Google works...and the best information we have from Google staff is that UNIQUE CONTENT is crucial.

vickygarg · Dec 9, 2010

It was really informational, though I have a question, I have a wordpress blog and the tags are ******** and are even indexed in google? Should I nofollow them and is there a plugin for it? Thanks

madoctopus · Dec 9, 2010

vickygarg said:
It was really informational, though I have a question, I have a wordpress blog and the tags are ******** and are even indexed in google? Should I nofollow them and is there a plugin for it? Thanks

Depends on how different the tag listings are. You can try it and see if you get an increase in traffic in the next 30-60 days. I did it on a site that had many tags and traffic increased with about 50% just from that. It doesn't mean it will improve your traffic or SERPs or anything, but it is worth giving it a try.

Update:
do not n0f0ll0w them. I said "n0index,f0ll0w", so that SE will follow the links on them but not index them.

The correct way to use tags is to add a description to each tag (at least 100-150 words) and make sure it gets displayed on the tag listing page. You may have to tinker with the theme files to get it displayed as most themes don't display tag or category descriptions. Also, do not use tags that would result in listings having the same posts. For example if you have a blog about WP themes, do not use two tags as "black" and "dark" as you will likely end up with the same posts listed for both tags. Also, reduce the number of words of the excerpt in tag and category listings. 50 words is enough. More than 100 is too much. Under no circumstances should you display the full post body in the listings. Also if the title of the tag listings is something like "Posts tagged <tagname>" remove the "Posts tagged" part and leave just the tag name. If your blog is targeted at one specific thing (e.g. WP themes) you can add that after the tag name so you get something like "<tagname> wordpress themes" (e.g. black wordpress themes).

aj113 · Dec 19, 2010

Very informative opening post, and although I'm no expert, I'd like to add my views: I think that it's important not to get bogged down with semantics. Whether there is a 'duplicate content penalty' or not is a moot point. Ultimately, all that matters is whether non-original content on your site hampers your SERPS results in any way - and for any reason. If it does, then the simplest and safest option is to use original content (IMHO).

madoctopus said:
.....

Spun content........they will generate way more spins than optimal, reducing the uniqueness of ALL spins (results). If you write a complex seed article and spin it within the optimal limits however, this technique can give you a lot of content and for a fraction of the cost of what you'd pay somebody to write it for....

Members may be interested to know that SpinnerChief (free software) has a unique 'spun article similarity percentage' function. After spinning articles, you can compare them to each other as opposed to simply checking them against the original article.

This can save nours of wasted time by solving non-unique problems with your spun content BEFORE publishing, indexing, creating backlinks, pinging etc.

GreyWolf · Dec 19, 2010

aj113 said:
...Whether there is a 'duplicate content penalty' or not is a moot point. Ultimately, all that matters is whether non-original content on your site hampers your SERPS results in any way - and for any reason....

If people think there is a duplicate content penalty, then they think the second part of that statement will always be the case.

IF non-original content does hamper your position in the serps, then of course you do need to do something. The point is that it doesn't ALWAYS negatively affect your sites position.

Sometimes totally original content is the best solution.
Sometimes spun content is the best solution.
Sometimes duplicate content is the best solution.

It's best to have a good understanding of what's really happening regarding unique content vs duplicate content in order to make the best decision on any particular projects. So really it isn't a moot point after all.

edit-
here's another thread discussing this topic from a few months ago.
No Such thing as duplicate content - Revealed

aj113 · Dec 19, 2010

GreyWolf said:
... The point is that it doesn't ALWAYS negatively affect your sites position....

Yes, that was my point. Put simply, if unoriginal content is affecting your SERPS results - change it for original content. Conversely if you are sailing along merrily with unoriginal content with no adverse effects then clearly you need do nothing about it. That is what I meant when I said the point was moot regarding a dupe content penalty.

madoctopus · Dec 21, 2010

If you want to do tests, don't use made up keywords with absolutely no competition or results for them. Instead use a combination that has no exact matches but uses existing keywords that have competition and results. Something like "Canon XFGT52 Digital Camera" where "XFGT52" model obviously doesn't exist. And don't test with 1-page sites. Build an actual site with 10 pages or so.

Even so, you can't be sure if the results are meaningful.

789abc · Dec 22, 2010

thanks for the info. now i am going to spin an article and submit to 10 d0f0llow article directories.

Guide - Duplicate Content - Myths & Facts

Supreme Member

Elite Member

Supreme Member

Elite Member

Supreme Member

Registered Member

Senior Member

Elite Member

BANNED

Power Member

Junior Member

Elite Member

Supreme Member

Power Member

Supreme Member

Senior Member

Elite Member

Senior Member

Supreme Member

Junior Member

Main Menu

Marketplace

Making Money

BlackHat World