What is duplicate content and Why you have to avoid using it

The spread of information across the web happens at close to the speed of light - it’s a reflex action now to share, quote, copy and re-use information. It’s called ‘duplicate content’. But many people don’t know it can also kill a website’s search ranking. Because if you have duplicate content on your site Google will rank you lower or punish you further by pushing you back in its search results.

If you’ve just cloned parts of your content from somewhere else, instead of achieving a front row seat in search results, Google relegates you to the back stalls. Or drops you in a bin outside the back exit!

search content filtering

This article tells you what the pitfalls of duplicate content are, how to avoid them and the moves you can make to get the most out of your website.

What’s wrong with duplicate content then? How do you avoid it?

Google likes credibility. It likes to deliver quality websites with quality information to the people who are searching. That’s Google’s product. And it’s mission.
Websites with good credibility generate their own original content. It may seem easy and sometimes even relevant to use someone else’s work on your site. Unfortunately this will see Google as deeming the site at a lower quality due to recognising the copied content and penalise your exposure.

Google does not want to send people who are searching for quality information to sites deemed to be of lower value. Google recognises copied content and will not reward websites that have it.

More reasons why Google hates duplicate content

Due to people copying information found on the web, search engines get confused about which page has the priority and greater authority. It then registers the duplication in its own index.
This led to search engine confusion about which page had priority and greater authority, badly eroding the quality and credibility of Google’s search results.

This was heavily compounded as growing numbers of blog websites filled with the same duplicated articles kept ballooning across the web polluting Google’s index.

In essence it could be claimed that half the content on the web was genuine ‘customer facing’ information, while the other half was there to simply manipulate Google and help websites gain greater exposure by using ‘Black Hat’ techniques. (Black Hat: unethically presenting content in different visual or non-visual ways to manipulate search engines).

A solution was needed and Google’s ‘duplicate content penalty’ was born. This was part of Google’s ‘Panda algorithm update’ and resulted in big drops in exposure and traffic as many websites caught out with duplicate content literally vanished from Google’s search results.

Google’s panda algorithm continues to evolve and get smarter. If your website uses exact or partial duplicate content this could be holding you back or a ticking time bomb waiting for the next Google update.


 Google ticking time bomb
To summarise the main points of duplicate content:
  1. Duplicate content is when the same piece of information exists on more than 1 web page (or website)
  2. You can avoid it by ensuring all your website page information is unique and original. If its sourced from elsewhere take the time to rewrite and refocus it
  3. Duplication should be avoided as it erodes the relevance of your web pages and message, confuses search engines and results in lost exposure and less website traffic.

Why Search Engines punish duplicate content summarised;
  1. When you search Google the results you see isn’t the web, its Google’s representation of the web. It’s a list of websites Google has carefully selected for you. This is formed by Google bots (programs) following website links, reviewing the web pages it finds and adding them to its search index. So you are seeing the websites with the highest credibility scores
  2. Duplication makes it hard to define the content author, to evaluate the website, and provide a quality score to that page
  3. Duplication creates confusion about which web pages to include/discard from Google’s index
  4. In cases where duplicated content can’t be removed from your website, depending on the type of duplication there are best practice methods for dealing with this and avoiding/reducing any negative impact against your website – we address these later in this article

Duplicate content can exist in many shapes and forms and for different business reasons.
Here’s how to identify and manage them.


Content or blog posts - offsite

You may have received a syndicated article that will also be published on other websites, or have with permission re-used a third party article, or you may have even published your own article on an industry website and on your website also.

If your goal here is to gain Google exposure, keep in mind Google will give ownership credit only to one version of the article or block of content. If that’s not your version, you’re wasting your time, as it’s likely you will lose exposure.

However if you believe this content is relevant and beneficial to your website visitors, then there’s a method that will keep both your customers and search engines happy.

It’s a tag called Rel=”canonical”. Placed on the same web page as the offending content, the code will guide Google and Bing to the location of the original article and avoid any penalties or negative score placed against your website.

canonical blog article

<link href="http://www.other-website.com.au/original-version-of-page/" rel="canonical" />

As expected there’s a few twists:

  1. Let’s say you have published your articles on third party sites for recognition and exposure along with including the same on your own website. As a canonical tag on the third party site may not be possible, you will likely need to place a canonical tag on your website’s article page telling Google the third party site is the original. Why would you do this? Because if it’s credible, the third party site’s version of your article will provide greater brand exposure through its readership and the article should also gain exposure in Google, based on the site and article’s originality, relevance and quality.
  2. You’ve purchased a few domain names and published the same site and content on each. If they are serving unique regions, then ensure your content is unique, otherwise you may want to consider using a canonical tag, changing these sites to a landing page, or simply turning them off and redirecting all traffic back to the main site.
  3. Duplicate product information on ecommerce websites is of concern. Many product only websites struggle as they have very little unique content. It’s recommended that at minimum, main products have the content rewritten and extended. This is time consuming, but importantly it will help set you apart from the crowd and put you on the path to being unique and a step above the rest.

Content or blog posts - onsite


There are a number of things that can result in onsite duplication. This can include;
  1. Thin content – where pages on your website have very little unique information
  2. Content heavy (large blocks of text) page elements repeated throughout more than 1 page. This can be blog lists or article summaries and compounded by point 1
  3. Product or Service pages with very similar descriptions and related information
  4. Sorting, filtering and pagination of articles or ecommerce products
  5. A ‘same page’ URL which can be viewed using both upper case and lower case – especially with windows hosting
  6. Any of your web pages that are accessible with and without the WWW.


When it comes to search engines and content, always ensure you are promoting the correct single source

promote intended page version

What to do if your pages are similar in theme and keywords

Where numerous pages on your website are quite similar in theme and keywords, combining these pages to create one stronger quality page is beneficial (to both your audience and Google exposure).
Once finalised, you would then require a ‘301 redirect’ that sends people (and search engines) trying to access the old pages to the new one.

If you’re not confident in tackling this, take the time to compile a complete list, ensure you test all the web page addresses to make sure they work and then forward it onto your web specialist. They should take no more than 30 min’s to action this for you.

If you have a .htaccess file on your website here is the example code:

//301 Redirect Old Page
Redirect 301 http://website.com.au/old-page2 http://website.com.au/new-page
Redirect 301 http://website.com.au/old-page3 http://website.com.au/new-page


What to do if two pages on your site have similar information

Should you have two pages on your site with similar information, consider extending and refining the pages with more unique and relevant information. This will increase relevance for each page and reduce duplication. Examples of this could be mens’ and boys’ boots or a specific camera sold on numerous other websites.

similar duplicate content

Once your ‘per page’ content is detailed enough, (some claim 500+ unique words should be your goal) review it carefully. If you’re repeating large blocks of content like testimonials, blog summary or product lists on more than one page, try your best to reduce the word count for these elements so they don’t compete with and pollute the unique content on each page.

While your content may be unique, if there’s the need to sort, filter and show articles or products over numerous pages, this can create varied page addresses, which are sorted versions of the same content. In this case it’s important to tell search engines which page is the ‘original page’.
Here’s an example of the same page seen by search engines as 3 and not 1;

  • www.mywebsite.com.au/engine-parts/ba-ford-head-gasket-745.php?s=price
  • www.mywebsite.com.au/engine-parts/ba-ford-head-gasket-745.php?print
  • www.mywebsite.com.au/engine-parts/ba-ford-head-gasket-745.php?kword=ford&src=forum

Our friend the canonical tag is ideal for such situations.

<link href="http://www.mywebsite.com.au/engine-parts/ba-ford-head-gasket-745.php" rel="canonical" />

How to nominate an ‘original’ page

Allowing your website to be accessed with and without the WWW prefix and your pages in upper and lower case versions will result in twins, triplets and so on of the same pages in Google’s index.


To avoid the confusion of which page is the original, a few things can be carried out;

  1. Your website system (CMS) or hosting should provide an option to use a URL rewrite/redirect. This performs an auto match and uses the 301 redirect to push all requests to the original version be it upper case, lower case or with/without WWW.
  2. Register for Google webmaster tools and set your preference to either with or without WWW

While many expect to see a WWW, its personal preference, however while browsers are not case sensitive, from a search engine perspective, it’s always safe to use all lower case letters in your website addresses and links.

Rel=canonical can be used to enforce the promoted URL, however a 301 redirect automatically performed on the server is the best solution to this problem.

Using URL rewrite in IIS-Windows or Apache-Linux will allow your website to automatically redirect a given page to the correct version be it all in lower case or the preferred domain meaning with or without the WWW prefix.

If you have launched a new website or migrated to a new domain name, a variation of this can also be used to send old website pages to the new ones.

How to block Google showing a page or following links

Given the complexity of your requirements, there may be scenarios where you;

  1. Can’t or don’t want to use a canonical tag to promote the original source to Google
  2. Need to block Google from showing one of your web pages in its search results
  3. Want to block admin or other private parts of your website from search engines
  4. Don’t want Google to follow links to a given web page
  5. Have exhausted other options? 
stop robots.txt

If your Content Management System allows you to modify these values per page or access the code, you can tell all search engine robots to not index and not follow links on that page.
The below ‘meta’ tag is placed between the <head> tags of the page in question.

<head>
<meta name= “robots” content= “noindex, nofollow” />
</head>


You achieve a similar outcome by using the Robots.txt file located in the root folder of your website. The tag below stops all search engines from showing: 1. Duplicate article 2. wordpress admin 3. internal pdfs in its search results.

User-agent: *
Disallow: /blog/duplicate-article.html
Disallow: /wp-admin/
Disallow: /pdfs/


Both these methods have worked well in blocking Google from indexing blog posts that were using syndicated articles also shown on many other websites.
This allows you to promote the articles and knowledge through your newsletter and build credibility while blocking indexing of this information and blocking penalties from search engines to your website.

Get the most from your meta titles and descriptions

We’ve addressed the most critical issues with duplicate content on and off your website and presented solutions. However one of the most important and often missed culprits is the “meta title” and “meta description” tags hidden in your pages and used by Google in its search results.

These tags need to be unique, in-line with that pages content while at the same time attractive and appealing to your audience. If left empty or generic, Google’s algorithm will decide what it shows which is less than ideal.

Here’s an example of how this impacts a leading brand’s Google listing;

serp example

There is hardly any information about oven sizes or features and instead Google has filled this gap with default “Filter this list” style gobbledegook.
This is a lost opportunity to attract a client by connecting and communicating key information.

Size wise your page title should not exceed 70 characters and try keep your meta description under 155 characters. For more info read our title and meta description post.

A word of caution - while sending the right signals to Google will result in a more search friendly, focused and geared website, when using redirects, meta robots or robots.txt ensure you always double check the entries and check that your pages work. Applying these incorrectly can result in Google blocking your site pages from its index altogether.

Once you get the hang of it, it’s really quite simple and the long-term rewards are excellent.

What hides underneath or alongside a web page design can cause great pain or gain. Using a third party tool to review and validate everything is AOK is a safe bet.

If you would like more information, or assistance in taking these essential measures for getting the most out of your website, feel free to trial our system, join our tips and trends newsletter or get in-touch.

What's your website hiding from you that Google doesn't like?

Expose and understand those technical glitches holding your site back in Google.

LIMITED TIME FREEWEBSITE HEALTH CHECK

Get your social on

 

0 comments

There are currently no comments, be the first.
Leave comment



Enter security code:
 Security code

Search Learning-Hub

Categories

Criticone Newsletter

Online marketing is changing, We keep you updated!

Receive our sales free monthly newsletter with marketing trends and tips helping you uncover online opportunities.

What's your website hiding from you that Google doesn't like?

Expose and understand those technical glitches holding your site back in Google.

LIMITED TIME FREEWEBSITE HEALTH CHECK