Digital Marketing

Canonicalisation – Why is it Bad?

Though it is quite a big word to pronounce, the definition of the same is quite simple:

“Canonicalization means the process of choosing the right URL from options.”

For example https://www.sitepoint.com/

https://kilianvalkhof.com/ (notice without ‘www’)

A server isn’t a physical machine, rather is a cascade of several servers residing in a single machine. It can be handled by several machines, all working together to produce a response to the constantly changing load of requests.

Which one works better – www or non-www?

www or non-www?

Once you have chosen one, the choice of it gives you the freedom to decide which one shall be your canonical location. The secret lies in sticking with it. It won’t only be making your website visible in all places important, but also is consistent to the users and search engines. This would include linking the chosen domain and sharing links in the same domain and wouldn’t be that tough a job if you’re using relative URLs.

You can have both and it is important that you’re coherent and consistent with the official domain, since it is the canonical name. All the absolute links should have it and can be used while working on other working domain. HTTP allows the double domain to work and provide the expected pages. Choosing one of the domains as the canonical one is important and there are basically two techniques to make the non-canonical domains to work still:

There are different ways to choose the canonical URLsSection:

Using HTTP 301 redirectsSection

Using HTTP 301 redirectsSection

In this case, you need to configure the server receiving the HTTP requests (most likely it is the same for www and non-www URLs) and responding with the adequate HTTP Using HTTP 301 redirectsSection301 response to requests for the non-canonical domain. Redirecting the browser to access the non-canonical URLs to their canonical equivalent, for example: if you’ve chosen to use only non-www URLs as the canonical type, then redirecting all www URLs to the equivalent URL without the www seems like a good way to go about it!

Example:

1) Receiving a request for http://www.example.org/whaddup when the canonical is just example.org

2) On this, the server answers with a code 301 with the header location: http://example.org/whaddup

3) The client issues a request to the canonical domain: http://example.org/whatddup

Using <link rel=”canonical”>Section

Using <link rel="canonical">Section

To add a special HTML <link> element to a page indicating the canonical address of a page. Having no impact on the human reader of the page, it also tells how search engine crawlers where the page actually lives. In this way, search engines avoid indexing the same pages several times, leading to it thus considered as duplicate content or span and even removing or lowering the page from the search engine result pages. Adding such a tag, serve the same content for both the domains thus telling the search engines which URL is actually canonical.

The dreadful consequences of bad canonicalization:

Suppose you’ve opened a blog site at http://www.ohmyblog.com/ but it is also active at http://ohmyblog.com/ and http://www.ohmyblog.com/index.html

So what would it lead to? People would find the home page at all the three versions! Though they won’t be able to know the difference the search engines would obviously known as Googlebot recognizes all the three URLs as three different pages and has SEO-hurting effects:

First, you lose link authority and if the visitor comes to ‘www.ohmyblog.com’ and links to that page, and when the visitor lands of ‘ohmyblog.com’ and links to the URL and when visitor 3 lands on ‘www.ohmyblog.com/index.html’ Googlebot links these three different home pages and applies 1 ‘vote’ to each one. These three links could have sent three authoritative signals to Googlebot for the new website’s homepage. Splitting into three weaker separate votes for three different pages not only decreases the rank but also loosens the link authority.

Secondly, search engines don’t crawl to your website as it should, since the search engines allocate resources for each crawl and no one exactly knows how it is safe to say that Googlebot would wander around each site until it has found every page, but in reality, it instead gives up and leaves. And that’s where everything starts going haywire since it should always crawl to unique pages rather than similar pages.

Leave a Reply

Your email address will not be published. Required fields are marked *