Understanding Google Panda algorithm updates
In the last blog post, we discussed about Google Penguin updates. Today, let’s try to understand Google Panda update. The first release of Google Panda was in February 2011. While penguin updates went after back links, the main purpose of panda updates are to watch the quality of website content.I believe by far the biggest affect in search results were made by Panda updates. Panda updates fights scraper sites and content farms. A scraper essentially copies content from various websites using web scraping. Web scraping can be done by simple manual copy and paste or to the extend of using sophisticated web scraping software. On the other hand, content farms are websites with large volume of low quality content often copied from other web sites or being copied in multiple pages of the same webs site. Content Farms usually spend efforts to understand popular search keywords and spin out content based on those phrases.Thus Panda update was also called Farmer Update :)
Unlike Penguin updates, Google Panda updates are more frequent. It is expected that the algorithm has seen close to thirty refreshes. As mentioned about quality of content is the main focus of these updates. However quality doesn’t confine to just the written quality of content. Some of the major factors that could determine the quality of a website are –
Quality of written content
Are you an authority about the topic. Is your content original? Will a reader benefit out of the written content you have put in your website? For example, this will be of utmost importance in case of e-commerce sites where there is a tendency to put minimal, duplicate or manufacture wording as product descriptions. These are the factors that determine the quality of written content.
Are you using the same content in different web pages of your site (even if with minimal wording changes), then you are risking of getting affected by Panda. One example where this is affected unexpectedly is blogs. For example if you are using both tags and categories in your blog, this could be considered as a duplicate. But don’t worry, techniques like using noindex tags will help overcome this issue. Another method is to use canonical tags to tell Google which page to index. Duplicate content is not just limited to your website, it could be outside duplicate content too. For example – micro-sites or your own blogs where you either tend copy paste the same content or make minimal changes. Here is a good Moz article that describes various duplicate possibilities in the eye of Google.
Let’s take an example to understand better the above to points. Consider the book – Half Girl Friend in Amazon or Flipkart. As you can see the product description is kept different in both. Both are different from the description Chetan Bhagat gave in his website. Also we could reach the product page in Amazon or Flipkart in multiple navigational paths or URLs (like category, search direct etc.); but google indexes like the main product page.
Ad to content ratio
One of the elements used by the algorithm to test the quality is number of ads shown. While there are no optimal number, its best to be judicious while selecting the number of ads shown. While there are no optimal ratio for this, keeping the number of ads to minimal always keep you safe. It’s expected that algorithm has really matured to understand the ideal duplicate and ad to content figures.
Technical aspects that differentiate the quality of a website including performance metrics such as page loading time, UX aspects like navigational ease & design and factors like proper re-directs, 404 pages and so on.
Finally, here is a Google Webmaster central blog post on what Google thinks a high quality website mean?
While the overall check of panda updates was on quality of the websites, third version of panda updates (specifically 3.3 onward) focused also on few other aspects. One improvement was around local search results. This update concentrated on improving on showing relevant search results from a proximity of user/content point of view. Another improvement was related to link evaluation(usage of link characteristics like wording, # of similar links etc to understand the content of the linked page). This was taken to an altogether different level with penguin updates. Freshness of the website was also considered in this update.
Google panda updates starting 3.0 were very regular that it became difficult to specifically focus on each update. Industry experts even opined that the dead Google Dance is back with these regular refreshes. Now the panda refreshes are tracked based on consecutive numbers rather than following a versioning system. From late 2013, Google started to include panda updates with index algorithms; thus becoming part of normal algorithm changes Google makes through-out the year. Panda 4.0 was released in May this year. This update specifically improved on how Google views authority and originality of articles in the web. Here is a good case study on this from Search Engine Land. Another affect out of the Panda 4.0 was on how PR websites are ranked. Many PR websites were affected and had to re-define their publishing guidelines. The latest one was 4.1 or 27th Panda update in September 2014 which is expected to help small websites with original content. Also those websites doing recovering actions are expected to benefit from this update.
One thing we can understand by studying about these updates and their history is that all SEO practices revolve around these updates (like link building, content marketing and so on…)