Before you start thinking about in-depth marketing strategies to get your site in front of more people, building links to increase your domain authority, improving your product descriptions to increase relevancy or any other time consuming marketing tasks, make sure that search engines can find and understand your web pages.

You can have the best website in the world, selling the best products but if search engines like Google can’t find, read and understand your site content then your pages aren’t going to appear in search results. For anyone running an online business it is well worth learning the basics about how Google works as it will make it easier to understand how and why things can go wrong.

Does your site have any issues?

There is a lot of technology that sits behind Google’s almost faultless ability to provide us with exactly what we are searching for but when it comes to organic search results (i.e. results that don’t include any paid services) the process can be broken down in to four main areas.


You may have read or heard about search engine Crawlers, Bots or Spiders. These are all terms for the same type of computer programs that companies like Google use to find content in the internet. Googles crawler is called Googlebot. Search engine crawlers will visit a web page then split apart and visit all other pages that are linked to from that page. This process is repeated for every new page encountered, very quickly jumping from website to website spreading out across the web like spiders. One way to look at crawling is that your webpages are like train stations and Google follows the tracks linking each station together.

Crawling is a very effective way for Google to find new content but they also rely on website owners to upload maps of their websites to help them get started and to help them understand which pages are more important than others.

Crawling problems can occur in several ways.

A very common problem is that spiders get accidentally blocked from crawling pages. There are valid reasons why you may not want Google to crawl all pages on your site. ‘Shopping Cart’ pages for example add no value to search results and you probably don’t want your site admin area log-in page to appear in the results either.

Robots.txt File

Crawlers can be blocked by placing instructions to search engines within a file called robots.txt which is hosted along side your website files (all websites should have a robots.txt file). Sometimes developers make mistakes when adding instructions to this file or some code can be left in the file by mistake when a website launches.

NoFollow Tags

Another method Google not to follow any links that appear on a specific web page is by placing an instruction on the webpage itself.

Orphaned Pages

A problem that happens on some sites that rely heavily on an on-site search feature for visitors to find content/pages is that there is no navigation that links all of the site pages together so Google can’t find some pages. Essentially it is like having a train station with no tracks linking it to other stations. These pages are referred to as orphaned pages. A good example of this would be a car parking website where you can search using your location or postcode to find a car park but there are no navigation links that would get you to that page.


Once Google has found all of the pages possible via crawling and reading your sitemap.xml file then the next stage is to read and understand these pages. Google does this by taking a copy of the pages and storing them on their systems. This process is called ‘Indexing’. You will often hear people referring to ‘Searching the Web’ when they use Google but when you search you are actually searching the Google ‘Index’ rather than the web itself. This is a very important fact to remember as Google doesn’t always decide to include all of the pages that they find when they are crawling.

Google may decide not to include pages from your site within their index if the content on those pages is ‘Thin’ or ‘Duplicate’

Thin Content

Thin content occurs when you have multiple pages on your site that are very similar and that have no depth of content. An example of this would be an eCommerce website that lists products and those products have a very short (i.e. one sentence descriptions). From Google’s perspective these pages will look virtually the same and provide no real value to their search results.

Duplicate content

Duplicate content occurs when the same text content is used on multiple web pages. This could be content duplicated solely on your site or content that is contained across more than one website.

An example of duplicate content across your own site is where you might have multiple products using the same product description with just a slight variation in the product:-


Leather iPhone Case Red


Leather iPhone Case Blue


Leather iPhone Case Black


Leather iPhone Case Pink


Duplicate content across more than one site can happen when retailers list products using the manufacturers product descriptions or one site copies another site.

‘Thin Content’ and ‘Duplicate Content’ are the biggest challenge that most online stores face when listing their products.

Technical issues

Technical issues can also cause pages not to be added to the Google index. In a similar way to crawling, it is possible to instruct Google not to index certain pages on your website. Ideally you want Google’s index to contain just the important pages created by your website such as products and content. Many other pages such as search results pages or checkout pages should be blocked from being indexed.


Sometimes these instruction tags can be added to the wrong pages or even all pages of the site by accident causing your site to effectively disappear from the index.


Where there is money to be made online some people will always try to beat the system and trick their way to the top of search results. In the past, one way of tricking Google was to try to increase the relevancy of a web page by repeating the same keywords over and other. This obviously didn’t look natural to a site visitor so some people then started to write with white text against a white background. This hasn’t been effective for gaming Google for a very long time and it is very likely that Google would quickly identify this happening and refuse to add your site to their index.

Penalties can also be applied to domains that have tried to build lots of links to their website in a bid to improve their rankings. It is not easy to build links, in fact the best way is to earn them by creating and publishing really good content so the short-cut that some people try is to build links from sites like ‘SEO Link Directories’. These should be avoided at all costs.


When a user enters a search term in to the Google search page then Google’s aim is to provide the best quality, most useful result that it can. This is by far the most complex part of the process with hundreds of factors being taken in to account. A much more expansive description of this process is covered under my previous post ‘Understanding What Google Wants


The final stage is for Google to display their results. If you have done a good job of making your site accessible and interesting enough for Google to list in their results then ensure that you also take every opportunity to grab the attention of the searcher by making your listing stand out.

You would have most likely already taken the time to make your page titles as relevant as possible but don’t forget that it is ultimately a person who will make the decision to click or not to click so also take the time to make them appealing.

Write your title and description as if they were a magazine advert rather than just information about your product.


Hopefully you will have a good idea of the number of pages your website should have indexed. If you don’t then you can get a rough estimate by adding the number of active products that you have listed with the number of blog posts and the number of content pages (T&C’s page, privacy policy, about us page, etc) Then adding 50% to that number.

An indication that you may have a crawling or indexing problem can be identified by comparing the number of pages that you think should be in the index with the number of pages that Google is telling you that it has indexed. The most accurate indication of this will be reported within your Google Search Console account under Google Index >>> Index Status. If you don’t have easy access to Search Console then you can carry out a site search by entering your domain name in to a standard Google search in the following format ‘’.

If the number of pages being reported is much lower than you expected then this indicates that there may be a problem and that Google either can’t find your pages or is deciding not to index them.

If the number is much higher then it is possible that you have a technical issue where your website is creating web addresses that are being indexed when they should possibly have been blocked. A good example is where a product search box on a website creates an unlimited number of web pages that should be blocked from being crawled/indexed. We once had a client who had just 400 products on their site but Google was aware of more than 8-million pages.

If you experience a sudden drop in traffic to your website then the very first thing that you should do is to check that you aren’t blocking search engines from crawling your site. You can do this by creating a Google Search Console account and using the Google’s Robots Testing Tool. If you aren’t able to test your site then you should get some help as soon as possible to find out why your web traffic has disappeared.