How do Search Engines work? Crawling, Indexing & Ranking

When I started out as a blogger, I had no idea about anything related to blogging, let alone “how search engines work” I mean, it’s obvious. There were very few reliable resources where I could learn from.

I’m not defending myself, but the point I want to make is, that I lacked knowledge. I didn’t focus on the importance of knowing the working of search engines until it was too late.

Had I taken the time out to dig the internet to learn more about it, I would’ve been a different person as a blogger. Very few people were even talking about it because it wasn’t public information, yet.

This ends today. In this post, I’m gonna be sharing the exact pieces of information that you need to know that’s relevant to you, & you know about search engines to grow as a blogger.

Let’s get into it.

Why you should care about how search engines work?

Search engines have come a long way & it has a very long way to go. Unlike the early 80s, the internet is not more considered a bubble. Heck, it’s a necessity.

Imagine a world without the internet. You can’t make payments, you can’t book cabs, you can’t take those office meetings, everything will shut down.

93% of online experience begins with a search engine, 8 out of 10 cases, it’s Google search engine.

Google gets over 6 billion searches every single day. People are constantly looking for information to either solve their daily life problems or make an informed decision. This means, there’s a huge chance for your blog to get a ton of traffic every single day.

Now, it’s not that straight and easy to get traffic from search engines like Google & Bing. Thanks to the drastic evolutionary change it has gone through all these years.

Earlier it was super easy to get traffic from Google, not too far, just a couple of years ago. It’s getting tougher day by day because more & more people (4.6 billion internet users) are searching & there’s a huge chunk of content that Google has to sort to find the relevant information for the end-user.

That ‘relevant’ information is what causes Google to get stricter with the rankings & it’s getting tougher to get traffic from Google, or any search engine for that matter.

To rank the search results based on relevancy, Google uses over 200 ranking factors or signals.

Here’s a 10,000 feet overview of how search engine by none other than Matt Cutts himself:

Since it’s this complex to even be eligible to compete for ranking from search engines, that’s why it’s important to learn how search engines work. Because if you don’t know how it works, how are you going to make it work for you?

So, let’s get to it right away.

Understanding three major components of search engines

I don’t intend to say that there are only these many components of search engines. If you’re a nerd who wants more, check out this whitepaper of the times when Google was a college project called backrub.

The three components are the only ones that we should care about. Here are the three components:

#1 Web Crawler

To serve the right results, search engines should have something stored already so that it can evaluate in advance. Basically, the web crawlers gather information from billions of webpages Google crawls every single day.

A web crawler is software that follows the links. From one page to the other, & the next one & then the next one & so on. The web crawler never stops crawling the web pages. Picture it as a robot whose only job is to dig the web, find links & add it to Google’s database.

When you search on Google, you’re searching from a fraction of what the web has to offer. Google keeps adding more and more links to the database so that when users search for something, the search engine can index the results from its own database.

Picture a trader maintaining a stock of grains in his godown so that he can trade that for money when the customers ask for it. The trader keeps the stock full based on the demand.

Here, the trader is our beloved Google, the end-user is you & I. The stock present in the godown is just a fraction of what the whole world has to offer. The web is so huge that Google estimates it has been able to crawl only 4% of the web in almost 3 decades of operation.

The only job of the web crawler is to keep crawling new URLs & add them to the database called the knowledge graph or the index. Google begins with a set of known pages & a list of web addresses from the past crawls. Then the crawler follows the links one by one.

The crawler also uses the sitemaps that you as a website owner should share with Google so that it can send the crawlers to look for new information. These crawlers will jump from one link to the other links & bring the information about those web pages back to the Google server.

Now, who takes care of organizing the information that’s flowing in at this rate?

Let’s check that out.

#2 Index or Knowledge graph

This is where the information get’s organized for faster indexing in the SERPs.

Organizing the web
Source: Google

It’s been the mission of Google to organize the web since the beginning. The index acts as a central filing system. This is where all the information about the crawled webpages is stored & served when a search is made by the end-user.

What information is stored in the index? To serve the right search results to the end-users, Google needs to know & understand the webpage. That can be done by looking at the keywords the webpage has.

This doesn’t mean mentioning keywords will help you rank faster. Google looks at the overall content & tried to decipher the meaning of the webpage.

The heading tags, meta description, title tags are the highlights where Google looks to get a gist of the webpage. You should use those tags as labels to make it easier for Google to understand your web pages, nothing more than that.

Using keywords doesn’t guarantee that your page will rank, let alone stuffing a webpage with keywords. It just helps Google understand the page, nothing more. The system is designed to continuously organize the information that the web crawler brings in.

Once the end-user makes the search, the index will find the relevant information (& subsequent webpages) and serve it to the end-user as SERPs.

As I’ve mentioned earlier, Google has a search index containing trillions of web pages. You can’t find what you’re looking for without help from Google with sorting based on relevance.

Who sorts that? The algorithm.

Let’s talk about that now.

#3 Algorithm

The index gives a bunch of blue links from the database or the knowledge graph, but it’s the algorithm that personalizes those results based on several factors like keyword type, location, the language of the search & more.

The algorithm is designed to do the sorting every time someone search on Google. Remember how many times users search on Google every single day? 5.5 billion searches on average.

The algorithm ranks the search results fetched from the index based on several factors & makes it relevant for that user.

The results I get for a keyword in India would be different for the same keyword in the UK or Australia. “What’s the weather” is a classic example of this.

Google keeps updating the algorithm (More than twice per day) since billions of users search billions of times per day, the system needs to be kept efficient so that the algorithm can deliver highly relevant results. The efficiency of an algorithm is calculated based on relevance.

The algorithm is so complex that each query weighs each factor differently. To keep up the bar of quality, Google has Quality Rater Guidelines that ensure the quality.

That’s all you have to know about the algorithm. At least to begin with. This is how the search engine delivers results to the end-user.

Now that you know about this, let me go a little further and share about the types of SEO. This is to ensure that you optimize your content for the right need.

Basics of SEO: Optimize the webpage for better indexing

Though the working of the search engines is pretty simple. Gather information in advance, organize it & serve when asked for based on relevance.

With that said, here are some search engine optimization techniques:

On-page SEO

Remember I mentioned various tags that tell the web crawler what the page is about? Those practices are a part of on-page SEO.

On-page SEO is a factor that can make or break your plan to get organic traffic from search engines.

Here are some on-page SEO techniques to help you win the game:

  1. Produce high quality content
  2. Optimize title & meta tags to help the search engines about your pages
  3. Create high quality content that addresses the search intent of the end user
  4. Optimize the headings for high (or at least ever improving) CTR
  5. Image optimization for traffic from Google images
  6. Structuring the URLs for better readability & SEO standards
  7. Interlinks for higher session time & lower bounce rate
  8. External links to build authority in terms of SEO link juice (Crawl budget)
  9. SEO Audit for keeping everything in check
  10. Optimize content for retaining the user on page.

While On-page SEO is a very deep & advanced topic, I shall cover that in an upcoming dedicated post. For now, let’s keep it low & basic.

Off-page SEO

Off-page SEO is everything you do outside of your blog to bring traffic to your blog. It can be guest blogging, social media posts or YouTube videos, etc.

The idea is to make more & more people aware of your blog. So that people either directly visit the blog or you can leave links in other sources so that traffic flows from there.

The problem with this approach is that most people end up spamming in order to get traffic from the sources. The idea is to repurpose the content so that it not only gets traffic from the source but it will also get a fresh look.

While there are a lot of people indulging in black hat techniques, only to end up getting banned by search engines.

Technical SEO

This is for the nerds who want full control of the blogs/websites. Technical SEO is the advanced practice you do to keep your blog SEO friendly.

This includes:

  1. Optimizing robots.txt,
  2. Creating sitemaps for subdomains (if any)
  3. Optimizing the URL structure & homepage for passing more juice to internal or deep pages.
  4. Schema markups
  5. Structured data for leveraging search engines features

Webmaster guidelines: Make the most out of search engines

Google has to go through a lot of pages & organize them. There’s a huge cost that goes into maintaining this huge library of data. On top of that, Google has to maintain quality for the end-users.

Google has faced a lot of criticisms for the irrelevant results it has on the SERPs. The search engine is still not efficient enough but it’s very strict with whatever efficiency it has.

Here are some basic guidelines to be in the good books of Google:

#1 Google should be able to find your pages

You can’t expect your pages to appear on Google search, if the search engines can’t even find your pages, in the first place. To do that you have to be discoverable.

  1. Submit sitemaps to Google so that it can crawl the webpages to keep it up to date in their index.
  2. Use robots.txt to instruct the search engines not to crawl into infinite space of your blog & waste the crawl budget.
  3. Don’t use internal links to spam. Even though Google is fine with a few thousand inter links per page, but it’s not ideal for a user to see thousands of inter links. It doesn’t help.
  4. Use If-Modifier since header for HTTP. This feature tells the search engines if the content is modified since it has last crawled.

#2 Help Google understand your webpages

  1. Create content that the end users would find useful. It’s high time that we, the content creators focus on search intent rather than keywords.
  2. Optimize title & alt tags for the search engines to understand. Remember, stuffing keywords is a red flag.
  3. All your assets should be crawlable so that it can index for some keywords. Once it starts indexing, you can update the content & improve the CTR and ranking for better results.
  4. Optimize the home page for the crawlers to discover your content & understand it. The most important pages of your blog should be visible by default. Add most important pages in the top menu for ease of access to end users.
  5. Tag rel=”no-follow’ or rel=”sponsored” in robots.txt to instruct the search engines to not crawl advertisements links

#3 Create user friendly pages

  1. The whole point of blogging is to give a great user experience to the end-users. UX is a ranking factor. Give your users just that.
  2. Use good-sized font & background colors.
  3. Use more text than images. If you happen to use images, give proper alt tags in the images so that crawler can understand the image & index it in Google image search.
  4. Optimize the speed by using CDNs, lighter themes, faster web hosting, etc. Use tools like Page speed insights to determine your score.
  5. Have mobile-friendly themes. Use AMP to not only load the pages lightning-fast, but it will also be highly readable on mobile screens. Use this tool to check for mobile-friendliness.

TL;DR: Final thoughts

This brings us to the end of the post. If you’ve made it this far, you’re awesome. I wanted to share this post because this knowledge is crucially important, especially for beginners.

I’ve seen so many questions & people who’re putting a lot of effort to get traffic from Google, without understanding the basics first. How search works are one of the basics, a lot of beginners fail to pay attention to.

This post is for them. If you want to start a blog or already have but don’t get traffic from Google, this post is very important. If you know someone who’s looking for this information, share this post with them.

It doesn’t cost anything to share. I have a daily-ish newsletter, which is absolutely free of cost. I share extended versions of the posts on this blog. It’s a roundup of top posts from across the web.

You can get in touch with me on Twitter or Instagram if you have any specific questions (s). I’d love to hear from you.

I’ll see you in the next one.