5 min read

When you open your search engine and enter a query, you’re not searching the entirety of the internet. You’re only searching what Google, or your other favourite search engine, has indexed.

It’s estimated that there are more than 130 trillion pages in Google’s index now. Google may need to consider thousands (or even millions!) of potential search results that could be a match for any given query. On top of that – it has to do all this instantly.

So, how do Google and other search engines accomplish this? Here, we’ll walk you through how search engines discover URLs, what they do when they find them, and why all this is important for ranking.

What is a search engine and how does it work?

A search engine is a software program that helps answer people’s queries and find different types of information online.

Search engines view and organise the information they find in three main steps:

1. Crawling

In this first step, the search engine collects as much information as it can about your web page. There are hundreds of billions (or perhaps trillions) of web pages to be crawled, and depending on many factors, such as popularity or structure, your page might be crawled regularly.

2. Indexing

Next, if the search engine concludes the page will be of value to searchers (such as by providing original content or answering a query), it stores it within a massive digital library. When the page is indexed, it will be categorised based on all the information collected during the crawl step. This allows the search engine to retrieve it quickly during search.

Each time your page is crawled, it may be reindexed and potentially recategorised if you have added new content, keywords, or other optimisation elements.

3. Ranking.

Finally, search engines dynamically rank web pages in response to search queries. They aim to provide the most relevant search result for any possible query a user might have.

How do search engines crawl?

Search engines crawl using a type of bot called a web crawler, sometimes called a spider or a search engine bot. As the bots encounter hyperlinks on a webpage, they may follow them and collect information about the next pages.

You can imagine these bots as spiders crawling through the links across your site to access every corner of it, collecting information and studying how the pages are interconnected. This is one of the reasons it’s so important to have good internal linking structure across your website.

How do search engines find new pages?

Search engines typically find new pages through hyperlinks on web pages they have already seen before. These hyperlinks may be found on other websites (backlinks) or from internal links on your website.

They may also see all the pages across your site via the sitemap you provide. A sitemap is a blueprint of your website that directs search engines to where the important pages of your website are.

Search engines also find new pages through submission to Google Search Console. When you submit your URL into a URL Inspection Tool, it is entered directly into their crawl queue.

What do search engines see when they crawl my webpage?

Text

Search engines try to understand the content via the language on the page (including keywords). It understands text partly through context, so it is important to consider the query you’re addressing with your text.

Page Structure

Crawlers use subheadings and other structural elements to gather further information about how the content is organised and what the page may contain.

Code associated with non-text elements

You can also indicate what non-text elements like videos and images contain in the page code. Images can be described to the search engine by adding descriptive text within an alt tag, for example.

Meta data

These are additional snippets of text associated with your web page that indicate its contents. Meta titles and descriptions are common types of metadata that help search engines understand what’s on your webpage.

Structured data markup

Structured data is a standardised, machine-readable format that helps Google and other search engines understand what’s on your page even more reliably.

Page Experience – Rendering

Crawlers will also attempt to render your page and see how (and what) loads. This can indicate what kind of page experience a visitor might have, how much information they’ll see, and how quickly. The more accessible the information is to the bot, the more likely it will be served in search.

What is indexing?

The search engine can index once the crawling process is completed. If the page is considered valuable for users, it will be added to the massive database which is designed to serve search queries.

As you can imagine, the more completely the crawler is able to categorise and “understand” the information, the more likely it will be served in relevant searches.

Will my web pages always be indexed?

Not necessarily. Search engines aim to index the best quality pages that are likely to be the most relevant for searches. If your page is unlikely to be served in a search, it’s unlikely to be indexed.

Why might pages not be indexed?

The page is new

The search engine may not have seen it yet. This is common with new content.

Not enough text

Sometimes search engines cannot understand the purpose of a page if there is not much text, or if the text is not relevant to any question or query.

Low quality content

Sometimes the content will simply be assessed as not high enough quality to justify indexing.

Duplicate content

If content is too similar to content on another site, or even another page on your own website, search engines will consider it duplicate content and not index the page. This is why archive pages or blog author pages are often not indexed.

Bad rendering

If the page rendering is bogged down by javascript elements that prevent the search engine from seeing your content, it can certainly impact indexing. Likewise, if a page takes too long to load, search engines may not index or rank it well due to poor user experience.

Other technical issues

Broken or redirected links will not be indexed, for example. If an indexed page begins giving 4xx or 5xx errors persistently, there’s also a chance that it could be de-indexed.

How does a search engine rank results?

Search engines use complicated algorithms to present the most useful results in search. These algorithms are designed to:

Serve the most relevant queries

Of all the items indexed, search engines are aiming to rank the most relevant ones to the searcher’s query.

Interpret search intent

Search engines are not just trying to answer your question, but to understand exactly what it is you want to know. There is often intent behind search terms that search engines aim to understand and address.

Tailor the results to the searcher

Search engines may offer different results depending on your location – such as if you’re looking for local services – or even based on your previous search history.

It’s important to note that these algorithms shift and change as search engines refine their systems and processes, in order to improve how they understand intent and parse content.

How Do Search Engines Work?