It’s probably obvious to everyone, but when we perform online searches, we’re looking at the past.
Published in the past
This is probably the most obvious factor: for search engines to display results, they have to know about them… which means they have previously crawled them and their sources. Of course, this discovery latency can be reduced by indexing and crawling faster, or using mechanism such as PubSubHubbub.
Reducing the latency is one of the challenges of the search engines. When it’s small enough, search engines can become prospective. This means that they can keep track of what the user is looking for and send them new results. This is what grep.io does, for example.
Many signals used by search engine are external: they’re not pulled from the documents themselves, but from other documents or actions. For example, the PageRank algorithm relies on the number of links to each document: they’re considered as votes. These votes too, are from the past.
If a new document is published now, it’s unlikely that it will rank high until at least other documents link to it and people have selected it from search results pages relatively more than its neighbors.
This is another challenge for search engines: surfacing behavioral trends faster, as well as provide different time scales when it comes to these signals.
For example, when searching for “Feed API”, search engines could obviously show the page which has the most links with this term over the last 10 years… or the page which received the most links in the last 24 hours, or even the largest acceleration (number of recent links compared to its average) in the last day.
There can’t be only one
The current domination of Google on this market makes us think that there’s only a single approach to building search engine. Not only we should challenge the business model, the interface and experience, but there’s also ways to challenge the signals and how they’re interpreted to provide other results to the users.