Elasticsearch distance scoring

09 Dec 2016, Fabian Steeg | 🏷 lobid-organisations 

In our organisations directory beta we were sorting results by distance to the user, if they share their location. Adrian noticed that this yields confusing results, since the relevance ranking is completely overridden by the distance sorting. A quick research in the fabulous Elasticsearch documentation revealed that we actually want to score results by distance, not sort by distance.

The basic solution was straight-forward: implement a scoring function that scores the results returned by the regular query based on their distance to the user’s location. We did encounter one issue though: not all our organisations have geo coordinates, and for those with a missing location.geo field Elasticsearch gives a perfect score.

Elasticsearch might provide a way to specify a value for missing fields in the future, but in the meantime, there’s a documented workaround that works fine. For our specific setup though, the fact that the workaround includes a script led to some more overhead: we run Elasticsearch in embedded mode and disabled Groovy scripting. So instead of a simple line in the JSON, we implemented it as a native script in Java. It’s registered when creating the Node.

Once again I’m impressed with all the stuff Elasticsearch offers, it’s a fantastic tool.

Comments? Feedback? Just add an annotation with hypothes.is.