Grim's web corner

Notes, essays and ramblings

Sorting things, rank-aggregation (beginner's approach)

Last year, a friend of Xavier Van de Woestyne asked him a question for which he had some inklings but lacked precise answers. Given his professional commitments, he requested that I provide a succinct response on his behalf. This article seeks to outline the issue concisely and offer a considered solution. As with many problems, knowing the name of the problem makes it much easier to find a solution. The goal of the response I provided (and, by extension, of this article) was to offer a simple solution that could be easily integrated into an existing database. For the sake of clarity (and conciseness) in the article, performance considerations are not taken into account.

# open Article_lib.Rank_aggregation ;;
# #install_printer pp_set ;;

Context

Let’s imagine a list of products where users can vote using up and down options. How might we go about sorting these results in an intelligent manner? Here is an example of our data source: we have the name of a product, an amount of voters and a result. The result is the percentage calculated between the number of positive votes and the number of negative votes :

# dataset ;;
- : data list =

Name		Voters	Result	Upvotes	Downvotes

Product A	  100	32.0	   32	   68
Product B	 1000	89.0	  890	  110
Product C	    2	100.0	    2	    0
Product D	 3000	51.0	 1530	 1470
Product E	  759	69.0	  524	  235
Product F	  590	20.0	  118	  472
Product G	  100	25.0	   25	   75

This kind of modelling of data by expressing the popularity of an entry is fairly common; it's the kind of representation on which the sorting of Reddit messages is based, for example.

Problem sketch

Our exercise is relative. We want to provide a function capable of sorting products in a relevant way. Obviously, it is difficult to give a systematic definition of relevance for any context, but let's look at this example. Let's say I want to sort products "from most popular to least popular" (in descending order of popularity). I could very naively sort my products using the result, which is the percentage calculated on the basis of up/down votes.

# sort (fun a b -> Float.compare b.result a.result) ;;
- : data list =

Name		Voters	Result	Upvotes	Downvotes

Product C	    2	100.0	    2	    0
Product B	 1000	89.0	  890	  110
Product E	  759	69.0	  524	  235
Product D	 3000	51.0	 1530	 1470
Product A	  100	32.0	   32	   68
Product G	  100	25.0	   25	   75
Product F	  590	20.0	  118	  472

We can argue that our result is objectively good because the product C which received 100% positive votes is in first position. There are many cases in which this answer would be acceptable. However, at a time when recommendation engines are building streams of consumption (like Netflix, Spotify, etc.), the organisation of this kind of vote raises some interesting questions, for example:

We can arbitrarily consider that in the case of the first question, product C ranks lower than product B, which has many more positive votes. On the other hand, in the second question, even though product D has far more positive votes, the large number of negative votes mitigates against its ranking. This kind of question raises a set of more general questions about the relationship between upvotes and downvotes and is broadly referred to as rank aggregation problems which is widely documented in the statistical and Bayesian interpretation literature.

The problem is also extremely well summarized in the introduction of the article "How Not To Sort By Average Rating" by Evan Miller.

Some bizarre solutions

In drafting his question, Xavier suggested a few avenues, a little strange, but which illustrate the versatility of the problem (the first proposal being that using only the percentage). All the solutions relied on changes in weighting and didn’t lead to results that one might instinctively consider valid. However, I would like to share with you his latest proposal, which I found quite original.

Firstly, weighting slices are defined. The idea is to allow you to use the percentage only if you are in the same slice.

let slice_of data = 
  let p = result data |> Float.to_int in
  if p = 100 then 8 else p / 8

We assume that if we have a perfect score (100), we project the result into the last slice, 8, so that the score can be compared with the results between ~90 and 100 (inclusive). Now we can refine our heuristic so that it only produces a fine comparison when scores are present in the same slice:

# sort (fun a b -> 
    let slice_a = slice_of a and slice_b = slice_of b in
    if Int.equal slice_a slice_b then
      let a = a.result *. 
        (Float.of_int a.amount_votes *. (a.result /. 100.0))
      and b = b.result *. 
        (Float.of_int b.amount_votes *. (b.result /. 100.0)) in
      Float.compare a b
    else Int.compare slice_b slice_a) ;;
- : data list =

Name		Voters	Result	Upvotes	Downvotes

Product B	 1000	89.0	  890	  110
Product C	    2	100.0	    2	    0
Product E	  759	69.0	  524	  235
Product D	 3000	51.0	 1530	 1470
Product A	  100	32.0	   32	   68
Product G	  100	25.0	   25	   75
Product F	  590	20.0	  118	  472

The result seems correct, however, the calculation of the pivot to define the slices seems a little arbitrary (mainly because this approach is problematic where there are strong proximities with the boundaries of a slice) and it would be interesting to see what the big ad sites offer.

Some preliminary conclusions

In the light of our few small experiments, we can quickly draw a number of interesting pre-conclusions. To avoid products with too many (potentially negative) votes, i.e. 10,000 positive votes against 20,000 negative votes, we probably need to weight the positive votes more heavily (in other words, penalise the negative votes) by means of exponential curves.

In our example of slices, the indexing gave some exponential differentiation, but in a very ad hoc way. To break these ad hoc encodings, I reread certain chapters of the book Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference.

The Shopify approach

The book provides an extremely precise answer to a sorting problem based on reviews used at Shopify (a toolbox offering advanced functionalities for implementing e-commerce platforms), and here is a fairly free implementation. Firstly, we define a function to assign a score to an input (a product). We will then use the comparison of these scores to sort our dataset.

let shopify_score data = 
  let (total_up, total_down) = up_down data in
  let up = total_up |> succ |> Float.of_int 
  and down = total_down |> succ |> Float.of_int in 
  let total = up +. down in
  let base = up /. total 
  and coef = 1.65 *. sqrt (up *. down /. (Float.pow total 2.0 *. (total +. 1.0))) in
  base -. coef

Now we can use our score to order our dataset. As you can see, the result seems reasonable:

# sort (fun a b -> 
    let a = shopify_score a
    and b = shopify_score b in
    Float.compare b a) ;;
- : data list =

Name		Voters	Result	Upvotes	Downvotes

Product B	 1000	89.0	  890	  110
Product E	  759	69.0	  524	  235
Product D	 3000	51.0	 1530	 1470
Product C	    2	100.0	    2	    0
Product A	  100	32.0	   32	   68
Product G	  100	25.0	   25	   75
Product F	  590	20.0	  118	  472

To (try to) confirm the relevance of our result, we can slightly modify the number of positive votes for our `product C', to see how it grows in the list of reviews:

# sort ~dataset:(update "Product C" ~up:20 ~down:0) 
   (fun a b -> 
      let a = shopify_score a
      and b = shopify_score b in
      Float.compare b a) ;; 
- : data list =

Name		Voters	Result	Upvotes	Downvotes

Product C	   20	100.0	   20	    0
Product B	 1000	89.0	  890	  110
Product E	  759	69.0	  524	  235
Product D	 3000	51.0	 1530	 1470
Product A	  100	32.0	   32	   68
Product G	  100	25.0	   25	   75
Product F	  590	20.0	  118	  472

From my point of view, which is clearly not that of an expert in "presenting ordered data", I find the results rather convincing, and they provide a good basis for sorting, which could be altered by taking into account temporal data (for example) or additional criteria. It's possible to go much further, but if we take the origin of this response in context, providing a function capable of ordering product lists according to reviews (up/down vote) taking into account the number of votes, I found this solution to be very useful. One can quickly fall down the rabbit hole (for example, through the Binomial proportion confidence interval), discovering increasingly sophisticated sorting solutions, and one of the challenges of this exercise was to remain concise while seamlessly integrating with an existing sorting function.

Conclusion

Sorting data based on reviews is a complex task that likely requires advanced domain knowledge. For instance, the heuristic would probably differ significantly if you were ordering products versus posts on a social network (like Reddit). However, it's an enjoyable challenge (that lends itself quite well to literate programming).

In fact, before settling on the Shopify approach, I was very intrigued by the methods used by Reddit to calculate message scores. It’s a fascinating topic that led me to several interesting resources (for example, this article describing how comments are sorted, written by the author of XKCD, Randall Munroe), which I've included in the bibliography (even though they didn't serve as the basis for this article).

This is the end of this very brief article, which sketches out a somewhat naive solution to what is potentially a much more complex problem. If you have any suggestions for plug and play approaches similar to the one I’ve presented, I’d be delighted to hear them and possibly update this article.

Bibliography

  1. Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference
    • Cameron Davidson-Pilon
  2. Binomial proportion confidence interval
    • Wikipedia
  3. Reddit's new comment sorting system
    • Randall Munroe
  4. Deriving the Reddit Formula
    • Evan Miller
  5. How Not To Sort By Average Rating
    • Evan Miller
  6. Ranking Items With Star Ratings
    • Evan Miller