Oct 13 2008

Selecting random, weighted records from MySQL

Your web application has become big and evil enough to display marketing messages. You want to show your users a message whenever they login. However the Marketing department gave you 3 different messages and wants you to show them at a 70%-20%-10% ratio, meaning:70% of users will see message 1, 20% will see message 2 and the remainder (10%) will see message 3.

You’ve also been tasked to create the back-end of the message distribution and want to create the simplest query that addresses the Marketing requirements. And off you go:

mysql> describe foo;
+------------+---------+------+-----+---------+-------+
| Field      | Type    | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+-------+
| id         | int(11) | NO   |     | NULL    |       |
| grp        | int(11) | YES  |     | NULL    |       |
| likelihood | int(11) | YES  |     | NULL    |       |
| message    | text    | YES  |     | NULL    |       |
+------------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)

This is your table that holds the messages. They are grouped by ‘grp’ and the ‘likelihood’ is expressed in numbers from 0-100 (0=message is not shown at all, 100=message is shown all the time).

You add some test records:

mysql> select * from foo order by id;
+----+------+------------+---------------------+
| id | grp  | likelihood | message             |
+----+------+------------+---------------------+
|  1 |    1 |         70 | Marketing Message 1 |
|  2 |    1 |         20 | Marketing Message 2 |
|  3 |    1 |         10 | Marketing Message 3 |
|  4 |    2 |         90 | Sale Offer 1        |
|  5 |    2 |         10 | Sale Offer 2        |
+----+------+------------+---------------------+
5 rows in set (0.00 sec)

And you craft the query that selects a weighted, random message from your table:

mysql> select message from foo where grp=2 order by likelihood * rand() desc limit 1;
+--------------+
| message      |
+--------------+
| Sale Offer 1 |
+--------------+
1 row in set (0.00 sec)

You implement it and get a raise for being a smart guy. And they lived happily ever after …

5 Responses to “Selecting random, weighted records from MySQL”

  • Buddy Says:

    This approach has a problem in that the output doesn’t match the expected frequencies. For example, you 90/10 split will actually occur about 95%/5%. Once you get three or more candidate rows, the numbers skew even more. I don’t know enough statistics to explain, just using empirical data. For example, try (1, 2, 5, 12), you would expect the first to occur 5%, but it’s like 0.5% or less.

    If someone is paying for ad time, they are going to be peeved if the frequencies don’t come out right.

    I’ve seen complex and statistically correct solutions involving summing all the weights and picking a random value in between, but I’m hoping there’s a way to fix the weighting formula (e.g. rand() / likelihood ASC – gives better results).

  • Buddy Says:

    Formula is:

    ORDER BY -LOG(1.0 – RAND()) / likelihood

    For safety use likelihood > 0.0 in WHERE clause.

    See http://use.perl.org/~bart/journal/33630 for excellent analysis by someone way smarter than me.

    It works on all distributions I’ve tested.

  • Tobias Says:

    Buddy – thanks for the update. I’ll reference your comment in the original text.

    Cheers – Tobias

  • How To Randomly Select From A Record Based On Weight (PHP, MySQL) | Alvin Poh's Blog Says:

    […] got this code from a brilliant comment here, and it’s a simple formula that allows you to randomly select from a database. The difference […]

Leave a Reply