I run into this problem from time to time, and I've finally put enough energy into it that I want
to commit it to memory through the blog. The most common occurrence of this in my world is when
performing a search using something like Lucene or Sphinx. Here's what usually happens:
I perform a search using my engine of choice. Sphinx, for example, will give me back a hash of
results formatted like this:
Array
(
[matches] => Array
(
[10546] => Array
(
[weight] => 3
[attrs] => Array
(
[feed_id] => 624
[published_at] => 1213713545
)
)
[14154] => Array
(
[weight] => 3
[attrs] => Array
(
[feed_id] => 583
[published_at] => 1213801410
)
)
...
)
[time] => 0.000
...
)
Since I do not store my full-text fields in my index, I then have to take the IDs from the
matches array and query the database to get at my data. The query is simple:
SELECT * FROM items WHERE id IN (10546, 14154, …);
Unfortunately, that query will not necessarily return the rows in the order I provided.
This is an easy fix if I'd like to sort my results on a column in my database, since I can
just add a quick ORDER BY clause on say, the date the item was created. But if I
searched the database for results sorted by relevance, the database really doesn't know
anything about how to sort that way. I need a way to tell MySQL to sort the results in
the order I've supplied my IDs, which leads us to at least two functions: FIND_IN_SET
and FIELD.
Both functions work the same way: Given a string (or column) as the first argument, they
return the 1-based index — or zero if not found — of the argument in either the list of additional
arguments (FIELD) or the second argument, a comma-delimited string (FIND_IN_SET). For
my purposes, then, my query looks like this:
SELECT * FROM items WHERE id IN (10546, 14154, …) ORDER BY FIELD(id, 10546, 14154, …);
SELECT * FROM items WHERE id IN (10546, 14154, …) ORDER BY FIND_IN_SET(id, '10546,14154,…');
Both functions will return the same results, sorted the same way. I went so far as to profile
both, too, just to make sure I was using the best function. Granted, when I profiled this using
MySQL's built-in profiler, my query was pretty simple (as it should be, in my opinion). Your
mileage may vary:
+----------+------------+------------------------------+
| Query_ID | Duration | Sorting result | Query |
+----------+------------+------------------------------+
| 1 | 0.01572400 | 0.001129 | FIND_IN_SET |
| 2 | 0.00241400 | 0.00127 | FIELD |
| 3 | 0.00194700 | 0.00114 | FIND_IN_SET |
| 4 | 0.00193100 | 0.001121 | FIELD |
| 5 | 0.00192300 | 0.001116 | FIND_IN_SET |
| 6 | 0.00192100 | 0.001101 | FIELD |
+----------+------------+------------------------------+
Note that the query result is a hybrid between show profiles and show profile for query [id],
and I've ripped out the actual query because all I care about is the sorting function being used (the query
is the same otherwise). The duration on the whole was lower in each subsequent query because of MySQL's query cache,
but since I really wasn't interested in that anyway — just the sorting duration — it doesn't really matter.
The profiling I did was ridiculously contrived, but it did go to show me, based on the miniscule differences, that
it most likely will not matter which function I use (although FIND_IN_SET ultimately won out over my
exhaustive six-query profile).