When looking for something in an array of values, it is very tempting to use in_array()
. After all, that’s what the name says. However, searching through an array, even with best-case search algorithms, will never be faster than a single index lookup, which is where isset()
comes in. With isset()
, you can use one operation to see if a value exists, provided those values exist as keys. I don’t know if it’s truly random access, but it’s pretty darn close.
So, instead of something like this:
1 2 3 4 5 6 7 8 9 | $exclude = array(1, 4, 6, 8); for ($i = 0, $size = count($data); $i < $size; $i++) { if (in_array($data[$i]['id'], $exclude) { // do something } } |
do something like:
1 2 3 4 5 6 7 8 9 10 11 12 | $exclude[1] = true; $exclude[4] = true; $exclude[6] = true; $exclude[8] = true; for ($i, $size = count($data); $i < $size; $i++) { if (isset($exclude[$data[$i]['id']])) { // do something } } |
So does this make a difference? Let’s write a little benchmark script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #!/usr/bin/php <?php $haystack = array(); for ($i = 0; $i < 1000; $i++) { $haystack[] = rand(0, 1000); } $needles = array(); for ($i = 0; $i < 1000; $i++) { $needles[] = rand(0, 1000); } for ($i = 0; $i < 1000; $i++) { foreach ($needles as $needle) { if (in_array($needle, $haystack)); } } |
We fill two arrays with 1000 random integers. One is the haystack – what we will search through. The other is the list of needles – we want to search for each one. For each needle, we look for it in the haystack. Then, we repeat this 1000 times.
Executing this, the script takes around 37 seconds:
% time ./bench.php real 0m37.400s user 0m37.282s sys 0m0.068s |
Now, let’s change the last for()
loop to this:
15 16 17 18 19 20 21 22 | for ($i = 0; $i < 1000; $i++) { $tmp = array_flip($haystack); foreach ($needles as $needle) { if (isset($tmp[$needle])); } } |
The new output:
% time ./bench.php real 0m0.778s user 0m0.764s sys 0m0.008s |
Execution time drops from around 37 seconds to 0.7 seconds.