[ Perl tips index ]
[ Subscribe to Perl tips ]
In a previous tip we used the Benchmark module to compare different
methods of searching for an item in a list. In this tip we'll provide a
basic introduction to Benchmark.pm. To learn more about the module read
its documentation and look at the resources at the end of this tip.
Perl is fast. Most of the time the code you write will usually be fast enough. Further, it is possible to waste more time benchmarking a few approaches to a problem than you can ever save in the life of that program.
As a result, it is usually best to just write the code rather than worrying about optimisations. In fact most of the time you can leave thinking about optimisations until you've found that speed is an issue. Once it is an issue though, make sure you use Perl's profiling tools to make sure you're optimising the right code before you start benchmarking. We provide some information on this in our Profiling tip: http://perltraining.com.au/tips/2005-07-04.html
If you have a lot of time to waste however, Benchmark.pm is a great way
to prove to your colleagues that your approach is faster than theirs.
Benchmarking allows us to measure code execution time. If we measure two or more ways of solving the same problem then we can compare the execution times to determine which one is the faster. The fastest code is also the most efficient.
Benchmarking gives us information on how well our code will scale. For example searching for an element in a small list is pretty fast, no matter how we do it. However if we repeat that search hundreds or thousands of times, then we may discover small differences between the execution times of each approach. These small differences combine to be large differences if there are enough executions.
The Benchmark.pm module comes standard with Perl and provides a great
interface for a number of comparison options. This tip does not cover all
that Benchmark can do.
Benchmark provides two kinds of subroutines. You can use timethis,
timeit and countit to examine the speed of a single code
snippet. Alternately you can use timethese and cmpthese to run
a number of code snippets. cmpthese prints a nice comparison
chart at the end.
This example compares using grep and foreach to find a random item in
a list. If the first argument to cmpthese is zero then Benchmark will
run each test for at least 3 CPU seconds each, this is a good default. We
can alternately specify the number of times we want it to run with a
positive integer, or a different number of CPU seconds with a negative
integer. The second argument in our example is a hash reference of label =>
subroutine reference pairs.
use strict;
use Benchmark qw(cmpthese);
# A big, sorted list, with 10702 elements.
my @list = (1..10000, 'a'..'zz');
cmpthese(0, {
# Looking for a random item in @list with grep
grep => sub {
# A random element in the list
my $random = $list[rand @list];
my $found = grep { $_ eq $random } @list;
},
# Looking for a random item in @list linearly (with foreach)
foreach => sub {
# A random element in the list
my $random = $list[rand @list];
my $found;
foreach (@list) {
if($_ eq $random) {
$found = 1;
last;
}
}
},
});
Running the above code gives us:
Rate grep foreach
grep 374/s -- -42%
foreach 641/s 72% --
These results can be interpreted as follows. The grep code snippet
can be run 374 times a second, while the foreach loop ran 641 times per
second. foreach was 72% faster than grep for this kind of data.
If this seems strange, look at the code - you'll see that we're cheating.
grep has to search the whole of @list for each call, because it
returns a count of all the occurrences for which the search block is
true. Our foreach loop on the other hand, stops after the item is
found; which on average means only searching through half of the loop.
If we change our foreach loop to be less efficient by removing the
call to last we get the following results:
Rate foreach grep
foreach 305/s -- -19%
grep 378/s 24% --
which looks more fair.
Calling a subroutine in Perl is quite costly when compared to more basic
operations such as addition and subtraction. This cost can dwarf the time
required for tests which don't do much, thus Benchmark allows us to pass in
strings for our code snippets. In the below code we change our subroutine
references to single quoted strings using q{}.
# WARNING: this snippet is buggy
use strict;
use Benchmark qw(cmpthese);
# A big, sorted list, with 10702 elements.
my @list = (1..10000, 'a'..'zz');
cmpthese(0, {
# Looking for a random item in @list with grep
grep => q{
# A random element in the list
my $random = $list[rand @list];
my $found = grep { $_ eq $random } @list;
},
# Looking for a random item in @list linearly (with foreach)
foreach => q{
# A random element in the list
my $random = $list[rand @list];
my $found;
foreach (@list) {
if($_ eq $random) {
$found = 1;
last;
}
}
},
});
Running the above code gives us:
Rate foreach grep
foreach 821975/s -- -13%
grep 945324/s 15% --
Woah! Does this mean that subroutine references were slowing our code down
so dramatically? Actually no. What's happened here is quite subtle. When
we declared @list with my we made it a lexical
variable. Due to the way Benchmark evaluates the code contained in
strings our lexical variables are unaccessible to these snippets, instead
Benchmark assumes the code snippets refer to package variables such as
@main::list instead. Unfortunately we don't have a @main::list so
the code snippets search an empty list many times per second. Declaring
@list with our instead solves the problem:
use strict;
use Benchmark qw(cmpthese);
# A big, sorted list, with 10702 elements.
our @list = (1..10000, 'a'..'zz');
cmpthese(0, {
# Looking for a random item in @list with grep
grep => q{
# A random element in the list
my $random = $list[rand @list];
my $found = grep { $_ eq $random } @list;
},
# Looking for a random item in @list linearly (with foreach)
foreach => q{
# A random element in the list
my $random = $list[rand @list];
my $found;
foreach (@list) {
if($_ eq $random) {
$found = 1;
last;
}
}
},
});
This code yields:
Rate grep foreach
grep 375/s -- -40%
foreach 621/s 66% --
which is very comparable to our earlier results.
In all of these examples, we've found that a short-circuiting foreach
statement is faster than grep. Is this meaningful? Can we say that
foreach is definitely faster than grep? The answer is no.
The tests above do not consider the behaviour where the element is not in
the list. We also don't test other comparison operators, would there
be a difference if we were comparing with a regular expression rather than
eq?
Generalising from the results of a single set of benchmark comparisons is
not always possible. The only thing we can assert from these results is
that when searching for existing numbers and short strings in a medium
length list using eq; foreach is faster. Thus, when you're working
on benchmarks for your own code, make sure you use appropriate data and
comparisons for your domain.
use strict will not complain about
attempts to use package variables that are undefined. Running cmpthese
with 1 for its first argument will help you test this faster.
Benchmark returns seem too big, something's wrong. Ask
yourself whether you think it's likely that Perl really can run foreach
over such a big list 821,975 times every second.
http://www.oreilly.com/catalog/9780596527242/index.html Chapter 6: Benchmarking Perl. Mastering Perl by brian d foy, O'Reilly Publishers, 2007.
(If you can't wait for the book to be published (due really soon) you can see a draft of this chapter at: http://www252.pair.com/comdog/mastering_perl/Chapters/05.benchmarking.html )
Beware of Benchmark by Abigail, Open Source Developers' Conference 2004, (link currently not available at this time).
[ Perl tips index ]
[ Subscribe to Perl tips ]
This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.
If you have any questions please don't hesitate to contact us:
| Email: | contact@perltraining.com.au |
| Phone: | 03 9354 6001 (Australia) |
| International: | +61 3 9354 6001 |
Copyright 2001-2012 Perl Training Australia. Contact us at contact@perltraining.com.au