[ Perl tips index ]
[ Subscribe to Perl tips ]
Big-O notation is a mathematical construct used to describe algorithmic complexity. In particular it is allows efficiency comparisons of different algorithms for large data sets.
Typical orders are (in order of desirability):
O(1) also CO(log N)O((log N)c)O(N)O(N log N)O(N2) and O(Nc)N2: doubling N will quadruple
execution time, N3: doubling N increases execution time by a factor
of 8. For large N, O(N2) and above rapidly become unusable.
O(n!)N. 5! = 120.
It is an interesting property of Big-O notation that if a function can be written as a sum of other functions, then only the fastest growing function matters. This is because the growth in that function will rapidly dwarf the growth in the others. For example:
cN + dN2 => O(N2) for any constants c and d
N4 + N! => O(N!)
Understanding the basics of how to calculate Big-O notation can help us identify bad choices in our code and spot likely causes of slow execution time. For example consider the following attempt to find all the elements in list1 which are also in list2:
my @list1 = (10 .. 100);
my @list2 = (50 .. 200);
# Walk over @list1
foreach my $element (@list1) {
# Check in @list2 to see if it's there
foreach my $item (@list2) {
if($item eq $element) {
print "$item is in both lists\n";
}
}
}
If @list1 and @list2 are small, this code works very quickly.
However, if we use:
my @list1 = (10 .. 10000);
my @list2 = (5000 .. 2000000);
then the code takes a very long time. Let's call the size of @list1
N and the size of @list2 M then what we're doing is:
for each N
for each M
do something
You can see here that this means we walk over @list1 once, but for
each item we also walk over the whole of @list2. This means we're
dealing with O(N M). As the smaller list approaches the same size as
the larger, this approaches O(N2) which suggests we should investigate
whether a faster possibility exists.
We can rewrite the above code as follows:
my @list1 = (10 .. 100); # length N
my @list2 = (50 .. 200); # length M
# Create a hash of all the values in @list1: O(N)
my %in_list1;
foreach my $element (@list1) {
$in_list1{$element} = ();
}
# Walk over each element in @list2 and compare against the hash
# O(M)
foreach my $item (@list2) {
if(exists $in_list1{$item}) {
print "$item is in both lists\n";
}
}
Now we can see that the end result is O(N+M) - we walk over @list1
once, and we walk over @list2 once. This is an O(N) solution so we
know it's good. Note that we're achieving this trade-off by using extra
memory to store our %in_list1 hash. For very large N this needs to
be considered.
The above style problem comes up regularly, even though it may not quite
look as simple as this. For example, imagine that you have a flat file
containing addresses for businesses, and you wish to display a subset of
those based on a list of suburbs. You could go through the whole file for
each suburb (O(N M)) or you could adapt the above solution to store
either the addresses or suburbs in a hash (depending on your requirements)
and go through each only once. The same style solution can also be used
to find the complement of a list.
Big-O notation does not predict how fast something will run, but rather
gives us information about how well it will scale. For example doubling
the size of N for a O(N2) problem will quadruple the program
execution time regardless of whether the function complexity is actually
O(0.5 N2) or O(10000 N2).
Constants only matter when comparing functions of different complexities.
For example O(100*N) is slower than
O(N2 /100) until N is greater
than 10,000.
Mathematical coverage for Big-O notation can be found on Wikipedia at: http://en.wikipedia.org/wiki/Big_O_notation.
There are also some great notes on Big-O notation on perlmonks.org
[ Perl tips index ]
[ Subscribe to Perl tips ]
This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.
If you have any questions please don't hesitate to contact us:
| Email: | contact@perltraining.com.au |
| Phone: | 03 9354 6001 (Australia) |
| International: | +61 3 9354 6001 |
Copyright 2001-2010 Perl Training Australia. Contact us at contact@perltraining.com.au