Memoizing

[ Perl tips index ]
[ Subscribe to Perl tips ]

Imagine that you have a subroutine that is a function in the mathematical sense. A given set of inputs will always result in the same output. A good example may be a function that determines if a number is prime, or one that picks the largest number from a list. These functions depend only upon their inputs, and produce only their return values.

If our functions take a long time to calculate their results, then we can use a hash to remember the results, so we don't have to calculate them next time:

        # Calculate the Fibonacci number for a given input
        
        my %fib_results;                # our cache
        sub fibonacci {
                my $n = shift;
                # See if we've already cached our result
                if(exists $fib_results{$n}) {
                        return $fib_results{$n};
                }
                # Base cases
                if($n == 0 || $n == 1) {
                        $fib_results{$n} = $n;
                        return $n;
                }
                # Calculate the result
                my $result =  fibonacci($n - 1) 
                              +
                              fibonacci($n - 2);
                # Store result and return answer
                $fib_results{$n} = $result;
                return $result;
        }

This version is many times faster than one which doesn't remember the results, however this is achieved at a loss to readability. Instead of being able to look at the function and immediately understand how it works, we have to wade through all of the caching work. Further, there is a risk that our caching itself may be introducing bugs. For example if we accidently wrote:

                # Base cases
                if($n == 0 || $n == 1) {
                        $fib_results{$n} = 1;   # Oops!
                        return $n;
                }

then we'll get some very odd results (fibonacci(0) should return 0)..

Fortunately there is a Perl module Memoize that comes standard with Perl which does this caching work for us. Using it allows us to write:

        use Memoize;
        memoize('fibonacci');
        # Then later...
        # Calculate the Fibonacci number for a given input
        sub fibonacci {
                my $n = shift;
                # Base cases
                if($n == 0 || $n == 1) {
                        return $n;
                }
                # Calculate our result
                return fibonacci($n - 1) 
                       +
                       fibonacci($n - 2);
        }

Now the clutter from caching is gone and it's easy to see that the code is correct. Further, if we ever want to remove the caching for any reason we can just comment out the call to memoize()

Extra features

Memoize accepts the following options where required:

INSTALL
Memoize works by hiding the original version of your function by installing its cached version under the same name. If you don't want this functionality you can use the INSTALL option to specify a new name for the cached code:
        memoize('fibonacci', INSTALL => 'new_fibonacci');
NORMALIZER
Memoize caches its results based on the arguments passed in. If you have a function which utilises optional arguments which are populated by defaults if empty, then this can cause Memoize to cache functionally equivalent calls multiple times. To solve this you can provide memoize with a normalize subroutine which tells Memoize which argument lists should be treated as identical.

This is also important if you are passing references to a memoized function.

        memoize('some_subroutine', NORMALIZER => 'some_sub_normalizer');

Caveats

State changes
Memoize should only be used on subroutines which always return the same thing for the same inputs. For example mathematical functions like calculating sequences, or determining where chess pieces can move. If you memoize a subroutine which uses program, environment, external or temporal state data, then the results for the first call to that subroutine will be correct but all later calls may receive out of date information.
Side effects
Memoizing a subroutine which has side-effects will result in those side-effects only occurring the first time the subroutine is run. In all later calls, only the return value will be returned.
Simple programs
Using Memoize can slow your program down in some cases. This is because the time used to look up the value in the cache is less than the time it would have taken to calculate it. If you're uncertain as to whether caching will speed up your code, you can always use Benchmark.

[ Perl tips index ]
[ Subscribe to Perl tips ]


This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.

If you have any questions please don't hesitate to contact us:

Email: contact@perltraining.com.au
Phone: 03 9354 6001 (Australia)
International: +61 3 9354 6001

Valid XHTML 1.0 Valid CSS