Perl's Taint and Locale

[ Perl tips index ]
[ Subscribe to Perl tips ]

Perl has a great security mechanism called taint. A program running under taint must not allow data from outside your program to affect other things outside your program (at least not unintentionally). Thus all incoming data is considered tainted and must be cleaned.

Attempting to: open a file for writing, change into a directory, execute a system command, invoke a shell or perform other similar operations, while using tainted data causes Perl to throw an exception. This was covered in one of our earlier Perl tips http://perltraining.com.au/tips/2005-11-11.html and also in more depth in our Perl Security course. (The Perl Security course notes can be downloaded from http://perltraining.com.au/courses/perlsec.html).

Perl's locale pragma allows us to change how Perl performs string comparisons and regular expressions based upon your regional settings. It answers questions such as:

Even in regular English, we use more letters than A-Za-z, particularly where the word has been borrowed from other languages, such as café. Locale settings can allow Perl to know that 'é' comes before 'f' in the alphabet in string comparisons, and allows a regular expression such as /\w+/ to match words with accented characters, such as naïve or über. Locale settings can also influence how dates are displayed, and how numbers are formatted.

In order for locale settings to work correctly, they need to be configured on your system. While a detailed discussion is beyond the scope of this Perl tip, this is usually done in the Regional and Language Options control panel in Windows, or using the LC_* and LANG environment variables in Unix. See perldoc perllocale for more details.

It is imperative that when we use locale that we also consider the security aspects (which are covered further in perldoc perllocale). Locale settings come from outside of your program, and on some systems these can be set or edited by users. Thus, certain parts of the locale are considered tainted, in particular the built-in character classes such as \w and \s, and their complements \W and \S. Using these on any data, including that from inside your program taints any data returned by the regular expression.

This happens because a sneaky user might edit their locale to suggest that certain punctuation characters are included in classes in which they shouldn't be. This could then be used to make your extra-privileged program to do naughty things. For example consider a program which runs with higher privileges and accepts a file name for deletion from the user:

        # only accept filenames containing word characters and dots
        my ($safe_filename) = ($filename =~ m/^([\w.]+)$/);

        if(defined $safe_filename) {
                # Allow our user to delete items from the trash.
                unlink("/trash/$safe_filename") or die $!;
        }

This is considered safe under taint and no locale, even if run under suidperl. But consider the case where the program caller is able to change their locale settings so that \w allows a single extra character, the forward-slash:

        \w = [A-Za-z./]

Now our user could potentially enter a file name of ../etc/passwd and attempt to remove an important file on our system! That's certainly not that we wanted!

To safely untaint data while using locale you have two options. The first is that you can turn locale off just for that section. For example:

        use locale;

        {
                no locale;      # Disable locale settings for this block.

                my ($safe_filename) = ($filename =~ m/^([\w.]+)$/);

                if(defined $safe_filename) {
                        unlink("/trash/$safe_filename") or die $!;
                }
        }

or you can be explicit about which characters you're accepting with a full character class:

        use locale;

        # only accept filenames containing standard word characters and dots
        my ($safe_filename) = ($filename =~ m/^([A-Za-z0-9_.]+)$/);

        if(defined $safe_filename) {
                unlink("/trash/$filename") or die $!;
        }

It should be noted that use locale itself is lexical in scope, lasting until the end of the current file or block.

Using locale and taint together cause another interesting issue. Data originating from inside your program can still end up being tainted. For example if you perform case operations on a string for example with lc(), lcfirst(), uc() and ucfirst() (or the less common \l, \L, \u and \U) then that string will be tainted. This is because tainted data - the locale - is used in determining the results of these functions, and taint is contagious.

In conclusion, Perl's locale features are a great way to ensure that programs can operate correctly across different regions and locations. By enabling Perl's use locale pragma we can write regionally-aware programs and modules that match and sort words, and display dates, numbers and prices according to our local settings. However in taint aware programs we need to be aware that locale settings cannot always be trusted, and alter our coding style appropriately.

Further reading

http://search.cpan.org/perldoc - Perl locale documentation.

http://perltraining.com.au/courses/perlsec.html - Perl Security course (notes available for download).

[ Perl tips index ]
[ Subscribe to Perl tips ]


This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.

If you have any questions please don't hesitate to contact us:

Email: contact@perltraining.com.au
Phone: 03 9354 6001 (Australia)
International: +61 3 9354 6001

Valid XHTML 1.0 Valid CSS