[ Perl tips index ]
[ Subscribe to Perl tips ]
Perl has a great security mechanism called taint. A program running under taint must not allow data from outside your program to affect other things outside your program (at least not unintentionally). Thus all incoming data is considered tainted and must be cleaned.
Attempting to: open a file for writing, change into a directory, execute a system command, invoke a shell or perform other similar operations, while using tainted data causes Perl to throw an exception. This was covered in one of our earlier Perl tips http://perltraining.com.au/tips/2005-11-11.html and also in more depth in our Perl Security course. (The Perl Security course notes can be downloaded from http://perltraining.com.au/courses/perlsec.html).
Perl's locale pragma allows us to change how Perl performs
string comparisons and regular expressions based upon your regional
settings. It answers questions such as:
Even in regular English, we use more letters than A-Za-z, particularly
where the word has been borrowed from other languages, such as
café. Locale settings can allow Perl to know that 'é' comes before
'f' in the alphabet in string comparisons, and allows a regular
expression such as /\w+/ to match words with accented characters,
such as naïve or über. Locale settings can also influence how
dates are displayed, and how numbers are formatted.
In order for locale settings to work correctly, they need to be
configured on your system. While a detailed discussion is beyond
the scope of this Perl tip, this is usually done in the Regional
and Language Options control panel in Windows, or using the
LC_* and LANG environment variables in Unix. See
perldoc perllocale for more details.
It is imperative that when we use locale that we also consider the
security aspects (which are covered further in perldoc perllocale).
Locale settings come from outside of your program, and on some systems
these can be set or edited by users. Thus, certain parts of the locale are
considered tainted, in particular the built-in character classes such as
\w and \s, and their complements \W and \S.
Using these on any data, including that from inside your program
taints any data returned by the regular expression.
This happens because a sneaky user might edit their locale to suggest that certain punctuation characters are included in classes in which they shouldn't be. This could then be used to make your extra-privileged program to do naughty things. For example consider a program which runs with higher privileges and accepts a file name for deletion from the user:
# only accept filenames containing word characters and dots
my ($safe_filename) = ($filename =~ m/^([\w.]+)$/);
if(defined $safe_filename) {
# Allow our user to delete items from the trash.
unlink("/trash/$safe_filename") or die $!;
}
This is considered safe under taint and no locale, even if run under
suidperl. But consider the case where the program caller is able to change
their locale settings so that \w allows a single extra character, the
forward-slash:
\w = [A-Za-z./]
Now our user could potentially enter a file name of ../etc/passwd and attempt to remove an important file on our system! That's certainly not that we wanted!
To safely untaint data while using locale you have two options. The first is that you can turn locale off just for that section. For example:
use locale;
{
no locale; # Disable locale settings for this block.
my ($safe_filename) = ($filename =~ m/^([\w.]+)$/);
if(defined $safe_filename) {
unlink("/trash/$safe_filename") or die $!;
}
}
or you can be explicit about which characters you're accepting with a full character class:
use locale;
# only accept filenames containing standard word characters and dots
my ($safe_filename) = ($filename =~ m/^([A-Za-z0-9_.]+)$/);
if(defined $safe_filename) {
unlink("/trash/$filename") or die $!;
}
It should be noted that use locale itself is lexical in scope,
lasting until the end of the current file or block.
Using locale and taint together cause another interesting issue. Data
originating from inside your program can still end up being tainted. For
example if you perform case operations on a string for example with
lc(), lcfirst(), uc() and ucfirst() (or the less common
\l, \L, \u and \U) then that string will be tainted. This is
because tainted data - the locale - is used in determining the results of
these functions, and taint is contagious.
In conclusion, Perl's locale features are a great way to ensure that
programs can operate correctly across different regions and locations.
By enabling Perl's use locale pragma we can write regionally-aware
programs and modules that match and sort words, and display dates,
numbers and prices according to our local settings. However in
taint aware programs we need to be aware that locale settings
cannot always be trusted, and alter our coding style appropriately.
http://search.cpan.org/perldoc - Perl locale documentation.
http://perltraining.com.au/courses/perlsec.html - Perl Security course (notes available for download).
[ Perl tips index ]
[ Subscribe to Perl tips ]
This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.
If you have any questions please don't hesitate to contact us:
| Email: | contact@perltraining.com.au |
| Phone: | 03 9354 6001 (Australia) |
| International: | +61 3 9354 6001 |
Copyright 2001-2012 Perl Training Australia. Contact us at contact@perltraining.com.au