HANDLE ERRORS WITH WARN AND DIE
Perl provides two routines to handle different error levels in your
programs: warn and die. Both routines issue a message to the standard error
file handle, STDERR. In addition, die causes the program to exit (or
immediately terminates an eval).
Warn is used when a nonfatal error occurs and you wish to inform the
user of the problem. For example, if a user cannot open a specified file,
you might use warn to display the error.
Die is used when a fatal error occurs because it will terminate the
program without returning. For example, dividing by zero might generate an
error message using die.
Here are examples of die and warn in action:
open LOG, ">log" or die "Can't create log file 'log': $!, stopped";
warn "Salary seems excessive\n" if ($salary > 1000000.00);
If the string you want to print does not end in a new line, then the
current script name, line number, and input file line number (if any) are
appended to the message. For this reason, you should end the die or warn
message with something appropriate, such as ", stopped" or ", warning".
If you don't supply any error message, warn and die use the current
value of the $@ variable for the message. If $@ doesn't contain any text,
then warn prints "Warning: Something's wrong", and die prints "Died". Both
messages are followed by the file and line number message.
Warn and die make it easy to return appropriate diagnostics in exception
conditions.
USE THE CARP MODULE TO TRACK BUGS
Perl provides module writers with a set of tools that identifies the
location of errors in the caller's program. The carp module works like warn
and die but returns information about the caller rather than the module.
The carp module has four routines: carp, cluck, croak, and confess. The
routines carp and croak return the file and line information of the
caller rather than the file and line that called carp or croak.
So, if you have a routine Check() that calls Carp(), then Carp() will
report that the error occurred where Check() was called, not where Carp()
was called. This helps you to identify where to look in your code for the
error.
If your module accepts a value that will be used as a divisor in an
operation, it could check that value to ensure it's not zero. Here is an
example of carp and croak:
use Carp;
divide(2,0);
sub divide
{
my ($top, $bottom) = @_;
croak "Bottom is ZERO!\n" if ($bottom == 0);
return $top / $bottom;
}
Calling divide with a value for the second parameter of zero will return
the file and line of the caller rather than the location in the divide
routine. This is the desired behavior since the caller, not divide, is
really at fault.
Using cluck and confess will generate a full stack trace rather than
just the file and line of the caller. If you want to use cluck, you have to
explicitly import it since most people do not want full stack traces on
warnings. Here is an example:
use Carp;
b(2,0);
sub b { return c(@_); }
sub c { return divide(@_); }
sub divide
{
my ($top, $bottom) = @_;
confess "Bottom is ZERO!\n" if ($bottom == 0);
return $top / $bottom;
}
The carp module makes it easy for module developers to return useful
diagnostic information to module users.
USE BLESS TO TURN REFERENCES INTO OBJECTS
Perl's object model is based on references that know what package they
belong to as specified by the bless function. Stated simply, an object is
a reference that knows what package it belongs to; a class is a package
(a collection of data and routines); a method is a subroutine in a
package.
The bless function accepts one or two parameters. The first is the
reference to be turned into an object. The second optional parameter is the
package name (or class name, if you prefer). In order to support
inheritance (making a new class based on an existing one), you should always use
the two-parameter form of bless.
Typically, the reference being blessed points to an anonymous hash.
However, a reference to any type of basic data structure can be turned into
an object by blessing it into a package.
Here are a few examples:
package Chicken;
sub hatch
{
my $class = shift;
my $self = { First => 'Chicken' };
return bless($self, $class);
}
package Egg;
sub lay
{
my $class = shift;
my $self = { First => 'Egg' };
return bless($self, $class);
}
Perl's support for object-oriented programming begins with the bless
function and the specification of a package for a reference.
GET FILE INFORMATION USING THE FILE TEST OPERATORS
Perl's file test operators are unary operators that return specific
information about a file, including whether the file exists, the file size,
and whether the file is a plain file or a directory. The file test
operators can also return the modification, creation, and last access times for
the file.
The file test operators can be used on either a file name or an open
file handle. The modification, creation, and last access times are expressed
in fractional days, based on when the program started. Positive numbers
refer to time before the script started, while a negative value indicates
time since the script started.
Here's an example that prints the time of modification for each file
specified on the command line:
for (@ARGV)
{
print "$_: ", scalar localtime(time -24*60*60 * -M $_), "\n";
}
The following program prints the size of each specified file:
for (@ARGV)
{
print "$_: ", -s $_, "\n";
}
Combining these with printf generates a simple directory listing:
for (@ARGV)
{
printf "%s %10d %-s\n", scalar localtime(time -24*60*60 * -M
$_), -s $_, $_;
}
Perl's file test operators make it simple to retrieve and process
information about files.
CREATE ABBREVIATION LISTS WITH TEXT::ABBREV
When parsing commands, it's sometimes helpful to be able to abbreviate a
command. You can use the Text::Abbrev module to find the shortest unique
word to describe the command.
If you provide Text::Abbrev with a list of words, the module will return
a list of all possible unambiguous abbreviations. The abbreviations will
be unique with regard to the other words in the list. The returned list
is in the form of a hash, with the abbreviation as the key and the
original word as the value of each element.
You can check the abbreviation list against the input command by looking
up the word in the generated hash. If a corresponding hash entry exists,
then the command is valid and the hash value is the full-text command
word. If the hash entry does not exist, the command is either invalid or
ambiguous.
Here is some example Text::Abbrev code:
use Text::Abbrev;
%h = abbrev qw(left right forward backward lift backdoor);
for (sort keys %h)
{
print "$_ = $h{$_}\n";
}
Using Text::Abbrev simplifies the work required for unique command
processing.
USE TEXT::SOUNDEX FOR INEXACT NAME MATCHING
The soundex algorithm is designed to map names to a four-character value
based on the sound of the word when spoken by an English speaker. The
idea is to be able to enhance name matching to include "fuzzy matches" with
names that sound similar but are spelled differently (e.g., Smith and
Smythe).
To use the soundex tool, simply pass in a list of strings and the code
returns a list of soundex codes. The routine will accept a single string
and return a single code, or you can pass in a list and get a list of
codes back.
Here is a code example:
use Text::Soundex;
for (@ARGV)
{
$code = soundex $_;
print "$_ = $code\n";
}
Soundex will work not only on names but also on any text elements. Keep
in mind that it maps an arbitrarily large set of words into a finite
space (only four characters). This means that the algorithm will begin to
degrade rapidly if it's used on a vast set of very similar words.
Almost any application that has to match names can greatly benefit from
the soundex algorithm. A system that allows fuzzy matches against names
is a great advantage for customer support.
EXPAND TABS TO A VARIABLE NUMBER OF SPACES
Perl is exceptionally good at text processing jobs. One common task it
performs is translating tab characters into an appropriate number of
spaces to maintain text alignment.
The standard method for converting tabs into spaces is to count the
number of characters to the left of the tab, and then replace the tab with a
sufficient number of spaces to move the position to the next even
multiple of the tab stop value. With Perl, you can do this quite easily.
Use the substitution pattern-matching operator to find the first tab on
the line. Then use the length function to count the number of nontab
characters before the tab. Replace the tab with an appropriate number of
spaces by calculating the remainder after dividing the length by the tab stop
value.
Although it's tempting to use the /g modifier with the substitution
operator, it will not work. You must repeat the entire pattern match each
time so that you know the length of the string using the newly substituted
spaces.
The /e substitution modifier specifies that the right-hand side of the
substitution is code rather than a simple string. This allows the
necessary calculations to be completed without variable substitution in the
replacement string.
Here is an example:
# expand tabs to spaces
# accept tabstop on command line or default to 4
$tabstop = $ARGV[0] + 0 || 4;
while (<STDIN>)
{
1 while(s/^(.*?)(\t+)/$1 . ' ' x ($tabstop * length($2) - length($1)
% $tabstop)/e);
print;
}
</nowiki>
Perl's ability to process text using regular expressions makes tasks
like expanding tabs very simple.
EXPAND EXPRESSIONS INSIDE STRINGS
Sometimes string expressions need to be more complex than a simple
scalar value allows. Perl allows you to create arbitrarily complex expressions
inside strings and print their values. This can be particularly handy
when using here documents.
There are only three special characters that Perl recognizes inside a
double-quoted string: $, @, and \. However, recall Perl's syntax for
reference variables, using brackets around the expression. For example:
$aref = [ 3, 2, 1 ];
print "Countdown: @{$aref}";
Combining this with an inline anonymous array results in the syntax for
embedding arbitrarily complex expressions. Simply put the anonymous array
inside the brackets:
print "Countdown: @{[3, 2, 1]}";
If the expression results in a single scalar value, you can use the
scalar version ${\()} as follows:
print "The current index is $i, the next index is ${\($i + 1)}";
You are not limited to just hard-coded values. You can use any
expression, including function calls. You can also use this technique to great
advantage in here documents:
print "Today is ${\scalar localtime}";
print "YTD earnings = ${\AddCommas?($YTDEarnings)}";
SendMail?(@targetEmails, <<EOT);
From: admin@localhost
To: ${\join ', ', @targetEmails}
Date: ${\scalar gmtime}
The following people are disk hogs:
@{[grep {DiskUsage?($_) > 1000000000} @users]}
- The Admin
EOT
Perl's extended syntax for reference variables makes it easy to embed
complex expressions into double-quoted strings and here documents.
BUILD QUERY ENGINES IN PERL
Perl's ability to parse text can be greatly enhanced if the user is
allowed to specify queries directly in Perl's own language. This technique
allows complex processing of log files and other program output that goes
well beyond what could be done using simple regular expressions.
Below is a program that will perform arbitrarily complex queries on a
comma-separated value (CSV) file such as those produced by spreadsheet
programs. It assumes that the first line of the file contains a list of field
names and that the remaining lines are data. For simplicity, the program
reads input from standard in:
use strict;
use Text::ParseWords?;
# get header line
my $header = <STDIN>;
# get field names
my @fieldNames = quotewords(",", 0, $header);
# strip leading & trailing spaces (if any), replace internal spaces withunderscore
@fieldNames = map {s/^\s+//; s/\s+$//; s/\s+/_/g; $_} @fieldNames;
my @fields; # where field data will be stored
# create access functions
for (my $i = 0; $i < @fieldNames; $i++)
{
no strict 'refs';
my $name = $fieldNames[$i];
eval "sub $name () { \$fields[$i] }"; # create access subroutine
*{lc $name} = *{uc $name} = $name; # make upper and lower case aliases
}
# compile user's query
my $code = "sub Query { " . join(" and ", @ARGV) . " }";
eval $code.1 or die "Error: $@\nIn query string: $code\n";
# print the header line (field names)
print $header;
# process each line
while (<STDIN>)
{
@fields = quotewords(",", 0, $_);
print if Query();
}
Using Perl's ability to compile and execute code on the fly makes it
easy to write and use powerful filter programs.
BUILD A SIMPLE WEB SERVER WITH THE HTTP::DAEMON MODULE
When your project needs a simple interface, consider building a Web
server into your application. Building a basic Web server is fairly simple in
Perl using the HTTP::Daemon module. This module does most of the heavy
lifting associated with getting requests and sending replies.
Below is the code for a complete Web server. It sends a response
indicating the type of request, the path requested, the request headers, and the
current time:
use HTTP::Daemon;
use HTTP::Status;
use HTTP::Response;
my $daemon = new HTTP::Daemon LocalAddr? => 'localhost', LocalPort? =>1234;
print "Server open at ", $daemon->url, "\n";
while (my $connection = $daemon->accept)
{
print Identify($connection) . " Connected\n";
while (my $request = $connection->get_request)
{
print Identify($connection) . " Requested ${\$request->method} ${\$request->url->path}\n";
if ($request->method eq 'GET')
{
$response = HTTP::Response->new(200);
$response->content(<<EOT);
Simple Web server
You requested path ${\$request->url->path}
using protocol ${\$request->protocol}
via method ${\$request->method}
Your header information was:
${\join ' ', split(/\n/, $request->headers_as_string())}
I'm a simple server so that's all you are going to get!
Generated ${\scalar localtime}
EOT
$connection->force_last_request; # allows us to service multiple clients
$connection->send_response($response);
}
else
{
$connection->send_error(RC_FORBIDDEN)
}
}
print "${\$connection->peerhost}:${\$connection->peerport} Closed\n";
$connection->close;
undef($connection);
}
sub Identify
{
my ($conn) = @_;
return "${\$conn->peerhost}:${\$conn->peerport}";
}
By simply replacing the code of the innermost while loop, you can make
your Web server do anything you want. For example, you could return the
status of your program, or get a picture from a Web camera.
Adding a Web interface to your Perl script is straightforward when you
use the HTTP::Daemon module.
INSERTING COMMAS IN TEXT LISTS
Perl is an excellent tool for processing text data. However, sometimes
you need to format a multi-item list following proper English syntax
rules. The basic rules are as follows:
1. Single-item lists require no processing.
2. Two-item lists require that the word "and" be inserted between the
items.
3. Three-item and longer lists require a comma between each element and
the word "and" before the last element.
4. If one or more items in the list already contain a comma, then a
semicolon should be used as the separator, rather than a comma.
Turning these rules into Perl is a simple process. Below is a routine
that will return a properly formatted string given a list of one or more
items (an array):
sub EnglishCommas?
{
@_ == 0 and return ''; # nothing ventured, nothing gained
@_ == 1 and return $_[0]; # just return the item
@_ == 2 and return join ' and ', @_; # just add 'and'
my $sep = grep(/,/, @_) ? '; ' : ', '; # determine separator
return join $sep, @_[0..($#_-1)], "and $_[-1]"; # return the separated list
}
Passing this routine, the following input list results in the output shown below it:
Input:
* 'Dilbert'
* 'Bilbo', 'Frodo'
* 'The butcher', 'the baker', 'the candlestick maker'
* 'Pimento cheese', 'peanut butter and jelly', 'egg salad', 'bacon, lettuce, and tomato'
Output:
* Dilbert
* Bilbo and Frodo
* The butcher, the baker, and the candlestick maker
* Pimento cheese, peanut butter and jelly, egg salad, and bacon, lettuce, and tomato
As shown, generating lists that follow complex English grammar rules is
simple using just the basic Perl functions.
USE A HASH TO IDENTIFY ELEMENTS
When you need to find the unique elements in a list, consider using a
hash to identify them.
For instance, perhaps you have several e-mail lists, and you need to
send a message to everyone on each list. However, there is a good chance
that the lists have overlapping addresses, but you don't want to send
duplicate e-mails to the same address.
You can use the following code to process the lists and determine only
the unique elements:
@admin = qw(bob joe fred bill tim);
@operators = qw(bob fred tim travis nancy);
@powerusers = qw(fred bill jane sally roger);
@users = qw(tom john ralph nancy alex kelly);
print "Original users: @{[sort(@admin, @operators, @powerusers, @users)]}\n\n";
%before = ();
@unique = sort grep { ! $before{$_}++ } @admin, @operators, @powerusers, @users;
print "Unique users: @unique\n";
This program outputs the following:
Original users: alex bill bill bob bob fred fred fred jane joe john kelly nancy nancy ralph roger sallytim tim tom travis
Unique users: alex bill bob fred jane joe john kelly nancy ralph roger sally tim tom travis
If you need to process each element as it is seen rather than wait until
the entire list is available, you can check and process elements inside
a for loop, as follows (use the same lists of users as before):
print "Unique users:\n";
%before = ();
for $user (@admin, @operators, @powerusers, @users)
{
next if ($before{$user}++);
# do whatever processing here
}
Using a hash is an easy way to find the unique elements of a list.
FIND NON-DUPLICATE ELEMENTS IN A LIST
When you need to compare two lists and show the differences between the
two lists, consider using a hash to identify and remove common elements.
By using a hash to identify the elements in one list, it's easy to
identify which elements from the second list are common to the first.
The following code compares two arrays and prints those elements from
the second array that aren't in the first array.
@a1 = qw(one two three four five six);
@a2 = qw(one three five seven nine);
@found{@a1} = ();
for $item (@a2)
{
print "$item\n" if (! exists $found{$item});
}
An example of when this sort of code might come in handy is when you
need to compare a list of images against a list of products for a Web site.
A further enhancement is to test for duplicate items in the second list.
By adding the following line to the bottom of the for loop, you'll
eliminate duplicate items in list two:
$found{$item} = 1;
Finding the unique items in one list but not in another is easy when you
use a hash.
STORING MULTIPLE VALUES PER HASH ENTRY
When you need to store multiple values for each key in a hash, store an
array reference instead of a scalar value.
If you're storing data that has text for a key, a hash is the obvious
choice. You can only store one scalar value in a hash element. However, an
array reference is a scalar value and can therefore be stored in a hash
element.
For example, consider the case of storing the zip codes associated with
a city. Since a single city may have multiple zip codes, you might
consider storing a reference to an array of all applicable zip codes as the
value of the hash element. Look at this example:
while (<DATA>)
{
chomp;
($zip, $state, $city) = split / /;
push @{$zipcodes{"$city, $state"}}, $zip;
}
for $city (sort keys %zipcodes)
{
print "$city: @{$zipcodes{$city}}\n"
}
__DATA__
40502 KY LEXINGTON
40503 KY LEXINGTON
40504 KY LEXINGTON
40505 KY LEXINGTON
40511 KY LEXINGTON
40513 KY LEXINGTON
40514 KY LEXINGTON
40515 KY LEXINGTON
40516 KY LEXINGTON
40517 KY LEXINGTON
40202 KY LOUISVILLE
40213 KY LOUISVILLE
40214 KY LOUISVILLE
40215 KY LOUISVILLE
40217 KY LOUISVILLE
40220 KY LOUISVILLE
40222 KY LYNDON
40241 KY LYNDON
40242 KY LYNDON
By always storing an array reference, even when there is only one
element, you can simplify the code. Otherwise, you would have to test the hash
element to see if it is a reference or a scalar and then act on it
appropriately.
Use an array reference to store multiple values in a single hash
element.
PASSING PARAMETERS IN HASHES
If your subroutine has a number of optional parameters, consider passing
those parameters as key-value pairs in a hash.
For instance, suppose you have a subroutine that accepts a URL and some
optional parameters such as a debug flag, a timeout, a maximum depth, and
a verbosity level. In a traditional procedural programming language, you
might expect to see something like this:
func(url, debug, timeout, depth, verbose);
In this model, to pass a value for the depth parameter, you must pass a
value for debug and timeout parameters. In Perl, you can combine all of
the optional parameters into a hash, and only pass those that interest
you.
Here's the same function in Perl, with optional parameters passed in a
hash:
func("http://www.builder.com", debug=>1, verbose=>0, depth=>2);
sub func
{
my ($url, %params) = @_;
print "URL: $url\n";
for $item (sort keys %params)
{
print "$item: $params{$item}\n";
}
}
Using the hash parameter passing method allows you to add new parameters
without changing existing code. This method is also useful when you
don't need or want to know what the parameters are. For example, if your
"pass-through" function is a wrapper for another function or object. Your
code can allow the caller to pass in parameters that will be sent along to
the wrapped function without ever knowing what they are.
Passing parameters to subroutines is much easier using hashes.
MATCHING ACROSS LINES IN REGULAR EXPRESSIONS
The ability to have regular expressions match across line boundaries can
come in handy.
Perl provides two modifiers, /s and /m, that have to do with how regular
expressions match line boundaries. The /s modifier allows the period (.)
special character to match a newline (which usually it would not).
The /m modifier changes the meaning of the caret (^) and dollar sign ($)
special characters. Normally, caret (^) matches at the beginning of the
string and dollar sign ($) would match the end of the string. With /m in
effect, caret (^) and dollar sign ($) match next to a newline.
Note that these two modifiers aren't mutually exclusive. One changes the
behavior of the period (.) character, and the other changes the behavior
of the caret (^) and dollar sign ($) characters. Both can be used
together if needed.
The following code does a poor job of removing XML tags from a document:
undef $/; # slurp entire file at once
($file = <STDIN>) =~ s/<.*?>//gs;
print $file;
If you were looking for POD documentation markers, you could use the /m
option to have caret (^) match at the beginning of each line.
undef $/; # slurp entire file at once
$file = <STDIN>;
while ($file =~ /^=(\w+)(.*)$/gm)
{
print "command: $1, options: $2\n";
}
When /m is in effect, you can use the \A and \Z special characters to
match the beginning or end of the string, respectively. When /m is not in
effect, \A and caret (^) match at the same point, as do dollar sign ($)
and \Z.
When you need to match across line boundaries, consider whether /s, /m,
or both would best serve your needs.
HOW TO CREATE FILTERS USING PERL
A common use for Perl is to create filters. A filter is a program that
reads data, processes it, and outputs the processed information. Typical
filter behavior is to accept filenames on the command line or to read from
standard input if no files are specified.
Perl provides the special construct "while (<>)" for this case. This
tells Perl: If there are command line parameters, read from standard input
until the end of the file. If there are command line parameters, assume
each one is a filename and try to open and read them until the end of the
file.
Perl doesn't automatically expand wildcards in the filenames. If your
shell program doesn't do this for you, you must do it explicitly at the
start of the program. The default Windows command interpreter, for example,
doesn't expand file patterns on the command line.
The following is a basic shell for a filter program. It includes the
necessary steps to expand file patterns on the command line. This filter is
a poor man's grep. It accepts a pattern and, optionally, a list of files
on the command line and prints lines that match from the files it
processes.
$pattern = shift || die "No pattern\n";
@ARGV = map {glob} @ARGV; # expand file patterns
while (<>)
{
print "$ARGV ($.): $_" if (/$pattern/o);
close ARGV if eof;
}
Note the use of the $ARGV variable in the print statement; Perl stores
the name of the current file in this variable. Perl also sets the special
variable "$." to the current line number.
Filters that read from files or from standard input make it easy to
chain simple functions together to form a complex program on the command
line.
KEEPING LOGS OF SCRIPT ACTIVITIES
When Perl scripts are used as management tools, it's often necessary to
keep a log of what the script has done. Also, scripts are sometimes run
simultaneously on different machines or in separate processes on the same
machine. File locking becomes critical to maintaining a consistent log
file.
Here's a function to write a journal entry to a log file:
sub Journal
{
my ($msg) = @_;
open LOG, ">>$logDir/$logBaseName?.log" or die "Can't create log file: $!\n";
flock LOG, 2; # lock exclusive
seek LOG, 0, 2; # reposition to end in case
# someone appended while we waited
chomp $msg; # remove trailing newlines
# tab indent extra lines in the message
$msg =~ s/\n/\n\t/g;
print LOG (scalar localtime) . ": $msg\n";
flock LOG, 8; # unlock
close LOG;
}
This subroutine expects that the $logDir and $logBaseName? variables have
been set by the main program. It opens the log file for append or
creates it as needed. It then locks the file for exclusive access to prevent
two processes from writing to it at the same time. Once a lock is achieved,
the script looks to the end of the file in case another process was
written while waiting for the lock. It then writes the message, unlocks the
log file, and closes it. Every message written to the log file has the
current date and time appended to the message.
Keeping activity logs makes good sense for scripts that run unattended
or that make modifications to servers or configurations.
SCHEDULING TASKS BY MINUTES WITH WINDOWS' SCHEDULER
Windows NT and 2000 support scheduling tasks to run at a later time,
which can be handy for systems tasks such as daily backups. However, the
Windows scheduler's smallest granularity is daily tasks; it doesn't
allow
you to schedule tasks in minute or hour increments. However, there are a
few lines of Perl that can make up for this deficiency.
The following code uses the Windows scheduler to implement a task
scheduling tool that allows you to assign tasks in terms of minutes. The
code
does this by reading the current time, adding the appropriate number of
minutes, and then calling the command line tool "at" to schedule the
task.
# schedule something to run in a certain number of minutes
if ($#ARGV != 1)
{
print "Usage: $0 minutes program\n";
exit 1;
}
# find out what time it will be in x minutes
($sec, $min, $hour) = localtime(time + $ARGV[0]*60);
print `at $hour:$min $ARGV[1]`;
Now create a batch file to run the Perl script or compile it to an
executable. Then use it in batch files or other programs to create a
recurring
event that happens every few minutes.
Perl is an excellent language for creating simple tools to augment the
existing functions of the Windows OS.
PROCESSING CONFIGURATION INFO WITH BUILT-IN FACILITIES
You can make your program more flexible by using configuration options.
An easy way to process configuration information is to use Perl's
built-in facilities.
There are two simple ways to process configuration information. The
first method processes one line at a time and allows only simple value
settings. The second method allows any valid Perl code to be used.
Some very simple code will allow basic var = value type settings. Blank
lines and Perl style comments alone on a line are supported.
while(<CONFIG>)
{
chomp;
next if (/^\s*#/); # ignore comments
next if (/^\s*$/); # ignore blank lines
if (/^\s*(.*?)\s*=\s*(.*?)\s*$/)
{
$prefs{uc $1} = $2;
}
else
{
warn "Can't understand config line: $_\n";
}
}
Here's an example of the type of configuration information that could be
read.
# server to use
server = www.builder.com
ip address = 10.0.1.1
If you want to give your configuration file a well-developed language,
you can easily use Perl. To process the file, use the do command:
do "program.cfg";
If you want your configuration settings in a separate name space (or
package), you can use the following code:
{ package Config; do "program.cfg"; }
Now it's up to you to select which process configuration method best
fits your needs.
USE RECURSIVE PROCESSING TO APPLY ACTIONS TO NUMEROUS FILES
If you want to apply an action to all of the files in a directory and
all of its subdirectories, the File::Find module provides a perfect
solution.
The File::Find module exports a simple find routine, which accepts a
code reference and a list of directories. The code reference is called
for
each entry (file or directory) in the specified directories and their
subdirectories.
Before find calls the code reference, it changes to the directory being
visited. The variable File::Find::dir is set to the current directory
path. The File::Find::name variable contains the full path of the file
or
directory being visited, while $_ contains the base name.
The following example simply prints the directory structure:
use File::Find;
push @ARGV, '.' if (! @ARGV);
find \&display, @ARGV;
sub display
{
print $File::Find::name, -d $_ ? "/ <DIR>\n" : "\n";
}
The File::Find module also has a routine, finddepth, which is guaranteed
to return objects in depth first order (i.e., it returns the files in a
directory before it returns the directory). The normal find routine
returns found objects in random order.
Perl's File::Find module makes it easy to recursively visit all files in
a subdirectory tree.
LEARN AN EASIER WAY TO PARSE FILENAME COMPONENTS
Parsing filename components--directory, filename, and extension--can be
a daunting task. The File::Basename module makes this task simple.
File::Basename exports three important routines: basename, dirname, and
fileparse. The basename routine returns the filename portion, which is
everything after the last directory separator. Dirname returns the path
without the filename, which is everything before the last directory
separator. Fileparse returns an array of three elements: path, filename,
and
extension.
Since the definition of what a file extension is varies from application to application, fileparse accepts a regular expression pattern to match an extension. This pattern is matched against the basename portion of
the name to determine which part is the extension.
In this case, we use '\..*' as our extension matching regular expression. This says that everything after the first dot in the basename portion is the file extension.
The following code demonstrates each of these functions:
use File::Basename;
$path = "/some/dir/path/test.tar.gz";
$base = basename($path);
$dir = dirname($path);
print "base: $base, dir: $dir\n";
($base, $dir, $ext) = fileparse($path, '\..*');
print "dir: $dir, base: $base, ext: $ext\n";
The fileparse routine returns an array of three elements. The anonymous
array syntax makes it easy to assign these three elements to the
variables $base, $dir, and $ext, respectively.
Parsing filename components is much simpler when you use the
File::Basename module.
CREATE PERSISTENT LOCAL VARIABLES BY USING AN ENCLOSING BLOCK
Perl doesn't directly support the concept of a persistent variable that
is local to a subroutine like C and C++ do with static variables.
However, you can create a static local variable in Perl with a little
magic.
In Perl, variables exist as long as something is aware of them, such as
a subroutine. Then you have to figure out how to hide the variable so
that only the appropriate routine can see it. To do this, enclose the
variable(s) and the accessing subroutine(s) inside a block, like this:
BEGIN
{
my $counter = 1;
sub IncCounter?
{
return $counter++;
}
sub DecCounter?
{
return $counter--;
}
}
It's always a good idea to make the enclosing block a BEGIN block to
ensure that the initialization occurs before the access routines are
called.
If the variables don't need to be initialized, a plain code block is
sufficient.
Unlike C or C++, you can enclose several routines inside the block, and
all of them will have access to the shared local persistent variable or
variables.
Even though Perl doesn't directly support persistent local variables,
you see that it's easy to create them by using an enclosing block.
MIXING SHELL COMMANDS WITH PERL IN ONE FILE
One of the great things about Perl is the ability to mix shell commands
and Perl code in the same file. Perl's Shell module makes this possible.
The Shell module also demonstrates the use of the AUTOLOAD function for
defining functions as they're needed.
The Shell module doesn't actually define any of the shell commands as
functions. Instead, it uses Perl's AUTOLOAD function to pass any
undefined
functions to the operating system shell as commands.
When you "use" the Shell module, you can specify a list of commands on
the "use" line. By doing so, you automatically define these as functions
that Shell exports. This allows you to call them without the need of
parenthesis, which makes the resulting program look more like a shell
script
or batch file.
For example, compare the following two identical calls:
This code:
use Shell;
copy('*.pl', 'backup');
is equivalent to:
use Shell qw(copy);
copy '*.pl', 'backup';
The second example looks more like a batch file, while the first looks
more like a traditional programming language.
The following example combines shell commands with Perl file test
operators. It checks for two things: to see if the passed name is a file
and to
see that it has been updated in the last week. If so, it copies the file
to the backup directory.
use Shell qw(copy);
copy $ARGV[0], 'backup' if (-f $ARGV[0] && -M $ARGV[0] <= 7);
By combining the processing power of Perl with standard operating system
commands, the result is a powerful scripting tool. Using the Shell
module makes any operating system command available as though it were a built-in Perl function.
ACCESSING OPERATING SYSTEM ENVIRONMENT VARIABLES
Sometimes access to the operating system's environment is essential to
the operation or configuration of a program. Perl provides two different
ways to access environment variables. You can use the %ENV hash that is
built into Perl, or you can use the Env module to access the
environment.
Both methods provide two-way access to the environment, which means that
any changes to the Perl variable change the environment variable. These
changes will be passed on to any child processes but aren't preserved
after your script exits.
Using the %ENV hash is straightforward. The key of the %ENV hash is the
environment variable name; the hash value is the environment variable
value.
The following code displays the contents of the path environment
variable:
print "path is $ENV{path}\n";
This code creates (or overwrites) the variable PerlVar? in the
environment.
$ENV{PerlVar?} = "Perl Tech Tips";
To remove an environment variable, use delete. This example removes the
temp environment variable:
delete $ENV{temp};
The second way to access environment variables is via the Env module.
The Env module creates Perl scalar or array variables for each
environment
variable.
You can specify particular variables to be imported, or you can import
all environment variables. If you specify the variable names, they don't
need to exist yet as environment variables.
To access only the path variable, use the following:
use Env qw(PATH);
print "The path is $PATH\n";
If you don't specify the type of variable, it's assumed to be scalar. If
you specify a variable as an array (e.g., use Env qw(@PATH);), it's
split and joined automatically using $Config::Config{path_sep} as the
delimiter.
This code creates (or overwrites, if it already exists) the variable
PerlVar? in the environment.
use Env qw(PerlVar?);
$PerlVar? = "Perl Tech Tips";
Perl provides two different ways to access the environment: the %ENV
hash and the Env module. Depending on your needs and personal coding
preferences, you can use either method to access and alter environment
variables.
USE GETOPT::LONG TO PROCESS COMPLEX COMMAND LINE OPTIONS
Perl provides a variety of ways to process command line options. For
simple programs, Getopt::Std or even hand-coded tests will suffice.
However,
for more complex option processing, consider using Getopt::Long.
This example shows the basic usage of Getopt::Long. (For a complete
description, see the module's documentation.)
use Getopt::Long;
# declare default values for variables
$verbose = 0;
$all = 0;
$more = -1; # so we can detect both -more and -nomore
$diam = 3.1415;
@libs = ();
%flags = ();
$debug = -1; # test for -debug with no argument (0)
# process options from command line
# verbose will be incremented each time it appears
# either all, everything or universe will set $all to 1
# more can be negated (-nomore)
# diameter expects a floating point argument
# lib expects a string and can be repeated (pushing onto @libs)
# flag expects a key=value pair and can be repeated
# debug will optionally accept an integer (or 0 by default)
GetOptions('verbose+' => \$verbose,
'all|everything|universe' => \$all,
'more!' => \$more,
'diameter=f' => \$diam,
'lib=s' => \@libs,
'flag=s' => \%flags,
'debug:i' => \$debug);
# display resulting values of variables
print <<EOS;
Verbose: $verbose
All: $all
More: $more
Diameter: $diam
Debug: $debug
Libs: @{[ join ', ', @libs ]}
Flags: @{[ join "\n\t\t", map { "$_ = $flags{$_}" } keys
%flags ]}
Remaining: @{[ join ', ', @ARGV ]}
(ARGV contents)
EOS
The basic syntax is to pass a hash of option specifiers. Each option
specifier consists of the text to match and a reference to a variable to
set. The text can include an optional or (vertical pipe) separated list of aliases.
An option followed by + doesn't take an argument; instead, it will increment the variable by one each time it appears. An option followed by ! doesn't take an argument; however, it can be negated by preceding it
with no (e.g., -nomore for the "more" option). The associated variable is set to 1 if the option appears on the command line or 0 if the negated option appears.
An option followed by = requires an argument of type string (s), integer (i), or floating point (f). An option followed by : takes an optional argument that defaults to 0 or an empty string. If the associated variable is an array, the option can appear multiple times, and the values are pushed onto the array. If the variable is a
hash, a key=value pair is expected and inserted into the hash. GetOptions? defaults to ignore case when matching option names and will allow options to be abbreviated to the shortest unique string (e.g., -m for -more, but -di and -de are required for diameter and debug, respectively).
Here's an example command line and the resulting output:
perl getoptlong.pl -l=abc -l def -f a=b -f b=c -ev -de 5 -nomore arg
Verbose: 0
All: 1
More: 0
Diameter: 3.1415
Debug: 5
Libs: abc, def
Flags: a = b
b = c
Remaining: arg
(ARGV contents)
Perl provides a rich set of tools for handling command line arguments.
Getopt::Long provides complex option processing with minimal setup.
USE TEXT::METAPHONE FOR "SOUNDS LIKE" TEXT MATCHING
The Metaphone Algorithm provides an English phonetic pronunciation hash
for input words. In other words, it generates a code based on the way a
word sounds rather than how it's spelled. This makes "fuzzy matches"
possible between words that sound similar but are spelled somewhat
differently
(i.e., Smith vs. Smythe). The Metaphone Algorithm is similar in concept
to Soundex but is much richer in dealing with pronunciation.
The Text::Metaphone module is simple to use. You pass in a word to be
encoded, and it returns the encoded result. You can optionally limit the
length of the encoded result by passing a maximum phoneme length as the
second parameter. It's generally accepted that a four-character code is
optimal for matching most names and English words.
The following example reads one or more words from the command line and
returns the Metaphone hash for each one:
use Text::Metaphone;
for (@ARGV)
{
print "$_ = " . Metaphone($_) . "\n";
}
Because Metaphone maps all words into a relatively small hash space, it
will begin to break down if presented with a large list of similar
words.
Any application that has to match text can benefit from the Metaphone
Algorithm. A system that allows "fuzzy" text matches would be helpful
for
customer support, dictionary lookups, Web searching, or genealogy.
The Metaphone Algorithm makes it easy to match words that sound alike
but are spelled differently.
excerpt from perlretut
· no modifiers (//): Default behavior. '.' matches any character
except "\n". "^" matches only at the beginning of the string and
"$" matches only at the end or before a newline at the end.
· s modifier (//s): Treat string as a single long line. '.' matches
any character, even "\n". "^" matches only at the beginning of the
string and "$" matches only at the end or before a newline at the
end.
· m modifier (//m): Treat string as a set of multiple lines. '.'
matches any character except "\n". "^" and "$" are able to match
at the start or end of any line within the string.
· both s and m modifiers (//sm): Treat string as a single long line,
but detect multiple lines. '.' matches any character, even "\n".
"^" and "$", however, are able to match at the start or end of any
line within the string.
Most of the time, the default behavior is what is want, but "//s" and
"//m" are occasionally very useful. If "//m" is being used, the start
of the string can still be matched with "\A" and the end of string can
still be matched with the anchors "\Z" (matches both the end and the
newline before, like "$"), and "\z" (matches only the end):
utilisation de pos :
while ($x =~ /(\w+)/g) {
print "Word is $1, ends at position ", pos $x, "\n";
}
Perl tricks
Negate regexp: voir ces deux articles :
- http://groups.google.fr/group/comp.lang.perl.misc/browse_frm/thread/898dfe2929fe4eaf/581f3aa882a3e17f?hl=fr#581f3aa882a3e17f
- http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/cf66e7281514182f/7af7898218075b5b?q=negate+regex&rnum=3#7af7898218075b5b
complex map function: http://groups.google.fr/group/comp.lang.perl.misc/browse_frm/thread/576452e3c80bc537/134baa5068f6250f?q=map&rnum=7#134baa5068f6250f
Enlever un module avec un script : modrm.pl
====
|