Autres
AideEnLigne
CahierDeBrouillon
Présentation
Administration

MesLectures
[Articles publiés]
[Blog Affordance]
[LibreOffice en ligne]
[Journal d'Ophelia]
[Kim Khal]

Informations
[Rue 89]

DNS etc
[Robtex]
Logins
Votre ID: 111
Nom:
Login utilisateur
Mot de passe éditeur

PerlTips

Différence (depuis la version mineure précédente) (modification majeure)

Modifié: 1362,1384c1362,1363
--C-t to transpose chars

"When a regexp can match"

· Principle 0: Taken as a whole, any regexp will be matched at the
earliest possible position in the string.

· Principle 1: In an alternation "a|b|c...", the leftmost alternative
that allows a match for the whole regexp will be the one used.

· Principle 2: The maximal matching quantifiers "?", "*", "+" and
"{n,m}" will in general match as much of the string as possible
while still allowing the whole regexp to match.

· Principle 3: If there are two or more elements in a regexp, the
leftmost greedy quantifier, if any, will match as much of the
string as possible while still allowing the whole regexp to match.
The next leftmost greedy quantifier, if any, will try to match as
much of the string remaining available to it as possible, while
still allowing the whole regexp to match. And so on, until all the
regexp elements are satisfied.

utilisation de pos :

utilisation de pos :

Ajouté: 1388a1368,1377

Perl tricks



Negate regexp: voir ces deux articles :
#http://groups.google.fr/group/comp.lang.perl.misc/browse_frm/thread/898dfe2929fe4eaf/581f3aa882a3e17f?hl=fr#581f3aa882a3e17f
#http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/cf66e7281514182f/7af7898218075b5b?q=negate+regex&rnum=3#7af7898218075b5b

complex map function: http://groups.google.fr/group/comp.lang.perl.misc/browse_frm/thread/576452e3c80bc537/134baa5068f6250f?q=map&rnum=7#134baa5068f6250f

Enlever un module avec un script : modrm.pl
====


HANDLE ERRORS WITH WARN AND DIE

Perl provides two routines to handle different error levels in your programs: warn and die. Both routines issue a message to the standard error file handle, STDERR. In addition, die causes the program to exit (or immediately terminates an eval).

Warn is used when a nonfatal error occurs and you wish to inform the user of the problem. For example, if a user cannot open a specified file, you might use warn to display the error.

Die is used when a fatal error occurs because it will terminate the program without returning. For example, dividing by zero might generate an error message using die.

Here are examples of die and warn in action:

open LOG, ">log" or die "Can't create log file 'log': $!, stopped"; warn "Salary seems excessive\n" if ($salary > 1000000.00);

If the string you want to print does not end in a new line, then the current script name, line number, and input file line number (if any) are appended to the message. For this reason, you should end the die or warn message with something appropriate, such as ", stopped" or ", warning".

If you don't supply any error message, warn and die use the current value of the $@ variable for the message. If $@ doesn't contain any text, then warn prints "Warning: Something's wrong", and die prints "Died". Both messages are followed by the file and line number message.

Warn and die make it easy to return appropriate diagnostics in exception conditions.


USE THE CARP MODULE TO TRACK BUGS

Perl provides module writers with a set of tools that identifies the location of errors in the caller's program. The carp module works like warn and die but returns information about the caller rather than the module.

The carp module has four routines: carp, cluck, croak, and confess. The routines carp and croak return the file and line information of the caller rather than the file and line that called carp or croak.

So, if you have a routine Check() that calls Carp(), then Carp() will report that the error occurred where Check() was called, not where Carp() was called. This helps you to identify where to look in your code for the error.

If your module accepts a value that will be used as a divisor in an operation, it could check that value to ensure it's not zero. Here is an example of carp and croak:

 use Carp;

 divide(2,0); 

 sub divide
 {
         my ($top, $bottom) = @_;
         croak "Bottom is ZERO!\n" if ($bottom == 0);
         return $top / $bottom;
 }

Calling divide with a value for the second parameter of zero will return the file and line of the caller rather than the location in the divide routine. This is the desired behavior since the caller, not divide, is really at fault.

Using cluck and confess will generate a full stack trace rather than just the file and line of the caller. If you want to use cluck, you have to explicitly import it since most people do not want full stack traces on warnings. Here is an example:

 use Carp;
 b(2,0);  

 sub b { return c(@_); }
 sub c { return divide(@_); }
 sub divide
 {
         my ($top, $bottom) = @_;
         confess "Bottom is ZERO!\n" if ($bottom == 0);   

        return $top / $bottom;
 }

The carp module makes it easy for module developers to return useful diagnostic information to module users.


USE BLESS TO TURN REFERENCES INTO OBJECTS

Perl's object model is based on references that know what package they belong to as specified by the bless function. Stated simply, an object is a reference that knows what package it belongs to; a class is a package (a collection of data and routines); a method is a subroutine in a package.

The bless function accepts one or two parameters. The first is the reference to be turned into an object. The second optional parameter is the package name (or class name, if you prefer). In order to support inheritance (making a new class based on an existing one), you should always use the two-parameter form of bless.

Typically, the reference being blessed points to an anonymous hash. However, a reference to any type of basic data structure can be turned into an object by blessing it into a package.

Here are a few examples:

 package Chicken;

 sub hatch
 {
      my $class = shift;
      my $self = { First => 'Chicken' };
      return bless($self, $class);
 }

 package Egg; 

 sub lay
 {
      my $class = shift;
      my $self = { First => 'Egg' };
      return bless($self, $class);
 }

Perl's support for object-oriented programming begins with the bless function and the specification of a package for a reference.


GET FILE INFORMATION USING THE FILE TEST OPERATORS

Perl's file test operators are unary operators that return specific information about a file, including whether the file exists, the file size, and whether the file is a plain file or a directory. The file test operators can also return the modification, creation, and last access times for the file.

The file test operators can be used on either a file name or an open file handle. The modification, creation, and last access times are expressed in fractional days, based on when the program started. Positive numbers refer to time before the script started, while a negative value indicates time since the script started.

Here's an example that prints the time of modification for each file specified on the command line:

 for (@ARGV)
 {
         print "$_: ", scalar localtime(time -24*60*60 * -M $_), "\n";
 }

The following program prints the size of each specified file:

 for (@ARGV)
 {
         print "$_: ", -s $_, "\n";
 }

Combining these with printf generates a simple directory listing:

 for (@ARGV)
 {
        printf "%s %10d %-s\n", scalar localtime(time -24*60*60 * -M
 $_), -s $_, $_;

 }

Perl's file test operators make it simple to retrieve and process information about files.


CREATE ABBREVIATION LISTS WITH TEXT::ABBREV

When parsing commands, it's sometimes helpful to be able to abbreviate a command. You can use the Text::Abbrev module to find the shortest unique word to describe the command.

If you provide Text::Abbrev with a list of words, the module will return a list of all possible unambiguous abbreviations. The abbreviations will be unique with regard to the other words in the list. The returned list is in the form of a hash, with the abbreviation as the key and the original word as the value of each element.

You can check the abbreviation list against the input command by looking up the word in the generated hash. If a corresponding hash entry exists, then the command is valid and the hash value is the full-text command word. If the hash entry does not exist, the command is either invalid or ambiguous.

Here is some example Text::Abbrev code:

 use Text::Abbrev;

 %h = abbrev qw(left right forward backward lift backdoor);

 for (sort keys %h)
 {
          print "$_ = $h{$_}\n";
 }

Using Text::Abbrev simplifies the work required for unique command processing.


USE TEXT::SOUNDEX FOR INEXACT NAME MATCHING

The soundex algorithm is designed to map names to a four-character value based on the sound of the word when spoken by an English speaker. The idea is to be able to enhance name matching to include "fuzzy matches" with names that sound similar but are spelled differently (e.g., Smith and Smythe).

To use the soundex tool, simply pass in a list of strings and the code returns a list of soundex codes. The routine will accept a single string and return a single code, or you can pass in a list and get a list of codes back.

Here is a code example:

 use Text::Soundex;

 for (@ARGV)
 {
         $code = soundex $_;
         print "$_ = $code\n";
 }

Soundex will work not only on names but also on any text elements. Keep in mind that it maps an arbitrarily large set of words into a finite space (only four characters). This means that the algorithm will begin to degrade rapidly if it's used on a vast set of very similar words.

Almost any application that has to match names can greatly benefit from the soundex algorithm. A system that allows fuzzy matches against names is a great advantage for customer support.


EXPAND TABS TO A VARIABLE NUMBER OF SPACES

Perl is exceptionally good at text processing jobs. One common task it performs is translating tab characters into an appropriate number of spaces to maintain text alignment.

The standard method for converting tabs into spaces is to count the number of characters to the left of the tab, and then replace the tab with a sufficient number of spaces to move the position to the next even multiple of the tab stop value. With Perl, you can do this quite easily.

Use the substitution pattern-matching operator to find the first tab on the line. Then use the length function to count the number of nontab characters before the tab. Replace the tab with an appropriate number of spaces by calculating the remainder after dividing the length by the tab stop value.

Although it's tempting to use the /g modifier with the substitution operator, it will not work. You must repeat the entire pattern match each time so that you know the length of the string using the newly substituted spaces.

The /e substitution modifier specifies that the right-hand side of the substitution is code rather than a simple string. This allows the necessary calculations to be completed without variable substitution in the replacement string.

Here is an example:

 # expand tabs to spaces
 # accept tabstop on command line or default to 4
 $tabstop = $ARGV[0] + 0 || 4;

 while (<STDIN>)
 {
     1 while(s/^(.*?)(\t+)/$1 . ' ' x ($tabstop * length($2) - length($1)
 % $tabstop)/e);

     print;
 }

</nowiki>

Perl's ability to process text using regular expressions makes tasks like expanding tabs very simple.


EXPAND EXPRESSIONS INSIDE STRINGS

Sometimes string expressions need to be more complex than a simple scalar value allows. Perl allows you to create arbitrarily complex expressions inside strings and print their values. This can be particularly handy when using here documents.

There are only three special characters that Perl recognizes inside a double-quoted string: $, @, and \. However, recall Perl's syntax for reference variables, using brackets around the expression. For example:

 $aref = [ 3, 2, 1 ];
 print "Countdown: @{$aref}";

Combining this with an inline anonymous array results in the syntax for embedding arbitrarily complex expressions. Simply put the anonymous array inside the brackets:

 print "Countdown: @{[3, 2, 1]}";

If the expression results in a single scalar value, you can use the scalar version ${\()} as follows:

 print "The current index is $i, the next index is ${\($i + 1)}";

You are not limited to just hard-coded values. You can use any expression, including function calls. You can also use this technique to great advantage in here documents:

 print "Today is ${\scalar localtime}";

 print "YTD earnings = ${\AddCommas?($YTDEarnings)}";

 SendMail?(@targetEmails, <<EOT);
 From: admin@localhost
 To: ${\join ', ', @targetEmails}
 Date: ${\scalar gmtime}

 The following people are disk hogs:
 @{[grep {DiskUsage?($_) > 1000000000} @users]}

 - The Admin
 EOT

Perl's extended syntax for reference variables makes it easy to embed complex expressions into double-quoted strings and here documents.


BUILD QUERY ENGINES IN PERL

Perl's ability to parse text can be greatly enhanced if the user is allowed to specify queries directly in Perl's own language. This technique allows complex processing of log files and other program output that goes well beyond what could be done using simple regular expressions.

Below is a program that will perform arbitrarily complex queries on a comma-separated value (CSV) file such as those produced by spreadsheet programs. It assumes that the first line of the file contains a list of field names and that the remaining lines are data. For simplicity, the program reads input from standard in:

 use strict;
 use Text::ParseWords?;

 # get header line
 my $header = <STDIN>;

 # get field names
 my @fieldNames = quotewords(",", 0, $header);

 # strip leading & trailing spaces (if any), replace internal spaces withunderscore
 @fieldNames = map {s/^\s+//; s/\s+$//; s/\s+/_/g; $_} @fieldNames;

 my @fields; # where field data will be stored

 # create access functions
 for (my $i = 0; $i < @fieldNames; $i++)
 {
     no strict 'refs';

     my $name = $fieldNames[$i];

     eval "sub $name () { \$fields[$i] }";     # create access subroutine
     *{lc $name} = *{uc $name} = $name; # make upper and lower case aliases
 }

 # compile user's query
 my $code = "sub Query { " . join(" and ", @ARGV) . " }";
 eval $code.1 or die "Error: $@\nIn query string: $code\n";

 # print the header line (field names)
 print $header;

 # process each line
 while (<STDIN>)
 {
     @fields = quotewords(",", 0, $_);
     print if Query();
 }

Using Perl's ability to compile and execute code on the fly makes it easy to write and use powerful filter programs.


BUILD A SIMPLE WEB SERVER WITH THE HTTP::DAEMON MODULE

When your project needs a simple interface, consider building a Web server into your application. Building a basic Web server is fairly simple in Perl using the HTTP::Daemon module. This module does most of the heavy lifting associated with getting requests and sending replies.

Below is the code for a complete Web server. It sends a response indicating the type of request, the path requested, the request headers, and the current time:

 use HTTP::Daemon;
 use HTTP::Status;
 use HTTP::Response;

 my $daemon = new HTTP::Daemon LocalAddr? => 'localhost', LocalPort? =>1234;

 print "Server open at ", $daemon->url, "\n";

 while (my $connection = $daemon->accept)
 {
     print Identify($connection) . " Connected\n";
     while (my $request = $connection->get_request)
     {
         print Identify($connection) . " Requested ${\$request->method} ${\$request->url->path}\n";
         if ($request->method eq 'GET')
         {
             $response = HTTP::Response->new(200);
             $response->content(<<EOT);
Simple Web server

You requested path ${\$request->url->path} using protocol ${\$request->protocol} via method ${\$request->method}

Your header information was:
${\join '
', split(/\n/, $request->headers_as_string())}

I'm a simple server so that's all you are going to get!

Generated ${\scalar localtime}

 EOT
             $connection->force_last_request; # allows us to service multiple clients
             $connection->send_response($response);
         }
         else
         {
             $connection->send_error(RC_FORBIDDEN)
         }
     }
     print "${\$connection->peerhost}:${\$connection->peerport} Closed\n";
     $connection->close;
     undef($connection);
 }

 sub Identify
 {
     my ($conn) = @_; 

     return "${\$conn->peerhost}:${\$conn->peerport}";
 }

By simply replacing the code of the innermost while loop, you can make your Web server do anything you want. For example, you could return the status of your program, or get a picture from a Web camera.

Adding a Web interface to your Perl script is straightforward when you use the HTTP::Daemon module.


INSERTING COMMAS IN TEXT LISTS

Perl is an excellent tool for processing text data. However, sometimes you need to format a multi-item list following proper English syntax rules. The basic rules are as follows:

1. Single-item lists require no processing.

2. Two-item lists require that the word "and" be inserted between the items.

3. Three-item and longer lists require a comma between each element and the word "and" before the last element.

4. If one or more items in the list already contain a comma, then a semicolon should be used as the separator, rather than a comma.

Turning these rules into Perl is a simple process. Below is a routine that will return a properly formatted string given a list of one or more items (an array):

 sub EnglishCommas?
 {
     @_ == 0 and return '';  # nothing ventured, nothing gained
     @_ == 1 and return $_[0];   # just return the item
     @_ == 2 and return join ' and ', @_;   # just add 'and'

     my $sep = grep(/,/, @_) ? '; ' : ', ';   # determine separator
     return join $sep, @_[0..($#_-1)], "and $_[-1]";   # return the separated list
 }

Passing this routine, the following input list results in the output shown below it:

 Input:

 * 'Dilbert' 

 * 'Bilbo', 'Frodo'

 * 'The butcher', 'the baker', 'the candlestick maker'

 * 'Pimento cheese', 'peanut butter and jelly', 'egg salad', 'bacon, lettuce, and tomato'

 Output:

 * Dilbert

 * Bilbo and Frodo

 * The butcher, the baker, and the candlestick maker

 * Pimento cheese, peanut butter and jelly, egg salad, and bacon, lettuce, and tomato

As shown, generating lists that follow complex English grammar rules is simple using just the basic Perl functions.


USE A HASH TO IDENTIFY ELEMENTS

When you need to find the unique elements in a list, consider using a hash to identify them.

For instance, perhaps you have several e-mail lists, and you need to send a message to everyone on each list. However, there is a good chance that the lists have overlapping addresses, but you don't want to send duplicate e-mails to the same address.

You can use the following code to process the lists and determine only the unique elements:

 @admin = qw(bob joe fred bill tim);
 @operators = qw(bob fred tim travis nancy);
 @powerusers = qw(fred bill jane sally roger);
 @users = qw(tom john ralph nancy alex kelly);

 print "Original users: @{[sort(@admin, @operators, @powerusers, @users)]}\n\n";
 %before = (); 
 @unique = sort grep { ! $before{$_}++ } @admin, @operators, @powerusers, @users;

 print "Unique users: @unique\n";

This program outputs the following:

Original users: alex bill bill bob bob fred fred fred jane joe john kelly nancy nancy ralph roger sallytim tim tom travis

Unique users: alex bill bob fred jane joe john kelly nancy ralph roger sally tim tom travis

If you need to process each element as it is seen rather than wait until the entire list is available, you can check and process elements inside a for loop, as follows (use the same lists of users as before):

 print "Unique users:\n";
 %before = ();
 for $user (@admin, @operators, @powerusers, @users)
 {
      next if ($before{$user}++);
      # do whatever processing here
 }

Using a hash is an easy way to find the unique elements of a list.


FIND NON-DUPLICATE ELEMENTS IN A LIST

When you need to compare two lists and show the differences between the two lists, consider using a hash to identify and remove common elements.

By using a hash to identify the elements in one list, it's easy to identify which elements from the second list are common to the first.

The following code compares two arrays and prints those elements from the second array that aren't in the first array.

 @a1 = qw(one two three four five six);
 @a2 = qw(one three five seven nine);

 @found{@a1} = ();

 for $item (@a2)
 {
    print "$item\n" if (! exists $found{$item});
 }

An example of when this sort of code might come in handy is when you need to compare a list of images against a list of products for a Web site.

A further enhancement is to test for duplicate items in the second list. By adding the following line to the bottom of the for loop, you'll eliminate duplicate items in list two:

 $found{$item} = 1;

Finding the unique items in one list but not in another is easy when you use a hash.


STORING MULTIPLE VALUES PER HASH ENTRY

When you need to store multiple values for each key in a hash, store an array reference instead of a scalar value.

If you're storing data that has text for a key, a hash is the obvious choice. You can only store one scalar value in a hash element. However, an array reference is a scalar value and can therefore be stored in a hash element.

For example, consider the case of storing the zip codes associated with a city. Since a single city may have multiple zip codes, you might consider storing a reference to an array of all applicable zip codes as the value of the hash element. Look at this example:

 while (<DATA>)
 {
 chomp;

 ($zip, $state, $city) = split / /; 

 push @{$zipcodes{"$city, $state"}}, $zip;
 }

 for $city (sort keys %zipcodes)
 {
 print "$city: @{$zipcodes{$city}}\n"
 }

 __DATA__
 40502 KY LEXINGTON
 40503 KY LEXINGTON
 40504 KY LEXINGTON
 40505 KY LEXINGTON
 40511 KY LEXINGTON
 40513 KY LEXINGTON
 40514 KY LEXINGTON
 40515 KY LEXINGTON
 40516 KY LEXINGTON
 40517 KY LEXINGTON
 40202 KY LOUISVILLE
 40213 KY LOUISVILLE
 40214 KY LOUISVILLE
 40215 KY LOUISVILLE
 40217 KY LOUISVILLE
 40220 KY LOUISVILLE
 40222 KY LYNDON
 40241 KY LYNDON
 40242 KY LYNDON

By always storing an array reference, even when there is only one element, you can simplify the code. Otherwise, you would have to test the hash element to see if it is a reference or a scalar and then act on it appropriately.

Use an array reference to store multiple values in a single hash element.


PASSING PARAMETERS IN HASHES

If your subroutine has a number of optional parameters, consider passing those parameters as key-value pairs in a hash.

For instance, suppose you have a subroutine that accepts a URL and some optional parameters such as a debug flag, a timeout, a maximum depth, and a verbosity level. In a traditional procedural programming language, you might expect to see something like this:

func(url, debug, timeout, depth, verbose);

In this model, to pass a value for the depth parameter, you must pass a value for debug and timeout parameters. In Perl, you can combine all of the optional parameters into a hash, and only pass those that interest you.

Here's the same function in Perl, with optional parameters passed in a hash:

 func("http://www.builder.com", debug=>1, verbose=>0, depth=>2);

 sub func
 {
     my ($url, %params) = @_; 
     print "URL: $url\n";
     for $item (sort keys %params)
     {
         print "$item: $params{$item}\n";
     }
 }

Using the hash parameter passing method allows you to add new parameters without changing existing code. This method is also useful when you don't need or want to know what the parameters are. For example, if your "pass-through" function is a wrapper for another function or object. Your code can allow the caller to pass in parameters that will be sent along to the wrapped function without ever knowing what they are.

Passing parameters to subroutines is much easier using hashes.


MATCHING ACROSS LINES IN REGULAR EXPRESSIONS

The ability to have regular expressions match across line boundaries can come in handy.

Perl provides two modifiers, /s and /m, that have to do with how regular expressions match line boundaries. The /s modifier allows the period (.) special character to match a newline (which usually it would not).

The /m modifier changes the meaning of the caret (^) and dollar sign ($) special characters. Normally, caret (^) matches at the beginning of the string and dollar sign ($) would match the end of the string. With /m in effect, caret (^) and dollar sign ($) match next to a newline.

Note that these two modifiers aren't mutually exclusive. One changes the behavior of the period (.) character, and the other changes the behavior of the caret (^) and dollar sign ($) characters. Both can be used together if needed.

The following code does a poor job of removing XML tags from a document:

 undef $/;     # slurp entire file at once
 ($file = <STDIN>) =~ s/<.*?>//gs;
 print $file;

If you were looking for POD documentation markers, you could use the /m option to have caret (^) match at the beginning of each line.

 undef $/;     # slurp entire file at once

 $file = <STDIN>;

 while ($file =~ /^=(\w+)(.*)$/gm)
 {
      print "command: $1, options: $2\n";
 }

When /m is in effect, you can use the \A and \Z special characters to match the beginning or end of the string, respectively. When /m is not in effect, \A and caret (^) match at the same point, as do dollar sign ($) and \Z.

When you need to match across line boundaries, consider whether /s, /m, or both would best serve your needs.


HOW TO CREATE FILTERS USING PERL

A common use for Perl is to create filters. A filter is a program that reads data, processes it, and outputs the processed information. Typical filter behavior is to accept filenames on the command line or to read from standard input if no files are specified.

Perl provides the special construct "while (<>)" for this case. This tells Perl: If there are command line parameters, read from standard input until the end of the file. If there are command line parameters, assume each one is a filename and try to open and read them until the end of the file.

Perl doesn't automatically expand wildcards in the filenames. If your shell program doesn't do this for you, you must do it explicitly at the start of the program. The default Windows command interpreter, for example, doesn't expand file patterns on the command line.

The following is a basic shell for a filter program. It includes the necessary steps to expand file patterns on the command line. This filter is a poor man's grep. It accepts a pattern and, optionally, a list of files on the command line and prints lines that match from the files it processes.

$pattern = shift || die "No pattern\n"; @ARGV = map {glob} @ARGV; # expand file patterns while (<>) { print "$ARGV ($.): $_" if (/$pattern/o); close ARGV if eof; }

Note the use of the $ARGV variable in the print statement; Perl stores the name of the current file in this variable. Perl also sets the special variable "$." to the current line number.

Filters that read from files or from standard input make it easy to chain simple functions together to form a complex program on the command line.


KEEPING LOGS OF SCRIPT ACTIVITIES

When Perl scripts are used as management tools, it's often necessary to keep a log of what the script has done. Also, scripts are sometimes run simultaneously on different machines or in separate processes on the same machine. File locking becomes critical to maintaining a consistent log file.

Here's a function to write a journal entry to a log file:

 sub Journal
 {
      my ($msg) = @_;

      open LOG, ">>$logDir/$logBaseName?.log" or die "Can't create log file: $!\n";

      flock LOG, 2;     # lock exclusive
     seek LOG, 0, 2;     # reposition to end in case
 # someone appended while we waited

      chomp $msg;          # remove trailing newlines

      # tab indent extra lines in the message
      $msg =~ s/\n/\n\t/g;

      print LOG (scalar localtime) . ": $msg\n";

      flock LOG, 8;     # unlock

      close LOG;
 }

This subroutine expects that the $logDir and $logBaseName? variables have been set by the main program. It opens the log file for append or creates it as needed. It then locks the file for exclusive access to prevent two processes from writing to it at the same time. Once a lock is achieved, the script looks to the end of the file in case another process was written while waiting for the lock. It then writes the message, unlocks the log file, and closes it. Every message written to the log file has the current date and time appended to the message.

Keeping activity logs makes good sense for scripts that run unattended or that make modifications to servers or configurations.


SCHEDULING TASKS BY MINUTES WITH WINDOWS' SCHEDULER

Windows NT and 2000 support scheduling tasks to run at a later time, which can be handy for systems tasks such as daily backups. However, the Windows scheduler's smallest granularity is daily tasks; it doesn't allow you to schedule tasks in minute or hour increments. However, there are a few lines of Perl that can make up for this deficiency.

The following code uses the Windows scheduler to implement a task scheduling tool that allows you to assign tasks in terms of minutes. The code does this by reading the current time, adding the appropriate number of minutes, and then calling the command line tool "at" to schedule the task.

 # schedule something to run in a certain number of minutes

 if ($#ARGV != 1)
 {
      print "Usage: $0 minutes program\n";
      exit 1;
 }

 # find out what time it will be in x minutes
 ($sec, $min, $hour) = localtime(time + $ARGV[0]*60);

 print `at $hour:$min $ARGV[1]`;

Now create a batch file to run the Perl script or compile it to an executable. Then use it in batch files or other programs to create a recurring event that happens every few minutes.

Perl is an excellent language for creating simple tools to augment the existing functions of the Windows OS.


PROCESSING CONFIGURATION INFO WITH BUILT-IN FACILITIES

You can make your program more flexible by using configuration options. An easy way to process configuration information is to use Perl's built-in facilities.

There are two simple ways to process configuration information. The first method processes one line at a time and allows only simple value settings. The second method allows any valid Perl code to be used.

Some very simple code will allow basic var = value type settings. Blank lines and Perl style comments alone on a line are supported.

 while(<CONFIG>)
 {
      chomp;
      next if (/^\s*#/);     # ignore comments
      next if (/^\s*$/);     # ignore blank lines
      if (/^\s*(.*?)\s*=\s*(.*?)\s*$/)
      {
           $prefs{uc $1} = $2;
      }
      else
      {
           warn "Can't understand config line: $_\n";
      }
 }

Here's an example of the type of configuration information that could be read.

 # server to use
 server = www.builder.com

 ip address = 10.0.1.1

If you want to give your configuration file a well-developed language, you can easily use Perl. To process the file, use the do command:

 do "program.cfg";

If you want your configuration settings in a separate name space (or package), you can use the following code:

 { package Config; do "program.cfg"; }

Now it's up to you to select which process configuration method best fits your needs.


USE RECURSIVE PROCESSING TO APPLY ACTIONS TO NUMEROUS FILES

If you want to apply an action to all of the files in a directory and all of its subdirectories, the File::Find module provides a perfect solution.

The File::Find module exports a simple find routine, which accepts a code reference and a list of directories. The code reference is called for each entry (file or directory) in the specified directories and their subdirectories.

Before find calls the code reference, it changes to the directory being visited. The variable File::Find::dir is set to the current directory path. The File::Find::name variable contains the full path of the file or directory being visited, while $_ contains the base name.

The following example simply prints the directory structure:

 use File::Find;   

 push @ARGV, '.' if (! @ARGV);   

 find \&display, @ARGV;   

 sub display   
 {   
      print $File::Find::name, -d $_ ? "/ <DIR>\n" : "\n";    
 }   

The File::Find module also has a routine, finddepth, which is guaranteed

to return objects in depth first order (i.e., it returns the files in a directory before it returns the directory). The normal find routine returns found objects in random order.

Perl's File::Find module makes it easy to recursively visit all files in

a subdirectory tree.


LEARN AN EASIER WAY TO PARSE FILENAME COMPONENTS

Parsing filename components--directory, filename, and extension--can be a daunting task. The File::Basename module makes this task simple.

File::Basename exports three important routines: basename, dirname, and fileparse. The basename routine returns the filename portion, which is everything after the last directory separator. Dirname returns the path without the filename, which is everything before the last directory separator. Fileparse returns an array of three elements: path, filename, and extension.

Since the definition of what a file extension is varies from application to application, fileparse accepts a regular expression pattern to match an extension. This pattern is matched against the basename portion of the name to determine which part is the extension.

In this case, we use '\..*' as our extension matching regular expression. This says that everything after the first dot in the basename portion is the file extension.

The following code demonstrates each of these functions:

 use File::Basename;   

 $path = "/some/dir/path/test.tar.gz";   

 $base = basename($path);   
 $dir = dirname($path);   
 print "base: $base, dir: $dir\n";   

 ($base, $dir, $ext) = fileparse($path, '\..*');   
 print "dir: $dir, base: $base, ext: $ext\n";   

The fileparse routine returns an array of three elements. The anonymous array syntax makes it easy to assign these three elements to the variables $base, $dir, and $ext, respectively.

Parsing filename components is much simpler when you use the File::Basename module.


CREATE PERSISTENT LOCAL VARIABLES BY USING AN ENCLOSING BLOCK

Perl doesn't directly support the concept of a persistent variable that is local to a subroutine like C and C++ do with static variables. However, you can create a static local variable in Perl with a little magic.

In Perl, variables exist as long as something is aware of them, such as a subroutine. Then you have to figure out how to hide the variable so that only the appropriate routine can see it. To do this, enclose the variable(s) and the accessing subroutine(s) inside a block, like this:

 BEGIN   
 {   
      my $counter = 1;   

      sub IncCounter?   
      {   
           return $counter++;   
      }   

      sub DecCounter?   
      {   
           return $counter--;   
      }   
 }   

It's always a good idea to make the enclosing block a BEGIN block to ensure that the initialization occurs before the access routines are called. If the variables don't need to be initialized, a plain code block is sufficient.

Unlike C or C++, you can enclose several routines inside the block, and all of them will have access to the shared local persistent variable or variables.

Even though Perl doesn't directly support persistent local variables, you see that it's easy to create them by using an enclosing block.


MIXING SHELL COMMANDS WITH PERL IN ONE FILE

One of the great things about Perl is the ability to mix shell commands and Perl code in the same file. Perl's Shell module makes this possible.

The Shell module also demonstrates the use of the AUTOLOAD function for defining functions as they're needed.

The Shell module doesn't actually define any of the shell commands as functions. Instead, it uses Perl's AUTOLOAD function to pass any undefined functions to the operating system shell as commands.

When you "use" the Shell module, you can specify a list of commands on the "use" line. By doing so, you automatically define these as functions

that Shell exports. This allows you to call them without the need of parenthesis, which makes the resulting program look more like a shell script or batch file.

For example, compare the following two identical calls:

This code:

 use Shell;   

 copy('*.pl', 'backup');   

is equivalent to:

 use Shell qw(copy);   

 copy '*.pl', 'backup';   

The second example looks more like a batch file, while the first looks more like a traditional programming language.

The following example combines shell commands with Perl file test operators. It checks for two things: to see if the passed name is a file and to see that it has been updated in the last week. If so, it copies the file

to the backup directory.

 use Shell qw(copy);   

 copy $ARGV[0], 'backup' if (-f $ARGV[0] && -M $ARGV[0] <= 7);   

By combining the processing power of Perl with standard operating system

commands, the result is a powerful scripting tool. Using the Shell module makes any operating system command available as though it were a built-in Perl function.


ACCESSING OPERATING SYSTEM ENVIRONMENT VARIABLES

Sometimes access to the operating system's environment is essential to the operation or configuration of a program. Perl provides two different

ways to access environment variables. You can use the %ENV hash that is built into Perl, or you can use the Env module to access the environment.

Both methods provide two-way access to the environment, which means that

any changes to the Perl variable change the environment variable. These changes will be passed on to any child processes but aren't preserved after your script exits.

Using the %ENV hash is straightforward. The key of the %ENV hash is the environment variable name; the hash value is the environment variable value.

The following code displays the contents of the path environment variable:

 print "path is $ENV{path}\n";   

This code creates (or overwrites) the variable PerlVar? in the environment.

 $ENV{PerlVar?} = "Perl Tech Tips";   

To remove an environment variable, use delete. This example removes the temp environment variable:

 delete $ENV{temp};   

The second way to access environment variables is via the Env module. The Env module creates Perl scalar or array variables for each environment variable.

You can specify particular variables to be imported, or you can import all environment variables. If you specify the variable names, they don't

need to exist yet as environment variables.

To access only the path variable, use the following:

 use Env qw(PATH);   

 print "The path is $PATH\n";   

If you don't specify the type of variable, it's assumed to be scalar. If

you specify a variable as an array (e.g., use Env qw(@PATH);), it's split and joined automatically using $Config::Config{path_sep} as the delimiter.

This code creates (or overwrites, if it already exists) the variable PerlVar? in the environment.

 use Env qw(PerlVar?);   

 $PerlVar? = "Perl Tech Tips";   

Perl provides two different ways to access the environment: the %ENV hash and the Env module. Depending on your needs and personal coding preferences, you can use either method to access and alter environment variables.


USE GETOPT::LONG TO PROCESS COMPLEX COMMAND LINE OPTIONS

Perl provides a variety of ways to process command line options. For simple programs, Getopt::Std or even hand-coded tests will suffice. However, for more complex option processing, consider using Getopt::Long.

This example shows the basic usage of Getopt::Long. (For a complete description, see the module's documentation.) use Getopt::Long; # declare default values for variables $verbose = 0; $all = 0; $more = -1; # so we can detect both -more and -nomore $diam = 3.1415; @libs = (); %flags = (); $debug = -1; # test for -debug with no argument (0) # process options from command line # verbose will be incremented each time it appears # either all, everything or universe will set $all to 1 # more can be negated (-nomore) # diameter expects a floating point argument # lib expects a string and can be repeated (pushing onto @libs) # flag expects a key=value pair and can be repeated # debug will optionally accept an integer (or 0 by default) GetOptions('verbose+' => \$verbose, 'all|everything|universe' => \$all, 'more!' => \$more, 'diameter=f' => \$diam, 'lib=s' => \@libs, 'flag=s' => \%flags, 'debug:i' => \$debug); # display resulting values of variables print <<EOS; Verbose: $verbose All: $all More: $more Diameter: $diam Debug: $debug Libs: @{[ join ', ', @libs ]} Flags: @{[ join "\n\t\t", map { "$_ = $flags{$_}" } keys %flags ]} Remaining: @{[ join ', ', @ARGV ]} (ARGV contents) EOS

The basic syntax is to pass a hash of option specifiers. Each option specifier consists of the text to match and a reference to a variable to

set. The text can include an optional or (vertical pipe) separated list of aliases.

An option followed by + doesn't take an argument; instead, it will increment the variable by one each time it appears. An option followed by ! doesn't take an argument; however, it can be negated by preceding it with no (e.g., -nomore for the "more" option). The associated variable is set to 1 if the option appears on the command line or 0 if the negated option appears.

An option followed by = requires an argument of type string (s), integer (i), or floating point (f). An option followed by : takes an optional argument that defaults to 0 or an empty string. If the associated variable is an array, the option can appear multiple times, and the values are pushed onto the array. If the variable is a hash, a key=value pair is expected and inserted into the hash. GetOptions? defaults to ignore case when matching option names and will allow options to be abbreviated to the shortest unique string (e.g., -m for -more, but -di and -de are required for diameter and debug, respectively).

Here's an example command line and the resulting output:

 perl getoptlong.pl -l=abc -l def -f a=b -f b=c -ev -de 5 -nomore arg   

 Verbose:        0   
 All:            1   
 More:           0   
 Diameter:       3.1415   
 Debug:          5   
 Libs:           abc, def   
 Flags:          a = b   
                b = c   

 Remaining:      arg   
   (ARGV contents)   

Perl provides a rich set of tools for handling command line arguments. Getopt::Long provides complex option processing with minimal setup.


USE TEXT::METAPHONE FOR "SOUNDS LIKE" TEXT MATCHING

The Metaphone Algorithm provides an English phonetic pronunciation hash for input words. In other words, it generates a code based on the way a word sounds rather than how it's spelled. This makes "fuzzy matches" possible between words that sound similar but are spelled somewhat differently (i.e., Smith vs. Smythe). The Metaphone Algorithm is similar in concept to Soundex but is much richer in dealing with pronunciation.

The Text::Metaphone module is simple to use. You pass in a word to be encoded, and it returns the encoded result. You can optionally limit the

length of the encoded result by passing a maximum phoneme length as the second parameter. It's generally accepted that a four-character code is optimal for matching most names and English words.

The following example reads one or more words from the command line and returns the Metaphone hash for each one:

 use Text::Metaphone;   

 for (@ARGV)   
 {   
     print "$_ = " . Metaphone($_) . "\n";   
 }   

Because Metaphone maps all words into a relatively small hash space, it will begin to break down if presented with a large list of similar words.

Any application that has to match text can benefit from the Metaphone Algorithm. A system that allows "fuzzy" text matches would be helpful for customer support, dictionary lookups, Web searching, or genealogy.

The Metaphone Algorithm makes it easy to match words that sound alike but are spelled differently.


excerpt from perlretut

 ·  no modifiers (//): Default behavior.  '.' matches any character
           except "\n".  "^" matches only at the beginning of the string and
           "$" matches only at the end or before a newline at the end.

 ·   s modifier (//s): Treat string as a single long line.  '.' matches
           any character, even "\n".  "^" matches only at the beginning of the
           string and "$" matches only at the end or before a newline at the
           end.

 ·   m modifier (//m): Treat string as a set of multiple lines.  '.'
           matches any character except "\n".  "^" and "$" are able to match
           at the start or end of any line within the string.

 ·   both s and m modifiers (//sm): Treat string as a single long line,
           but detect multiple lines.  '.' matches any character, even "\n".
           "^" and "$", however, are able to match at the start or end of any
           line within the string.

       Most of the time, the default behavior is what is want, but "//s" and
       "//m" are occasionally very useful.  If "//m" is being used, the start
       of the string can still be matched with "\A" and the end of string can
       still be matched with the anchors "\Z" (matches both the end and the
       newline before, like "$"), and "\z" (matches only the end):

  utilisation de pos :
           while ($x =~ /(\w+)/g) {
               print "Word is $1, ends at position ", pos $x, "\n";
           }

Perl tricks

Negate regexp: voir ces deux articles :
  1. http://groups.google.fr/group/comp.lang.perl.misc/browse_frm/thread/898dfe2929fe4eaf/581f3aa882a3e17f?hl=fr#581f3aa882a3e17f
  2. http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/cf66e7281514182f/7af7898218075b5b?q=negate+regex&rnum=3#7af7898218075b5b

complex map function: http://groups.google.fr/group/comp.lang.perl.misc/browse_frm/thread/576452e3c80bc537/134baa5068f6250f?q=map&rnum=7#134baa5068f6250f

Enlever un module avec un script : modrm.pl ====