A brief Perl primer


Roles of Perl

Perl is useful for a number of things. Thanks to a strong developer community, there is a central repository called CPAN through which software modules for tasks from fuzzy logic to talking to Oracle are distributed. A rich (if cryptic) pattern matching/transformation/extraction sublanguage for regular expressions (regexes) provides powerful text manipulation capabilities. Perl datatypes are free-form, automatically sized to meet needs. Perl's syntax is very loose, allowing easy migration from many different languages. Integration with other languages is fairly easy, using low-level tools such as libperl or XS, or using modules such as Inline::Java . Perl is portable -- Pure Perl code is generally portable across Unices, Windows, MacOS, and other platforms. Memory management in Perl is very easy -- the Perl garbage collector handles most tasks for you, only needing help to break circular references. Perl is also semi-interpreted, giving it better performance than purely interpreted languages.

Styles of Perl Programming

#!/usr/bin/perl
print "Hello, world!\n";


Perl started life inspired by C, shell, various Unix utilities such as sed and awk, and some other languages, and has continued to borrow useful/beautiful constructs from others throughout its evolution. It has been said that Perl is a language in which people can write their own language, and this is equally valid as criticism and praise. It is very possible to use Perl primarily as a better shell script language than shell -- shell programming is subject to the whims of various Unix vendors, not just in the details of the shell, but also in how sed, awk, and the like work. Portable shell programming is very difficult. Many programmers have reported being happy moving from languages such as LISP to Perl. I don't know LISP well enough to comment on this :) Due to the popularity of C, C++, and Java, many people's Perl resemble those languages structures and flow. Perl is, in my eyes, a better C than C, and similarly for those other languages, and you can see a lot of C in my Perl. There are, however, advantages to picking up Perl idioms over time. Unlike a lot of these other languages, you don't need to know much Perl to do something useful with it. Perl is also capable of Object-Oriented Programming, with most CPAN modules offering only object interfaces.

Perl Data Types

Perl has 5 essential data types.

The accumulator

Perl has a special scalar that's accessed in many functions if no argument is provided to them. It has the name $_. The perl idiom above, while(<FILEHANDLE>) { ... } assigns the value of each readline() into $_ for each iteration of the loop. print, without arguments, prints out the accumulator. Perl is very variadic, which might bother C folk. Still, the use of this idiom can result in less code on a line, reducing visual clutter. Other useful functions that can use the accumulator include: split, chomp, (regular expressions), map, grep, foreach

references

Perl doesn't have pointers (they don't work well with garbage collection). Instead, Perl has references, which are almost as powerful, and considerably less dangerous. Scalars hold references. To take a reference to something, prefix it with a backslash (\), and to dereference, prefix it with the type you expect to get back from it (it thus should have a double prefix). In some cases (such as passing through multiple references at once), it's necessary to be more verbose, and access it through the symbol table for the type you desire, using the syntax (symbol){reference}, nesting as needed. This last tip shouldn't be needed too often.
$a = "Foo";
$b = \$a;
print $$b . "\n";
$$b = "Bar";
print "Hey! Weird way to do it gives " . ${$b} . "\n";
print "Now A is $a\n";
This gives the output
Foo
Hey! Weird way to do it gives Bar
Now A is Bar

definedness

Variables can have the special value undef. This can be used to catch things that don't have a value yet or have had their value removed. The operators defined and undef are used to test and set this value. Notice that undefining a value in a hash does not remove it's key. Use delete to remove both. With the right options, Perl will complain when undef is attempted to be used in some ways.

truth

Perl's notion of truth is simple. All strings (apart from the empty string) are true. The number 0 is false, and all other numbers are true. Things not defined are false.

Subroutines, scopes, and other flow control

Perl has subroutines, and a number of other flow control mechanisms. The more useful/commonly used of them are described below. The C keywords continue and break are replaced by next and last, respectively. They also can take an argument, which, if the loop is labeled, lets them specify which loop they're talking about. This lets you break out of nested loops without a lot of ugly logic.

Scopes

Without additional qualification, all variables in Perl are global. There are two types of scoping, done with the qualifiers my and local. my is equivilent to C's scoping (except with garbage collecting, so it's safe to return something made with my. local saves the old value of the variable away, and arranges for it to be restored when the current block exits. Generally, use of local should be discouraged unless it's being used to modify Perl's builtin globals for just a moment. There is another scoping mechanism that's almost exclusively used by the OO facilities that we won't discuss here. Note that the my and local keywords can be positioned flexibly, such as in the variable slot of foreach.

Subroutines

Perl starts execution at the top of a program (outside a subroutine) and progresses downwards. It's possible (and definitely recommended) to organize programs through subroutines. Subroutines recieve their arguments through the @_ array (a cousin to $_, which has it's own set of array-oriented functions that use it if nothing else is specified). Variadic functions are natural and easy in Perl. Unfortunately, if you want named, checked parameters, you need to manage that yourself. Recursion is safe in Perl (don't forget to use my). Here's a sub that takes two arguments, does subtraction, and returns the results
print my_subtract(10,3) . "\n";

sub my_subtract
{
my ($base, $decrement) = @_;
if(!( defined($base) && defined($decrement)))
	{die "my_subtract() passed bad arguments!\n";}
return($base - $decrement);
}

That script prints 7. Note that there's no restriction on where subroutines can be in your program -- all global subroutines declared this way are defined when a Perl script is compiled, before it is run. It's possible to take a reference to a sub or a code block.
$foo = \&my_subtract;

print &$foo(3,1) . "\n";
It is possible to return a number of parameters from a function. However, note that returning an array or a hash flattens it and returns its elements one-by-one. If you're only returning one array/hash, it's best to return it as the last parameter -- otherwise it's best to return a reference.

OO in Perl

Objects in Perl are handled through scalars, and implemented through a special namespace mechanism. Typically, an object is declared in a seperate file, which is loaded through the use keyword. For further examples in this section, below is an object declaration that we'll assume lies in AutonDemo.pm
#!/usr/bin/perl -w

package AutonDemo;

sub new
{
my $self = # This is taking a reference to an anonymous hash
        { # This is the syntax for initializing hashes.
        reads => 0,
        writes => 0,
        value => undef,
        karma => 2
        };
bless $self;
return $self;
}

sub getvalue
{
my $self = shift; # Shifts @_
$self->{reads}++;
if(rand(10) < 3)
        {
        print "You feel ill\n";
        $self->{karma}--;
        }
if($self->{karma} < 0)
        {$self->{value} = 0;}
return $self->{value};
}

sub setvalue
{
my($self,$value) = @_;
$self->{writes}++;
if(rand(10) < 1)
        {
        print "You feel safe\n";
        $self->{karma}++;
        }
$self->{value} = $value;
}

sub report
{
print "Value is " . $self->{value} . "\n";
print "Reads/Writes " . $self->{reads} . " " . $self->{writes} . "\n";
print "Karma is " . $self->{karma} . "\n";
}
1;
This class implements a logged variable. Here's a regression test for that class
#!/usr/bin/perl -w
use AutonDemo;

$foo = AutonDemo::new();

$foo->setvalue(4);
foreach $UNUSED (0 .. 10)
        {
        print $foo->getvalue . "\n";
        }
$foo->report();


Regexes in Perl

Regular expressions are an important feature in Perl. Unfortunately, they're also hard to read (in any language). There are many places you may have used some subset or relative of the regular expression language. DOS and shell wildcards are cousins to the language, offering a minute subset of what it is capable of. C offers POSIX regular expressions, unfortunately with a cumbersome API. Sed and Awk implement variants on POSIX regular expressions. Perl's regexes are the result of gradual expansion on POSIX standard regexes over several years, and have proved popular enough that Perl-style regexes have been backported to C (via the pcre package) and into some other languages (such as Python). Perl regexes are normally specified via enclosing forward slashes (/), and applied via the =~ operator. This syntax is used for matching. Perl Regexes are also capable of substitution, using the same operator but prepending a s to the slashes and using three of them. Parentheses inside Perl regexes are used to capture content (escape any parentheses that you're trying to match with a backslash). This content is stored in the variables $1, $2, and upwards, each corrisponding to the position of one of the sets of parentheses. If a regex appears alone on a line, it is applied to the accumulator.
$foo = "My name is Pat, I think";
$foo =~ /is\s([^,]+)/;
$name = $1;
print "I found the name $name\n";
$foo =~ s/$name/Andrew/;
print $foo . "\n";
Note that if a match or a substitution fails, it will return false. It's possible to add modifiers after the closing slash of a regex. Perl's regex style is documented in the perlre manpage. If you're not all that familiar with regexes, it might be helpful to find a book about them (O'Reilly makes a good one, alternatively, O'Reilly's Programming Perl has a section on them that's entirely Perl-centric)

Cool Perl tricks

Perl has a number of cool, quirky, and useful features

My recommended style

My style is, to a certain degree, the style accepted by the Perl community at large. Example of a HERE document
$foo = <<EOHERE;
Meow
This is multiline
I think

EOHERE
print $foo;

Resources

You may find the following helpful

Weaknesses of Perl

Perl has a few weaknesses