Perl Notes (5.1+) - Linux | Key: , [optional] | Full function reference: http://perldoc.perl.org/index-functions.html
Principles: There's more than one way to do it! / do what I mean!
GENERAL
=======
use <string>;
- Pragmas, intepreted before script runs
- Use 'strict' and 'warnings' to enforce good syntax
- Also used to import libraries
use feature ':5.10';
- Enables features from Perl 5.1 (including say)
use utf8;
- Guess
- Generally pragmas in lowercase, user libs start with uppercase
Simple template:
#!/usr/bin/perl
use Modern::Perl 2011;
use autodie;
!#/path/to/perl/bin (usually /usr/bin/perl or /usr/local/bin/perl)
- Add on first line to select intepreter
- Statements terminated by semi-colon
- Function parantheses are generally optional but may break function combinations if not used
AMOUNT CONTEXT
==============
Perl has three amount contexts in which functions and variables are used, which can alter behaviour.
Void - Call function with no return value assignment
Scalar Context (sigil $) - Assign to scalar, pass to func expecting scalar
List Context (sigil @) - Assign to arrays, use in list pass to func expecting list
- Lists propagate list context to expressions they contain
So any scalar value (even if in an array / hash) must use $ (use of {} []) will show what type the scalar is being taken out of
VALUE CONTEXT
=============
Perl has three values contexts in which functions and variables are used, which can alter behaviour
Numeric - Will evaluate strings as 0
- To force, add 0 to variable using + operator
String
- To force, concatentate empty string '' to variable using . operator
Boolean
- To force, add double negation to variable using the ! operator
Using string or numeric operators orc donitional statements will define the current context
TERMINAL
=======
perl -v
- Check version
perl <script>
- Run perl script
CPANMINUS (CPAN client for downloading modules / dependancies)
(Comprehensive Perl Archive Network)
sudo apt-get install cpanminus
- Install client
cpanm [Module::Name]
- Install module
echo "PERL_CPANM_OPT=\"--local-lib=~/perl5\"" >> ~/.bashrc
- ALlow use of ~/perl5 for modules without errors
cpanm --info [Module::Name]
- Get info on a module
MANUAL INSTALL OF MODULES
=====
cd <expanded directory>
perl Makefile.PL
make
make test
make install
* PERLDOC *
===========
sudo apt-get install perl-doc
- Install perl doc
perldoc perldoc
- View perldoc info
perldoc perltoc
- View documentation index
perldoc perlfaq
- View FAQ
perldoc <module|core>
- View module information
perldoc perlop
- View operators info
perldoc perlsyn
- View symbolic operators / syntax
perldoc perldiag
- View warning messages info
perldoc perlfunc
- View functions
perldoc -f <function>
- View function use info
perldoc -q <keyword>
- Search FAQ
perldoc -v <var>
- Lookup builtin variables
* VARIABLES *
=============
- Identifier limited by char set, must start with letter or underscore and may not have spaces
- Variable value held in cariable container
- Container cannot change type (scalar, array, hash)
- Value can
- Scalar
- Value = undef or number or string or reference to another variable
- Numbers
- Integer: 12, 34, 76
- FLoat: 0.03, 1.234, 3.1415
- Scientific: 1.34n12, 6.22n2
- Binary: 0b10101010, 0b11100, 0b010111
- Octal: 032424, 0123, 03
- Hexadecimal: 0x20, 0xA4, 0x12
- Underscore can be used a formatting seperator igrnore by calculations
- Declared but undefined scalar variables contain undef, which avaluates as false in boolean context
- my $<identifier> = <value>
- Defines a scalar (in current scope)
- Identified with $<varName>
- Weakly typed (treated as number / string dependant on context)
- Warnings will be raised if a string is treated as a number when not possible
- Single quoted string is literal (except \' (in string) and \\' (at end of string)
- Can also use q<char>Stuff here<sameChar>; to avoid need for escaping
- Double quoted string is allows interpolation of variables and control characters
- Can also use qq<char>Stuff here<sameChar>; to avoid need for escaping
- heredoc syntax (quote determines single / double behavior - default is double)
my $var = <'|"><string><'|">
dqwjdqoiw
qwidqwpodkqwpo
<sameString> //This MUST NOT be indented
- Single characters from unicode sets can be identified using \x{<hexCode?>}
- with use charnames ':full'; can alse us \n{<FULL NAME>}
- Full string definition operators set
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
`` qx{} Command yes*
qw{} Word list no
// m{} Pattern match yes*
qr{} Pattern yes*
s{}{} Substitution yes*
tr{}{} Transliteration no (but see below)
<<EOF here-doc yes*
* unless the delimiter is ''.
- Booleans
- No boolean type
- undef / 0 / "" / "0" evaluates false, else true
- Usually function return 1 as true and "" as false
- Array (array of scalars, Zero indexed, ordered)
- Identified with @<varName>
- Create (in current scope) using comma delimited scalars in brackets
e.g. @myArray = (1, 2, 3, "four"); //Can have trailing comma on last element
- To retrieve or set (scalar) element use form $<arrName>[<elementKey>]
- Use negative elementKeys to count from end of array
- To retrieve or set (list of) elements use form @<arrName>{<elementKeys>}
- $#<varName>
- Identifies the last populated index (i.e. -1 if unset, 0 if one element)
- Can use to change size of array
e.g. $#myArray = 5 //will make the array extend / shrink to 6 elements
- scalar @myArray
- Returns length of array
- Hash (array of scalars, associative array, unordered)
- Identified with %<varName>
- Create (in current scope) using comma delimited key => scalars in brackets
e.g. %myHash = ('key1' => 4, 'key2' => 'smeg', 'monkey' => 6);
- Can actually replace => with commas (synonym, then must have even elements)
- To retrieve or set (scalar) element use form $<hashName>{<elementKey>}
- To retrieve or set (list of) elements use form @<hashName>{<elementKeys>}
- Lists
- Not a variable, ephermeral which can be assigned to arrays and hashes
- (), ('one', 1, 'two', '2'), ('one' => 1, 'two' => 2) are all lists
- Lists can not be nested, they would be flattened
- Can concatenate into lists
e.g. ('value', 1, @list, (4, 5))
- () on RHS = undef in scalar context and an empty list in list context
- () on LHS imposes list context
- References
- Perl's way of making more complex data structures
- A scalar reference to a list (array or hash) as they can normally only stored scalar values
- Represent with pre-backslash
e.g. $arrayReference = \@array;
- Use with braces (or $$ if 'not ambiguous)
e.g. ${ scalarReference } / $$scalarReference
- To get reference array values
- ${ $arrayReference }[<key>]
- $arrayReference->[<key>]
- $arrayReference{<key>}{<key>} (for array or array references)
- To get reference hash values
- ${ $hashReference }{<key>}
- $hashReference->{<key>}
- $hashReference{<key>}{<key>} (for hash of hash references)
- Can use [] to delimit anonymous arrays and {} for anonymous hashes
e.g.
my %account = (
"number" => "31415926",
"opened" => "3000-01-01",
"owners" => [
{
"name" => "Philip Fry",
"DOB" => "1974-08-06",
},
{
"name" => "Hubert Farnsworth",
"DOB" => "2841-04-09",
},
],
);
- $aref2 = $aref, copies by reference NOT VALUE
- Use $aref2 = [@{$aref1}]; #for arrays
- Use $href2 = {%{$href1}}; #for hashes
Use ref for boolean is a ref check
- Scalars, arrays and heaps can have the same variable name (but best to avoid of course)
- Can be interpolated in double quoted strings (all types) (escape $, @, % with \)
- Using $/@/% determines scalar or list context which will alter affect of operations
e.g. my @array = 'scalar'; //('scalar')
e.g. my $scalar = (1, 2, 3); //3 - the last scalar in list
e.g. my $scalar = @array; //sets $scalar to length of array
e.g. $array[3] = @array; //Will set as length of @array
my @<arrayName> = %<hashName>
- Will create array with all key / values from hash as alternating values
$_ //The default scalar operator (like using 'it')
- Auto created when iterating over an array / hash (can also be explicitly defined)
- Many functions will use this var as arg if arg omitted
- To avoid overwrite within a block explicitly define my $_ in while / whatever conditions
$@ //The Default array operator (like using 'them')
- Availble to subs automatically, subs will use this var as arg if arg omitted in contacined funcs
@ARGV array holds arguments passed at command line (outside of functions)
- e.g. perl <script> these are arguments //@ARGV = ('these', 'are', 'arguments')
- Script use this var as arg if arg omitted in contacined funcs
$0 holds name of currently executing script
* OPERATORS *
=============
- The usual candidates (numbers): =, +, -, *
- Unitary operators (numbers): +, --, +=, -=, /= and *=
- Concatenation (strings): .
- Comparison (numbers): <, >, <=, >=, ==, !=, <=>
- Comparison (strings): lt, gt, le, ge, eq, ne, cmp,
- String multiplier (strings): x
- Range Operator (numbers): .. e.g. 1 .. 10 -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
- Control Characters (only work in double quoted strings)
- \n = new line
- \t = tab
- \r = carriage
- \f = formfeed
- \b = backspace
- triple-dot: ... will be parsed as a complete statement / through exceptions if left during run
Special -X file checks
e.g. if (-e $fileName) ....
-r File is readable by effective uid/gid.
-w File is writable by effective uid/gid.
-x File is executable by effective uid/gid.
-o File is owned by effective uid.
-R File is readable by real uid/gid.
-W File is writable by real uid/gid.
-X File is executable by real uid/gid.
-O File is owned by real uid.
-e File exists.
-z File has zero size (is empty).
-s File has nonzero size (returns size in bytes).
-f File is a plain file.
-d File is a directory.
-l File is a symbolic link.
-p File is a named pipe (FIFO), or Filehandle is a pipe.
-S File is a socket.
-b File is a block special file.
-c File is a character special file.
-t Filehandle is opened to a tty.
-u File has setuid bit set.
-g File has setgid bit set.
-k File has sticky bit set.
-T File is an ASCII text file (heuristic guess).
-B File is a "binary" file (opposite of -T).
-M Script start time minus file modification time, in days.
-A Same for access time.
-C Same for inode change time (Unix, may differ for other
* GENERAL FUNCTIONS *
=====================
defined <variable>
- Returns true if value is undef else false
- 0 is equiv to undef as is ''
fc <variable>
- Casefolding (for making an argument case insensitive in things like custom sorts)
- Need use feature 'fc'; / Perl v5.16 to use this :'(
looks_like_number <var> //From Scalar:Util
- Returns true if value would be treated as a number by Perl
scalar <array|hash>
- Treat value as a scalar
-- alpha
print <expression>
- Standard Output (evaluates arguments as list)
e.g. print 3, @array, 'three'; //3ArrayElementsthree
length <scalar>
- Length of value of variable
substr <value>, <offset>[, <length>[, <replacement>]]
- Return substring (zero indexed)
- Use negative start value to count back from end of string
- Can also use to manipulate strings
e.g. substr($a, 11, 4) = "Perl"; //Length states how many characters from start to replace with assigned string
uc <string>
- Returns string in uppercase
int (<scalar>)
- Trim fractional part of scalar value
say <scalar(s)>
- Standard output followed by newline
sort (<array|hash>) {
- Sorts the array (numbers before letters, lower before upper case)
- Can be redefined (return -1 for $a before $b, 1 for $a after $b and 0 for equality):
e.g. Sort in order of keys in a hash
sort {
if ($type{$b} > $type{$a}) { return 1; }
if ($type{$b} < $type{$a}) { return -1; }
return 0;
} keys %type
OR could use <=> operator ( / cmp operator for strings)
- Can also define function then use that for sort
e.g. sort <functionIdentifier> <array>
- e.g. case insensitive array value sort
@sortedArray = sort { fc($a) cmp fc($b)} @arrayToSort
keys <array|hash>
- Return array of keys
pop @<identifier>
- Extracts and removes last element of array
push @<identifier>, <value>[, ...];
- Append values to end of array
shift @<identifier>
- Extracts and returns first eleemnt of array
my $var = shift
- Example of using shift to work against $_ in lexical scope
unshift @<identifier>, <value>[, ...];
- Insert new elements at the begining of the array
splice(@<identifier>, <startIndex>, <count>, <valueToAdd>[, ...]);
- Returns / removed specified slice of array and replaces with listed values
join(<delimiter>, <array>)
- Makes array into scalar value
reverse(<list>)
- Returns list in reverse order
scalar reverse(<list>)
- Returns conacatenated list characters in reverse order
map { #Do stuff with $_ } @array
- Applies function to all elements of array, returning new array
grep { #condition based on $_ } @array;
- Applies filter to array, returning new array
- Can be used to search arrays
chomp <scalar|list>
- Remove characters in $/ (the default input seperator) in scalar or all scalars in list
- Returns number of characters removed
stat [<arg>]
- Returns 13 element list of file properties
- See: http://perldoc.perl.org/functions/stat.html
my $from_utf8 = decode('utf8', $data);
my $to_latin1 = encode('iso-8859-1', $string);
- Example switching between character sets
* FILE OPERATIONS *
===================
open <filehandle>, [<mode>,] <filename>
- Attempts to open <file>, assigns to <filehandle>
- <filehandle> is a scalar identifier
- This will fail if file permissions do not permit access
- Will attempt to create file if it does it exist
- Use redirect operators before <file> for overwrite / amend, e.g.
- open($overwriteHandle, '>', 'file.txt'); //Overwrite contents
- open($amendHandle, '>>', 'file.txt'); //Amend to contents
(Note that second and third arguments can be concatenated
- Also have
- +< read / write (will not create)
- +> read / write (will create / truncate)
- +>> read / write (append)
- True or false + error in $!
- To use a particular character set use e.g. '<:utf8' as the mode
<<filehandle>> in scalar context will return a line from file
- e.g. $line = <$amendHandle>;
readline <filehandle>
- Return line
<<filehandle>> in array context will return an array of lines from file
- e.g. @lines = <$amendHandle>;
- OR
for $line (<$amendHandle>) {
#Loop every line in $amendHandle
}
print <filehandle> <string>;
- Write a string to an opened file
close <filehandle>
- Guess what this does
- Perl will auomatically do this at end of script / new open on same file
eof <filehandle>
- Returns true or false dependant on being at end of file (usefull in while loop conditions)
Smallest way to read line by line:
while (<$<fileHandle>>) {
#Do stuff with $_
}
- STDIN, STDOUT and STDERR are all available globally
- To read single line
my $var = <STDIN>;
- TO wait for enter
<STDIN>;
- Reading from <> reads from STDIN or (default if present) from files named in arguments to Perl script call. Use ctrl-D to finish input
* USER DEFINED FUNCTIONS *
==========================
To define:
sub <functionName> {
#statements
}
To use:
<functionName>();
- Old Perl required & before function call so may still see this around
- Brackets are actually optional
return <var>;
- Return value to calling code (stop executing sub)
return <var> if wantarray;
- wantarray flag for if function was called in list context
- Can pass as many scalar arguments to function call as we like as a list
- Scalar = list of one
- Hash of N = list of 2N
- Passed scalars will then be available within the sub via the @_ array
- If you don't expressly use the return statement, the sub returns the result of the last statement.
* CONTROL STATEMENTS *
======================
- Comparisons have scalar context
- Comparisons shortcircuit (they will only evaluate until boolean result is not known)
- 0 values, '-' and '', empty lists and undef evaluate to false
- If statements:
if (<comparison>) {
# Do stuff if <comparison> == true
}
<command> if <comparison>; //Postfix form
- If-else statements:
if (<comparison>) {
# Do stuff if <comparison> == true
} else {
# Do stuff if <comparison> == false
}
- If-elsif-else statements
if (<comparison>) {
# Do stuff if <comparison> == true
} elsif (<comparison2>) {
# Do stuff if <comparison2> == true
# Can have 1...* elsif blocks
} else {
# Do stuff if all above comparisons == false
}
- Unless statements:
unless (<comparison>) {
# Do stuff if <comparison> == false
}
<command> unless <comparison>; //Postfix form
- Unless-else statements:
unless (<comparison>) {
# Do stuff if <comparison> == false
} else {
# Do stuff if <comparison> == true
}
- Ternary statement / operator (may be nested)
(<comparison> ? <runIfTrue> : <runIfFalse>)
- While statements:
while (<comparison>) {
# Do stuff / loop while <comparison> == true
}
- While statement using each
while ( my($key, $value) = each %hash) {
#Do stuff with each key and value
}
- If using <> / <STDIN> then need to type Ctrl/z in Windows or Ctrl/d in Linux to finish entry
- Until statements:
until (<comparison>) {
# Do stuff / loop while <comparison> == false
}
- Do-While statements:
do {
#Do stuff, repeat while <comparision> == true
} while (<comparision>);
- Do-Until statements:
do {
#Do stuff, repeat while <comparision> == false
} until (<comparision>);
- For loops:
for (<varDeclaration>; <comparison>; <executeOnLoop>) {
#Do stuff, while <comparison> == true
#var should be declared with my uto make local only
}
- Can actually replace for with foreach
- Must use lexical iterator to avoid collisons
- Foreach loops:
foreach my $<identifier> ( @<identifier>|<list> ) {
#Do stuff, $<identifier> will be populated with each value in turn
}
- Can actually replace foreach with for
- Use (0 .. $#array) to jey array indices
- If the iterator is not defined $_ will be auto populated instead
e.g.
foreach (@array) {
print $_;
}
- Single statement shorthand
- print $_ foreach @array;
- Loop control
- Name the loop with an allcaps label, then can use next <label> and last <label> to skip to next or break loop
e.g.
LABEL: foreach (@array) {
#Do stuff
next LABEL if <condition>;
last LABLE if <condition>;
}
* SYSTEM CALLS *
================
- When script execution ends a status word is returned, highest 8 bits (out of 16) contain return code
- Usually return code of 0 means no errors
system <list>
- Use to call another program. list command line parts
- Returned assigned value (or auto $?) is th return word
- Use $? >> 8 to shift out non- return code bits
`<command>`
- Run command on command line (returns output in list or string dependant on context)
exit [<int>]
- Exit script, returning int (0-255), default 0
die <list|string>
- Stop script with optional exit message(s)
- Can use $! in string to output error if used in form: <failed function> or die "this is error: $!";
UNPACKING ARGUMENTS
- @_ should be unpacked inside function it has been passed to
- Can use 'shift' (no arg) to auto shift into vars one by one
- Best = require hash to be passed then assign @_ to a hash
* REGULAR EXPRESSIONS *
=======================
- Match regex: =~ (opposite is !~)
- e.g. ('aabb' =~ m/aab+/) //true
- Used in scalar returns true or false, in list returns list of matches
- Can capture subexpressions using () around part of regex, this will make captured string available as var
- i.e. 'aabbc' =~ m/(aab+)(c)/ would yield $1 = 'aabb', $2 = c
- Add i to make case insensitive
- Add x to allow spaces and comments in regex (which will not be interpreted)
- Can also name captures (by including '?<name>' in subexpression parenthesis start) which will store them in the %+ hash
e.g. @a = 'first second' =~ /(?<first>\w*)\s(?<last>\w*)/
- Would yield $+{first} = 'first', $+{last} = 'last'
- Can replace delimiters (/) with any other character
- Replace regex: =~
- e.g. $var =~ s/regex/replacestring;
- Will replace matches with specified string in variable
- Can use sub-expressions here too, i.e. capture in regex then use $1, $2 or %+ etc in replacestring
- Add g to replace all occurances
- Add i to make case insensitive
- Add x to allow spaces and comments in regex (which will not be interpreted)
- Can replace delimiters (/) with any other character
- or: A series of statements separated by 'or' will continue until you hit one that works, or returns a true value
- So can be used to join comparisons and functions e.g. ($var eq $var2) or ($var[2] == 7) or print 'monkeypoo';
- and: t evaluates your chain of statements, but stops when one of them doesn't work.
- e.g. open (LOG, "log.file") and print "Logfile is open!\n";
- Can assign using list notation
e.g. my ($time, $hours, $minutes, $seconds) = ($1, $2, $3, $4);
- When using subexpressions Perl gives a shortcut:
my ($time, $hours, $minutes, $seconds) = ($string =~ /((\d{1,2}):(\d{2}):(\d{2}))/);
* MODULES *
===========
- Can be included in another Perl file
- uses .pm extension
- Same syntax as .pl files
- Must return a true value at end
export PERL5LIB=/path/to/modules:$PERL5LIB
- Add directory of modules to include path
require FOLDER::FILE
- Include the FILE in FOLDER in defined PERL5LIB directory from above
PERL PACKAGES / NAMESPACES
- Namespaces in which subroutines can be declared
- Default pacakge is 'main'
- Subroutines with same name in different packages do no conflict
- Function from name space called as NAME::SPACE::funcName()
package NAME::SPACE
- Switch namespace / package
- :: used a namespace seperator
PACKAGE::functionName()
- Example call
MODULES VS PACKAGES
OBEY (otherwise things will become rather confusing)
- A Perl script (.pl file) must always contain exactly zero package declarations.
- A Perl module (.pm file) must always contain exactly one package declaration, corresponding exactly to its name and location.
e.g. module Demo/StringUtils.pm must begin with package Demo::StringUtils.
* CGI (Common gateway interface) *
==================================
Commonly used to run Perl with an http server.
Header settings:
Content-type: String A MIME string defining the format of the file being returned. Example is Content-type:text/html
Expires: Date String The date the information becomes invalid. This should be used by the browser to decide when a page needs to be refreshed. A valid date string should be in the format 01 Jan 1998 12:00:00 GMT.
Location: URL String The URL that should be returned instead of the URL requested. You can use this filed to redirect a request to any file.
Last-modified: String The date of last modification of the resource.
Content-length: String The length, in bytes, of the data being returned. The browser uses this value to report the estimated download time for a file.
Set-Cookie: String Set the cookie passed through the string
ENV variables:
CONTENT_TYPE The data type of the content. Used when the client is sending attached content to the server. For example file upload etc.
CONTENT_LENGTH The length of the query information. It's available only for POST requests
HTTP_COOKIE Return the set cookies in the form of key & value pair.
HTTP_USER_AGENT The User-Agent request-header field contains information about the user agent originating the request. Its name of the web browser.
PATH_INFO The path for the CGI script.
QUERY_STRING The URL-encoded information that is sent with GET method request.
REMOTE_ADDR The IP address of the remote host making the request. This can be useful for logging or for authentication purpose.
REMOTE_HOST The fully qualified name of the host making the request. If this information is not available then REMOTE_ADDR can be used to get IR address.
REQUEST_METHOD The method used to make the request. The most common methods are GET and POST.
SCRIPT_FILENAME The full path to the CGI script.
SCRIPT_NAME The name of the CGI script.
SERVER_NAME The server's hostname or IP Address
SERVER_SOFTWARE The name and version of the software the server is running.
Get request from CGI
$query = new CGI;
Get value from request
$query->param('<index'>);
Get array of indexes in GET array
$query->param;
Force useful debug info:
use CGI::Carp qw(fatalsToBrowser);
REFERERENCES
http://qntm.org/files/perl/perl.html
http://perldoc.perl.org/perlvar.html
http://onyxneon.com/books/modern_perl/modern_perl_letter.pdf
http://www.cpan.org/modules/index.html
http://www.tutorialspoint.com/perl/perl_cgi.htm
==========
- regex are delimited by / e.g. /i(\sa)?m\sa\sregex/
- Perl Compatible Regular Expressions (PCRE) of course!
- Using /g will remember how far though regex last match was when used in loop for capture
- Regexs are greedy (will match max string posisble)
- Use .*? instead of .* to reduce affect
split(<regex>, <scalar>, [limit])
- Return array of scalars from splitting given scalar by regex
- Limit value will limit max size of array, subsequent values will be discarded