2/12/2013 - 4:42 PM

Perl Notes (5.1+) - Linux | Key: <required>, [optional] | Full function reference:

Perl Notes (5.1+) - Linux | Key: , [optional] | Full function reference:

Principles: There's more than one way to do it! / do what I mean!


use <string>;
  - Pragmas, intepreted before script runs
  - Use 'strict' and 'warnings' to enforce good syntax
  - Also used to import libraries

use feature ':5.10';
  - Enables features from Perl 5.1 (including say)

use utf8;
  - Guess  

- Generally pragmas in lowercase, user libs start with uppercase

Simple template:
  use Modern::Perl 2011;
  use autodie;

!#/path/to/perl/bin (usually /usr/bin/perl or /usr/local/bin/perl)
  - Add on first line to select intepreter

- Statements terminated by semi-colon

- Function parantheses are generally optional but may break function combinations if not used


Perl has three amount contexts in which functions and variables are used, which can alter behaviour.

  Void - Call function with no return value assignment
  Scalar Context (sigil $) - Assign to scalar, pass to func expecting scalar
  List Context (sigil @) - Assign to arrays, use in list pass to func expecting list
    - Lists propagate list context to expressions they contain

  So any scalar value (even if in an array / hash) must use $ (use of {} []) will show what type the scalar is being taken out of


Perl has three values contexts in which functions and variables are used, which can alter behaviour
  Numeric - Will evaluate strings as 0
    - To force, add 0 to variable using + operator
    - To force, concatentate empty string '' to variable using . operator
    - To force, add double negation to variable using the ! operator

Using string or numeric operators orc donitional statements will define the current context

perl -v
  - Check version

perl <script>
  - Run perl script

CPANMINUS (CPAN client for downloading modules / dependancies)
(Comprehensive Perl Archive Network)

  sudo apt-get install cpanminus
    - Install client

  cpanm [Module::Name]
   - Install module

  echo "PERL_CPANM_OPT=\"--local-lib=~/perl5\"" >> ~/.bashrc
    - ALlow use of ~/perl5 for modules without errors

  cpanm --info [Module::Name]
    - Get info on a module

cd <expanded directory>
perl Makefile.PL
make test
make install

  sudo apt-get install perl-doc
    - Install perl doc

  perldoc perldoc
    - View perldoc info

  perldoc perltoc
    - View documentation index

  perldoc perlfaq
    - View FAQ

  perldoc <module|core>
    - View module information

  perldoc perlop
    - View operators info

  perldoc perlsyn
    - View symbolic operators / syntax

  perldoc perldiag
    - View warning messages info

  perldoc perlfunc
    - View functions

  perldoc -f <function>
    - View function use info

  perldoc -q <keyword>
    - Search FAQ

  perldoc -v <var>
    - Lookup builtin variables


- Identifier limited by char set, must start with letter or underscore and may not have spaces
- Variable value held in cariable container
  - Container cannot change type (scalar, array, hash)
  - Value can

- Scalar
  - Value = undef or number or string or reference to another variable
  - Numbers
    - Integer: 12, 34, 76
    - FLoat: 0.03, 1.234, 3.1415
    - Scientific: 1.34n12, 6.22n2
    - Binary: 0b10101010, 0b11100, 0b010111
    - Octal: 032424, 0123, 03
    - Hexadecimal: 0x20, 0xA4, 0x12
    - Underscore can be used a formatting seperator igrnore by calculations
    - Declared but undefined scalar variables contain undef, which avaluates as false in boolean context
  - my $<identifier> = <value>
    - Defines a scalar (in current scope)
  - Identified with $<varName>
  - Weakly typed (treated as number / string dependant on context)
  - Warnings will be raised if a string is treated as a number when not possible
  - Single quoted string is literal (except \' (in string) and \\' (at end of string)
    - Can also use q<char>Stuff here<sameChar>; to avoid need for escaping
  - Double quoted string is allows interpolation of variables and control characters
    - Can also use qq<char>Stuff here<sameChar>; to avoid need for escaping
  - heredoc syntax (quote determines single / double behavior - default is double)
      my $var = <'|"><string><'|">
<sameString> //This MUST NOT be indented
  - Single characters from unicode sets can be identified using \x{<hexCode?>}
    - with use charnames ':full'; can alse us \n{<FULL NAME>}
  - Full string definition operators set
    Customary    Generic		Meaning	     	Interpolates
	''	 		q{}	      	Literal		  	no
	""			qq{}	    Literal		  	yes
	``			qx{}	    Command		  	yes*
				qw{}	    Word list		no
	//	 		m{}	   		Pattern match	yes*
				qr{}	    Pattern		  	yes*
		 		s{}{}	    Substitution	yes*
				tr{}{}	  	Transliteration	no (but see below)
    <<EOF                 	here-doc        yes*
	* unless the delimiter is ''.

- Booleans
  - No boolean type
  - undef / 0 / "" / "0" evaluates false, else true
  - Usually function return 1 as true and "" as false

- Array (array of scalars, Zero indexed, ordered)
  - Identified with @<varName>
  - Create (in current scope) using comma delimited scalars in brackets
    e.g. @myArray = (1, 2, 3, "four"); //Can have trailing comma on last element
  - To retrieve or set (scalar) element use form $<arrName>[<elementKey>]
    - Use negative elementKeys to count from end of array
  - To retrieve or set (list of) elements use form @<arrName>{<elementKeys>}
  - $#<varName> 
    - Identifies the last populated index (i.e. -1 if unset, 0 if one element)
    - Can use to change size of array
        e.g. $#myArray = 5 //will make the array extend / shrink to 6 elements
  - scalar @myArray 
    - Returns length of array

- Hash (array of scalars, associative array, unordered)
  - Identified with %<varName>
  - Create (in current scope) using comma delimited key => scalars in brackets   
    e.g. %myHash = ('key1' => 4, 'key2' => 'smeg', 'monkey' => 6);
      - Can actually replace => with commas (synonym, then must have even elements)
  - To retrieve or set (scalar) element use form $<hashName>{<elementKey>}
  - To retrieve or set (list of) elements use form @<hashName>{<elementKeys>}

- Lists
  - Not a variable, ephermeral which can be assigned to arrays and hashes
  - (), ('one', 1, 'two', '2'), ('one' => 1, 'two' => 2) are all lists
  - Lists can not be nested, they would be flattened
  - Can concatenate into lists
    e.g. ('value', 1, @list, (4, 5))
  - () on RHS = undef in scalar context and an empty list in list context
  - () on LHS imposes list context

- References
  - Perl's way of making more complex data structures
  - A scalar reference to a list (array or hash) as they can normally only stored scalar values
  - Represent with pre-backslash
    e.g. $arrayReference = \@array;
  - Use with braces (or $$ if 'not ambiguous)
    e.g. ${ scalarReference } / $$scalarReference
  - To get reference array values
    - ${ $arrayReference }[<key>]
    - $arrayReference->[<key>]
    - $arrayReference{<key>}{<key>} (for array or array references)
  - To get reference hash values
    - ${ $hashReference }{<key>}
    - $hashReference->{<key>}
    - $hashReference{<key>}{<key>} (for hash of hash references)
  - Can use [] to delimit anonymous arrays and {} for anonymous hashes
      my %account = (
        "number" => "31415926",
        "opened" => "3000-01-01",
        "owners" => [
      			"name" => "Philip Fry",
      			"DOB"  => "1974-08-06",
      			"name" => "Hubert Farnsworth",
      			"DOB"  => "2841-04-09",
  - $aref2 = $aref, copies by reference NOT VALUE
    - Use $aref2 = [@{$aref1}]; #for arrays
    - Use $href2 = {%{$href1}}; #for hashes
  Use ref for boolean is a ref check

- Scalars, arrays and heaps can have the same variable name (but best to avoid of course)
- Can be interpolated in double quoted strings (all types) (escape $, @, % with \)
- Using $/@/% determines scalar or list context which will alter affect of operations
  e.g. my @array = 'scalar'; //('scalar')
  e.g. my $scalar = (1, 2, 3); //3 - the last scalar in list
  e.g. my $scalar = @array; //sets $scalar to length of array
  e.g. $array[3] = @array; //Will set as length of @array

my @<arrayName> = %<hashName>
  - Will create array with all key / values from hash as alternating values

$_ //The default scalar operator (like using 'it')
  - Auto created when iterating over an array / hash (can also be explicitly defined)
  - Many functions will use this var as arg if arg omitted
  - To avoid overwrite within a block explicitly define my $_ in while / whatever conditions

$@ //The Default array operator (like using 'them')
  - Availble to subs automatically, subs will use this var as arg if arg omitted in contacined funcs  

@ARGV array holds arguments passed at command line (outside of functions)
  - e.g. perl <script> these are arguments //@ARGV = ('these', 'are', 'arguments')
  - Script use this var as arg if arg omitted in contacined funcs  

$0 holds name of currently executing script


- The usual candidates (numbers): =, +, -, *
- Unitary operators (numbers): +, --, +=, -=, /= and *=
- Concatenation (strings): .
- Comparison (numbers): <, >, <=, >=, ==, !=, <=>
- Comparison (strings): lt, gt, le, ge, eq, ne, cmp, 
- String multiplier (strings): x
- Range Operator (numbers): .. e.g. 1 .. 10 -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
- Control Characters (only work in double quoted strings)
  - \n = new line
  - \t = tab
  - \r = carriage
  - \f = formfeed
  - \b = backspace
- triple-dot: ... will be parsed as a complete statement / through exceptions if left during run

Special -X file checks
  e.g. if (-e $fileName) ....
  -r  File is readable by effective uid/gid.
  -w  File is writable by effective uid/gid.
  -x  File is executable by effective uid/gid.
  -o  File is owned by effective uid.
  -R  File is readable by real uid/gid.
  -W  File is writable by real uid/gid.
  -X  File is executable by real uid/gid.
  -O  File is owned by real uid.
  -e  File exists.
  -z  File has zero size (is empty).
  -s  File has nonzero size (returns size in bytes).
  -f  File is a plain file.
  -d  File is a directory.
  -l  File is a symbolic link.
  -p  File is a named pipe (FIFO), or Filehandle is a pipe.
  -S  File is a socket.
  -b  File is a block special file.
  -c  File is a character special file.
  -t  Filehandle is opened to a tty.
  -u  File has setuid bit set.
  -g  File has setgid bit set.
  -k  File has sticky bit set.
  -T  File is an ASCII text file (heuristic guess).
  -B  File is a "binary" file (opposite of -T).
  -M  Script start time minus file modification time, in days.
  -A  Same for access time.
  -C  Same for inode change time (Unix, may differ for other 


defined <variable>
  - Returns true if value is undef else false
  - 0 is equiv to undef as is ''

fc <variable>
	- Casefolding (for making an argument case insensitive in things like custom sorts)
	- Need use feature 'fc'; / Perl v5.16 to use this :'(
looks_like_number <var> //From Scalar:Util
  - Returns true if value would be treated as a number by Perl
scalar <array|hash>
  - Treat value as a scalar

-- alpha

print <expression>
  - Standard Output (evaluates arguments as list)
    e.g. print 3, @array, 'three'; //3ArrayElementsthree

length <scalar>
  - Length of value of variable

substr <value>, <offset>[, <length>[, <replacement>]]
  - Return substring (zero indexed)
  - Use negative start value to count back from end of string
  - Can also use to manipulate strings
      e.g. substr($a, 11, 4) = "Perl"; //Length states how many characters from start to replace with assigned string

uc <string>
  - Returns string in uppercase

int (<scalar>)
    - Trim fractional part of scalar value

say <scalar(s)>
    - Standard output followed by newline

sort (<array|hash>) {
 - Sorts the array (numbers before letters, lower before upper case)
 - Can be redefined (return -1 for $a before $b, 1 for $a after $b and 0 for equality):
   e.g. Sort in order of keys in a hash
     sort {
       if ($type{$b} > $type{$a}) { return 1; }
       if ($type{$b} < $type{$a}) { return -1; }
       return 0;
     } keys %type
   OR could use <=> operator ( / cmp operator for strings)
  - Can also define function then use that for sort
    e.g. sort <functionIdentifier> <array>
  - e.g. case insensitive array value sort
  	@sortedArray = sort { fc($a) cmp fc($b)} @arrayToSort

keys <array|hash>
  - Return array of keys 

pop @<identifier>
  - Extracts and removes last element of array

push @<identifier>, <value>[, ...];
  - Append values to end of array

shift @<identifier>
  - Extracts and returns first eleemnt of array

my $var = shift
  - Example of using shift to work against $_ in lexical scope

unshift @<identifier>, <value>[, ...];
  - Insert new elements at the begining of the array

splice(@<identifier>, <startIndex>, <count>, <valueToAdd>[, ...]);
 - Returns / removed specified slice of array and replaces with listed values

join(<delimiter>, <array>)
  - Makes array into scalar value

  - Returns list in reverse order

scalar reverse(<list>)
  - Returns conacatenated list characters in reverse order

map { #Do stuff with $_ } @array
  - Applies function to all elements of array, returning new array

grep { #condition based on $_ } @array;
  - Applies filter to array, returning new array
  - Can be used to search arrays

chomp <scalar|list>
  - Remove characters in $/ (the default input seperator) in scalar or all scalars in list
  - Returns number of characters removed

stat [<arg>]
  - Returns 13 element list of file properties
  - See:

my $from_utf8 = decode('utf8', $data);
my $to_latin1 = encode('iso-8859-1', $string);
  - Example switching between character sets


open <filehandle>, [<mode>,] <filename>
  - Attempts to open <file>, assigns to <filehandle>
  - <filehandle> is a scalar identifier
  - This will fail if file permissions do not permit access
  - Will attempt to create file if it does it exist
  - Use redirect operators before <file> for overwrite / amend, e.g.
    - open($overwriteHandle, '>', 'file.txt'); //Overwrite contents
    - open($amendHandle, '>>', 'file.txt'); //Amend to contents
    (Note that second and third arguments can be concatenated
    - Also have
      - +< read / write (will not create)
      - +> read / write (will create / truncate)
      - +>> read / write (append)
  - True or false + error in $!
  - To use a particular character set use e.g. '<:utf8' as the mode

<<filehandle>> in scalar context will return a line from file
  - e.g. $line = <$amendHandle>;

readline <filehandle>
  - Return line

<<filehandle>> in array context will return an array of lines from file
  - e.g. @lines = <$amendHandle>;
  - OR 
      for $line (<$amendHandle>) {
        #Loop every line in $amendHandle

print <filehandle> <string>;
  - Write a string to an opened file

close <filehandle>
  - Guess what this does
  - Perl will auomatically do this at end of script / new open on same file

eof <filehandle>
  - Returns true or false dependant on being at end of file (usefull in while loop conditions)

Smallest way to read line by line:
  while (<$<fileHandle>>) {
    #Do stuff with $_

- STDIN, STDOUT and STDERR are all available globally
  - To read single line
    my $var = <STDIN>;
  - TO wait for enter

- Reading from <> reads from STDIN or (default if present) from files named in arguments to Perl script call. Use ctrl-D to finish input


To define:
  sub <functionName> {

To use:

  - Old Perl required & before function call so may still see this around
  - Brackets are actually optional

return <var>;
  - Return value to calling code (stop executing sub)

return <var> if wantarray;
  - wantarray flag for if function was called in list context

- Can pass as many scalar arguments to function call as we like as a list
  - Scalar = list of one
  - Hash of N = list of 2N
- Passed scalars will then be available within the sub via the @_ array
- If you don't expressly use the return statement, the sub returns the result of the last statement. 


- Comparisons have scalar context
- Comparisons shortcircuit (they will only evaluate until boolean result is not known)
- 0 values, '-' and '', empty lists and undef evaluate to false

- If statements:

  if (<comparison>) {
      # Do stuff if <comparison> == true

  <command> if <comparison>; //Postfix form

- If-else statements:

  if (<comparison>) {
      # Do stuff if <comparison> == true
  } else {
      # Do stuff if <comparison> == false

- If-elsif-else statements

  if (<comparison>) {
      # Do stuff if <comparison> == true
  } elsif (<comparison2>) {
      # Do stuff if <comparison2> == true   
      # Can have 1...* elsif blocks
  } else {
      # Do stuff if all above comparisons == false

- Unless statements:

  unless (<comparison>) {
      # Do stuff if <comparison> == false

  <command> unless <comparison>; //Postfix form

- Unless-else statements:

  unless (<comparison>) {
      # Do stuff if <comparison> == false
  } else {
      # Do stuff if <comparison> == true

- Ternary statement / operator (may be nested)

  (<comparison> ? <runIfTrue> : <runIfFalse>)

- While statements:

  while (<comparison>) {
      # Do stuff / loop while <comparison> == true

- While statement using each

  while ( my($key, $value) = each %hash) {
    #Do stuff with each key and value

  - If using <> / <STDIN> then need to type Ctrl/z in Windows or Ctrl/d in Linux to finish entry

- Until statements:

  until (<comparison>) {
      # Do stuff / loop while <comparison> == false

- Do-While statements:

  do {
    #Do stuff, repeat while <comparision> == true
  } while (<comparision>);

- Do-Until statements:

  do {
    #Do stuff, repeat while <comparision> == false
  } until (<comparision>);

- For loops:

  for (<varDeclaration>; <comparison>; <executeOnLoop>) {
    #Do stuff, while <comparison> == true
    #var should be declared with my uto make local only

  - Can actually replace for with foreach
  - Must use lexical iterator to avoid collisons

- Foreach loops:

  foreach my $<identifier> ( @<identifier>|<list> ) {
    #Do stuff, $<identifier> will be populated with each value in turn

  - Can actually replace foreach with for
  - Use (0 .. $#array) to jey array indices
  - If the iterator is not defined $_ will be auto populated instead
      foreach (@array) {
        print $_;
  - Single statement shorthand
    - print $_ foreach @array;

- Loop control
  - Name the loop with an allcaps label, then can use next <label> and last <label> to skip to next or break loop
      LABEL: foreach (@array) {
        #Do stuff
        next LABEL if <condition>;
        last LABLE if <condition>;


- When script execution ends a status word is returned, highest 8 bits (out of 16) contain return code
- Usually return code of 0 means no errors

system <list>
  - Use to call another program. list command line parts
  - Returned assigned value (or auto $?) is th return word
  - Use $? >> 8 to shift out non- return code bits

  - Run command on command line (returns output in list or string dependant on context)

exit [<int>]
  - Exit script, returning int (0-255), default 0

die <list|string>
    - Stop script with optional exit message(s)
    - Can use $! in string to output error if used in form: <failed function> or die "this is error: $!";


- @_ should be unpacked inside function it has been passed to
- Can use 'shift' (no arg) to auto shift into vars one by one
- Best = require hash to be passed then assign @_ to a hash


- Match regex: =~ (opposite is !~)
  - e.g. ('aabb' =~ m/aab+/) //true
  - Used in scalar returns true or false, in list returns list of matches
  - Can capture subexpressions using () around part of regex, this will make captured string available as var
    - i.e. 'aabbc' =~ m/(aab+)(c)/ would yield $1 = 'aabb', $2 = c
  - Add i to make case insensitive
  - Add x to allow spaces and comments in regex (which will not be interpreted)
  - Can also name captures (by including '?<name>' in subexpression parenthesis start) which will store them in the %+ hash
      e.g. @a = 'first second' =~ /(?<first>\w*)\s(?<last>\w*)/
        - Would yield $+{first} = 'first', $+{last} = 'last'
  - Can replace delimiters (/) with any other character
- Replace regex: =~ 
  - e.g. $var =~ s/regex/replacestring;
  - Will replace matches with specified string in variable
  - Can use sub-expressions here too, i.e. capture in regex then use $1, $2 or %+ etc in replacestring
  - Add g to replace all occurances
  - Add i to make case insensitive
  - Add x to allow spaces and comments in regex (which will not be interpreted)
  - Can replace delimiters (/) with any other character
- or: A series of statements separated by 'or' will continue until you hit one that works, or returns a true value
  - So can be used to join comparisons and functions e.g. ($var eq $var2) or ($var[2] == 7) or print 'monkeypoo';
- and: t evaluates your chain of statements, but stops when one of them doesn't work.
  - e.g. open (LOG, "log.file") and print "Logfile is open!\n";
- Can assign using list notation
    e.g. my ($time, $hours, $minutes, $seconds) = ($1, $2, $3, $4);
    - When using subexpressions Perl gives a shortcut:
      my ($time, $hours, $minutes, $seconds) = ($string =~ /((\d{1,2}):(\d{2}):(\d{2}))/);


- Can be included in another Perl file
- uses .pm extension
- Same syntax as .pl files
- Must return a true value at end

export PERL5LIB=/path/to/modules:$PERL5LIB
  - Add directory of modules to include path

require FOLDER::FILE
  - Include the FILE in FOLDER in defined PERL5LIB directory from above


- Namespaces in which subroutines can be declared
- Default pacakge is 'main'
- Subroutines with same name in different packages do no conflict
- Function from name space called as NAME::SPACE::funcName()

package NAME::SPACE
  - Switch namespace / package
  - :: used a namespace seperator

  - Example call

  OBEY (otherwise things will become rather confusing)
  - A Perl script (.pl file) must always contain exactly zero package declarations.
  - A Perl module (.pm file) must always contain exactly one package declaration, corresponding exactly to its name and location. 
    e.g. module Demo/ must begin with package Demo::StringUtils.

* CGI (Common gateway interface) *

Commonly used to run Perl with an http server.

Header settings:
Content-type: String  A MIME string defining the format of the file being returned. Example is Content-type:text/html
Expires: Date String	The date the information becomes invalid. This should be used by the browser to decide when a page needs to be refreshed. A valid date string should be in the format 01 Jan 1998 12:00:00 GMT.
Location: URL String	The URL that should be returned instead of the URL requested. You can use this filed to redirect a request to any file.
Last-modified: String	The date of last modification of the resource.
Content-length: String	The length, in bytes, of the data being returned. The browser uses this value to report the estimated download time for a file.
Set-Cookie: String		Set the cookie passed through the string

ENV variables:
CONTENT_TYPE  The data type of the content. Used when the client is sending attached content to the server. For example file upload etc.
CONTENT_LENGTH	The length of the query information. It's available only for POST requests
HTTP_COOKIE		Return the set cookies in the form of key & value pair.
HTTP_USER_AGENT	The User-Agent request-header field contains information about the user agent originating the request. Its name of the web browser.
PATH_INFO		The path for the CGI script.
QUERY_STRING	The URL-encoded information that is sent with GET method request.
REMOTE_ADDR		The IP address of the remote host making the request. This can be useful for logging or for authentication purpose.
REMOTE_HOST		The fully qualified name of the host making the request. If this information is not available then REMOTE_ADDR can be used to get IR address.
REQUEST_METHOD	The method used to make the request. The most common methods are GET and POST.
SCRIPT_FILENAME	The full path to the CGI script.
SCRIPT_NAME	The name of the CGI script.
SERVER_NAME	The server's hostname or IP Address
SERVER_SOFTWARE	The name and version of the software the server is running.

Get request from CGI
  $query = new CGI;

Get value from request

Get array of indexes in GET array

Force useful debug info:
  use CGI::Carp qw(fatalsToBrowser);


- regex are delimited by / e.g. /i(\sa)?m\sa\sregex/
    - Perl Compatible Regular Expressions (PCRE) of course!
    - Using /g will remember how far though regex last match was when used in loop for capture
    - Regexs are greedy (will match max string posisble)
        - Use .*? instead of .* to reduce affect

split(<regex>, <scalar>, [limit])
    - Return array of scalars from splitting given scalar by regex
    - Limit value will limit max size of array, subsequent values will be discarded