If your web browser formats this page incorrectly, try viewing the page source. In netscape 4, click the view menu, then click page source. #!/usr/bin/perl -w __END__ examples of how to program in perl. The first line of a perl script should be '#!/usr/bin/perl -w', as above; and the file should have execute permission. But that is probably not important if you run the script with 'perl -w [script_name]' (-w means to display compiler warning messages) '__END__' tells perl that this is the end of the program, so perl assumes that everything after this is a comment, and perl ignores everything after this. I think '__END__' is supposed to be on a line by itself, like above. In a real program, you would not want '__END__' on the second line, like above. # This line is a comment because it begins with '#', like in bash or sh. Every command in perl ends with a semicolon, ';', like in C. At the beginning of a program you might want to use: use strict; use English; use integer; # 'Strict' tells perl to do more careful error checking. With 'strict', any data in a subroutine must be specifically declared as local or global. 'strict' may have no effect if there are no blocks or subroutines. But I find it easier to not use 'strict', and let most variables be global without bothering to declare them global. 'English' tells perl to recognize long english names of predefined variables, like $PROCESS_ID for $$ 'integer' tells perl to treat numbers as integers instead of as floating point numbers. 'use integer' remains in effect until the end of the block, including blocks within the block. If it is at the beginning of the program, it will be in effect in all blocks. Maybe Perl uses it as if it occurred at the beginning of the block, no matter where it occurs in the block. Most programming languages have many types of data, like strings, floats, and signed integers. In perl there is only one kind of data, and it is called a scalar. A scalar is more like a data structure than a data type; it includes information about whether or not the scalar has been initialized, and what format the data is in, as well as the actual data. There are several possible data formats, and these data formats are probably the usual data types like string, float, etc. The good news is that perl chooses the data format for you and converts between formats automatically so you never have to worry about it. The bad news is that if you doubt that perl is choosing the best format, there is little you can do to change it, and perl will not tell you which format it is using. Also note that 'name' means sort of the same thing as 'scalar'; compiler error messages might say somehing like 'missing name', in which case name would probably but not always be a scalar. The defined() function tells whether or not a scalar has been initialized. Or you can use -w on the command line or in the first line of the perl script, and perl will warn you any time you use a scalar which has not been initialized. This is very useful for debugging, because bugs often involve the use of uninitialized data. print($A); # if $A is not initialized, this will give a warning If we are using -w and if we do something with the output of a function which returns undef, there may or may not be a warning print(open(A,'); # reads whole file into array. Note that if the # parenthesis were left out, it would only read one line print($#B); # displays '-1'. Since @B is not defined, there is no last # string in the array. ($D1, $D2, $D3, $D4, $D5) = @Data; # convert an array to seperate scalars (undef, undef, $D) = @Data; # extract one scalar from an array $A = 5; (undef,undef,undef,$A) = (1,2,3); # no data is written to $A; $A # will be undef; $A does not keep its original value of 5 %NameOfHash=($NameOfStringA,$StringA,$NameOfStringB,$StringB, $NameOfStringC,$StringC); %NameOfHash=('NameOfStringA',$NameOfHash{'NameOfStringA'}, 'NameOfStringB',$NameOfHash{'NameOfStringB', 'NameOfStringC',$NameOfHash{'NameOfStringC', 'NameOfStringD',$NameOfHash{'NameOfStringD'); # Note that perl may change the order of the hash %A=('a','wer','b','foo','c','asd','d','iop'); # a is the name of wer, b is the name of foo, etc print($A{'d'},$A{'b'},$A{'c'}); # displays iopfooasd predifined data: $_ or $ARG default data, assumed anywhere you leave out the input or output. For example, '<>' is assumed to be '$_=<$ARGV>' $ARGV default filehandle, assumed if you leave out the filehandle @ARGV command line parameters. $ARGV[0] is the first parameter. $#ARGV is the number of command line parameters minus one. @_ parameters passed to the current subroutine %ENV environment parameters of the current process. Any changes will apply to the current process and to future child processes, but not to the parent process or to past child processes. $0 or $PROGRAM_NAME name of the current perl program or script, or the name of the link to it. This is the name of the file with the perl code in it. $$ or $PROCESS_ID the id number of the current process, like what ps displays. Usually this number is less than 256. It is often used to make the named of temporary files, to guarantee that the temporary files of this process will not have the same names as the temporary files of other processes. $EXECUTABLE_NAME This is usually 'perl'. # Like in many other programming languages, # a typical perl command consists of name of output, equals sign, # name of function, open parenthesis, list of inputs seperated by commas, # close parenthesis, semicolon. For example: $OutputString=substr($InputString,1,2); # Perl allows you to be a sloppy programmer. If you leave out things like # quote marks and parentheses, then perl will try to guess # where they go. # Some functions do not return any data, or maybe you just do not care # what data is returned by the function. If you leave out the name of output # and equals sign, you will lose the data returned by the function; but # other than that the command will still work. Actually all functions # return data; if nothing else functions return 0 for failed and 1 for # succeeded. For example, print is almost always used without name of output # or equals sign. But you can do: $A=print(); # Since no inputs are given to print, it does not display anything. Since # print is not likely to fail, # that command is a stupid way to do $A=1; # But if you are printing to a file, then there is a small chance it will # fail. # The following command also works: substr($A,1,1); # That command is totally useless; there is no point in running substr() # if you are not going to save the data it returns. Or does perl # automatically put the data in $_? # It is safe to leave out the semicolon at the end of # the last command in the program, and it is safe to leave out the semicolon # just before a }, but every other semicolon is needed. # Perl uses semicolons to seperate commands, like in C. Thus you can # put comments in the middle of commands like this: print # comment # more comment "hello\n"; # But I cannot imagine why you would want to. I think that makes the program # hard to understand. # Commas are used to seperate the inputs, for when you have a function # which takes more than one input. Commas are as important as semicolons. # Quote marks are tricky; some times quote marks are needed and sometimes # not. For example: $A="AAAA"; # you might get away with skipping the quote marks, but you # should not try it because AAAA might be interpreted as the # name of a function, or maybe a filehandle. $B="123"; # quote marks not needed because a number will not be interpreted # as the name of a function or filehandle. The man pages seem to hint that # if quote marks are used, then perl will treat it as a string, converting # it to a number as needed; and if quote marks are not used, perl will treat # it as a number, converting it to a string as needed. $A="$B"; # quote marks not needed. The official recommendation is to skip # the quote marks in this situation because it speeds up the program, # especially if $A and $B are numbers. $A="$B\n"; # quote marks needed, I think. $A="123,456"; # quote marks needed because number contains a comma # Parentheses usually begin just after the name of the function and # end just before the semicolon. Thus it is usually obvious where the # parentheses go, and so it is usually safe to leave them out, and so perl # programs and help files usually do not include the parentheses. # However, if you use complicated combinations of commands, then parentheses # become tricky. # Including all the quote marks and parentheses makes a perl program more # precise, so if you are paranoid about bugs or just obsessive-compulsive, # you might want to always use parentheses and quote marks. # You should always remember that the parentheses and quote marks got left out. # When you look at a perl program without parentheses and quote marks, # you should see the parentheses and quote marks in your mind, # and you should know where to put the parentheses and quote # marks in case they are ever needed. # For example, you might need parentheses when you nest one function # inside another function: print substr($InputString,2,4),"\n"; # parentheses needed # You can add spaces if you think that makes your program look prettier. # For example, the following two commands are the same: $OutputString=substr($InputString,1,2); $OutputString = substr ( $InputString , 1 , 2 ) ; # The parentheses are not needed in either example; but I included the # parentheses to show that you can put spaces around parentheses. perl allows you to put a function in a function like this: $A = function1(function2($B)); which is the same as: $C = function2($B); $A = function1($C); except that the first version does not use or change $C perl also allows you to do this: $A = function1($C = function2($B)); which is the same as the previous example Perl subs are reentrant. That means a sub can call itself. Just make sure you do not have an infinite nesting, where it calls itself, which calls itself, which calls itself, which calls itself, etc. # There is more than one way to do everything in perl. That is good, because # you can adapt perl to your own personal programming style. But it is bad, # because if you look at a program written by someone else, they will have # written it in their own personal style, and you will not be able to # understand it! To run perl interactively, type the following at the linux prompt: perl -de 42 If you want to call a short perl script from a sh script, it may be simpler, faster, and use less disk space to use the perl command line instead of a script file. For example, I once wrote a sh script with: unzip -l $1 | perl -ne \ 'chomp;m/^.{7} ..-..-.. ..:.. / and print(" \"$_\" \"\"")' That uses perl to throw away some of the lines of output from unzip, and to reformat the rest of the lines into a format appropriate for dialog. I think -e needs to be the last option; -ew did not work. You can make filters which do just about anything this way. To convert unix text to DOS text by inserting a carriage return character before every newline character: perl -ne 's/\n/\r\n/g;print $_' To convert DOS text to unix text by removing every carriage return character: perl -ne 's/\r//g;print $_' However, it may be faster to use sed for simple filters. # math: $Answer=$A+$B; # addition $Answer=$A-$B; # subtraction $Answer=$A*$B; # multiplication $Answer=$A/$B; # division # When you write a program in perl, you treat numbers as strings, and perl # automagically converts strings to floating point numbers as needed. # If ALL the numbers in a routine or program are integers, then you can # speed up the program and increase the accuracy by adding the command: use integer; This seems to imply that if you do not use 'use integer;', then all numbers are floats. But some functions like remainders and file sizes seem to require integers, so functions like those may use integers even if you do not use 'use integer;'. big integer bugs: Perl version 5.004_04 built for i386-linux, the version from red hat linux 5.0, has bugs in dealing with integers larger than 2147483647, 0x7FFFFFFF, 2 gigabytes, about two billion. Some integers are 32 bits, and some are 64 bits; some are signed, and some are unsigned. Somehow, perl decides when to use signed and when to use unsigned; and when to use 32 bits and when to use 64 bits; and most important of all, perl tries to remember which are which. Some functions check what kind of integers we have and act appropriately; but some functions just assume that all integers are 32 bit signed. Thus problems occur when integers are more than 2147483647, or less than -2147483648. For example: use integer; $A=2147483648; if ( $A < 0 ) { print("$A is less than 0\n") } # displays 2147483648 is less than 0 '2147483648' is too big to be a signed integer, so perl correctly stores it as an unsigned integer. But the less than function, which is probably the same as the subtraction function, incorrectly assumes that all integers are signed, and thinks that $A is -2147483648, which is less than 0. The print function correctly recognizes that $A is an unsigned integer, not a signed integer, and displays $A as '2147483648'. You may be able to work around this if you are sure an integer is unsigned, or if you are sure an integer will never be less than 0: $MostSignificantBits = 0; if ( $A < 0 or $A > 0x7FFFFFFF ) { $MostSignificantBits = $MostSignificantBits + 1; $LeastSignificantBits = $A & 0x7FFFFFFF; } else { $LeastSignificantBits = $A } now you have split an unsigned integer into two signed integers. This might be useful or it might not; it depends on what you wanted to do with the unsigned integer. To recombine the parts: if ( $MostSignificantBits == 0 ) { $A = $LeastSignificantBits } elsif ( $MostSignificantBits == 1 ) { no integer; $A = $LeastSignificantBits | 0x80000000 } else { ??? } If all else fails, do not use 'use integer', and Perl will convert your large integers into floating point numbers. For example: use integer; $A=5000000000; { no integer; $A = $A * 2; } print("$A\n"); # displays 10000000000 These things correctly handle large integers: print(), automatic float-integer conversions. These things do not handle large integers correctly: add, subtract, multiply, divide, remainder, less than, greater than. You can also work around the large integer bugs by using the module /usr/lib/perl5/Math/BigInt.pm like this: use Math::BigInt; $I = new Math::BigInt '256'; print(($I * $I * $I * $I),"\n"); I do not understand it, but it works. When you give Perl a number, you usually type the number into your computer. Anything you type into your computer is a string, so Perl has to convert this string to a number. If the number string is NOT in quotes, then Perl converts the number string to a number when Perl loads the Perl script, using the following rules: If the number string begins with 0x, Perl interprets the number string as hexadecimal. If the number string begins with 0 but not 0x, then Perl interprets the number string as octal. Otherwise, Perl interprets the number string as decimal. However, if the number string is in quotes, then Perl treats the number string as a string. Perl will not convert the number string to a number unless you use the number string in a number function. If you do use the number string in a number function, then Perl will interpret the number string as decimal, even if it begins with 0 or 0x, unless you use oct(). Thus: $A=0755;$B='0755'; # for number functions, $A is 493 and $B is 755 # for string functions, $A is '493' and $B is '0755' $A=0x78;$B='0x78'; # for number functions, $A is 120 and $B is 0 # for string functions, $A is '120' and $B is '0x78' Maybe numbers in hexadecimal are integers by default, while numbers in decimal are floats by default. $Remainder = $StartingNumber % $NumberToDivideBy; remainder seems to imply that we are using integers. What happens if we calculate the remainder of a float? $Output = $Input | $MaskOrTemplate; # bit by bit boolean logic OR $Output = $Input & $MaskOrTemplate; # bit by bit boolean logic AND '&' means 'and', like in 'joe & co.' also like '&&' is command 'and': 'command1 && comand2;' is almost the same as 'command1 and command2;' '|' means 'or', like a description of the command line options of a program, where it might say 'option1 | option2', meaning you can use option1 or option2, but not both or neither also like '||' is command 'or': 'command1 || command2;' is the almost same as 'command1 or command2;' bit by bit boolean logic means the first bit of one input and the first bit of the other input are compared and the result is put in the first bit of the output, etc. AND means that if both bits are 1, the result is 1; if either or both bits are 0, the result is 0. OR means that if either or both bits are 1, the result is 1; if both bits are 0, the result is 0. AND and OR seem to imply that $Input should be a 32 bit integer. What happens if $Input is an integer of more or less than 32 bits, or a float or a string? $MaskOrTemplate is usually given in hexadecimal; you need 8 hexadecimal digits to make 32 bits. AND and OR do not care if $Input is a signed or unsigned integer. If 'use integer;' is in effect, $Output will be a signed integer. If 'no integer;' is in effect, $Output will be an unsigned integer. (see comments on big integer bugs above) For example: $A = -1; { no integer; $A = $A | 0x00000000; } print("$A"); # displays 4294967295 '| 0x00000000' does not actually change $A; the 32 bits of $A stay the same. But $A is changed from a signed integer to an unsigned integer. $OutputString=substr($InputString,$StartPosition, $NumberOfBytesToExtract); # substr() extracts a substring from the input string, like basic MID$() # For $StartPosition, 0 is the first byte, 1 is the second byte, etc; # -0 is the same as 0; # and -1 is the last byte, -2 is the next to last byte, etc. # If $NumberOfBytesToExtract is negative, then it gives the number of # characters to skip at the end: -1 means skip 1 byte at end (stop 2 bytes # from end), -2 means skip 2 bytes at end (stop 3 bytes from end), etc. # -0 is the same as 0, extracts 0 bytes. # If $NumberOfBytesToExtract is left out, # returns everything from $StartPosition to the end of $InputString. # Note that for $StartPosition, -1 means the last byte; for # $NumberOfBytesToExtract, -1 means the next to last byte. $Input=1234567890; $Output=substr($Input,0,1); # returns first character, $Output='1' $Output=substr($Input,-4,1); # extracts 1 character, starting 4 from end, # $Output='7' $Output=substr($Input,0,-5); # start with first byte and stop six bytes # from end, $Output='12345' $Output=substr($Input,8,-1); # output is '9' $CombinedString=$String1.$String2; # period combines two strings. If String1 # is '12' and String2 is '34', then $CombinedString is '1234'. Same as $CombinedString="$String1$String2"; $LengthOfInputString=length($InputString); $Character=chr($ASCIInumber); # converts ASCII number to the corresponding character the opposite of chr() is ord() $ASCIInumber=ord($Character); # converts character to ASCII number # if $Character is more than one character, ord() uses just the first # character $DecimalString=hex($HexadecimalString); # convert hexadecimal string # to decimal string. for example, would convert 'af' to '175'. $DecimalString=oct($OctalString); # convert octal string to decimal string # If $OctalString begins with '0x', oct will assume it is hexadecimal, # and oct will do the same thing as hex. $IntegerString=int($InputString); # convert string to integer string print int '$2.78'; # displays 0 print int '2.78'; # displays 2 print int 'n1n2'; # displays 0 print int '1n2n'; # displays 1 $RandomNumber=rand($Maximum); # generates random number between 0 and $Maximum $Output=$WhatToRepeat x $NumberOfTimesToRepeatIt; $Output='x' x 80; # $Output is 80 x's. $OutputString=join($Seperator,$String1,$String2,$String3); # like $OutputString="$String1$Seperator$String2$Seperator$String3" $OutputString=reverse($String1,$String2,$String3,$String4); print reverse '123456'; # displays 1234565 print reverse('123','456','789'); # displays 789456123 $PositionOfSearchString=index($InputString,$SubStringToSearchFor, $PositionToStartSearching); # searches $InputString for $SubStringToSearchFor, and returns the position # of the first character of $SubStringToSearchFor. If $SubStringToSearchFor # is not found, returns -1 # If $PositionToStartSearching is left out, assumes 0 and starts searching # at the first character of $InputString. print index('ifoad','i'); # displays 0 print index('ifoad','o'); # displays 2 # note that most perl functions return 0 or undef if they fail, and # any other number or string if they succeed. But index() returns -1 # if it fails and 0 or more if it succeeds. This makes it tricky to # test index() to see if it succeeded. For example: $A='x';$B='xxxx'; if(index($B,$A)){print("there is a $A in $B")} else{print("there is no $A in $B")} # displays there is no x in xxxx # add '!=-1' and it works $A='x';$B='xxxx'; if(index($B,$A)!=-1){print("there is a $A in $B")} else{print("there is no $A in $B")} # displays there is a x in xxxx $PositionOfSearchString=rindex($InputString,$SubStringToSearchFor, $PositionToStartSearching); # rindex is like index, except it searches backwards, and if # $PositionToStartSearching is not given, it starts at the last character # in the string. Thus it returns the last occurrence of # $SubStringToSearchFor print rindex('ifoad','i'); # displays 0 @PiecesOfInputString = split($SeperatorString,$InputString); @A=split('/','/foo/bar/ugh'); # @A = ('', 'foo', 'bar', 'ugh') # note that $SeperatorString is not included in any of the pieces. # also note that the first piece is '', because $InputString began # with $SeperatorString The seperator string can be a simple string in quotes or a pattern in slashes: @A = split(/a/,'fdsadnadsa '); # @A = ('fds','dn','ds',' ') But if the seperator string or pattern is a space, then slashes and quotes have different meanings: @A = split(/ /,' a 1 2'); # @A = ('','a','','1','2') The next example is a special case which splits on both spaces and newlines, and it throws away any piece which is '' @A = split(' '," foo\nbar"); # @A = ('foo','bar') You can mark the seperator string with quotes or slashes, but if you use both, then the inner pair is part of the seperator string. @A = split(/' '/,"a' 'b"); # @A = ('a','b') $LastCharacter=chop($InputString); # remove the last character of # $InputString. Note that this changes $InputString! The main use of # chop() is to remove the newline on the end of a string returned by # readline, as in print("What is your name?\n"); $YourName=; chop($YourName); # Some possible variations are: $YourNameNoNewline$=substr($YourName,0,-1); $YourNameNoNewline$=substr($YourName,0,-1) if substr($YourName,-1,1) eq "\n"; # safer chomp($YourName); # chomp removes all the newlines at the end of the string # If you are a Wizard of Perl with an uncontrollable urge to make your # programs incomprehensible to mere mortals, # you can combine readline() and chop() with chop($YourName=); $ExitCodeX256 = system($CommandToExecute); $ExitCode = system($CommandToExecute) >> 8; system() runs an external program, and returns the exit code of that program with some other information, like which signal killed it. To get just the exit code, integer divide by 256 or shift right 8 places. If you do not want to wait for the other program to exit, or if you want to run the other program in the background, you have to use fork. system() does not flush the output buffers, so in the following commands: print("foo\n"); system("bar"); # The output of bar will be displayed BEFORE 'foo'. If you want 'foo' to be displayed before the output of bar, use $OUTPUT_AUTOFLUSH or $| print("foo\n"); $OUTPUT_AUTOFLUSH = 1; $OUTPUT_AUTOFLUSH = 0; system("bar"); # $A = system() & 0xFFFF; $ExitCode = $A >> 8; $CoreDump = $A & 0x80; # 0 for no core dump, 1 for core dumped $Signal = $A & 0x7F; # the signal which killed the process # The perl man page says that if system() fails totally and there is no exit code, the exit code will be 255, no core dump and no signal. The perl man page says to do '$A = system() & 0xFFFF'; this implies that system returns more than exit code, core dump, and signal; but it does not say what else is returned. Maybe the extra information is only available on some operating systems; or maybe there is no additional information, but there might be additional information in future versions of perl. In a perl program which used system() to run another perl program, the child aborted when it received SIGPIPE, and the parent perl program thought the exit code was 141, the signal was 0, and no core dump. The man page says the signal is supposed to be the signal which killed the process, I know the process was killed by SIGPIPE, so why was the signal 0? $WhatItLinksTo = readlink($NameOfSymLink); readlink() finds out what a soft link is linked to. $SuccessOrFailure = symlink($WhatItLinksTo,$NameOfLink); symlink() makes a soft link. $NumberOfFilesSuccessfullyChanged = chmod(0644,'foo'); $NumberOfFilesSuccessfullyChanged = chmod($Permissions,$File1,$File2,$File3); $NumberOfFilesSuccessfullyChanged = chmod($Permissions,@LotsaFiles); chmod() changes the permissions of files. It is usually easiest to give file permissions as an octal number, because in octal the file permissions is a three digit number, one digit for each set of three permissions. 1 for execute permission, 2 for write permission, 4 for read permission, and add these numbers together for combinations of permissions. For example, a common permission is 0644: 6 equals read plus write, so owner has read and write permission. 4 equals read, so group has read permission, and 4 again for other, so others also have read permission. If the permissions are 0644 and you change them to 0644, is that a successful change, or a failure to change? Remember that Perl thinks 0644 is an octal number, but Perl thinks '0644' is a decimal number. $Perms = 0644; chmod($Perms, 'foo'); # 0644 is octal (right) $Perms = '0644'; chmod($Perms, 'foo'); # 0644 is decimal (wrong) $Perms = '0644'; chmod(oct($Perms), 'foo'); # 0644 is octal (right) If your number for the permissions is too large, chmod() uses the low bytes and ignores the high bytes; this is not an error. $NumberOfFilesSuccessfullyChanged = chown(0,0,'foo'); $NumberOfFilesSuccessfullyChanged = chown($UID,$GID,$File1,$File2,$File3); $NumberOfFilesSuccessfullyChanged = chown($UID,$GID,@LotsaFiles); chown() changes the owner and group of files. $UID is the user number, 0 for root, etc. $GID is the group number, 0 for root, etc. If a file has setuid permission, perl 5.004_04 chown() removes the setuid permission; I think that is a bug, but it may be an undocumented feature. $NumberOfFilesSuccessfullyChanged = utime(939723901,939723901,'foo'); $NumberOfFilesSuccessfullyChanged = utime($TimeOfLastAccess, $TimeOfLastChange,$File1,$File2,$File3); $NumberOfFilesSuccessfullyChanged = utime($TimeOfLastAccess, $TimeOfLastChange,@LotsaFiles); utime() changes the time of last access and the time of last change of a file. The time is the time since the beginning of Unix, 00:00 January 1, 1970 GMT; the unit of time is seconds; thus the time is a large number time() and stat() return an appropriate number. ($DeviceNumberOfFilesystem,$Inode,$TypePermissions,$NumberOfHardLinks, $UID,$GID,$DeviceID,$SizeOfFile,$TimeOfLastAccess, $TimeOfLastChange)=stat('dev'); @FileInfo=lstat($NameOfFile); stat() and lstat() return information about a file. If the file is a soft link, stat() returns information about the file the soft link is linked to; lstat() returns information about the soft link. $TypePermissions is the file type and permissions as a single number. $Permissions = $TypePermissions & 0x0FFF; $Type = ($TypePermissions & 0xF000) >> 12; $DeviceID is 0 unless the file is a block or character device; it is the device major number times 256 plus the minor number. If you write $DeviceID as a four digit hexadecimal number, the first two digits are the major number, the last two digits are the minor number. $SizeOfFile is the size of a normal file. For a directory, it is the number of bytes used to store directory information about the files in the directory. For a link, it is the size of the string which says what file the link is linked to. For a device, $SizeOfFile is 0. If you do not need all the information returned by stat(), you do not have to take it all: (undef,undef,$TypePermissions)=stat($NameOfFile); If you want to use the permissions to set the the permissions of a file with chmod(), you can use $TypePermissions, you do not have to convert the permisssions with the file type into the permissions without the file type. $OutputString = pack($Template,@InputData); @OutputData = unpack($Template,$InputString); If a number fits into one byte (8 bits), pack it with 'c' or 'C'. $A = pack("c",75); # same as $A = chr(75) For pack(), c is the same as C. Numbers from 128 to 255 are unsigned, numbers from -1 to -127 are signed, and for numbers from 1 to 127 it does not matter. For unpack(), c means that bytes from 0x80 to 0xFF should be converted into numbers from -127 to -1; C means that bytes from 0x80 to 0xFF should be converted into numbers from 128 to 255. For bytes from 0x00 to 0x7F there is no difference between unpack "c" and unpack "C". If a number fits into two bytes (16 bits), pack it with 's' or 'S'. If a number fits into four bytes (32 bits), pack it with 'l' or 'L'. ($FileSize,$Permissions,$UID,$GID) = unpack('LSCC',$DataStructure); @BunchOfBytes = unpack("C*",$Input); # split $Input into an array of bytes If there are not enough bytes in the input, unpack() returns nothing; probably it returns undef: $A=unpack('L','ABC'); # 'L' means 4 bytes, but input string is only 3 bytes, so nothing is unpacked. 'unpack(%' calculates checksums: $Checksum = unpack("%32L*","$Input\0\0\0"); '%' means to calculate a checksum, '32' means a 32 bit checksum, 'L' means to extract a string of four bytes (32 bits) and calculate the checksum on those four bytes, '*' means to calculate checksums on additional four byte pieces and add those checksums to the total checksum, '\0\0\0' is there in case $Input is not evenly divisible by four. Without '*', the checksum would be for the first four bytes only, not the whole input string. If 'L' was changed to 'S' or 'C' then there would be more calculations to do, which would be slower. Without the '\0\0\0', if $Input was 14 bytes, the checksum would be calculated on the first 12 bytes, and the last 2 bytes would not be checksum protected. If 'L' was 'S' only one '\0' would be needed; if 'L' was 'C' no '\0' would be needed. Less significant bytes are put into the output string before more significant bytes: print pack('L',0x41424344); # displays DCBA $NewSizeOfArray = push(@Array,$NewItem); $NewSizeOfArray = push(@Array,$NewItem1,NewItem2); $NewSizeOfArray = push(@Array,@NewItems); push() adds one or more items to the end of an array. There are several other ways to do this, but the man page says this is the most efficient. $LastItemInArray = pop(@Array); pop() removes the last item in an array and returns it; returns undef if the array is empty. pop() makes the array smaller, because it removes the last item, but if the array is already empty the size of the array does not change. @ScalarsRemoved = splice(@InputArray, $FirstScalarToRemove, $NumberOfScalarsToRemove, @ScalarsToPutInArray); splice() replaces some scalars in an array with other scalars. In reality, we rarely replace scalars; usually we remove scalars or insert scalars. To insert some scalars into an array: splice(@InputArray, $InsertBeforeThisScalar, 0, @ScalarsToInsert); To remove some scalars from an array: @ScalarsRemoved = splice(@InputArray, $FirstScalarToRemove, $NumberOfScalarsToRemove); To get rid of the last scalars in an array: @ScalarsRemoved = splice(@Array,$FirstScalarToRemove); $SuccessOrFailure=open(FH_FileHandleName,"MODE $FileName"); $SuccessOrFailure=open($FH_FileHandleNameName,"MODE $FileName"); # open file $FileName for reading or writing and assign filehandle # FH_FileHandle to it. Filehandles are often given names in all capitals # with underlines, like 'OUTPUT_FILE'; I prefer names beginning with # 'FH_' so I can remember that it is a filehandle. # The following filehandles are open automatically: STDIN, STDOUT, STDERR If MODE is '>', the file is opened for output, if the file does not exist it is created, if the does exist its size is reset to zero. If MODE is '>>', the file is opened for output, if the file does not exist it is created, if the file does exist it will be appended to. If MODE is '<' or nothing, the file will be opened for input. If MODE is '+<', the file will be opened for input and output. There does not have to be a space after the mode. However, in rare situations you may have a file with strange characters in the name. Then the space is important. For example, open(FH,'>>f') could mean to append to a file named 'f', or it could mean to overwrite a file named '>f'. In these rare situations the space is needed. I suggest you always use a space between the mode and the filename, and do not worry about when it is needed and when it is not needed. On the subject of opening files with bizarre names, you can not open a file if the first character of the name is a space, but you might be able to use sysopen(). open(FH_mtab,'< /etc/mtab'); open(FH_log,">> $LogFileName"); # right open(FH_log,>> $LogFileName); # wrong, quotes are needed You can open more than one file handle to the same file, or you can duplicate an existing file handle with '&': # use one of the following open()s; # use > if the second open() is >; use >> if the second open is >> #open(FH_1,'> o'); #open(FH_1,'>> o'); print(FH_1 "1"); # use one of the following open()s #open(FH_2,'> o'); # file o will contain 124 #open(FH_2,'> & FH_1'); # file o will contain 124 #open(FH_2,'>& FH_1'); # file o will contain 3124 #open(FH_2,'>> o'); # file o will contain 3124 #open(FH_2,'>> & FH_1'); # file o will contain 124 #print open(FH_2,'>> & FH_1'); # displays 1, so the open() succeeded #open(FH_2,'>>&FH_1'); # file o will contain 3124 #open(FH_2,'>>& FH_1'); # file o will contain 3124 #open(FH_2,'>> &FH_1'); # file o will contain 124 print(FH_1 "2"); print(FH_2 "3"); print(FH_1 "4"); I think it is particularly bizarre the way if you use '&' to duplicate a file handle, and you have a space between the arrows and the '&', then whatever you write to the new file handle disappears into a black hole. In my opinion, this is a bug in Perl. Actually, it does not disappear into a black hole; it was written to files named '& FH_1' and '&FH_1'. Also note that it appears that if you have two file handles to the same file, then whatever you write to the first file handle is saved in memory and not written to the file until the program is done. If the program writes a lot of output to the first file handle, then this could use a lot of memory; and this might be undesirable if the program runs for a long period of time. I have not seen comparable behavior in shell scripts, so I am guessing that this is a bug/feature of Perl, not of the kernel. This behavior is slightly different if you have more than one file handle to standard output: print(STDOUT "1"); print("2"); # use one of the following open()s #open(FH_2,'> -'); # displays 12345678 #open(FH_2,'>& STDOUT'); # displays 51234678 #open(FH_2,'>> -'); # displays 12345678 #open(FH_2,'>>& STDOUT'); # displays 51234678 print(STDOUT "3"); print("4"); print(FH_2 "5"); print("6"); print(STDOUT "7"); print("8"); Duplicating a file handle and opening more than one file handle to the same file are the same thing. Personally, I think it is simpler to open a second file handle to the same file, so I prefer to do it that way; but there are some situations where a sub may know the file handle, but not the name of the file. But maybe what you really want is more than one name for the same file handle, like this: open(FH_1,'> o'); print(FH_1 "1"); $FH_2 = 'FH_1'; print(FH_1 "2"); print($FH_2 "3"); print(FH_1 "4"); and file 'o' will contain '1234' The man page hints that file handles are numbers or integers, but Perl does not allow you to treat file handles as numbers. The compiler rejects: FH_2 = FH_1; and: if ( FH_2 == FH_1 ) {} The compiler does allow: open(FH_2,FH_1); but it fails; it returns undefined; it does not open FH_2; you can not write anything to FH_2. Actually, Perl thinks that says to create a file handle named 'FH_2' to read from a file named 'FH_1'; since there is no such file, open() fails. But there is a function fileno() which returns the file descriptor number of a file handle. I think that if you open one filehandle to a file, and then open another filehandle to the file, then the first filehandle is buffered and not written to the file until the second filehandle is closed; but if you duplicate a filehandle, then the data from both filehandles goes to the file right away. What happens if you open two read filehandles to the same file? If you make STDIN and STDOUT duplicates of some other file handles, then that redirects standard input and output. The redirection of standard output and input applies to all child process created after the redirection. open(MODEM,"+<",$MODEM_DEVICE); open(STDOUT, ">& MODEM"); open(STDIN, "<& MODEM"); system('sz file'); $NA = open(FH, "| foo37"); print("$NA\n"); $NA = close(FH); if ( ! defined($NA) ) { print("undefined\n") } print("\$NA= $NA\n"); print("\$\!= $!\n"); print(($?/256),"\n"); # output is: 127 foo $NA= $!= 37 # foo37 is a shell script as follows: echo foo exit 37 # open | means to open a pipe to a process, which implies that we are going to write data to the file handle. We could have done open(FH, "foo37 |") if we wanted to read data from the file handle. open() returned 127, which is the process id of the child process which was created. 'foo' is the output of foo37. close() set $? to the exit code of the child process times 256; so dividing $? by 256 gives 37, the exit code of foo37. close() returned 0 for failure because the exit code of the child process was not 0. $! is 0 because other than the nonzero exit code, close() was successful. I do not understand why $NA and $! are displayed as nothing instead of as 0; $NA is not undef or it would have displayed 'undefined'; it almost seems like $NA and $! are '', but they should be 0. Note that the output of the child process was displayed, and appears to be part of the parent process's output. Probably the child process's STDOUT goes to the parent process's STDOUT. Also note that the child process did not execute right away; the parent process dispayed the process id before the child process displayed 'foo'. It may be that perl does not allow the child process to execute until close() waits for the child process to finish; but more likely perl has no control over the execution of the child process; and it depends on how the kernel allocates processor time slices. Do not use >, <, etc for file mode when opening a pipe. For example, to read from or write to a compressed file: open(InputFile, "cat inputfile.gz | gzip --uncompress |"); open(OutputFile, "| gzip -9 > outputfile.gz"); Filehandles appear to be strings, and usually shorter strings are faster than long strings, so maybe it would be faster to do: $FileHandleFromInputFile = 'a'; $FileHandleToOutputFile = 'b'; open($FileHandleFromInputFile,"< $NameOfInputFile"); open($FileHandleToOutputFile, "> $NameOfOutputFile"); $SuccessOrFailure = sysopen($FileHandle,$FileName,$Mode,$Permissions); use Fcntl;sysopen($FileHandle_DataPipe,'data',O_CREAT|O_WRONLY); sysopen() is an alternate open() function which allows more modes. In order to use the special modes for sysopen, you have to include 'use Fcntl;'; the special modes are listed but not explained in /usr/lib/perl5/i386-linux/5.00404/Fcntl.pm, /usr/lib/perl5/5.6.0/i386-linux/Fcntl.pm, or something like that. The modes correspond to the modes for the open() system call, and are sort of explained in 'man 2 open'. I suppose sysopen() is the same as syscall(open,...). Note that you can combine two or more modes with '|'. The permissions are optional; the permissions are used only if sysopen() is creating the file, and if you do not provide permissions default permissions are used. $SuccessOrFailure=close(FH_FileHandleName); $SuccessOrFailure=eof(FH_FileHandleName); # $SuccessOrFailure will be 1 if # at the end of the file, or if the file has not been opened. $SuccessOrFailure=print(FH_FileHandleName $WhatToPrint1,$WhatToPrint2); # Note that there is a space and no comma between FH_FileHandleName # and $WhatToPrint1. If FH_FileHandleName is not given, assumes STDOUT. print("What is your name?\n"); # displays the string. \n is newline. # How can perl tell if the first input is what to print, or a filehandle? print STDOUT; # STDOUT is a filehandle, displays nothing print "STDOUT"; # STDOUT is not a filehandle, displays STDOUT $A=STDOUT;print $A; # STDOUT is not a filehandle, displays STDOUT $A="STDOUT";print $A; # STDOUT is not a filehandle, displays STDOUT $A="STDOUT";print $A $A; # first STDOUT is a filehandle, second STDOUT is # not a filehandle; displays STDOUT print() returns 1 if it succeeded, and undef if it did not succeed. I think it is is impossible to fail when writing to STDOUT or STDERR. However, if some error occurs part way through writing, like the disk is full, then print() will not return the error right away. Suppose there is 10K of free space on the disk, and you use print() to write 1K at a time. After 10 print()s, the disk will be full. But you might do 12 print()s before you get an error. This is because print() writes to a buffer, and perl writes the buffer to the disk later; perl says print() succeeded, but actually perl does not know if print() succeeded, and will not know until later, and so errors are reported late. Also note that you if you use print() to write a certain amount of data to the buffer, perl may write a different amount of data from the buffer to the file. If you want to know when the disk is full, or if you want to control how much data is written at a time, like if you are writing to a device which requires some exact block size, then use syswrite() instead of print(). Remember that print() returns 1 if it succeeded and undef if it failed; but syswrite() always returns the number of bytes written, thus syswrite() returns 0 if it failed. syswrite() requires a length in perl 5.0; in perl 5.6 the length can be skipped and syswrite() will write everything. printf(FILEHANDLE $Pattern,$Item1,$Item2,$Item3,etc); %u means unsigned integer, %.nf means print with n decimal places printf("%u %u %u %u\n",.0000001,.4999999,.5000001,.9999999); printf("%u %u %u %u\n",(.0000001+.5),(.4999999+.5),(.5000001+.5), (.9999999+.5)); printf("%.0f %.0f %.0f %.0f\n",.0000001,.4999999,.5000001,.9999999); printf("%.2f %.2f %.2f %.2f\n",.00000000001,.00499999999,.00500000001, .00999999999); printf("%.2f %.2f %.2f %.2f\n",(.00000000001+.005),(.00499999999+.005), (.00500000001+.005),(.00999999999+.005)); # output: 0 0 0 0 0 0 1 1 0 0 1 1 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.01 # observe that %f rounds numbers, while %u throws away the decimal, %u can # be forced to round by adding .5 first, or you can use %.0f. To read a line of text from an open file handle: $Line = ; # the line which was read is $Line ; # short for $_ = ; ; # wait until the user presses enter $A = <>; # since the name of the file handle is not given, use # the same file handle as the last read $B = 'FH_FileHandleName'; $A = <$B>; # variable file handle @EveryLineFromFile=; # read every line from a file. # this may use a lot of memory. A line read by <> will end with a newline character; except if the file does not end with newline, then the last line read from the file will not end with newline. Usually you use chomp() to remove the newline. A file is often read like this: open(FH,' ) { chomp($_); print("$_\n"); } close(FH); The while block runs once for each line in the file. if you changed it to while ( $A = ) ... then the compiler would give a warning that might be 0 if you changed it to while ( defined($A = ) ) ... then there would not be a warning. I guess that means that perl would rather <> to $_ than to $A or anything else. No, it probably means that perl automatically interprets while() as while(defined()). This is because <> might read read '0' from the file if the last character in the file was '0' and the next to last character was newline; perl usually interprets '0' as failure, but reading '0' from a file is not a failure to read from the file. The man page says that readline() is the same as <>, but I have found that the compiler ignores readline(); '$A=readline(FH);' is treated as a comment. I guess this is a bug in Perl. So do not use readline(). If you really want a function name for <>, you could create a sub. $NumberOfBytesActuallyRead=read(FH_InputFile,$OutputString, $NumberOfBytesToRead,$StartingOutputStringPosition); # reads a specific number of bytes from a file. # The first byte read from the file goes into $StartingOutputStringPosition, # second byte into the next position, etc. Usually # $StartingOutputStringPosition is not given, in which case 0 is assumed. # The last byte read from the file becomes the last byte of $OutputString, # which might make $OutputString longer or shorter than it was before. I tried leaving out $NumberOfBytesToRead in the hope that it would read until end of file, but the compiler said that was an error. If $NumberOfBytesToRead is 0, that means we are at end of file. If we try to open a file but open() fails, and then we try to read from the file, then read() returns undef. print read(STDIN,$A,6); waits for the user to press 6 keys. Note that each key is displayed right after it is pressed. The Perl program does not get any keys until the user presses enter. enter counts as a key. So this waits until the user presses enter after pressing five keys. This might be a kernel bug/feature, such that a terminal does not pass keystrokes to the program until you press enter. # To set the current file position, which controls which byte from the file # is read next, use $SuccessOrFailure=seek(FH_FileHandleName,$NewFilePosition,MODE); # if MODE is 0, then the new file position is $NewFilePosition # if MODE is 1, then the new file position is the old file position plus # $NewFilePosition # if MODE is 2, then the new file position is the last byte in the file # plus $NewFilePosition. $NewFilePosition should be a negative number. # If you are only reading one character, you could use $Character=getc(FH_FileHandleName); Just as print() writes to a buffer, while syswrite() writes directly to the file; read() reads from a buffer while sysread() reads directly from the file. Since read() reads from a buffer, perl reads ahead and stores the data in the buffer. If you do not want perl to read data before your program is ready for it, or if you want to control how much data is read at a time, then use sysread() instead of read(). Perhaps you have one program reading from another program, and you do not want one program to think it has sent 2K of data, while the other program thinks it has received 78 bytes; or maybe you are reading from a device which requires an exact block size. But you can not use read(), seek(), or eof() on the same filehandle as sysread(). There is a sysseek() for use with sysread(), but there is no syseof(). The only way to detect end of file with sysread() is to keep doing sysread() until it returns 0 as the number of bytes read. You do not have to use sysopen() with syswrite(); you can open the file with open() and write data with syswrite() $SuccessOrFailure=chdir($DirectoryName); # if no directory name given, changes to home directory It does not matter whether or not the directory name ends with '/' $SuccessOrFailure=mkdir($DirectoryName,0755); # create directory. '0755' is the permissions, rwxr-xr-x $SuccessOrFailure=rmdir($DirectoryName); # delete directory @ListOfFileNames = glob('*.c'); finds files which match sh pattern *.c in the current directory. If no files match, glob() returns nothing. $SuccessOrFailure = rename($OldFileName,$NewFileName); rename file. If there is already a file named $NewFileName, it will be overwritten. $NumberOfThingsDeleted=unlink($FileName); $NumberOfThingsDeleted=unlink('notes','notes~'); $NumberOfThingsDeleted=unlink(@LotsaFiles); unlink; # same as unlink($_) # delete file, link, device, pipe $SuccessOrFailure=symlink($LinkTo,$NameOfLink); # create symbolic link print("block device\n") if -b "$FileName"; print("character device\n") if -c "$FileName"; print("directory\n") if -d "$FileName"; # it does not matter if $FileName # ends with '/'. if $FileName is '', the directory does not exist. print("exists\n") if -e "$FileName"; # but if $FileName exists, and if # it is a soft link, and if what it links to does not exist; then # -e will say that $FileName does not exist. print("normal file\n") if -f "$FileName"; print("soft link\n") if -l "$FileName"; $FileAge=-M "$FileName"; print("pipe\n") if -p "$FileName"; print("Readable\n") if -r "$FileName"; print("size is more than zero\n") if -s "$FileName"; $FileSize=-s "$FileName"; print("Writable\n") if -w "$FileName"; print("Executable\n") if -x "$FileName"; print("size is zero\n") if -z "$FileName"; # if you want a lot of information about a file, # it may be easier to use stat() Is a soft link to a normal file considered a normal file? If you have a soft link to a normal file, and you do -e to see if the soft link exists, it tells you if the file exists. If the soft link exists, but the file it links to does not exist, then -e will say the soft link does not exist. Therefore, I would assume that for most of the tests, if you test a soft link, it actually tests the file which the soft link links to. Probably -l tests the soft link, and all the others test the file. if ($String1 eq $String2) { print('strings are the same'); } elsif ($String1 ne $String2) { print('strings are different'); } else { print('strings are neither same nor different???!!!'); } # Note no semicolons except within {} # arithmetic comparisons string comparisons # < less than lt # > greater than gt # <= less than or equal to le # >= greater than or equal to ge # == equal eq # != not equal ne # <=> compare cmp # compare returns -1 if less than, 0 if equal, 1 if greater than # compare returns -1 if first string comes first in alphabetical order # ('a' cmp 'b') # -1 # ('a' cmp 'a') # 0 # ('b' cmp 'a') # 1 # '.' comes before '0', '0' comes before '1', '1' comes before 'A', # 'A' comes before 'AA', 'AA' comes before 'B', 'B' comes before 'a' $A='d'; if ($A eq 'a') {print 'a';} elsif ($A eq 'b') {print 'b';} elsif ($A eq 'c') {print 'c';} else {print 'unknown';} You can have multiple tests in an if by using 'and' or 'or'. Also, you can put commands before the test, but note that the commands must end with a comma, not a semicolon. For example: if ( $A = substr($B,0,1), $A eq ' ' or $A eq "\t" ) { $B = substr($B,1) } That example would remove one leading space or tab from $B. It would probably make more sense to change 'if' to 'while', and then it would remove any number of leading spaces and tabs. Note the test is really just a command, and the comma is seperating commands in a list of commands. 'or' means to execute the following command if the previous command failed; that means that the first test is checked, if the first test passed the total test passed and the second test is never checked, if the first test failed the second test is checked and the result of the second test is the result of the total test. This also applies to 'if', 'while', and 'until'. In the following example: if ( 3 > 7 and 4 > 9 or 5 > 3 ) { print('ok') } # displays ok it looks to me like it should check if 3 > 7, find that is not true, then 'and' should tell it to skip everything else since the previous part was not true, and it should not display ok. But that is not what happens. It does display ok. So either 'and' is always evaluated before 'or', or perl reads and/or lists backwards. The moral of the story is, use parenthesis when mixing 'and' and 'or': if ( 3 > 7 and (4 > 9 or 5 > 3) ) { print('ok') } # does not display ok if ( $A ) { something } # does something if $A is not 0, '0', '', or undef # So you could have at the beginning of a program: $No = 0; $Yes = 1; # and later if ( something ) { $SomethingHappened = $Yes } # and at the end if ( $SomethingHappened ) { ... } # and the last line is easy to read; it looks like plain english for ('a','b','c') { next if $_ eq 'b'; print $_; } # displays ac # for is followed by a list of data, and then by a list of commands. # The list of commands is executed for each item in the list of data, # with $_ set to the current datum. next means to stop running the list # of commands and go on to the next datum. Note the curly brackets around # the list of commands. Note that 'for' does not take a semicolon outside # of the curly brackets. $A=1; while($A<5) { print $A; $A=$A+1; print $A; } # displays 12233445 # if the first line is $A=5, the while loop does not execute at all. # note curly brackets, and no semicolon outside curly brackets. # note this will not work with 'use strict', because $A is not specifically # declared as local or global. If 'use strict' is not used, $A is assumed to # be global because it is not declared local, and it works. # The following is an example from man LINE: while () { next LINE if /^#/; # discard comments ... } $A = 4; until ( $A > 5 ) { print $A; $A = $A + 1; print $A; } # displays 4556 # if the first line is $A=6, the until loop does not execute at all. until is the same as 'while ( ! ', while not. Or you can think of it as while is the same as 'until ( ! ', until not. The test in the ( ) is evaluated before the commands in the { }. Sometimes the test tests the output of the commands, and there is no point in doing the test until after the commands have run. In this case you can put the commands before the test in the ( ) like this: until ( command1, command2, $A = command3, $A == $CorrectAnswer ) {} Note the commands end with a comma instead of a semicolon! That looks kind of funny to me, so I prefer to do: $A = $WrongAnswer; until ( $A == $CorrectAnswer ) { command1; command2; $A = command3; } Note that I set $A to the wrong answer before the until command, because otherwise it would have been uninitialized the first time it was tested, and if I had set it to the correct answer, the loop would not have run at all. You could also do do {} until () while(){} and until(){} can have last and next. next means to go to the beginning and do the test again and continue from there. last means to go to wherever you go when the test fails. If you have while or until blocks within other while or until blocks, next and last refer to the inner block. But if you use a label, next and last refer to the labeled block. L_ABlock: while ( $A > 0 ) { until ( $B < 0 ) { next L_ABlock if $C = -1; if ( $C = 10 ) { last } } } for(){} can probably have last and next too. while can have no test at all. In this case the loop runs until a last command is run. For example: $A = 0; while () { $Answer = function($A); last if $Answer == $CorrectAnswer; $A = $A + 1; } or: while () { $A = ; if ( $A eq "\n" ) { last } } But if while is changed to until, then the compiler calls it an error. # defining a subroutine or function sub example_sub { print("first parameter of subroutine/function is $_[0]\n"); print("second parameter of subroutine/function is $_[1]\n"); my ($A,$B)=@_; # make local copies of data passed to sub print("$A$B\n"); return('done'); } $A=&example_sub('a','b'); print("subroutine/function returned $A\n"); # That displays the following: first parameter of subroutine/function is a second parameter of subroutine/function is b ab subroutine/function returned done # note that inside the sub, @_, $_[0], etc, is the data passed to the sub # note '&' as the first character of the sub name when the sub is called, # but not when the sub is defined. # I think names of subs are usually in lower case, like 'get_line' sub x { $_[0]=$_[0]+1;} &x($A); # $A is changed. use my in sub to make # local copies of data if you do not want to change global data. $A='ma';$B='mb'; print($A,$B); &s; print($A,$B); sub s {my $A='sa';local $B='sb';if(0==0){print($A,$B);}} # displays mambsasbmamb # man says that my data is only in current block; local data is in current # block plus blocks called from the current block. This example shows that # blocks which are inside the current block are considered part of the # current block. # if you intend to test to see if a function succeeded, make sure your # function returns something appropriate. sub s {print};print(&s); # displays 1, function succeeded sub s {print;return};print(&s); # displays nothing, function failed sub s {print;return(3)};print(&s); # displays 3, function succeeded sub s {print};if(&s){print("yes!")} # displays yes!, function succeeded sub s {print;return};if(&s){print("yes!")} # displays nothing, function # failed sub s {print;return(-1)};if(&s){print("yes!")} # displays yes!, function # succeeded # in perl 5.6 (and probably later, and probably not before) you can pass a # filehandle to a sub, but you cannot use the filehandle from the sub's # array of parameters, so you have to copy the filehandle from the sub's # array of parameters to a local variable before using the filehandle, like # this: sub fooprint { print($_[0] "$_[1] array variable\n"); # DOES NOT WORK!! my $Filehandle = $_[0]; print($Filehandle "$_[1] my variable\n"); # works } &fooprint(STDOUT,'filehandle in no quotes, stored in'); &fooprint('STDOUT','filehandle in single quotes, stored in'); # if you pass two or more arrays to a sub like this: &foobar(@array1,@array2); # the arrays are merged into one array. To keep the arrays seperate, you # could do this: &foobar($#array1,@array1,@array2); # or you could use references like this: @array1 = ('array 1 item 1','array 1 item 2'); @array2 = ('array 2 item 1','array 2 item 2'); sub foobar { # there are three ways of accessing the arrays. all three should compile # the same, the only difference is how much you type and the readability # of the source code. print("${$_[0]}[0]\n"); print("$_[1]->[0]\n"); print("$_[0][1]\n"); } &foobar(\@array1,\@array2); $SuccessOrFailure=opendir(DH_Directory,$DirectoryName); $EntryInDirectory=readdir(DH_Directory); # gets the name of the next file or whatever is in the directory $SuccessOrFailure=rewinddir(DH_Directory); # like close the directory and open it again; the next readdir will get # the first entry again. $SuccessOrFailure=closedir(DH_Directory); $ReturnOrExitCode=do($PerlScriptFileName); # execute a perl script $SuccessOrFailure=sleep($NumberOfSeconds); # wait for some seconds $ExitCode=system($Command); # run a non perl command or program eval($PerlCommands); # execute a string as if it were some perl commands # Note that if the string contains the command exit, the whole program # will exit; but if the string contains the command die, then the eval # function will abort, and the program will continue. do('$FileName'); # includes another file in the current program. #!/usr/bin/perl -w do('i'); $y = 'ypassedtodo'; &foo; print("$x\n"); __END__ file i is: sub foo { print("$y\n"); $x = 'xpassedfromdo'; } note that subroutines include with do() can read and write global variables in the main program. goto LABEL; LABEL: # LABEL does not need a semicolon after it because it is not a command. $NumberOfMatches=grep(Pattern,$Input1,$Input2,$Input3); @MatchesFromArrayName=grep(Pattern,@ArrayName); print(grep(/^f/,'foo','hi','fee','arrgh')); # displays foofee print(grep(!/^f/,'foo','hi','fee','arrgh')); # displays hiarrgh # Note that the pattern begins and ends with a slash, and no quote marks. # The slashes are not part of the pattern, but are required. # I think that the pattern is treated as if it were in double quotes, so # backslash and dollar sign substitutions are performed. $SuccessOrFailure = ($StringToBeSearched =~ m/$PatternToSearchFor/); print('hi'=~m/f/); # displays nothing ('', empty string) print('hi'=~m/h/); # displays 1 (string 1 or integer 1?) # note that m// returns '' if the match failed, not 0, not undef. print('match') if ('hi'=~m/h/); # displays 'match' print('match') if ('hi'=~m/x/); # displays nothing print('no match') if (! 'hi'=~m/h/); # displays nothing print('no match') if (! 'hi'=~m/x/); # WRONG! displays nothing print('no match') if ('' eq 'hi'=~m/h/); # displays nothing print('no match') if ('' eq 'hi'=~m/x/); # displays 'no match' 'foo/bar/gloop' =~ m|(.*)/(.*)|;print("$1 $2\n"); # displays foo/bar gloop 'foo/bar/gloop' =~ m|(.*?)/(.*)|;print("$1 $2\n"); # displays foo bar/gloop # note that the first '.*' matches as much as possible, unless we add '?', # in which case the first '.*' matches as little as possible. In other # words, the '/' in the pattern matched more than one place # in the string, and perl's default is to take the last match. $A='test';$B='^t.*t$';print('it matches') if ($A=~m/$B/); # displays it matches $SuccessOrFailure=m/Pattern/; # short form of pattern matching, searches $_ $SuccessOrFailure = ($StringToBeSearchedAndMaybeChanged =~ s/$PatternToSearchFor/$ReplacePatternWithThis/); # Note that s/// will only replace the first match, # unless you use the 'g' option $NumberOfSubstitutions = ($StringToBeSearchedAndMaybeChanged =~ s/$PatternToSearchFor/$ReplacePatternWithThis/g); $A = 'sheddher'; print($A =~ m/he/);print($A); # 1sheddher print($A =~ s/he/ha/);print($A); # 1shaddher print($A =~ s/he/ha/g);print($A); # 2shaddhar # (s/pattern/replace-with/) is the same as ( $_ =~ s/pattern/replace-with/) Do not use quotes within m// or s///. m// and s/// imply double quotes; if you add double quotes, the quotes are part of the string. # Perl patterns are not the same as sh patterns; Perl patterns are like # grep patterns. In Perl, the pattern 'a' will # match anything that has an 'a' in it, like 'can', 'ant', 'cola', and 'a'. # This is like the sh pattern '*a*'. The sh pattern 'a' is like the perl # pattern '^a$'. '^' matches the beginning of the string; '$' matches the # end of the string. # The perl equivalent of sh '*' is '.*'. The perl equivalent of sh '?' is '.'. # In perl patterns, control characters, punctuation marks, and just about # everything that is not a letter or number has a special meaning. If you # actually want to search for one of these characters, put a backslash before # it. For example, to search for four consecutive slashes, use the perl # pattern '\/\/\/\/', otherwise known as leaning toothpick syndrome. # So the sh pattern 'foo*' would be the same as the perl pattern '^foo', # and also the same as the perl pattern '^foo.*$'. $_=''; m/\/; print $`;print "\n"; # data before the data which matched print $&;print "\n"; # data which matched print $';print "\n"; # data after the data which matched print $1;print "\n"; # data from the first set of parentheses print $+;print "\n"; # data from the last set of parentheses # displays the following: history.html history.html die("All done!\n"); # display message and exit the program. # If the message does not include \n for a newline, then the line number # will be displayed. die("error "); # displays error at [program name] line ### exit($ExitCode); # exit program and set exit code # But if I run a perl script from a sh script, then the sh script always thinks the perl script gave an exit code of 0. Is this a bug in perl 5.004_04? This also happens in bash and ash scripts, but not if I run the perl script from the bash prompt, and not if I run the perl script from another perl script with system(). chdir '/usr/spool/news' or die "Can't cd!\n"; # Note two commands seperated by 'or'. The first command is executed. If the # first command succeeds, the second command is ignored. If the first # command fails, the second command is executed. Probably failure means # the command returned 0, nothing, or undefined; probably anything else # is interpreted as success. # unless does the same thing, except it is backwards: die "Can't cd!\n" unless chdir '/usr/spool/news'; # and executes the second command if the first command succeeded chdir($NewCurrentDirectory) and print("current directory changed\n"); # if does the same thing except it is backwards print("current directory changed\n") if chdir($NewCurrentDirectory); # You can combine commands with 'and' and 'or'. print('1') and system('true') or print('2'); # displays 12 print('1') and system('false') or print('2'); # displays 1 You can do a whole program this way to create a program which aborts as soon as a command fails, but there will be no error messages, so the user will not know where the program failed. Remember that and and or expect functions to return 0, '', or undef if failed and anything else if success; system() usually returns 0 for success so the meanings of and and or are reversed for system(); watch out for other functions which do not return 0 for failure, like index(). For a complicated block of ands and ors you may have to use parenthesis. You can usually make a program shorter by using and and or instead of using if () {} else {}, but you may have to use if () {} else {} to do really complicated things. $OutputString = sprintf($FormatString,@InputData); sprintf() formats one or more data into a string, like C printf(). For example: ($sec,$min,$hour,$day,$mon,$year) = localtime; until ( $year < 100 ) { $year = $year - 100 } print( sprintf('%02u:%02u:%02u %02u/%02u/%02u',$hour,$min,$sec,$mon,$day,$year), "\n"); # displays 12:59:55 04/03/00 $SuccessOrFailure = exec($ExternalCommand) runs an external command, like system, except that when the external command is done it does not return to the perl program. In other words, exec() ends the perl program and starts some other program. But if the exec fails, then the perl program does not end. So it is a good idea to put an error handler after exec(). You do not need to check to see if exec() failed; if the error handler is running, you know the exec() failed. For example: exec("foo"); print(STDERR "exec failed\n"); exit(1); The bad news is that if you are using perl -w, then perl will warn you about that; it will say something like: Statement unlikely to be reached at ... line ... (Maybe you meant system() when you said exec()?) Ignore the warning; I think it is correct to use an error handler. Or you could do exec() or &errorNoExec; where sub errorNoExec {} is the error handler; perl does not warn you about that. $PID = fork does a fork. In other words, it clones the current program process, so that it becomes two processes. These two processes are exactly alike; they are the same program, executing at the same point; they have seperate memory, but their memory blocks have the same contents; the only difference between the two is that in one $PID is 0, and in the other $PID is something other than 0. The one in which $PID is not 0 is the parent process; it has the same process id as the original program; and it has $PID set to the process id of the child process. The child process has $PID set to 0; it is a copy of the parent process, with a new process id. Immediately after executing the fork, the two processes are doing the same thing. I can not imagine why you would want two processes doing the same thing; you probably want the two processes to do different things. So after doing a fork you probably want to test $PID, and do one thing if $PID is 0, and do something else if $PID is not 0. If the fork failed, $PID will be undef; it would not hurt to check for that too. For example: $NA = fork; if ( ! defined($NA) ) { print(STDERR "fork failed\n"); exit } if ( $NA == 0 ) { # child does this exec("foo") or &errorNoExec; } # parent does this print("process id of child is $NA\n"); Often when you use fork, you want the child process to be a completely different program, so you often use exec() after fork to end the current program and start some other program. I wrote a perl program which forked, and the parent exited before the child; and it seemed like the prompt did not come back until the child exited; it seemed like the parent would not exit until after the child exited. But I wrote a simple perl program to test this; it forked, the parent exited right away, the prompt came back right away, and the child exited later. So I do not know. If a program has to do two things at once, then it is probably a good idea to fork into two process, and then one process can do one thing and the other process can do the other thing. For example, if you are writing a terminal program, then the terminal program has to read data from standard input and write it to the modem, and at the same time it has to read data from the modem and write it to standard output. But the terminal program does not know if the next byte of data will appear at the modem or at standard input, so what happens if it is waiting for data at the modem, and data appears at standard input? If you have more than one source of input, and you wait for data at one input, then the other source of input must be able to interrupt the wait. But if you fork the program, then you can have one process waiting for each input, and you do not need to bother with interrupts. Your terminal program could have one process reading data from standard input and writing it to the modem; and one process reading data from the modem and writing it to standard output. But note it should probably open the modem before the fork, so the modem would only be opened once, because if both processes opened the modem independently, then the second open might block the first open; the child process can not read standard input, so the child should read the modem and the parent should read standard input; and to exit the program you have to exit both processes, so the parent should kill the child before exiting. If you were writing a BBS communications program instead of a terminal program then fork would not work so well, because the process which was writing data to the modem would have to watch both standard input and the modem, because sometimes it would write a response to the modem in response to a prompt it read from the modem, and sometimes it would write something from the modem because it read that from standard input. If a perl script forks, and if both processes try to read from standard input, and if there is some data at standard input, then sometimes the parent gets the data and sometimes the child gets the data. It appears to be random. If only one of the processes tries to read from standard input, then the process which reads standard input will get the data from standard input. If you are at the command prompt, and you run a perl script, and the perl script does fork(), the command prompt comes back after the parent exits. It does not matter whether or not the child has exited. If the child has not exited, and if the child tries to read standard input, then the next line you type might go to the shell, or it might go to the child. If you need to write data and watch for SIGPIPE, you can not do that with two processes because SIGPIPE always goes to the process which is writing data. The man pages for perl say that it is important to harvest zombies. I do not understand this, but it may mean that if a perl program creates a child process, and then the child process tries to quit, then the child process stops and does not quit, does not do anything, until the parent process runs waitpid(). signals If your perl script is getting signals, and you want to control how it responds to signals, at the beginning put 'use sigtrap;'. Then for each signal you want to trap, set $SIG{signal name} to the name of the signal handler subroutine. For example, to use a subroutine named 'signalTrap' as the signal handler to trap SIGPIPE: '$SIG{PIPE} = \&signalTrap;'. Then create the signal handler subroutine: sub signalTrap { if ( $_[0] eq 'PIPE' ) { $SigPipe = 1 } } # Note that $_[0] of the signal handler subroutine is set to the name of the signal. If you are trapping more than one signal, you only need one signal handler subroutine. If there is a default response to that signal, Perl already has a default response to some signals, usually to abort the program. If you set up a signal handler for a signal, perl will run your signal handler. After running your signal handler, perl may run the default signal handler if perl thinks the signal is important! For SIGPIPE, perl does NOT run the default signal handler after your signal handler. For SIGINT, perl DOES run the default signal handler after running your signal handler. SIGINT is the signal sent to your program if the user presses control-c while your program is running. So if your program traps SIGINT, then when the user presses control-c, your signal handler will run, and then the program will abort. A signal could interrupt your program at any time; your program could be doing anything when it is interrupted by a signal, including running some external program or calling some external library function. Most things have no problem being interrupted, but a few things do not like to be interrupted. In particular, there may be some external library functions which may crash if they are interrupted and the interrupt handler calls the same function. Bugs like these are very unpredictable and hard to detect and fix. Therefore, an interrupt handler should do as little as possible before returning to whatever was interrupted. Ideally, the interrupt handler should set some global variable to indicate that the interrupt occurred; and the main program should check this variable frequently, and run the real interrupt handler and reset the variable if an interrupt has occurred. I think it is safe for an interrupt handler to display error messages and exit. I do not know any way to interrupt a program when there is data waiting at standard input or a serial port. But you could fork a program into two processes, and one process read from input into a shared memory buffer, and the second process could check the buffer for data, or the first process could send a signal and interrupt the second program. If you do not know what signals your program is getting, the following code will tell you: use sigtrap; $SIG{HUP} = \&signalTrap; $SIG{INT} = \&signalTrap; $SIG{QUIT} = \&signalTrap; $SIG{ILL} = \&signalTrap; $SIG{TRAP} = \&signalTrap; $SIG{IOT} = \&signalTrap; $SIG{BUS} = \&signalTrap; $SIG{FPE} = \&signalTrap; $SIG{KILL} = \&signalTrap; $SIG{USR1} = \&signalTrap; $SIG{SEGV} = \&signalTrap; $SIG{USR2} = \&signalTrap; $SIG{PIPE} = \&signalTrap; $SIG{ALRM} = \&signalTrap; $SIG{TERM} = \&signalTrap; $SIG{CHLD} = \&signalTrap; $SIG{CONT} = \&signalTrap; $SIG{STOP} = \&signalTrap; $SIG{TSTP} = \&signalTrap; $SIG{TTIN} = \&signalTrap; $SIG{TTOU} = \&signalTrap; $SIG{URG} = \&signalTrap; $SIG{XCPU} = \&signalTrap; $SIG{XFSZ} = \&signalTrap; $SIG{VTALRM} = \&signalTrap; $SIG{PROF} = \&signalTrap; $SIG{WINCH} = \&signalTrap; $SIG{IO} = \&signalTrap; $SIG{PWR} = \&signalTrap; sub signalTrap { print(STDERR "SIG$_[0] received\n"); } To make a connection to an internet socket, like a client connecting to a server, you need the socket domain, the socket type, the socket protocol, the name or number of the remote computer, the port, and a file/socket/whatever handle. It would be nice to have a function where you give the data in a convenient format, and the function does everything. There is no such function, although you could create a sub. You need to use two functions, socket() and connect(), and the data must be given in inconvenient formats, so you must use other functions to convert the data. The socket domain, the socket type, and the socket protocol are integers. But you should give the socket domain, the socket type, and the socket protocol by name because the code is easier to understand, and the code is more portable because different operating systems may use different numbers, but will use the same names. For the socket domain and the socket type, you can give the name and perl will automatically convert to the number. For the socket protocol, you have to use function getprotobyname() to convert the name to the number. The socket domain is usually PF_INET, so if you do not know the socket domain, assume PF_INET. The socket type is usually SOCK_STREAM, so if you do not know the socket type, assume SOCK_STREAM. The socket protocol is usually tcp, so if you do not know the socket protocol, assume tcp. The man page says getprotobyname() works differently in a list context, so if you nest getprotobyname() in socket(), as in socket(A,B,C,getprotobyname('tcp')), then it might not work. use Socket; $SocketProtocol = getprotobyname('tcp'); socket(SOCKET_HANDLE, PF_INET, SOCK_STREAM, $SocketProtocol); connect(SOCKET_HANDLE, pack_sockaddr_in($Port, inet_aton($NameOrNumberOfRemoteComputer))) or die; # read from and/or write to SOCKET_HANDLE, # same as reading from or writing to a file handle close(SOCKET_HANDLE);