Getting Started
Introduction
Perl IDEs
History
Advice
Tools
Mini-Tutorial
Tutorial
Code Snippets

Resources
Top Sites
More Tutorials
Books
Magazines
Articles
NewsLetters
Mailing Lists
NewsGroups
Forums
User Groups
Talk Shows
Blogs
Clothing

GBIC >> Perl >> Information Center Tutorials >> File Handling

Perl Information Center Tutorials - File Handling
These tutorials were written to help you get a quick, but thorough, understanding of Perl - the scope of the language as well as it's specific capabilities.

Beginners Built-In Functions     Advanced CGI Applications

Files
Perl offers more functions that deal with files, I/O, and directory management than any other capability it supports. This wide variety of file functions is supplemented by some of the most terse (does more in fewer words - that's good!) syntax of any language, making file I/O operations extremely easy.

    • Open/Close
     close, flock, open, select
    • Random Access   
     eof, getc, read, seek, tell
    • Properties
     chmod, unlink, rename, stat, truncate
    • Print/Write
     print, printf, write, format

Opening/Closing Files
In Perl a file has to be opened and given a filehandle before the content can be accessed or before information can be written to the file. There are three modes in which a file can be opened - read-only, overwrite, or append, as shown in this sample code:

     open(INFILE,  "<input.txt")      close INFILE;    # read only
     open(OUTFILE, ">output.txt")     close OUTFILE;   # overwrite
     open(LOGFILE, ">>append.txt")    close LOGFILE;   # append
     

Note that for read only, the < in the qutoes is optional, so that "input.txt" is acceptable.

Perl will close all files when the program ends, but a file can also be close explicitly as follows:

     close INFILE

Closing a file explicitly is sometimes required to reset some of Perl's special variables, such as $. (line counter).

Reading From Files
In Perl there are three basic ways to read a file, depending on how much data you want to read at a time.

  • line - <> operator
    Once a file is opened, the <> operator is used with a filehandle to read one or more lines of text from a file, as in these examples:
         $line = <INFILE>;          # scalar context - reads 1 line
         @lines = <INFILE>;         # list context - reads all lines, \n included
         @lines[0..5] = <INFILE>;   # read lines into specific array elements
         $/=undef;$var = <INFILE>;       # read all lines as a single string
         $var = do { local $/; <FILE> }  # safer way to read all lines as string
         

    The while loop is often used to walk through a file, one line at a time, with action being taken on each line.

         open(INFILE,  "<input.txt");
         while (<INFILE>) {               # each line assigned to $_ 
             print "Line $. : $_"; 		   # $. special variable counts lines		
         }
         close INFILE;
         
  • character - getc function
    With the getc function, a single character can be acquired. This is typically used to capture keyboard input, rather than getting data from a file, but it works for either.
         $var = getc(<>);	# from STDIN (typically the keyboard)
         $var = getc(IN);	# from a file
         

  • fixed length - read function
    Some files, particularly databases, are divided into records which contain a fixed number of characters per record (newline are not used, especially not to act as a record separator). To capture a fixed number of characters, Perl provides the read function.
         $var = read(IN, $start_position, $number_characters);
         

Printing to a File
There are three functions which provide the ability to write to files, each providing different formatting capabilities.

  • print
    The print function allows writing of data to a file. Strings are written as defined by the program, whereas numbers use Perl default formatting. The print function uses with a filehandle followed by a list of variables (no comman between the two).

         print OUTFILE $text           # write one line of text to file       
         print OUTFILE @array          # write array into file
         print OUTFILE $var1, $var2    # write 2 variables to the file
         

  • printf
    The printf function works like the print function, except that it allows the use of a format string for each member of the list to be printed.

  • write
    The write function allows the user to define multi-line formats, suitable for printing reports, particular those which contain multiple lines using a common format. A separate tutorial on the write function and on creating formats is available, but here's a short example which defines a one-line report format which prints three variables. Then the Perl program assigns values to each variable and uses the write function to print the report to the file 'myreport.txt'
        format OUTFILE =
    	Test: @<<<<<<<< @||||| @>>>>>
    	      $x,       $y,    $z
        .
        $x = "dog"; $y = "cat";  $z = "pig";
        open  ">myreport.txt";
        write OUTFILE;
    
        OUTPUT:
    	Test: dog       cat    pig
    

File Position
Perl keeps track of the current position in a file, which is where the next read or write will take place. The tell function returns that position and the seek function can be used to reset the position to anywhere within the file.

     $position = tell IN;
     $new_position = seek (IN, $position, $action);

The $action tells Perl how to use $position (0 for absolute position, 1 for relative position from current position, and 2 for relative to EOF).

File Deletion/Properties
There other functions are provided to help manage files. The unlink function will delete a file, chmod will set the permission properties, and stat will return a 13-property list (including file size and date) of file properties. a file.

     unlink "input.txt";      	      # deletes the file
     chomod 755 "input.txt";          # sets permissions for the file
     @properties = stat ($filename);  # returns 13-property list of file properties

More on the FileHandle Operator
The <> operator is used in the examples above. Typically a filehandle (or variable containing a filehandle) is placed in the operator, which returns a single line when used in a scalar context and which returns all lines of the file when used in a list context - as shown in the next two examples.

     $a = <IN>     returns a single line (scalar context)
     @a = <IN>     returns all lines (list context)

If the content of the <> operator is not a filehandle, then Perl treats the content as a pattern to be globbed (return all filenames which fit the patter), as in the following example above.

     while ( <*.txt>) {   # <> operator puts filenames matching *.txt into $)
        print ;           # prints each filename
     }

Here are code snippets showing various ways in which the filehandle operator may be used.

     while (<>)             # returns 1 line at a time
     print <>               # prints next line
     for ( <*.txt> )        # returns filename each loop
     for (`dir *.txt /b`)   # returns filename each loop
     for (glob '*.txt')     # returns filename each loop

If the <> is empty then Perl takes one of two actions. If the special array @ARGV (contains the command line arguments) is not empty, its contents are assumed to be filenames, whose content is read by the <> operator.

In this next example the entire content of both files are printed as a result of using the empty <> operator.

     @ARGV = ('input1.txt','input2.txt');
     while (<>) {        # feed one line at time to $_
         print }         # print $_ by default

If the special array @ARGV is empty, then input is read from STDIN.

File Functions Reference
Here's a quick reference of the available file functions, in alphabetical order.

Unless otherwise noted, these functions operate on $_ by default.

  • chmod - changes file permissions
        chmod LIST                      
        chmod (0755, 'a.txt', 'bt.txt') 
        
  • close - close an open file
        close FILEHANDLE
        
  • eof - test for end of file
        eof FILEHANDLE      # without FILEHANDLE, applies to last file read
        
  • flock - lock file
        flock FILEHANDLE, OPERATION  # LOCK_SH, LOCK_EX, LOCK_UN, LOCK_NB
        
  • format - declares a format for writing to a file
        format Name = 
        FORMLIST
        # comment    # FORMLIST comments must have # in column 1
        .            # must be in column 1
        
  • getc - get the next character from a file
        getc FILEHANDLE
    
        $result = getc $INFILE
        
  • open - open a file
        open FILEHANDLE, FILENAME
        
  • print - output a list to a filehandle
        print FILEHANDLE, LIST
        
  • printf - output a formatted list to a filehandle
        print FILEHANDLE FORMAT, LIST
        
  • read - read specified number of characters from a file
        read FILEHANDLE, SCALAR, LENGTH, OFFSET
        
  • rename - change a filename
        rename OLDNAME, NEWNAME
        
  • seek - reposition file point for random access I/O
        seek FILEHANDLE, POSITION, WHENCE
        
  • select - reset default output or do I/O multiplexing
        select FILEHANDLE
        
  • stat - return file attributes/status
        stat FILEHANDLE
        stat EXPR
    
        ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
           $atime,$mtime,$ctime,$blksize,$blocks)
               = stat($filename);
        
  • tell - get current seekpointer on a filehandle
        tell FILEHANDLE
        
  • truncate - shorten a file
        truncate FILEHANDLE, LENGTH
        
  • unlink - delete a list of files
        unlink LIST
    
        unlink <*.txt>
        unlink ("a.txt", "b.txt", "c.txt")
        
  • write - write a formatted record to a FILEHANDLE
        write FILEHANDLE
        

If you have any suggestions or correction, please let me know.