UNIX/CGI/Perl
It's very common for web sites to generate web pages on the fly, accessing
information in a server database to be used to create and display HTML pages
on the fly - dynamic HTML. The pages are created by a program that runs on
the server. Since VB does not work in UNIX, and since most servers run UNIX,
VB programmers are out of luck.
Fortunately, UNIX does support the very popular freeware language "Perl",
which I call the "QBasic of
UNIX". Perl is an interpreted language, like QBasic, with which you can
write programs to run on UNIX servers and which you can call from your web
pages. Perl is functionally similar to QBasic and the basics of Perl are
fairly easy to learn. This part of the tutorial briefly covers UNIX and Perl.
Return to top of document
UNIX Overview
Fact is, most web servers use the UNIX operating system.
If your ISP operates from a UNIX machine,
then you cannot use Microsoft Windows/NT products on the server. Either
you find a new ISP who offers an NT server, or it's time to pick up a few new
programming skills.
One quick comment - Perl has been ported to the PC. I develop
my scripts in Win98, then transfer them to my UNIX server. You do not
have to have UNIX locally to develop your scripts. I also test out the
scripts on the UNIX environment using the Telnet utility that comes with
Win98 (remote access to the UNIX server from my PC).
When UNIX is running on a computer it presents the user with
a command line (just like DOS) from which commands may be typed in for
immediate execution. Just like DOS, you can type in single commands,
create batch files, or initiate programs. If you prefer the graphical
interface, there is also a program called XWindows which provides a similar
interface to a UNIX machine that Microsoft Windows/NT gives to PCs.
While many of you might have a server (Unix machine) of your own, most
of your reading this tutorial will be working with a remote server - one
that your ISP maintains for you. Fortunately, you can access the remote
server (where your web site is located) using a variety of software
applications. A very common application called Telnet comes free with
Win9x/NT. Telnet works from within a DOS window. Just type in "telnet"
at the DOS prompt.
When accessing a server remotely, you would use Telnet to creates a
connection to the server. Once connected, Telnet presents you with a
a window on your home PC which lets you interact with the server just as
though you were sitting right in front of it. You will notice a delay
between your entries and the response of the UNIX server due to the
phone/modem connection, but otherwise everything is done in real time.
The UNIX operating system is designed to allow multiple people to log on at
the same time, each operating in their own virtual computer space - unable to
see or affect the other users that are also logged on to the same machine.
Just like with PC programs that you're familiar with, the output of a
program running on the UNIX server can be sent to the screen or to files (or
to printers if your ISP allows it, but most don't for security reasons).
Input is made via the mouse or keyboard (Telnet supports both). In UNIX
terminology the screen is referred to as STDOUT and the keyboard is referred
to as STDIN (just like in DOS, although the terminology is not used as much
in DOS).
In the next section, we'll talk about basic UNIX commands which you can
run from the Telnet prompt.
Return to top of document
UNIX Commands
Despite its differences, a UNIX machine accepts commands that are essentially
the same as those that a PC DOS machine will accept. Mostly, the commands
have to do with file and directory manipulations, as well as with basic
text manipulations.
However, whereas DOS uses only the COMMAND.COM to supply the familiar C: prompt,
there are several common equivalents in UNIX to COMMAND.COM. There are three
of these "shells" (Bourne, C, and Korn), as they are callled, that are most
often used. The shells are generally similar to one another, although there
are interface and command differences.
Which one you have on your server depends on what your ISP chose. Generally, the
choice of a shell is not critical because you will be writing your web
server programs in Perl or C rather than the batch language that is specific
to the particular shell that you use to interface to the UNIX operating
system. I'll have more to say on this in the CGI section.
Below are the most common UNIX commands that you might use during a Telnet
session. Remember that these are UNIX commands - not Telnet commands. Telnet
is just a program that connects you to the UNIX server for remote access.
You'll see that these commands are mostly associated with manipulating
files on the server.
While you can use UNIX commands to edit/delete/move files directly on the
server, it is usually more convenient to manipulate the files on your PC
and then copy them over to the UNIX server. Passwords are almost always
required to establish a Telnet connection to protect the server contents
from malicious acts from unauthorized visitors.
One thing to note - UNIX is case-sensitive! If you're following examples
on how to use UNIX be sure to follow the case of the examples.
Common UNIX commands:
- clear: clear screen
- cd: change directory
- ls: list directory content
- pwd: print working directory (the current directory)
- mkdir: make directory
- rmdir: remove directory
- cp: copy a file
- mv: move a file
- rm: remove a file
- man: manual (lookup documentation on a command)
- whatis: short description of a command
- cat: concatenation (display the contents of a file on the screen)
- more: more (view a file a page at a time)
- vi: vi text editor
- pico: pico text editor
UNIX also supports redirection (the <, > and >> commands and uses the . and ..
notation to describe the current and parent directories.
Finally, there is the issue of file permissions. On a file by file basis,
UNIX allows 3 levels of security in regards to reading and writing to files
as well as to executing program files. In general, web servers are set up
such that all executable files must be placed in a directory called
CGI-BIN. While not a requirement of UNIX, it is a practice followed to
prevent uncontrolled use of programs.
From a programmer's viewpoint the key is that programs must all be located
in a particular directory (but not data files) and the execution of those
programs can be controlled/limited in a variety of ways. This is critical
in that web servers would otherwise be susceptible to hackers who would plant
programs of their own on the server - programs which could do uncontrolled
damage to the files on the server. During your Telnet sessions you can set
the security for your executable files. Typically, a "chmod 755 filename"
command is used to set a file's security. You can get more information
on the chmod command by using the UNIX "man" statement.
Return to top of document
CGI Scripts
This brings us to the concept of CGI - Common Gateway Interface. When a
web browser sends a request to a web server the request may include a
simple request to return a web page. Often, satisfying the request may
require that the server execute a program (contained in the CGI-BIN directory)
which creates the web page that is to be returned.
You may have heard of the phrase "CGI scripts" but you need to be aware that
CGI is a specification for how the server and the executable program exchange
data. CGI is not a computer language and there is no such thing as
a CGI script - just programs which comply with the CGI specification.
Even so, the phrase CGI scripts is commonly used and simply means any
computer program which can be used to return data to a web server.
Here's the important part - any language (C, VB, Java, Perl, QBasic,
VBScript, or even batch commands of the OS shell) can be used as a CGI
script, provided that the language will run on the server.
For us VB programmers the downside is that VB, QBasic and VBScript have
not been ported to UNIX. They work only on Microsoft servers. And, since
most servers are UNIX-based, it's pretty likely that most of you will not
be able to use your VB/QBasic/VBScript knowledge to create interactive
web sites.
That's the case for me. My ISP uses a UNIX server running the Apache web
server. I could probably find an ISP that supports an NT server but that
means I'd have to change ISPs - something I don't have the time nor
inclination to do.
So, my choices are basically down to two language - C or Perl, both of
which are available for use on UNIX servers. I've chosen Perl for reasons
that I list below.
Return to top of document
Perl Overview
As a long timer VB programmer I never know much about Perl. Now, I've
found that it is the de facto standard for writing CGI scripts. It has taken
that over that role because of a few basic reasons:
- It is interpreted, just like QBasic, thus allowing for rapid evaluation
of scripts (programs) without going through the compile stage.
- It is free, just like QBasic. Unlike QBasic, Perl is in a continuing
phase of development.
- It has very powerful text handling capabilities. Not that Perl can do
anything I can't write in VB or QBasic, but the built-in Perl commands can
do things in 1 or 2 lines that VB might takes tens of lines to complete!
The downside is that Perl can be very cryptic looking. It was not written
with English in mind !
In the few weeks since I decided to incorporate dynamic page generation
on my web site (i.e., creating web pages on demand from a database on my
web site server) I've been able to understand Perl well enough to write
the Perl programs that do what I needed.
Perl is simple enough, and close enough to QBasic, that learning the basics
of Perl was pretty simple. There are many web site tutorials which provided
all the instruction I needed to write simple Perl programs. However, don't be
misled. Perl is too complex a language to be learned fully in a few
short weeks. Becoming an expert with Perl will take much longer but I've
found that even as a fairly new beginner you can write productive CGI
scripts!
Without further ado, let's look at a very simple Perl program:
#!/usr/bin/perl
$name="Hello World"
print $name;
Except for the header line,
this program looks amazingly like a QBasic program, doesn't it?
The first line is a standard header for UNIX scripts (each language has its
own header, telling UNIX where to find the executable)
and simply tells UNIX where to find the Perl executable.
Line 2 gives a value to a string variable and
Line 3 prints the string variable. Perl can really be this simple!
Here's a very important point about this script, one that applies to all
Perl programs. The Perl print statement goes, by default, to the screen.
If you ran this program in a Telnet session, "Hello World" would be displayed
on your screen.
If you called this CGI (Perl) script using a web page (I'll explain how later),
you would get a web page back with those same words! Here's the explanation
of how it works:
- The web browser sends a request to the web server to run the CGI script
- The web server receives the request and tells UNIX to run the script
- UNIX executes the script and redirects the output of the script
to the web server - i.e., the STDOUT is changed from the screen to a
data stream that goes to the web server
- The web server sends the redirected output to the web browser
- The web browser displays the results
This is pretty much how all web servers work. A link on a web page can
ask the web server for an existing web page or it can
ask the web server to run a CGI script which in turn will generate a web page
on the fly! All of the big commercial web sites work on this basis.
I've not yet explained one important piece of information - that the output
of the CGI script (the Perl program) must be in a format such that the web
browser knows what to do with it. For example, consider this simple HTML
page:
<HTML>
<Head>
</Head>
<Body>
<H1>Hello World!</H1>
</Body>
</HTML>
The sample Perl program I showed above would actually have to print all of
these lines to be compliant with the specifications for an HTML documents.
In practice a browser can get by on much less, but creating the extra HTML
tags is not difficult and in all of my Perl scripts I include the full set of
HTML tags in my output.
In case you haven't realized it by now, creating dynamic web pages means
you not only have to learn a new language such as Perl but you have to
understand HTML programming as well. This shouldn't be a problem for most
web masters because learning HTML was a pre-requisite for creating our
sites. The only new thing to learn is how to use Perl to create the web
page content.
I do not cover HTML programming in this tutorial but you will have
to master it before you can be proficient at creating dynamic web pages.
There are many online sources for learning the skills. I suggest the
Web Developer's Virtual Library.
It has a wide variety of tutorials covering various aspects of web site
programming.
Return to top of document
Perl Language
And finally we come to the fun part - coding! The intent with this section
is to provide you with a quick overview of the basic commands which Perl
provides, including some short snippets of code to show you how they work.
With study you should be able to use this tutorial to write some programs
of your own, but you will want to supplement this section by
reading some of the other online tutorials about Perl. At the end of this
section I provide a sample Perl program which shows how the following
commands can be tied together to create a useful program.
Syntax comments
- Perl is case sensitive. $Foo and $FOO are different variable names.
- Variables start with $. $string = "Gary Beene"
- Arrays start with @. @string = ("one", "two", 3, 4, "dog")
- Arrays can contain mixed types of data
- Associative arrays are supported. These are pairs of values, where
a value is retreived by using the first value of the pair as the index.
- Associative arrays start with %. %alldata = {"one", 1, "two", 2}
- Perl statements end with ;. print "Hello";
- Subroutines start with &.
- Notice the different brackets for scalar and associative arrays
- Comment lines begin with #.
Operations
- ++$a will increment a$ by 1
- --$a will decrement a$ by 1
- $a = 5 ** 10 gives 5 raised to the power of 10
- $a = $b . $c concatenates b$ and c$
- $a = $b x 5 makes $a become b$ repeated 5 times
- $a += $b adds $b to $a
- $a -= $b subsracts $b from $a
- $a .= $b appends $b onto $a (see string printing below for an alternative)
Arrays
- @data = (1, 2, "three") puts 3 values into the array @data
- $data[2] refers to the third (0,1,2) position of the array
- push (@data, "dog") adds the value "dog" to the end of the array
- pop (@data) removes the last value of the array
- ($a, $b) = ($c, $d) is same as $a=$c and $b=$d
- ($a, $b) = @data assigns first two values of array @data to $a and $b
- $#data gives the largest index value of the array @data
- $a = @data assigns the length of the array to $a
- $a = "@data" assigns the entire list of elements of @data to $a
File Handling
- open (INFO,"filename") to open a file for input using the handle INFO
- @lines = <INFO> to read the entire file into the array @lines
- close(INFO) to close the file
- open (INFO, >"filename") for ouput
- open (INFO, >>"filename") to append
- open (INFO, <<"filename") for ouput (the "&lgt;" is optional)
String printing
- print '$Hello' (single quotes) prints the six characters: $Hello
- print "$Hello" (double quotes) prints the value of a variable called $Hello
- print "Hi there, $myname, how are you?" inserts the value of the variable $myname into the string
- print @lines to print the entire array (one long string unless the string contains CRLF characters)
- print INFO "Hello" prints the word "Hello" to the file opened as INFO
- "\n" is a newline (carriage return / line feed)
- "\t" is a TAB
- print <<XXX on one line prints everything as-is until a line that starts in XXX (will recognize embedded variables)
Boolean
- scalar is TRUE if not a null string
- scalar is TRUE is not zero
- $a==$b tests if $a is numerically equal to $b
- $a!=$b tests if $a is not numerically equal to $b
- $a eq $b tests if $a is string-equal to $b
- $a!=$b tests if $a is not string-equal to $b
- ($a && $b) tests if $a AND b$ is true
- ($a || $b) tests if $a OR b$ is true
- !($a) tests if $a is false
Control Structures
- foreach - walks through an array
@food = ("apple", "pear", "peach)
foreach $morsel (@food)
{
print $morsel;
}
- for - executes a block of statements while an expression is True
for ($i=1; $i<10; ++$i)
{
print "$i\n";
}
- while or until - executes a block of statements until an expression is true/false
while ($a ne "stop")
{
$a = ;
}
--- or ---
while ($line = )
{
$a = ;
}
--- or ---
do
{
$a = ;
}
while ($a ne "stop")
- if - executes a block once if an expression is true
if ($a)
{
print "True";
}
--- or ---
if ($a)
{
print '$a is True';
}
elsif (b$)
{
print '$b is True';
}
else
{
print 'none were True';
}
Matching / Substitution
- /xxx/ is a string between slashes and is called a Regular Expression
- $string =~ /the/ is True if "the" is in the variable $string
- $string !~ /the/ is True if "the" is NOT in the variable $string
- Special characters between the slashes affect how the matching is tested
- $string =~ /^x/ tests for x at the start of the string
- $string =~ /$x/ tests for x at the end of the string
- $string =~ /./ tests for any single character
- $string =~ /t.e/ tests for t and e separated by any one character
- $string =~ /^$/ tests for a string with nothing in it
- $string =~ /[a-z]/ test for any one character of any lower case letter
- $string =~ /[a-zA-Z]/ test for any one character of any letter
- $string =~ s/dog/cat/ replaces dog with cat first time it appears in the string
- $string =~ s/dog/cat/gi replaces dog with cat anywhere in the string, case insensitive
Sample Perl Program
This section on Perl contained some pretty terse information, but an example
will help show how the commands can be fit together to create useful
scripts. Here's some actual code that I use at my web site. It receives a
search string from a web browser and compares each record in a database to
see if it should be printed out.
In this example I've left off the top and bottom parts of my script, which
are where I've put the HTML print statements. Those parts were very long
and I just wanted to show the heart of the Perl routine. In this example,
I have received a string called $value which represents a search string to
find in my data base, a text file called "sites.txt".
open(INFO, "sites.txt");
@lines = <INFO>;
foreach $record (@lines) {
@elements = split(/;;;/, $record);
if ($value eq 'all') {
print "<tr><td><a href=\"$elements[0]\"_
><b>$elements[1]</b></a><_
td align=center>$elements[6]<td>...$elements[8]" ;
}
else {
if ($elements[5] eq $value) {
print "<tr><td><a href=\"$elements[0]\"_
><b>$elements[1]</b></a><_
td align=center>$elements[6]<td>...$elements[8]" ;
}
}
}
close (INFO);
In this example, I open the file and read it all into an array called @lines,
with one record per element in the array. Then I split up each record into
an array called @elements (I used ;;; to separate each field in my database).
I then tested positions 0 and 5 of the array to see if they match the
incoming search string. In either case, if the match is True the information
is printed out.
Even from this short script you can see the HTML tags that
I use to format the printed values so that a web browser will interpret the
output of my script as an HTML file.
Last, but not Least
I sort of snuck in the comment that my CGI script received a variable $value
which had in it the search string supplied by the web server. This needs a
bit more explanation. When a web browser sends a request to the web server
to execute a CGI script, the request contains quite a bit of information.
Of interest to a CGI script writer is that the additional information sent
by the web browser is passed to the CGI script. Some of the data is easily
accessible by accessing UNIX Environmental variables, which are set by the
web server. However, some of the information sent by the web browser is
included in a simple encoded (specially formatted) string which the
CGI script must decode.
There are many sites on the web which provide free Perl scripts, including
the code to extract the information from the encoded string. I didn't show
it in this tutorial, but you will have to locate that code and include it
in your Perl scripts. It's a little more complicated than I've indicated
but is well within the capabilities of a beginning Perl programmer.
|