I certainly am a Linux advocate. I use it on my laptop, and several of my iSeries clients have it on their iSeries. I started out first with a dual boot laptop, but now my machine only boots to Linux. I do use Windows 98 (using Win4Lin from Netraverse), but I have it available only so I can test my Web application delivery to Internet Explorer.
I'm the Java guy, or so I try to be when I present at COMMON, write articles and books, and do consulting. I use Linux as my development platform. And even though I have WebSphere Studio Application Developer (WSAD, the new Java GUI from IBM), I prefer to use the Linux standard editor--vi. Actually, that's vim, or vi improved. I use Ant to automate my compiles and to do other things like creating JavaDocs, FTPing files, testing units, and signing applets. For source control, I use the open-source Concurrent Versioning System (CVS), which is also used to track the code for most open-source products themselves. But there's another technology, one with a Unix heritage, that I've been using heavily lately, and that is the subject of this month's column--regular expressions.
Today, regular expressions (which, you'll see, are anything but regular) are used all over the place, including on non-Unix platforms. Regular expressions are used in JavaScript in Internet Explorer and Netscape Navigator. They are used in Jakarta Struts (the leading Java-based Web application framework). The Apache Foundation's Jakarta project has a Java package called Jakarta ORO that provides regular expressions for JDK 1.2 and 1.3. In fact, regular expressions are so heavily used that Sun saw fit to add support for regular expressions to JDK 1.4.
On the Linux side, regular expressions can be used from the vi editor, the ubiquitous Perl programming language, and the Unix sed utility.
Regular Expressions in Five Minutes
Regular expressions are arguably a complete language. They are comprised of a string of special characters interspersed with sets of characters that are used as a mask against strings in files and HTML entry fields. The regular expression engine compares a line of text with your regular expression mask. The regular expression engine can either simply return a Boolean saying your text string did not match the mask or, optionally, update characters in that string. The following, for instance, is a regular expression that can be used to compare phone numbers.
That regular expression can be used in JavaScript to test an input test:
var phoneRE = /^(ddd) ddd-dddd$/;
if (phoneNo.match(phoneRE)) {
return true;
} else {
alert("The phone number entered is invalid!");
return false;
}
}
But that regular expression expects a space after the area code (if given) and a hyphen between the exchange and the four-digit number. The following accepts an optional area code (with optional parentheses), a three-digit exchange with one space or no space after the area code, and a four-digit number with a single space, a hyphen, or no space between it and the exchange:
Regular expressions have a number of special characters in them to control how the mask works. The caret (^), for instance, if at the beginning of the string, says to match the following mask from the beginning of the string. The dollar sign ($) says to match the preceding mask from the end of the string. The escape-d, identified with the forward slash () and the lowercase letter d says to match a digit. The vertical bar symbol (|) is the regular expression Boolean "or" character. The caret control character (^), if not used at the beginning of the mask, is the Boolean "not" character. The backward slash (/) is the commonly used delimiter for the complete mask. It can be replaced with another character if necessary--say, for instance, if you are validating a URL, which, itself, contains back slashes.
If this is your first exposure to regular expressions, I'm sure I've lost you. Just be aware that regular expressions are cryptic yet very powerful. You could do the same checks with code, but your code would become lengthy and far more error-prone than regular expressions.
Updating with Regular Expressions
But regular expressions can do more than simple check strings. The language supports updating strings as well. Like I mentioned earlier, I use vim as my Java editor rather than an editor from IBM's Eclipse Java GUI product (WDSc, WSSD, WSAD) even though I make money training people to use IBM's Java IDE. One of my biggest reasons for using vim is regular expressions. Regular expressions give vi essentially scan/replace but with superior capabilities. For instance, as I wrote this document in vi, I noticed that I uppercased the first letters of the string "regular expression." To change them to lowercase, I used the following in vim:
Let me explain that vi command. The percent symbol (%) says to operate on the whole file. The first s says to search. The search mask
"/(sRegular)(sExpression)" is followed by a replacement mask of "L1L2," which says to lowercase (with the slash L) the string that matches the first parenthetical expression ((sRegular , as identified with the shorthand slash-1 notation). The trailing g says the replace operation is to be global. Such obscure syntax often scares programmers away from using regular expressions, but, more often than not, you can go mining the Internet for regular expressions that fit your need and, after playing around with them for awhile, you begin to really appreciate the mini-language.
Sed Again
Although this example use of regular expressions within vi was contrived, I regularly use the same strategy when I do my Java programming. I became so accustomed to the use of regular expressions that I wanted a way of globally replacing Java code in all source files of my app. That's where the Unix sed utility comes into play. The sed utility takes an input file and runs all its text through a regular expression. What I do is write a quick shell script
that runs files in a directory (or directories) recursively through sed. I used the following, for instance, to convert a client's Java ServerPages from the syntax of JSP 0.91 to JSP 1.1:
#
# create="yes/no" to blank
# <%@ import to <%@ page import
# <%@ isErrorPage= to <%@ page isErrorPage=
# type="com. to class="com.
# > to />
# JSP 0.91 to JSP 1.1 Converter
mkdir convertedjsp
for jsp in $(ls *.jsp);
do
cat $jsp | sed 's/
sed 's/
sed''s/create="no"//''
sed ''s/create="yes"/g''
sed ''s/type="com./class="com/g''
sed ''s/<%@ import/<%@ page import/g''
sed ''s/<%@ isErrorPage=/<%@ page isErrorPage=/g''
sed ''s/>/>/g''
sed ''s/>/g'' >
convertedjsp/$jsp
done
# end of sed script
Note, however, that I've recently begun to use Perl, with its integrated support for regular expressions, rather than shell scripts and sed. Here's a handy Perl script that I use to verify my regular expressions (regardless of where I will be using them--Java, JavaScript, Perl, or otherwise):
use strict;
while (<>) {
chomp;
# replace the regular expression with the
# one you want to test
if (/^(ddd) ddd-dddd$/) {
print "Matched: |$`<$&>$'| ";
} else {
print "No match. ";
}
}
Strut'in My Regular Expressions Stuff
I've been using regular expressions in JavaScript for a while now, but, with the advent of Struts 1.1, I now use them in my server-side Java Web applications. Struts 1.1 added the ability to use declarative edits for HTML input fields. The declarations are placed in an XML file called validator.xml. The following is a validator.xml snippet that declares edits for the input form called visits:
Note that Struts will automatically edit the qualified fields on the server. But Struts will also, as an option, add JavaScript code in the JSP input form that performs the same regular expression edits that are performed on the server, via Java. By selecting that client-side edit option, performance is enhanced because there isn't a round trip to the server. And you didn't even have to write the JavaScript code.
Regular Expressions, Linux, and Windows
This column may not have talked you into loading Linux at your shop, but I hope it persuaded you to look into using regular expressions for your Web applications. If you are dabbling with Linux, it behooves you to use regular expressions, if not in vi, at least with shell scripts via the sed utility or in Perl programs. Java programmers should look into using Struts (for far more than just the benefit of regular expressions). As I said earlier, regular expressions are directly supported in JDK1.4. But don't wait until you are using JDK1.4--you can use Jakarta's ORO package today with JDK1.2 and 1.3.
If you want to learn more about regular expressions, try the following books. Each has several chapters on them: JavaScript: The Definitive Guide, 4th Edition by David Flanagan, O'Reilly; Learning Perl, 3rd Edition by Randal Schwartz and Tom Phoenix, O'Reilly; or if you really want to get into depth, Mastering Regular Expressions, 2nd Edition by Jeffrey Friedl, O'Reilly.
Don Denoncourt is the co-author of Understanding Web Hosting on Linux, along with Barry Kline. He can be reached by email at
LATEST COMMENTS
MC Press Online