Re: [perl] For the benefit of Mr. Baisley

Date view Thread view Subject view Author view

From: Wayne E Baisley (baisley@alumni.rice.edu)
Date: Sat Oct 14 2000 - 21:26:48 PDT


Maggie, you're beautiful! Of course, if I were doing it, for heavy-
handed, yet subtle, comic effect, I would have "embellished" Gore's by
adding a letter or two to make whoppergrams. Which is why I'm not a
cartoonist. Or stand-up comedian. Or the list-monitors' pet.

Cheers,
Wayne

To get with the techno-nerd program, I thought I'd share a little bit
of perl I wrote to modify user login scripts, a year or so ago. This
probably belongs on a perl list, so I thought it best to FoRK it. ;-}

The situation was that we had 20-30 thousand files with code like this

if [ -e /usr/local/etc/fermi.upsII.shrc ]
then
  if [ -r /usr/local/etc/fermi.upsII.shrc ]
  then
    . /usr/local/etc/fermi.upsII.shrc
  fi
else
  if [ -e /usr/local/etc/fermi.shrc ]
  then
    if [ -r /usr/local/etc/fermi.shrc ]
    then
      . /usr/local/etc/fermi.shrc
    fi
  fi
fi

[yeah, it's ugly. I didn't write it!]

that we wanted to replace with new stuff. With many variations in
spacing, intervening comments, echo commands, and so on, not to mention
both shell families. So, I picked an approach that took 50 different
known code fragments and turned them into regular expressions.
Whenever an expression matched, that chunk of code got commented out
(using ': #' which works universally), and the new boilerplate setup
code got prefixed to the file. Worked marvelously, to my great relief.

Here's the code that generates the regular expressions:

# patternize does the heavy lifting in doctoring the
# pattern-matching strings.
#
# This routine does two different things to the input string:
# 1) It quotes all of the metacharacters (and trims the string).
# 2) It generalizes certain known character sequences.
#
# In several places, we generalize things like parentheses and brackets,
or
# single and double quotes, in such a way that they won't necessarily
balance
# or match. This is perfectly fine, because the shell will have been
enforcing
# those rules. We can afford to match impossible supersets. In the
"worst"
# case, we'll end up overriding a chunk of code that wasn't working.
#
# The generalizations are quite involved in places, and can be
# fragile, since they can easily interfere with each other.
#

sub patternize

{

# Grab the input pattern string.

$_ = $_[0];

# Trim all leading and trailing newlines.

s/^\n*//s;
s/\n*$//s;

# Quote all of the regular expression metacharacters.

s/\./\\./gm;
s/\(/\\(/gm;
s/\)/\\)/gm;
s/\[/\\[/gm;
s/\|/\\|/gm;
s/\{/\\{/gm;
s/\^/\\^/gm;
s/\*/\\*/gm;
s/\?/\\?/gm;
s/\$/\\\$/gm;
s/\+/\\+/gm;

# Allow for an optional trailing semicolon. Another pattern mangling
# step below will allow for embedded semicolons, but that won't cover
# single-line patterns.

s/$/;?/gm;

# Allow for intervening echo commands. This consist of the command
# itself, plus a string we'll call ECHOJUNK for now, possibly surrounded
# by single or double quote marks (we're not overly picky about their
# matching or being balanced), possibly followed by a semicolon.

s/\n/\n(echo(\\s+-n)*\\s+('|")?ECHOJUNK('|")?;?)*\n/gm;

# Turn new-lines into a pattern to match arbitrary whitespace.

s/\n/\\s*/gm;

# Turn arbitrary whitespace into a pattern to match mandatory
whitespace.

s/\s+/\\s+/gm;

# Collapse the results of the previous steps into appropriate patterns.

s/\\s\*\\s\+/\\s+/gm;
s/\\s\+\\s\*/\\s+/gm;
s/\\s\*\\s\*/\\s*/gm;

# Finish our handling of intervening echo commands. Move the optional
leading
# whitespace inside the parentheses for the whole optional echo string.
# Expand the ECHOJUNK placeholder to match anything but control
characters
# which terminate lines, various quote marks, and the semicolon.

s/\\s\*\(echo\(\\s\+/\(\\s*echo\(\\s+/gm;
s/ECHOJUNK/[^\\n\\r\\f\\e`'";]*/gm;

# Generalize the file test options to match all of the usual cases:
# if -a, if -e, if -f, if -r, if -s, and if -x.

s/if\\s\+\\\[\\s\+-[aefrsx]\\s\+/if\\s\+\\\[\\s\+-[aefrsx]\\s\+/gm;
s/if\\s\+\\\(\\s\+-[aefrsx]\\s\+/if\\s\+\\\(\\s\+-[aefrsx]\\s\+/gm;

# Make sure comment lines in patterns don't bleed to the following
line(s).

s/#(\\s\+)*/#[ \\t]*/gm;

# Generalize the sourcing of certain files (/usr/local/etc/fermi.*,
# /usr/local/etc/setpath.*, /usr/local/etc/setups.*, and
$SETUPS_DIR/setups.*)
# to allow either the . or source command. Besides reducing the number
of
# patterns needed, this also catches bash variants.

s/(source|\\\.)(\\s\+\/usr\/local\/etc\/fermi\\\.\w+)/(\\.|source)$2/gm;
s/(source|\\\.)(\\s\+\/usr\/local\/etc\/setpath\\\.\w+)/(\\.|source)$2/gm;
s/(source|\\\.)(\\s\+\/usr\/local\/etc\/setups\\\.\w+)/(\\.|source)$2/gm;
s/(source|\\\.)(\\s\+\\\$SETUPS_DIR\/setups\\\.\w+)/(\\.|source)$2/gm;

# Generalize the sourcing of the users' .shrc files, similar to the
previous.

s/(source|\\\.)(\\s\+(\\\$HOME|~\w*)\/\\\.shrc)/(\\.|source)$2/gm;

# Make the spacing optional in "if (", "( -", and " )" strings, and
generalize
# to match either parentheses or brackets (we don't care if they don't
match).

s/if\\s\+\\/if\\s\*\\/gs;
s/\\(\[|\()\\s\+-/(\\[|\\()\\s*-/gs;
s/\\s\+(]|\\\))/\\s*(]|\\))/gs;

# Generalize certain strings to match bash, ksh or sh.

s/=\\s\+"bash"\\s\+/!?=\\s+"(ba|k)?sh"\\s+/gm;

# Generalize anded tests to match "-a" or "] && ["

s/\\s\+-a\\s\+/(\\s+-a\\s+|\\s*(]|\\))\\s+&&\\s+(\\[|\\()\\s*)/gs;

# Allow intervening comments, but disallow a trailing comment.

s/;\?/;\?(\\s*#[^\\n]*\\n)*/gs;
s/\(\\s\*#\[\^\\n]\*\\n\)\*$//s;

# Generalize to match fermi(.upsII) and setups(II). Also match either
shell
# extension in the setups(II).(c)sh and setpath.(c)sh files.

s/(\/fermi)(\\\.upsII)?(\\\.)/$1(\\.upsII)?$3/gs;
s/(\/setups)(II)?(\\\.)(c?sh)/$1(II)?$3c?sh/gs;
s/(\/setpath)(\\\.)(c?sh)/$1$2c?sh/gs;

# Match either fi or endif to end an if block

s/\\s\*(fi|endif);\?/\\s*(fi|endif);?/gs;

# Generalize /usr/local/etc to match /usr/local/etc, /fnal/ups/etc, and
# /afs/(.)fnal.gov/ups/etc

s#/usr/local/etc/#(/usr/local/etc/|/fnal/ups/etc/|/afs/\.?fnal\.gov/ups/etc/)#gs;

# Allow for ~/.shrc and ~username/.shrc forms.

s#~\w?/\\\.shrc#~\\w?/\\.shrc#gs;

# Let semicolons be followed by optional blanks and tabs

s/;\?/[ \\t]*;?/gm;

# Put the leading newline back on the pattern, with optional blanks and
tabs.

s/^/\\n[ \\t]*/s;

# Match any trailing blanks and tabs.

s/$/[\\t ]\*/s;

# All done.

$_;

}

See the whole thing at:
http://www.fnal.gov/docs/products/template_home/my_convert_usr_local.pl

And some explanation at:
http://www.fnal.gov/docs/products/template_home/convert_usr_local.html


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sat Oct 14 2000 - 21:32:14 PDT