Grammar

1: Introduction
2: Invoking Gre
3: Gre Grammar
4: Commands
5: Cookbook
6: Hints
7: History
8: Plans
9: Concept Index

3: Gre Grammar

About this chapter. This chapter describes the `gre' grammar. If you're familiar with the Perl programming language, you can skim this chapter quickly. Otherwise you'll want to spend a little time here.

The `gre' grammar is quite similar to the Perl grammar. The main difference is that `gre' allows extension of the syntax through the use of the `cmd' syntax to extend the language see Cmd.

3.1: Parsing and execution phases

`gre' procedes in two stages: parsing and execution. During the first phase, `gre' commands are analysed and stored in a internal format that can be executed quickly. `gre' does this in the same way as other interpreted programs (e.g. Perl and Awk) do, by constructing an execution tree. (The theory of language parsing is beyond the scope of this manual; see any good computing science reference library for more information.)

As discussed previously see System startup file, `gre' first parses some system files, and then parses the user's commandfile. The latter may be standard-input, of course.

Execution begins once parsing is done. (Perl experts will note that Perl has the facility to do some execution before parsing is finished. `gre' doesn't need that since it lacks the Perl object-oriented feature.)

Some notes on efficiency.

For most applications, parsing takes negligible time. On a 300MHz pentium-II machine, `gre' parses/executes about 7000 lines per second, as indicated by timing of code produced by the following perlscript:
   #!/usr/bin/perl -w
   $n = 2e4;  # gives 40,000 lines to parse/execute
   @v = ("\$abc", "\$cdef", "\$abcd", "\$x", "\$yyy", "\$SS33");
   for ($i = 0; $i < $n; $i++) {
       print $v[(rand()*6)%6], " = ", $dict[(rand()*6)%6], " + 1e-8;\n";
       print $v[(rand()*6)%6], " = ", $dict[(rand()*6)%6], " - 1e-8;\n";
   }
Since typical programs have fewer than a thousand lines (3-line user programs aren't uncommon!), parsing is normally accomplished in times that are comparable to that taken to scroll text in a text editor.

The speed of `gre' execution is comparable to that of Perl for Perl-like commands. The design goal was that it be at most 10 times slower than other interpreted languages such as Perl and Gawk. However, benchmarks indicate that performance is much better than that. `gre' is only about 30 percent slower than Perl, on perl-like code.

However, very few `gre' programs consist solely of Perl-like code. (If you want Perl, use Perl!) The core `gre' commands, e.g. `read columns' and `draw contour' can deal with arbitrary data sizes, and perform complicated calculations. For any `gre' program that takes more than a second or so to run, such core commands consume the lion's share of the execution time. The core commands are written in C++ code and run at machine speed, or roughly 30 times faster than they might if they were written in an interpreted language like Perl.

3.2: Command syntax

The `gre' grammar is closely patterned on that of the C and perl programming language. Commands (or statements) in `gre' end in semicolons; multiple commands are permitted on one line in a file, and a command may extend over any number of lines. Comments may be inserted at the end of any line in a commandfile, following the "sharp" or "hash" character `#'.

Example:
   something to start;           # not a real `gre' command  ...
   one thing; leads to another;  # ... and neither are these!
   

3.3: Reserved Words in Gre Language

Certain words may not be used within the syntax of newcommands. These special words fit into two categories: (1) "reserved words" which are part of the `gre' grammar itself and (2) names of built-in functions.

  • Words reserved for the grammar: `and', `bool', `cmd', `columns', `do', `else', `eq', `execute', `for', `foreach', `if', `last', `my', `ne', `not', `or', `package' see Package, `return', `unless', `until', `use' see Use, and `while'.

  • Words used in built-in functions: `abs', `acos', `acosh', `asin', `asinh', `atan', `atanh', `ceil', `chop', `cos', `cosh', `close', `die', `erf', `erfc', `exit', `exp', `exp10', `floor', `index', `int', `length', `ln', `log10', `open', `min', `max', `print', `pow', `pt_to_cm', `read', `rewind', `sin', `sinh', `sqrt', `split', `sprintf', `sub', `substr', `sw_alpha', `sw_beta', `sw_density', `sw_freezing_temperature', `sw_lapse_rate', `sw_potential_temperature', `sw_salinity', `sw_spice', `system', `tan', `tanh', `text_ascent', `text_descent', `text_width', `unlink', `user_to_cm_x', `user_to_cm_y', and `warn'.

You may not use any of the above words in a new command of your own making. (It is also a good idea to avoid using words which match names of Perl functions, because a future version of `gre' may well emulate that function.)

Below is an example of the error you'll see if you try to use the name of a built-in function inappropriately. The example shows an attempt to create a new command called `substr' using the `cmd' mechanism see Cmd. This is an error because the word `substr' conflicts with the built-in function of that name. For example, the code
   # Cannot use word 'substr' in a user command.
   cmd substr ($) {
       print "$_[0]\n";
   }

generates several errors, the first of which is
   parse error at a.gre line 2.
        cmd substr ($) {

A similar error would arise if `substr' were replaced by the name of any other built-in function, or any of the words reserved for the grammar.

3.4: Variables

3.4.1: User Variables

Variables may contain either numerical values or text values. Variables may be scalar or single-indexed vector. The individual elements of vectors is free to be numerical or textual, independent of other elements.

The syntax for variables is very similar to the Perl language: variables have dollarsign as the first character

Example:
   $unity = 1;                 # Store number in scalar
   $message = "`gre' is cool"; # Store text

The contents to the right of the equals-sign may be any mathematical expression, using perl-like (or C-like, if you prefer) notation.

You should not use an underscore as the first character after the dollarsign in variables of your own construction, because such names are pre-defined by `gre'. It is legal to modify these built-in variables, but sometimes that makes no sense. For example, `$_version' is a variable that gives the version number of the `gre' program you are using -- you could modify it but I don't know why you'd want to. On the other hand, `$_x' and `$_y' are the vectors storing the x- and y-columns of data. You might well want to modify these values, to create a curve to plot.

Any variables named in double-quoted strings are evaluated at run-time and inserted into the string, as is done in the Perl language see Variable Interpolation In Quoted Strings.

Vector variables are indexed using square brackets notation, with the first index being 0, as in most modern languages such C and perl. The construct `$#variable', where `$variable' is a variable, returns the index of the highest-numbered element in the variable if it is a vector, or returns `-1' of it is a scalar. Note that in this special case the sharp-character `#' does not indicate a comment string.

Example:
   $list[0] = 0;
   $list[1] = 1;
   print "Vector length is:", $#list, "\n";

In this example, the `print' command is used to output the message `Length of vector is 1'. By now, Perl users will recognize the close correspondance between `gre' and Perl, as in the `print' command used here see Print.

Individual indices may mix numeric and text values, e.g.

Example:
   $x[0] = 0;
   $x[1] = "zero";

Variable interpolation into strings is somewhat restricted, compared to Perl. Interpolation is only done if the index is a numerical constant or a variable value, not if it is an expression:

Example:
   $x[0] = 0;
   $x[1] = 1;
   $i = 0;
   print "$x[0]\n";    # Legal
   print "$x[$i]\n";   # Legal, yielding same as above
   print "$x[$i+1]\n"; # *** Illegal -- expression in index ***

You may use vector variables and scalar variables with identical names. For example, it is legal to have both `$a' (a scalar) and `$a[0]' (a vector element).

Assigning to user vector may be done in several ways. The most common way when dealing with data is with the `split' command. To assign without having to read data, you may assign to a list, as in

Example:
   @depths = (1, 10, 100);
   print "$depths[0] $depths[1] $depths[2] (expect '1 10 100')\n";

Test file `tst-variables.gre':

Test file `tst-variables2.gre':

3.4.2: System Variables

3.4.2.1: Scalar system variables

`$_' (may also be vector), `$0' (name of command file, also stored in `$_command_file'), `$_arrow_size', `$_command_file' (name of command file, also stored in `$0'), `$_time' (date and time when this Gre job was started), `$_exit_status', `$_fontsize', `$_graylevel', `$_missingvalue', `$_symbolsize', `$_linewidth', `$_linewidth_axis', `$_linewidth_symbol', `$_xmargin', `$_xsize', `$_xleft', `$_xright', `$_ytop', `$_ybottom', `$_ysize', `$_eof', and `$_version'.

3.4.2.2: Vector system variables

`$_' (used as in Perl, read by the `<>' operator), `$_x' (holds x data), `$_y' (holds y data), `$_z' (holds z data, normally for contouring), `$_u' (holds u data, normally for arrow plots), and `$_v' (holds v data, normally for arrow plots).

In addition, if the command-line arguments used in invoking `gre' contained extra items beyond the name of the command file, these items are stored in the vector `$ARGV' (as in Perl). For example, the `gre' code (in `foo.gre' say)
   for ($i = 0; $i <= $#ARGV; $i++) {
       open(IN, $ARGV[$i]) or die;
       # ... do stuff with this file
       close(IN);
   }
invoked with the command
   gre foo.gre station1.dat station2.dat station3.dat
will do "stuff" on each of the indicated data-files.

3.5: Mathematics

The syntax of mathematical expressions in `gre' is identical to that in Perl, very close to Gawk, and quite close to C and FORTRAN.

Test file `tst-math.gre':

3.5.1: Builtin mathematical constants

`gre' has a few mathematical constants built in as variables:

   `$_pi' = 3.14 ...# geometrical quantity Pi
   `$_e'  = 2.72 ...# base of natural logarithms

3.5.2: Precision used in calculations

Calculations in `gre' are done in double-precision. This yields about 15 digits of precision. For example,
   printf "%.30f\n", 1/6;
prints
   0.166666666666666657414808128124

The column data are stored in single-precision, not double-precision. In practice this seldom makes any difference since data are seldom reported to 15 digit precision. (If enough users ask, the `gre' author will switch to double-precision for the column data also.)

3.5.3: Efficiency considerations

Constant arithmetic expressions are computed at compile time, not at run time, so you needn't worry about simplifying such expressions even if they are within a loop that is executed many times. Thus, you won't suffer a runtime performance degradation by using `2/3' in an expression instead of `0.666', and you'll get better accuracy (unless you are patient enough to type the 12 or so digits of accuracy used in Gre).

3.5.4: Underline convention in numerical constants

For convenience, it is permitted to use an underline character (`_') within numerical constants, to separate "thousands" groupings. Thus you can write `$x=1_000_000;' instead of `$x=1000000;', which is quite a lot harder to read. All `_' characters in constants are skipped. Thus you could, if you wanted, write a constant as `1_00_00'. However, `gre' will issue a compile-time warning for such constructs, because it wants the underscores to be spaced at intervals of three digits. The number will be accepted even if a warning is given. (Perl users will note that `gre' is pickier than Perl, since Perl only issues this type of warning if the warning flag is turned on.)

3.5.5: Convention on division of constants

One important difference from C (and FORTRAN) is that all constants are considered to be floating point in `gre'. In this regard, `gre' follows the convention used in Perl, Awk, and other modern interpreted languages. Thus the statement
   $x = 1/2;
assigns the value `0.5' to the variable. In FORTRAN and in C, the lack of decimal points would force integer arithmetic and the result would be `0'. As a test of the utility of the FORTRAN convention, ask 10 programmers to indicate the value of `-1/2', and see if they all agree on whether it is `-1' or `0'. Note that the `floor()' function may be used to convert floating-point values into integer values see Builtin functions that take one argument. This function, borrowed from C, produces values that 10 programmers would probably agree on!

3.5.6: Operator precedence

Within mathematical expressions, the usual operator precedence is used. For example, multiplications are done before additions. The order of precedence is, from order of lowest precedence to highest: (1) addition (`+') and subtraction (`-'); (2) multiplication (`*') and division (`/'); (3) and exponentiation (`**'). Operators of equal precedence, such as addition and subtraction, are done in left-to-right order. Parentheses may be used to override these operator precedence rules.

3.5.7: Comparison operators

Along with the normal algebraic operators, various operators are provided to compare numerical values. For example, `($a < $b)' is an expression which has the value `1' if variable `$a' is less than variable `$b', and zero otherwise. The comparison operators are mostly used in conditional and looping statements. They are: `<' (less than), `<=' (less than or equal to), `>' (greater than), `>=' (greater than or equal to), `==' (equal to), and `!=' (not equal to).

Logical operations, also called boolean operations, are done with `||' or `or' (for logical "or"), `&&' or `and' (for logical "and") and `!' or `not' (for logical negation).

The question-operator, borrowed from C-like languages, has the form `expr_test ? expr_true : expr_false'. This construct reduces to a single value, equal to `expr_true' if the test-expression `expr_test' is true, or to `expr_false' otherwise. Thus one might say `$min = ($a < $b) ? $a : $b;' to assign to `$min' the value of `$a' or `$b', whichever is smaller. The expressions after the `?' character may be numerical or textual (and needn't be of the same type); `gre' will appropriately assign a type to the result depending on whether the "true" or "false" branch is selected by the test value.

As in most other C-like languages, logical expressions are evaluated from left to right, with evaluation stopping as soon as the value can be determined. For example, consider the expression
   0 && die "will not print\n";

(Aside: since the expression is surely zero regardless of the return value for the `die' command, the `die' command will not be executed.)

The same rule applies to the "or" operator. In fact, it is very common (and highly recommended) to use this in checking for whether files could be opened, e.g.
   open(IN, "input_file") || die "Cannot open file 'input_file'\n";

Test file `tst-string-comp.gre':

3.5.8: Mathematical subroutines

Calling mathematical subroutines is conventional, with parentheses around the arguments in almost all cases. Note that `gre' departs from Perl in this. Perl lets you omit the parentheses in cases where the parser can figure it out based on precidence rules. That's okay if you know the rules, but pretty confusing otherwise!

The "in almost all cases" phrase indicates that the parenthesis are optional in `die' and a few of other commands, covering cases in which almost no Perl programmers would use parenthesis. Thus, you can write
   die "Cannot handle negative values" unless $x > 0;
instead of
   die("Cannot handle negative values") unless $x > 0;

However, `gre' won't understand
   $x = sin 0;  # broken
when you mean
   $x = sin(0);  # okay
The reason is that the former convention is, simply, that the former (perl-ish) convention has always been confusing to the author, who would have to run a test program to figure out what value would be returned by the following
   print sin 0 + 3.14159;  # Illegal in Gre but ok in Perl
Granted, Perl experts can parse the above statement easily, and predict the answer (pop quiz: is it zero or Pi), but users of other languages would have to guess, so `gre' has chosen to disallow such syntax, and to require that sine, cosine, and other subroutines be called with parenthesis.

3.5.8.1: Builtin Functions that take no arguments

  • `rand()' returns a pseudo-random number uniformly distributed between 0 and 1. See also `srand(x)';

    This subroutine is based on the system routine `drand48', and therefore the pseudo-random sequence will be the same each time `gre' is invoked; to get a different sequenct use `srand($x)' where `$x' is an integer.

  • `rand_gaussian()' returns a pseudo-random number from a Gaussian distribution with mean of 0 and standard deviation of 1. See also `srand(x)';

    To create a random number `$r' with a mean of `$M' and a standard-deviation of `$S', use
       $r = $S * rand_gaussian() + $M;
    

    This subroutine is based on the system routine `drand48', and therefore the pseudo-random sequence will be the same each time `gre' is invoked; to get a different sequenct use `srand($x)' where `$x' is an integer.

  • `version()' returns the version number of `gre' in numeric form. For example, `version()' returns `0.0115' for version `0.1.15' of `gre'. The value in double-dotted textual form (e.g. `0.1.15') is available in the `$_version' builtin variable.

Test file `tst-random.gre':

3.5.8.2: Builtin Functions that take one argument

  • `abs(x)' returns absolute value of `x'.
  • `acos(x)' returns inverse cosine of `x', in radians.
  • `acosh(x)' returns inverse hyperbolic cosine of `x'.
  • `asin(x)' returns inverse sine of `x', in radians.
  • `asinh(x)' returns inverse hyperbolic sine of `x'.
  • `atan(x)' returns inverse tangent of `x', in radians.
  • `atanh(x)' returns inverse hyperbolic cosine of `x'.
  • `ceil(x)' returns smallest integer which is larger than `x'.
  • `cm_to_pt(x)' returns centimeters corresponding to indicated value in PostScript "points".
  • `cos(x)' returns cosine of `x', where `x' is in radians.
  • `cosh(x)' returns hyperbolic cosine of `x'.
  • `exp(x)' returns 'e' raised to indicated power.
  • `exp10(x)' returns 10 raised to indicated power.
  • `erf(x)' returns error-function of indicated argument, i.e. `2/sqrt(pi)*integral from 0 to x of exp(-t*t) dt'.
  • `erfc(x)' returns complementary error-function of indicated argument, i.e. `1-erf(x)'.
  • `floor(x)' returns largest integer which is smaller than x.
  • `int(x)' returns integer part of `x', e.g. `2==int(2.1)' and `-2==int(-2.1)' both apply.
  • `ln(x)' returns natural logarithm, that is log base e.
  • `log10(x)' returns log base 10.
  • `open()' opens a file for reading.
  • `print' prints something see Print.
  • `printf' prints something see Printf.
  • `pt_to_cm(x)' is opposite of cm_to_pt.
  • `rewind()' rewinds a file.
  • `sin(x)' returns sine of `x', where `x' is in radians.
  • `sinh(x)' returns hyperbolic sine of `x'.
  • `sqrt(x)' returns square root of `x'.
  • `srand(x)' sets seed for random number, based on integer `x', returning 1. See also `rand()' and `rand_gaussian()'
  • `tan(x)' returns tangent of `x', where `x' is in radians.
  • `tanh(x)' returns hyperbolic tangent of `x'. For example, here is how you might show a graph of the hyperbolic tangent:
       # Graph tanh(x) for x ranging from -3 to 3.
       $i = 0;
       for ($x = -3; $x <= 3; $x += 0.1) {
           $_x[$i] = $x;
           $_y[$i] = tanh($x);
           $i++;
       }
       set y axis -1 1 1 0.1;
       set x axis -3 3 1;
       set x name "X";
       set y name "tanh(X)";
       draw curve;
    

  • `length("string")' returns the number of characters in the string. (Perl users note: this subroutine requires an argument; unlike Perl, it does not default to an argument of `$_' if called with no argument.)

  • `text_ascent("string")' returns the ascent, in centimeters, for the present font. The argument is ignored and this subroutine returns the descent for the font (i.e. the maximal descent for any character that can be drawn). `gre' consults an internal table of PostScript font metrics to do the calculation.
  • `text_descent("string")' returns the descent, in centimeters, for the present font. Actually, the argument is ignored and this subroutine returns the descent for the font (i.e. the maximal descent for any character that can be drawn). `gre' consults an internal table of PostScript font metrics to do the calculation.
  • `text_width("string")' returns the width, in centimeters, that the indicated string would occupy if drawn in present font size.
  • `user_to_cm_x(xuser)' returns page x-location of the indicated x-value. That is, it returns the distance from the indicated point, if it were plotted, to the left-hand side of the page.
  • `user_to_cm_y(yuser)' returns page y-location of the indicated y-value. That is, it returns the distance from the indicated point, if it were plotted, to the bottom side of the page.

3.5.8.3: Builtin Functions that take two arguments

  • `atan2(x,y)' returns arctangent of slope `y/x'.
  • `max(x1,x2)' returns `x1' or `x2', whichever is larger.
  • `min(x1,x2)' returns `x1' or `x2', whichever is smaller.
  • `pow(x,y)' returns `x' raised to power `y', just as the expression`x**y' does.

Test file `tst-builtin.gre':

3.5.8.4: Builtin Functions that take more than two arguments

  • `unlink("file1", ...)' deletes indicated file(s), and returns number of files successfully deleted.

3.5.8.5: Subroutines for physical properties of seawater

The subroutines for physical properties of seawater have names starting with `sw' (for SeaWater). The salinity unit, `PSU', is 'practical salinity unit', basically a synonym for PPT (parts per thousand). The pressure unit, `dbar', is decibar (roughly corresponding to the pressure increase for one meter of depth increase).

  • `sw_alpha(S, T, p)' -- seawater thermal expansion coefficient [1/degC] in terms of salinity [PSU], in-situ temperature T [degC] and pressure p [dbar], by numerically differentiating the sw_density() function with a 1e-5 degC first-difference.
  • `sw_beta(S, T, p)' -- seawater haline contraction coefficient [1/PSU] in terms of salinity [PSU], in-situ temperature T [degC] and pressure p [dbar], by numerically differentiating the sw_density() function with a 1e-5 PSU first-difference.
  • `sw_density(S, T, p)' -- seawater density [kg/m3] in terms of salinity [PSU], in-situ temperature T [degC] and pressure p [dbar]
  • `sw_freezing_temperature(S, p)' -- seawater freezing temperature [degC] in terms of salinity [PSU] and pressure p [dbar]
  • `sw_lapse_rate(S, T, p)' -- seawater lapse-rate [degC/m] in terms of salinity [PSU], in-situ temperature T [degC] and pressure p [dbar]
  • `sw_potential_temperature(S, T, p, pref)' -- seawater potential temperature [degC] in terms of salinity [PSU], in-situ temperature T [degC], in-situ pressure p [dbar] and reference pressure pref [dbar]
  • `sw_salinity (R, T, p)' -- seawater salinity [PSU] in terms of conductivity ratio R [unitless], in-situ temperature T [degC] and pressure p [dbar]
  • `sw_spice (S, T, p)' -- `spice' function as defined by Jackett and McDougall (1985) Deep-Sea Research 32, 1195-1207, in terms of salinity [PSU], in-situ temperature T [degC] and pressure p [dbar]

Test file `tst-sw.gre':

3.5.8.6: Subroutines for map projections

   proj_lonlat_to_x($lon, $lat)
   proj_lonlat_to_y($lon, $lat)

The above subroutines are provided to convert world-coordinate (lon/lat) pairs to geometrical coordinates, according to a map projection specified by the `set map projection' command see Set map projection.

In the argument list, `lon' appears before `lat', although a geographer might have expected the reverse order. Longitudes are measured positive to the east, so that degrees-west are indicated by negative `lon' values.

`proj_lonlat_to_x($lon, $lat)' returns x-component after map projection of indicated ($lon, $lat) pair, after passing through the appropriate map projection transformation routine. The y-value is returned by the corresponding subroutine `proj_lonlat_to_y($lon, $lat)'.

3.5.8.7: User-defined Subroutines

The `sub' command is used to create user-defined subroutines. For more details see Sub.

3.6: Operations on text

3.6.1: Entering text

Text is entered within double-quotes, as in the following examples.
   set x name "Temperature";
   draw label "Case A" at 3 3 cm;
   draw title "Asymptotic temperature dependence";

Text strings may be simple, as above, or they may contain ``interpolated'' components, resulting from system commands or from substitution of the values of variables. These are covered in the next few sections.

`Gre' uses PostScript fonts, and can handle both unaccented English text and accented non-English European text properly. (It cannot yet handle other languages, e.g. oriental languages.) The default ``font encoding'' vector (see any book on PostScript fonts) used in `gre' is the ISO-Latin-1 vector. This permits both unaccented and accented characters.

The rule for entering accented text is simple: if you can enter it in your text editor, `gre' will be able to draw it! It's up to you to determine how to enter the accents. Most text editors permit this. Since many users will prefer the Emacs editor, a few words about that are in order.

Consider the task of inserting French text with the Emacs text-editor. There are several ways of doing this (and you may wish to consult your emacs info manual). A method that works in emacs-19 up to current emacs-20 versions uses the emacs `iso-transl.el' package by putting the following in your `~/.emacs' file:
   (require 'iso-transl)
   (iso-transl-set-language "French")
   (standard-display-european t)
Loading this package defines three ways of entering the non-ASCII printable characters with codes above 127: the prefix `C-x 8', or the `Alt' key, or a dead accent key. For example, you can enter uppercase A-umlaut as `C-x 8 " A' or `Alt-" A' (if you have an Alt key) or `umlaut A' (if you have an umlaut/diaeresis key).

A more recently introduced method is to enter the mode which allows quick insertion of iso-latin-1 characters. Do the Emacs command `M-x iso-accents-mode' (either manually, or in a hook that's done automatically). Now, suppose the x-axis is to represent temperature. All you'd have to do is type in the command
   set x name "Temp'erature"
As you type, the quote mark will dissappear, and reappear as an accent on the `e'. And then, Gri will recognize this accented `e', and it will draw the accent on the axis label.

If, for any reason, you desire the so-called ``standard'' PostScript font-encoding vector, use the `Set Font Encoding' command Set font encoding.

3.6.2: System calls in quoted strings

When quoted strings are evaluated at run-time, `gre' checks to see whether they contain pairs of the back-tic character (i.e. ``'). This character is then interpreted as it would be in a unix shell: whatever text lies within the back-tic pair is sent to the operating system, and the result of the evaluation is then inserted into the string. (The trailing newline is removed from the system output.) Thus, you can print the time with the commands
   print "the date is `date`\n";

3.6.3: Variable interpolation in quoted strings

When a scalar variable is found within a string enclosed in double quotes, `gre' "interpolates" the variable value into the string. For example, here is how to print the decimal value of the fraction 1/2

Example:
   $x = 1/2;
   print "$x equals 0.5\n";

Interpolation into quoted strings is done for both numerical and text variables.

To prevent interpolation, you may ``protect'' the dollar-sign by preceding it with a backslash. This is useful when you'd like to use dollar-signs in calls to the operating system, e.g. in the following example, the tenth and eleventh columns of the file `wide_file.dat' are plotted as x and y respectively.
   open(IN, "gawk '{print \$10,\$11}' wide_file.dat |") or die;
   read columns IN x=1 y=2;

3.6.4: Textual comparisons -- simple method

Quoted character strings may not be compared using `==' and `!='. Neither may variables which contain strings be compared in this way. Instead, textual items, whether quoted-string constants or variables, are compared for equality and inequality with the operators `eq' and `ne' respectively. Variable interpolation is done before the strings are compared. Also, quoted strings may be compared to variables. If the variable is of string type, the expected results are achieved. If the variable is of numeric type, then the numeric value is converted to a string before the comparison -- this is almost always a bad idea, because the format may differ from what you expect it to be (e.g. in number of trailing zeros after the decimal point, etc).

Example:
   $x = "hello";
   die "failed comparison 1\n" if ($x ne "hello");
   die "failed comparison 2\n" if ("$x" ne "hello");

3.6.5: Textual comparisons -- regular expression method

As in Perl, a facility is provided for comparing textual items, using the powerful and familiar unix ``regular expression'' notation. The regular expressions are written inside forward slashes (i.e. `/expression/'), and are matched against the builtin variable `$_'. The item in slashes is given the value 1 if the expression matches to `$_' and 0 if not. For example, the following scans the input file, and prints all lines with the three-character sequence "gre" on them:
   while(<>) { print if /gre/; }

In unix parlance, the regular expressions are ``extended,'' meaning for example that subitems are enclosed in raw parentheses, as for example `an(item)here', not in backslashed parentheses such as `an\(item\)here'. It is important to emphasize that this unix-based definition of ``extended'' is quite a different thing than in Perl. By ``extended'' Perl means something different. In fact, all strings in Perl are ``extended'' in the unix definition.

Normally, pattern-matching obeys the case ('a' versus 'A') used in the regular expression, so that in the example above, a line consisting of the upper-case sequence `GRE' would not match the lower-case pattern. Case-independent pattern matching may be done by putting the character `i' after the closing slash. Thus, the following program matches lines with `gre', `Gre', `GRe', etc.
   while(<>) { print if /gre/; }

The contents of the regular expression may include constant items (`/gre/') and/or variables (`/$program/'). At present, `gre' `compiles' the regular expressions at runtime, each time it executes them. Eventually `gre' will pre-compile fixed regular expressions (ones not containing variables). On a 300 MHz linux pentium, this will save about 50 microseconds per runtime evaluation of a regular expression.

The syntax of regular expressions is discussed in another section see Regular Expression Syntax.

Test file `tst-match.gre':

3.6.6: Concatenation of textual items with dot operator

Two textual items may be linked using the dot operator. The items may be quoted strings or textual variables, e.g.
   $hw = "hello" . " world";
   print "$hw\n";
   print $hw . "\n";
   
If any of the items is numeric, then it will first be converted to a textual form in the default output format (`%.14g', in terms of the C i/o function `printf'), e.g.
   print "following is 100 ..." . 10 * 10, "\n";

3.6.7: Extraction of substrings

Extraction of substrings is done with the `substr' command, which matches the corresponding command in Perl. For example, the string `BC' can be created by `substr("ABCDE", 1, 2)'. The source string is the first parameter. The next parameter indicates an offset of 1 character from the beginning of the string and the third parameter indicates to extract 2 characters. For details see Substr.

3.6.8: Modifying strings with `s///' operator

Partial support is provided for Perl-like string modification. There are two forms, one for modifying the `$_' variable, the second for modifying any other variable.

In the explanation here, the string delimiter will be taken to be forward-slash (i.e. `/'), but any punctuation character will do just as well. Thus, one might use
   $date =~ s:04/:Apr :;
to convert a date from a numeric format to a named month.

3.6.8.1: Modifying `$_'

As in Perl, `gre' lets you modify the builtin string `$_' using the `s///' operator. (Recall that `$_' is read in with the `<>' operator.) Between the first two slashes is an input to be searched for in `$_', and between the second and third slash is the replacement. Two options are allowed after the final slash. The option `i' means to ignore the case in the input. The option `g' means to do the change globally, i.e. repeatedly on this input line.

For example, consider the data file (named `test-s.dat'):
   hello
   goodbye
If this is used with the commandfile
   open(IN, "test-s.dat") or die "cannot open test-s.dat";
   while(<IN>) {
       s/l/L/g;
       s/o/O/;
       print;
   }
it will yield the output
   heLLo
   gOodbye
Note that the `g' option changed both lower-case `l' characters to upper-case, but only the first of the two `o' characters was changed to upper-case.

It is not permitted to have the first string be blank. That is,
   s//something/;
is not allowed. (It is permitted in Perl, and it matches to the beginning of the string. However, the `gre' author couldn't find this to be a documented feature of Perl, and so left it out of `gre'.)

It is permitted, and quite common, to have the second be blank. This means to delete the indicated string, e.g.
   s/junk//;
will delete the first occurence of the string `junk' and
   s/junk//;
will delete all occurences of it.

The `s///' operator has a return value, which is the number of times the target string was encountered. Normally this is ignored, althoug the following might make sense.
   $num = 0;
   while(<>) {
       $num += s/junk//;
       print;
   }
   print "Replaced $num occurences\n";
   

3.6.8.2: Modifying other variables

The Perl-like `=~' syntax is supported. For example,
   $s = "hello";
   $s =~ s/l/L/g; print "$s\n";
prints `heLLo', while
   $s = "hello";
   $s =~ s/l/L/;  print "$s\n";
prints `heLlo' since only the first instance will be changed.

Test file `tst-substitute.gre':

3.6.9: Printing into strings with `sprintf'

As in C-like languages, the `sprintf' function provides the ability to create strings using the formatting capabilities of the `printf' command. The syntax is e.g.
   $a = sprintf("Pi is %.3f to three decimal places", $_pi);
   print "$a\n";

For more on printing formats see Printf.

3.6.10: Regular expression syntax

Regular expressions may be used in various contexts.

`gre' handles regular expressions in a way that is similar to Perl, but a few things (namely, non-consuming items such as `\\b') are not available.

For more on regular expressions, consult the online Perl documentation (type `man perlre' at the unix prompt) or any of a number of texts on the unix operating system.

Regular expressions in `gre' are of the ``extended'' type, as opposed to the ``basic'' type. The hallmark is that backslashes are not used to protect parentheses, e.g. `/foo(.*)bar/' may be used to match to the string ``fooOKAYbar,'' whereas in ``basic'' notation the expression would be written as `/foo(.*)bar/'.

The following is a brief summary of how regular expressions work.

  1. The usual POSIX regular expressions are handled. Briefly, the rules are as follows.

    `normal character'
    A normal character (not in the list below) matches to itself. Example: `gre' matches the character `g' followed by the character `r' followed by the character `e'. That is, it matches the string `gre'.
    `[]'
    Enclose a list of alternative characters. Example: `[gG]re' matches both `gre' and `Gre'.
    `()'
    Enclose an item that is to be remembered for later use. Several "remember" items may be present. The first one is stored in a variable named `$1', the second in `$2', etc. Example: `f(.*)bar' will match to the string `foobar', and `$1' will become defined as the string `oo'.
    `.'
    Matches to any character.
    `*'
    Match the item to the left, 0 or more times.
    `+'
    Match the item to the left, 1 or more times.
    `?'
    Match the item to the left, 1 or 0 times.
    `{n}'
    Match the item to the left, exactly n times.
    `{n,}'
    Match the item to the left, at least n times
    `{n,m}'
    Match the item to the left, at least n but not more than m times
  2. In addition to the above standard POSIX behavior, `gre' also follows some convenient conventions invented by Perl, as in the following list.

    `\s'
    Match a whitespace character (space, tab, or newline). Beware: `\s*' matches to any amount of whitespace -- including a zero amount -- so you'll likely wish to use `\s+' to match to whitespace separating non-whitespace items.
    `\S'
    Match a single nonwhitespace character.
    `\d'
    Match a single digit character.
    `\D '
    Match a single non-digit character.
    `\w'
    Match a single "word" character (alphanumeric plus "_"). Thus, `\w*' matches to a word, and ends on either whitespace or punctuation.
    `\W'
    Match a single non-word character, e.g. space and punctuation symbols.

3.7: Flow of Control

The usual C-like constructs are available for controlling the flow of control within a `gre' program, such as `if', `else', `do', `while' and `for'. In addition, the perl-like constructs `unless' (opposite to `if'), `until' (opposite to `while'), and `foreach' are available.

Blocks within conditional and looping constructs are enclosed within curly braces. (C users should note that these braces are required even if the block contains only a single command; Perl users will find this familiar.)

Example:
   if (1) {
       print "This will be printed.\n";
   } else {
       print "This will not be printed.\n";
   }

White space is optional in `gre'. In the above, there is no need for the spaces before the opening and closing parentheses in the `if' statement. Similarly, the opening brace could be put on a separate line if desired. The choice is merely aesthetic. In this manual, the spacing shown above will be used throughout, with an opening brace placed on the same line as conditional/loop tests, and with the code indented by four spaces.

Details:

3.7.1: Conditional statements

Conditional statements are of two varieties: (1) the positive-style `if' statement and the negative-style `unless' statement. Each variety has two forms, one (``block'') form for controlling execution of a block of commands, the other (``trailing'') form for controlling execution of a single command. The block form is conventional in many programming languages and the trailing form is provided mainly by `gre' and Perl.

3.7.1.1: `if' statements

The block form of `if' has syntax
   if expr {true_block} else {false_block}
where `expr' may be any valid expression and the blocks may contain multiple commands.

The `else' part is optional.

Some users prefer to put the expression in parentheses, while others don't.

   if ($x<10) { ... # some prefer this 
   if $x<10 { ... # some prefer this

Example:
   if ($x < 0) {
       print "x is negative.\n";
   } else {
       print "x is zero or positive ...";
       print " with square root =", sqrt(x), "\n";
   }

The trailing form of `if' has syntax
   single_cmd if (expr);

where `single_cmd' is any valid single-line command, including built-in commands such as `print' and `return', user-defined commands see Cmd, and mathematical expressions, including assignment to variables. There is no `else' clause in a trailing-if command.

Example:
   print "X is positive\n" if ($x > 0);
   die "Cannot handle negative x\n" if ($x < 0);
   $x = 0 if ($x < 0);     # Prevent x from being negative

Users familiar with C-like languages may find this form confusing because it puts the test at the end of the sentence, but Perl users may find it convenient for simple actions/tests.

3.7.1.2: `unless' statements

The `unless' syntax is analogous to the `if' syntax, except that the `expr' test expression is negated. Thus an example of the block form is
   unless ($x<0) {
       print "x is zero or positive\n";
   } else {
       print "x is negative\n";
   }

and an example of the `trailing' form is
   print "x is negative\n" unless ($x>=0);

Test file `tst-if.gre':

3.7.2: Loops

Loops have four forms, with reserved words `do', `while', `for', and `foreach'. The forms are almost identical to those in C-like languages such as Perl and awk.

3.7.2.1: `while' and `until' loops

Syntax:
   while expr { commands }

The command(s) within the braces are executed repeatedly, for as long as the test expression, `expr', is true. C/C++ users should note that the braces around the block are required, even if the block contains only a single command. Many programmers prefer to surround the expression in parentheses but these aren't required. That is, `while ($x>0) {...' and `while $x>0 {...' are equivalent).

Example:
   print "Numbers from 0 to 10:\n";
   $i = 0;
   while ($i <= 10) {
       print " $i\n";
       $i = $i + 1;
   }

The reverse of a `while' loop is an `until' loop, e.g.
   $i = 10;
   until ($i == 0) {
       $i--;
   }
   die "Error: i should be zero\n" if ($i != 0);

In both styles of loop (and pretty much everywhere that `gre' evaluates ``truth''), the test expression may be either a numerical expression, a string expression, or a file-reading command.

  • Numerical expressions are considered to be true if they evaluate to non-zero. The fragment above is a typical example.

  • String expressions are considered to be true if they are non-blank and do not translate to a zero value. (Thus, `"0.0"',evaluates to false, since it represents zero, but `"a0.0"' evaluates to true, because it is a nonblank string which does not translate to zero numerically.)

  • A special type of character string is the file-reading handle. This has the form `<IN>', where `IN' is the name of a file-handle opened by a command such as `open(IN, "name");' see Open. If no file-handle name is provided, that is if `<>' is written, then input from the standard input file (normally, the keyboard) is assumed -- this is equivalent to `<STDIN>'.

    The file-handle operator returns a character string equal to the next line of the indicated file. This string includes the newline at the end of the line, so that it will only be blank when the end of the file has been reached. Also, because the newline is attached, it will never represent zero numerically, so that reading a line containing e.g. `0.0' still makes the operator evaluate to true.

    There is an additional rule with regard to file-reading in `while' and `do-while' loops. If the input is not assigned specifically to a variable, then it is implicitly assigned to the variable `$_'. (This is not done in any other command, which might seem odd but that's the way Perl does it, and `gre' copies the established convention.) Since `print' will print out the value of `$_' by default, here is how you might print the contents of a file named `input_file':

       open(IN, "input_file");
       while(<IN>) {
           print;
       }
    

    and here is how you might count the number of lines in a file provided on standard-input:

       $lines = 0;
       while(<>) { $lines++; }
       print "$lines\n";
    

Test file `tst-while.gre':

3.7.2.2: `do-while' and `do-until' loops

Syntax:
   do { commands } while (expr);
The indicated commands are performed as long as the test value evaluates to 'true'. Because the test value is evaluated at the bottom of the loop, the loop is always done at least once, in contrast to the `while' loop, e.g.
   print "Numbers from 0 to 10:\n";
   $i = 0;
   do {
       print " $i\n";
       $i = $i + 1;
   } while ($i <= 10);
The reverse of a `do-while' loop is a `do-until' loop, e.g.
   $i = 10;
   do {
       $i--;
   } until ($i == 0);
   die "Error: i should be zero\n" if ($i != 0);

Test file `tst-do-while.gre':

3.7.2.3: Trailing `while' and `until'

A convenient form of `while' is the "trailing" form, in which a single command is repeated while a condition is true. For example, the following code prints the number 10:

Example:
   $i = 0;
   $i++ while ($i<10);
   print "$i\n";

The reverse is the "trailing until" form, in which a single command is repeated until a condition is true. For example, the following code prints the number 10:

   $i = 0;
   $i++ until ($i==10);
   print "$i\n";

3.7.2.4: `for' loop

Syntax:
   for (initial_statement; test_value; repeated_statement) {
       # commands to execute while test_value is nonzero
   }

Before the loop body is executed, the `initial_statement' is performed. Normally it is an initialization of some type. Then, if the `test_value' is nonzero, the contents of the loop are executed once. After the contents have been executed, the statement denoted `repeated_statement' is executed (often this is an increment of a counter), and the loop may repeat again, if `test_value' still evaluates to a nonzero value, e.g.
   print "Numbers from 0 to 10:\n";
   for ($i = 0; $i < 10; $i += 1) {
       print " $i\n";
   }

Test file `tst-for.gre':

3.7.2.5: `foreach' loop

Syntax:
   foreach $x (value_1, value_2, ... value_n) { ... }  # style 1
   foreach $x (@array_name) { ... }                    # style 2

The loop is executed repeatedly, with the value of the variable denoted `$x' in this example being set in turn to each of the values in the list (style 1) or the array (style 2), e.g.
   print "Some prime numbers are:\n";
   foreach $a (3, 5, 7, 11, 13, 17, 19) {
       print "  $a\n";
   }

`foreach' loops are very useful for processing a list of data files. Sometimes you'll know the list in advance, e.g.
   foreach $file ("a.dat", "b.dat", "c.dat") {
       open(FILEHANDLE, $file);
       # Do something with the data in this file
       close FILEHANDLE;
   }
   

and other times you'll have to "glob" the list see Glob, e.g
   foreach $file (glob("*.dat")) {
       open(FILEHANDLE, $file);
       read columns FILEHANDLE x=1 y=2;
       draw curve;
       close FILEHANDLE;
   }
   

and still other times you might want to have the values specified on the commandline, e.g.
   foreach $opt (@ARGV) { print "Option '$opt'\n"; }

The above could have also been written
   for ($i = 0; $i <= $#ARGV; $i++) {print "Option '$ARGV[$i]' \n";}

Test file `tst-foreach.gre':

(c) 1997-2000 Dan E. Kelley, email Dan.Kelley@Dal.Ca