Plum Hall, Inc.

Plum Hall is a world leader in making compiler validation suites and software for the programmer and software development community.

The Plum Hall Validation Suite for C™

The Plum Hall Validation Suite for C is the industry's favorite set of C programs for testing and evaluating C language compilers (including interpreters). It's ideal for your organization if you are:

If you are responsible for the evaluation of three competing compilers, you can test each compiler with The Plum Hall Validation Suite for C and obtain a detailed inventory of bugs and "features" of each compiler. Because this suite was written in close association with the development of the ANSI/ISO C Standard, it is less likely to overlook significant areas of the language, when compared to internally-developed compiler-comparison benchmarks. And, The Plum Hall Validation Suite for C has no business association with any compiler vendor; it provides an informed but unbiased evaluation tool.

You may be responsible for internal support for several different C compilers. The Plum Hall Validation Suite for C can be used to provide your user community with detailed portability information like "Compiler A lacks such-and-such features and has the following special peculiarities...

To the compiler implementer, The Plum Hall Validation Suite for C provides a collection of carefully chosen test cases -- currently over 56,000 lines of C source, with new sections added regularly under Maintenance. It can supplement internally-developed test suites with a fresh perspective outside the internal "cultural tradition" and has been written with a concern for the modern testing criterion of "positive marginal utility". In other words, each test case has a specific rationale for its inclusion. The Plum Hall Validation Suite for C is not "bulked up" with redundant examples.

The Plum Hall Validation Suite for C was written by specialists in the C language who are well-versed and recognized as major contributors to the development and evolution of the ANSI/ISO C Standard.

Each section of the suite is keyed to the corresponding section of the Standard and Plum Hall is committed to continue making The Plum Hall Validation Suite for C the most authoritative validator for the ANSI/ISO C Standard.

PRODUCT OVERVIEW

Each section of The Plum Hall Validation Suite for C builds upon the correctness established by the previous sections. This description will explain how each section of the suite works, and what assumptions are made about previous sections.

CONFORM - This section tests basic conformance to the C language Standard. By configuring the defs.h file appropriately, a compiler can be checked for conformance to the full ANSI/ISO Standard, or compatibility with several earlier features of C.

EXIN - The EXecutive INterpreter is a script language processor. When it is built and passes its own test set, the script processor is used as a basic tool in subsequent sections of the suite.

COVER - This section uses EXIN scripts to generate self-checking C programs that test coverage of all permutations of operators and data types.

LIMITS - EXIN scripts are used to determine the size of certain compile time limits, e.g., significant length of identifiers or how deeply include files may be nested.

EGEN - The Expression GENerator is a test program, written in C, which generates self-checking expressions of arbitrary complexity. It is the tool used by the STRESS section.

STRESS - Since it is impossible to test all possible legal expressions, a sampling approach is taken. Under the control of EXIN scripts, EGEN is used to generate complex self checking expressions. These can be completely random, or under the control of an expression template.

CONFORM

The CONFORM section of the Suite tests a compiler for conformance to the ANSI/ISO C Standard and consists of five C programs that test all of the required features of the language, preprocessor, and libraries.

The first two programs, ENVIRON and LANG, test the basic language and preprocessor. They are organized according to the section numbers of the ANSI/ISO Standard Document. PREC1 and PREC2 test operator precedence. All C language operators are tested in combination with all others. LIB tests the C library.

Each sub-section of the Standard has a corresponding function in these programs. Each program uses utility routines for checking that 2 integers are equal (iequals), that two addresses match (aequals), etc.

Errors are reported by writing a message of the form:

	     ERROR in c35.c line 234: (12) != (13)

Each program prints a summary of the form:

	     ***** 2 errors found in LANG *****

CONFORM/ERRTESTS

The Standard requires that diagnostic messages be produced for a source file which violates a syntax or constraint rule. The "error-tests" portion of CONFORM contains little C source files, each of which violates one such rule. Each file is to be compiled in the test environment, and the list of errors is to be checked against the "checklist" files provided in this section.

An example, which violates the syntax rules of 3.5, Declarations:


          /* 3.5      Syn SYNTAX-MANDATORY */
          /* all sc and types precede the declarator  */
          main() {
            { int i, int j; }
          }

CONFORM/CAPACITY

The Standard requires that each conforming implementation demonstrate that it can meet the translation-time capacity limits. One such program is provided; successfully compiling and executing CAPACITY. C will satisfy this requirement of the Standard (according to the ANSI/ISO Standard, the Vendor is entitled to substitute a different demonstration, if desired).

CONFORM/EXPRTEST

Roughly 1 Megabyte of machine-generated expression-testing code is provided as part of the CONFORM tests. These tests have themselves been tested on many different machine sizes and architectures. They provide greater certainty that the code generator produces accurate, conforming results.

EXIN

Once a compiler has passed the CONFORM section of The Plum Hall Validation Suite for C, it can be assumed that the compiler handles all of the syntax and semantics of the C language. The next step is to build EXIN (the EXecutive INterpreter) and have it pass its own test suite.

EXIN is a script processing language, and is used for many of the more advanced tests in the Suite. The language processed by EXIN is inspired by sh and csh from the UNIX operating system. Source code for EXIN is included as part of the suite. The program has been ported to a wide variety of environments. (If the environment does not provide a mechanism for invoking a sub-process and returning to the parent process, then EXIN can be used to generate test files, but not to control their compilation and execution.)

EXIN takes its input from a file specified on the command line, and processes one line of input at a time. The syntax is similar to the C language. There is one data type, a text string, which can be evaluated numerically by built-in operators. There is high-level control flow (for loops, while loops, if blocks, and switch'es. Other programs can be executed (such as the compiler under test). EXIN can write text to files, and is commonly used to generate C programs.

A few examples of the EXIN language are presented below. When the symbol # is seen, the rest of the line is a comment. Comments will be used to explain what the examples do.

	
	# The primitive data type is a string of characters. Here
	# a variable 'var' is assigned the string.
	set var = "this is a string"
	echo $var
	# This will print 'this is a string'

	# The 'for' loop is a basic iteration structure. The form is
	# for <name> in <list of words>
	# The variable <name> represents each word in turn.
	for i in a b c d
             for j in 1 2 3 4
	          echo $i$j
        	     end
	end
	# This loop prints all combinations of [a-d][1-4].

    	# It is also possible to create arrays of words. The index
    	# selects a word from the array (e.g. $array[1] produces
    	# 'one').
    	set array = "one two three"

    	# There are also numerical loops. This loop prints
    	#     one
    	#     two
    	#     three
    	for i = 1 to 3
             echo $array[$i]
    	end

    	# The final example shows a primitive form of C program
    	# generation.  The symbol '>' means to redirect the output
    	# to a specific file ... the one named immediately
    	# afterwards.  The symbol '>>' means to redirect, but rather
    	# than creating a new file, append to an existing one.

    	set COMPILE = "cc -c pgm.c"
    	set LINK = "cc -o pgm pgm.o"
    	set RUN = pgm
    	echo 'main() { ' > pgm.c
    	for vars in "int i=0;" "long j=1;" "short k=2;"
             echo "    $vars" >> pgm.c
    	end
    	for vars in i j k
             echo "    printf(\"%l \", (long)$vars);" >> pgm.c
    	end
    	echo "  }" >> pgm.c
    	$COMPILE
    	$LINK
    	$RUN
    	# End of example

The example produces this C program:

 
    main() {
        int i;
        long j;
        short k;
        printf("%l ", (long)i);
        printf("%l ", (long)j);
        printf("%l ", (long)k);
    }

It then compiles, links, and executes the program. This is a very simple example, but it illustrates the principle by which EXIN scripts are used to generate and test C programs.

These examples present only a small part of the facilities of EXIN. It is a very powerful language, which is used throughout the Suite to automate long sequences of testing.

COVER

Once built, EXIN can be used to run the scripts in the COVER section. These scripts generate exhaustive coverage of simple expressions in the C language. "Data sets" are collections of data declarations and initializations used in the generation of a self-checking C program. The "scalar" data set, for example, contains declarations for:

    char, unsigned char, signed char
    short, unsigned short
    int, unsigned int
    long, unsigned long
    float, double, long double

At the heart of the COVER section is an EXIN script which, given two data sets and a C language operator, generates all possible permutations. C operators (both unary and binary) can be covered with this script. There are options to declare the variables as either auto or static. Other data sets can be added as needed, but the current list of data sets is:

NAME DATA TYPES
scalar scalar data types
pscalar1 pointers to scaler data types
pscalar2 pointers to pointers to scaler data types
union unions of scaler types
punion pointers to unions of scaler types
struct structure members
pstruct1 pointers to structure members
pstruct2 pointers to structs with pointers to structs
array1 one dimensional arrays of scalar types
array2 two dimensional arrays of scalar types
bits bitfields
pbits pointers to bitfields

The COVER section contains scripts which allow the generation of C programs which check all possible permutations of:

These scripts can be restarted at any point. Each program generated by the COVER scripts reports errors of the form:

    auto scalar plus auto scalar at line 234: (12) != (13)

Each program also prints a summary of the form:

    ***** 2 errors in auto scalar plus auto scalar *****

Here is an example of the kind of program generated by the COVER scripts. The data sets were chosen as scalar vs. scalar, and the operator is "plus" (binary +).


    #include "types.h"
    main() {
        extern char *Filename;
        auto CHAR Ac = 7;
    #ifdef ANSI
        auto SCHAR Asc = 8;
    #endif
        auto SHORT As = 9;
        auto INT Ai = 10;
        auto UCHAR Auc = 11;
        auto USHORT Aus = 12;
        auto UINT Aui = 13;
        auto LONG Al = 14;
        auto ULONG Aul = 15;
        auto FLOAT Af = 16;
        auto DOUBLE Ad = 17;
    #ifdef ANSI
        auto LDOUBLE Ald = 18;
    #endif
        /* a second distinct data set would go here */

        Filename =  " auto scalar plus auto scalar ";
        iequals( __LINE__, Ac + Ac, 14 );
        iequals( __LINE__, Ac + Ac, 14 );
    #ifdef ANSI
        iequals( __LINE__, Ac + Asc, 15 );
        iequals( __LINE__, Asc + Ac, 15 );
    #endif
        iequals( __LINE__, Ac + As, 16 );
        iequals( __LINE__, As + Ac, 16 );
        iequals( __LINE__, Ac + Ai, 17 );
        iequals( __LINE__, Ai + Ac, 17 );
        iequals( __LINE__, Ac + Auc, 18 );
        iequals( __LINE__, Auc + Ac, 18 );
        iequals( __LINE__, Ac + Aus, 19 );
        iequals( __LINE__, Aus + Ac, 19 );
        iequals( __LINE__, Ac + Aui, 20 );
        iequals( __LINE__, Aui + Ac, 20 );
        lequals( __LINE__, Ac + Al, 21L);
        lequals( __LINE__, Al + Ac, 21L);
        lequals( __LINE__, Ac + Aul, 22L);
        lequals( __LINE__, Aul + Ac, 22L);
        dequals( __LINE__, Ac + Af, 23.);
        dequals( __LINE__, Af + Ac, 23.);

        /* .... excerpted from a 400 line file .... */

    }

LIMITS

The purpose of the LIMITS section of the Suite is to determine the value of certain compile time limits. The ANSI/ISO standard specifies a set of "minima maxima" that a conforming implementation must meet (see 2.2.4.1).

This section contains a set of scripts that determine what the actual value of these limits are (above or below the minimum requirement).

EGEN

EGEN is the Expression GENerator. Since it is impossible to test all possible C language expressions, the Suite provides this tool for generating complex expressions, and code to check that the right answer is calculated.

After passing the previous sections of the Suite, a compiler should be trustworthy in calculating the results of simple expressions. EGEN relies on this to generate self-checking expressions of arbitrary complexity. Each complex expression has its value calculated from the simpler components that make it up. For example, a compiler generating code for the expression:

    (a*b) + (c*d)

might have an error in keeping track of multiple registers and get the wrong answer. But calculated as:

    temp1 = a*b;
    temp2 = c*d;
    temp1+temp2

The right answer is more likely, given that expressions of this complexity have been exhaustively tested in the COVER section. By decomposing a complex expression into simpler pieces, EGEN expects to get the "right answer" and use that to check the compiler's result on the full complex expression. EGEN can also be invoked in a debugging mode that shows how each value in the decomposed expression was calculated.

EGEN requires several pieces of information:

The EGEN data set is a text file which describes the variables to be used in generating the expressions. Several data sets are provided with the STRESS section, and others can be created as desired.

The template is a list of operators or special tokens that specify the kind of expression to be generated. For example:

    +		- binary plus operator
    -		- binary minus operator
    *		- binary multiply operator
    neg		- unary minus operator
    ()		- parenthesis for grouping
    @		- EGEN randomly selects an operator
    {list}	- EGEN randomly selects an operator from the list

The expression templates can contain all C language unary or binary operators, as well as the special operators @, (), and {}.

An example of an EGEN command line is:

    egen -R23 -Dintegers -10 (+) * (-)

This sets the random number seed to 23, uses the data set defined in the file integers, and generates 10 self-checking statements of the form:

    (variable + variable) * (variable - variable)

EGEN randomly assigns variables from the data set to each variable, and tracks what the final value should be. Given the command line:

    egen -Dinteger -10 {+= -= *=} @

EGEN would generate 10 statements of the form

    variable OP1 variable OP2 variable

where each variable is randomly chosen from the data set, OP1 is randomly chosen from the set {+= -= *=}, and OP2 is randomly chosen as any C operator. EGEN generates code for the expression, code to check the result of the expression, and code to check the results of any side-effects.

Here is a real example from the output of:


    egen -Dinteger -4 {+= -=} {neg com} ( @ )


    main() {
        extern char *Filename;
        int true = 1, false = 0;
        auto unsigned int ui;
        static unsigned int *pui;
        auto int i;
        static int *pi;
        auto short s;
        static short *ps;
        auto char c;
        static char *pc;
        auto unsigned long ul;
        static unsigned long *pul;
        auto long l;
        static long *pl;
        register int rint1;
        register int rint2;

        ui = 3;
        pui = &ui;
        i = 10;
        pi = &i;
        s = 13;
        ps = &s;
        c = 20;
        pc = &c;
        ul = 65000;
        pul = &ul;
        l = 130000;
        pl = &l;
        rint1 = 1;
        rint2 = 2;

        Filename = "main";
        iequals( __LINE__, rint2 -= - (ui < c), 3);
        iequals( __LINE__, rint2, 3);

        iequals( __LINE__, *pi += - (s >>= ui), 9);
        iequals( __LINE__, *pi, 9);
        iequals( __LINE__, s, 1);

        iequals( __LINE__, rint1 +=  ~ (true ? *pc : *ps), -20);
        iequals( __LINE__, rint1, -20);

        lequals( __LINE__, *pl -= - (*pc /= *pui), 130006L);
        lequals( __LINE__, *pl, 130006L);
        iequals( __LINE__, *pc, 6);

        report(__FILE__);
    }

STRESS

The STRESS section is a collection of EXIN scripts and data setsfor EGEN. In the example below, the atest.gen data set has declarations for all C integral types. The script is intended to be run "forever" (e.g., in the background on a multi-tasking operating system). Periodically it can be checked to see if any compiler errors have been detected.


    # INTX4 - this script generates expressions of
    # complexity 4 (4 operators) in a continuous loop.
    # It is intended to be run in the background, with
    # results being checked periodically.
    # It is invoked as "exin intx4 -<number>
    # The number is a random number seed.
    # Each generated file is limited to 5 statements
    # because they tend toward 0 if too many are
    # calculated into 1 file

    # $$<variable> does a rescan after expansion

    # Configuration section:
    set FILE = 'test$SEED'
    set COMPILE = "cc -c $FILE.c"
    set COMPILE_OK = '$error == 0'
    set LINK = 'cc -c $FILE.o -o $FILE'
    set LINK_OK = '$error == 0'
    set RUN = '$FILE'
    set RUN_OK '$error == 0'
    set CLEANUP = 'rm $FILE.*'
    set LOGFILE = 'intx4.log'

    if ($# != 1)
        echoerr "syntax: exin intx4 -<number>
        exit 1
    end

    set SEED = $1
    while (1)
        egen -R$SEED -Datest.gen -5 @ @ @ @ > $$FILE.c
        $$COMPILE
        if ($$COMPILE_OK)
            $$LINK
            if ($$LINK_OK)
                $$RUN >> $LOGFILE
                if ($$RUN_OK)
                    $$CLEANUP
                end
            end
        end
        # each time through, the SEED += 1
        set SEED = $eval($SEED + 1)
    end