An Eclectic Sather Tutorial

Benedict Gomes

Contact me if you have any questions about this page or wish to make contributions/suggestions at gomes@icsi.berkeley.edu

This is a reformatted texinfo document. You can get to the HTML table of contents by clicking on any section heading. This section provides a FAQ like index into this tutorial. If you are reading a printed version of this document, the latest html version is at: http://www.icsi.berkeley.edu/~gomes/Sather/sather-tutorial.html

Questions and Answers

Environment

Typing Issues

Value Types

Bound Routines

Libraries

Other Stuff

Introduction

This document introduces some of the trickier concepts in Sather, particularly concepts that might not be familiar to a C++ or Smalltalk programmer. This document is somewhere between a tutorial and the language spec. It is not a guided tour of the language; rather, it explains (with examples) certain aspects of the language that might not be obvious from reading the language spec. I basically started writing this in lieu of sending replies to questions on the net. If you find this information helpful, let me ( <A href="mailto:gomes@icsi.berkeley.edu">gomes@icsi.berkeley.edu) know! Contributions are always welcome.

Who's working on what?
Library (1.0.7)
The Sather Home Page

Acknowledgements

This document incorporates ideas and feedback from David Stoutamire, David Bailey, Jerome Feldman, Michael Philippsen and others. Portions of this document will become a part of the 1.1 manual. This is a texinfo document, and will eventually be available as on-line GNU info files, as well as in textual and html form.

What is Sather?

This document has grown long enough that some introduction seems obligatory. This section is largely from the manual and will eventually be changed to reflect what is actually in this document.

Sather is an object-oriented language that supports highly efficient computation, powerful abstractions for encapsulation and code reuse, a flexible development environment, and constructs for improving code correctness. It has statically-checked strong typing, multiple inheritance, explicit subtyping which is independent of implementation inheritance, parameterized types, dynamic dispatch, iteration abstraction, higher-order routines and iters, garbage collection, exception handling, assertions, pre-conditions, postconditions, and class invariants.

Data structures in Sather are constructed from objects, each of which has a specific concrete type that determines the operations that may be performed on it. The implementation of concrete types is defined by textual units called classes.

Abstract types specify a set of operations without providing an implementation and correspond to sets of concrete types. Sather programs consist of classes and abstract type specifications. Each Sather variable has a declared type which determines the types of objects it may hold.

Classes define the following features: object attributes which make up the internal state of objects, shared and constant attributes which are shared by all objects of a type, routines which perform operations on objects, and iters which encapsulate iteration. Features may be declared private to allow only the class in which they appear to access them. Accessor routines are automatically defined for reading object, shared, and constant attributes and for writing object and shared attributes. The set of non-private routines and iters in a class define the interface of the corresponding type. Abstract types directly specify their interfaces. Routine and iter definitions consist of statements and these are constructed from expressions. There are special literal expressions for boolean, character, string, integer, and floating point objects.

Development Environment

The development environment consists of an incremental Sather-to-C compiler (written in Sather), emacs and gdb support and a tcl-based code browser. A fair number of libraries are available, as well as interfaces to other systems such as Tcl. A native compiler for a variant of Sather 1.0, Sather-K is also available from Karlsruhe in Germany. For a how-to introduction to writing your first program, please see Philippsen's tutorial which is available with the sather distribution and is also pointed to by:

http://icsi.berkeley.edu/~gomes/Sather/sather-top.html

in HTML form.

The compiler has been designed to permit the relatively easy development of a Sather interpreter. The compiler spits out virtual machine instructions which are translated into C; an interpreter for Sather must provide an interpreter for this abstract machine language. Resource constraints have not permitted its development. There is an unsupported interpreter for a subset of the language which does things differently (it does not make use of the abstract machine form); it is available in the distribution. If you are interested in working on the interpreter please contact sather-bugs or davids, both @icsi.berkeley.edu

The 1.0 Compiler

The 1.0 compiler is available by anonymous ftp from ftp://icsi.berkeley.edu/pub/sather/, as are binaries for various platforms. The Sather-K compiler is also available from this site. For information about the platforms to which the compiler has been ported, the FAQ etc., please see the main sather page:

http://icsi.berkeley.edu/Sather

The compiler options are documented in a man page under the Doc subdirectory of the distribution. There is also a document that describes the internal structure of the compiler available from

http://icsi.berkeley.edu/Sather/ps/compiler.ps.gz

Module Files

The compiler options (documented in the man page for the compiler: man cs) allow the user to specify a list of sather, C, "module" files and other compiler flags. Files of different types are distinguished by their suffixes (`.sa', `.c', `.module', etc.). Module files are collections of compiler options which behave exactly if they were typed in on the command line. Module files have no futher semantic meaning. This permits the usage of a single uniform, unix-like syntax whether specifying options on the command line or in a module file.

Module files are used to partition and organize the Sather library. Associated with each subdirectory of the Sather library is a module file, which just lists the sather and C files in that directory, as well as any modules contained within that directory. For instance, the System module lists the files in the system subdirectory and `Socket.module'. `Socket.module', in turn, lists the files in the socket sub-directory. Note that path-names in a module file are relative to the location of the module file.

The current convention is that all related files should be placed in a single directory and provided with a module file. Using the code from those classes should then just be a matter of mentioning it's module file. The Sather compiler automatically includes the top-most module of the library, `Library.module', which is found in the Library subdirectory of the distribution. This module in turn includes the rest of the sather library. If you do not wish to use the standard Library.module, you will need to change the value of the environmental variable SATHER_LIBRARY to point to a different module.

setenv SATHER_LIBRARY "/u/gomes/Sather1/SatherCopy/Browser/mylibrary.module"

You can use this method to avoid loading the standard Sather libraries, but don't forget to define the built-in classes (INT, BOOL, MUTEX etc.) or the compiler will complain.

Has clauses

If you notice -has clauses in some module classes, here is why they exist: in order to expedite parsing, the compiler permits the specification of which classes are defined in each file. For instance,

my_math.sa -has my_math.sa MY_MATH_CLASS_FOO MY_MATH_CLASS_BAR 

However, "has" clauses are not required by the compiler and users should not use them.

Comments

You can put comments in a module file by using the standard Sather syntax "--". Module files also accept an extended comment syntax of the form

(*  my multi
 line 
comment
*) 

Since the module file is displayed by the browser, these comments can be used to describe the nature of a particular module in the Library.

Module files with different suffixes

Module files which do not have the suffix .module must be specified with the compiler option -com <command file name>. This permits the use of the older style of `.com' files, which may still be necessary on some systems with file name restrictions.

Over-riding library files

It is not particularly easy to over-ride just a single file in the library. The straightforward way to do it is to copy over the entire library and modify the file. However, as a byproduct of the current behaviour of the compiler (having to do with -has clauses), it sometimes suffices to just copy the one file you want to fix and then mention this amended file explicitly during your compile (without a -has clause). The compiler may find your version. This is clearly a hacky solution, but it often works and can save you some trouble. If it does work, it will work correctly (i.e. it will either choose your version or complain that there were two definitions - it will never silently ignore your explicitly specified version).

Debugging Sather

You can basically use gdb for symbolic debugging. David took a great deal of trouble to make sure that the generated C names look as much like the sather names as possible. Classes turn into structs with obvious feature mappings. You can usually use the "info" command (eg. info locals) to get the actual names, and the mapping is usually quite clear, since gdb under emacs will take you to the right source location (if you compile with the -debug flag).

Furthermore, I find that programming in Sather is a very different experience from programming in C. Far fewer bugs make it through the compilation phase, and with checking turned on, those bugs are easily detected - very often just void references, or array bounds checks which you can locate by running gdb (even without the debug flag) and looking at the stack trace at the point where the error occurs. The gdb command "where" will give you the stack trace. I believe that gdb has a facility for name unmangling, given some table, which C++ may use. If so, it might not be too much work to hook into it. If you are interested in undertaking this task, please contact sather-bugs@icsi.berkeley.edu. Since Sather stores type information along with each object, there is the potential for a far more sophisticated debugging environment/object browser than would be possible with C++.

One more point to note: if, for some reason, you do have to debug the generated C, do not turn on the debug flag. Instead, use: -C_flag -g. Useful commands for debugging with gdb

(gdb) info locals
Prints out all the locals currently visible
(gdb) print self
Prints out the pointer to the current self 
Eg: $16 = (struct FMULTIMAPSTRINT_struct *) 0x57ed8
(gdb) print *self
Prints out the value of self
$17 = {header = {tag = 4, destroyed = 0}, hsize = 3, n_targets = 50, 
  asize = 17, arr_part = {{t2 = 0x0, t1 = 0x0}}}
(gdb) print *self->arr_part@3
$15 = {"gar"}
Printing out a string/array of three elements
(gdb) bt
Print out the current contents of the stack.
(gdb) up
Go up a level in the stack. You normally start out too low (in some
abort routine). Keep going up until you hit a sather line
(gdb) down 
Go down a stack level
(gdb) break
Set a breakpoint (easier to do from within emacs)
(gdb) break sather_main
Set a break point at the sather main.
(gdb) step
Go into the next line/call
(gdb) next
Go through the next line/call
(gdb) finish
Execute till end of current function.
(gdb) continue
Continue till the next break point 
$

Emacs provides good support for debugging with gdb. Type M-x gdb in emacs and supply the sather executable. Going upwards to the correct line will take you to the sather code as soon as you hit a sather line. You can set breakpoints by hitting "C-x Space" in the sather code.

TAB does symbol name completion and ESC-? provides a list of completions (useful for finding mangled function and variable names). There is currently some work being done to provide name mangling under gdb.

COMING SOON!!!!

Claudio Fleiner and I have created a tool that will allow you to quite cheaply look at all your sather variables (with the sather variable names), including stack frames from within gdb. With a higher cost in compile time, you can also look at at these data structures graphically. The option will be made available sometime after 1.0.8 (since this was developed during the 1.0.8 pre-release, we will wait till after 1.0.8 is released to merge the compiler changes with our CVS source tree).

The Sather Emacs Mode

There is an emacs editing mode available for Sather (started by Steve Omohundro, but largely written and currently maintained by Kevin K. Lewis, lewikk@aud.alcatel.com), under the Emacs directory of the distribution. The `.elc' files are byte-compiled, faster versions of the `.el' files. The documentation for the emacs mode is available at the beginning of the file `sather.el'. Effective use of code highlighting requires you to set up some variables as described in the beginning of `sather.el'.

The Browser

Documentation for the sather browser (which is written in tcl and distributed under the Browser subdirectory of the distribution) is available at

http://icsi.berkeley.edu/~gomes/Sather/bs.html

This documentation includes recent bug-fixes and coming attractions.

Concrete Classes

This section illustrates:

An example of a how a simple class FOO may be defined. Attributes with varying degrees of privacy are illustrated, along with a create routine that creates a new instance of this class.

class FOO is
   -- A concrete class i.e. a class whose features are
   -- all implemented

   attr a: INT;
      -- This is implemented by the two implicit routines
      -- a: INT   [the public reader routine]
      -- a(new_value: INT)   [the public assignment routine]
      -- Since assignment is syntactic sugar for a call to a function of
      -- one variable
      --   foo.a := b is syntactic sugar for foo.a(b),

   private attr b: FLT;
      -- This is implemented by the two implicit routines
      -- private b: FLT   [the private reader routine]
      -- private a(new_value: FLT)   [the private assignment routine]

   readonly attr c: CHAR;
      -- This is implemented by the two implicit routines
      -- c: CHAR   [the public reader routine]
      -- private c(new_value: CHAR)   [the private assignmnet routine]

   readonly attr g: INT;

   create: SAME is
      -- Create routine, invoked by calling #FOO (# is syntactic
      -- sugar for create routines)
      -- SAME, the returned type, refers to the current class i.e. FOO
      res ::= new;  -- The ::= is used to combine 
                             -- the declaration of "res" with its assignment

      -- Set the attributes to various literal values.
      res.a := 5;
      res.b := 3.2;
      res.c := 'd';
      res.g := 42;
      res.e := "abcd";
      return(res);
   end;

   create(a_value: INT): SAME is
     -- Another create routine that takes an integer argument.
     -- We can overload routines, provided that they differ either in
     -- the number of arguments, presence or absence of return types
     -- or the concrete type of the arguments (i.e. we can't overload
     -- a routine that takes two abstract types)
     res ::= #;  -- This calls the create routine above. Recall that
                 -- the # sign is syntactic sugar for the 
                 -- create  routine.
                 -- Since we are in FOO, we don't have to specify the class
                 -- Otherwise, we would have to say #FOO
                 -- The ::= is used to combine the 
                -- declaration of "res" with its assignment
     res.a := a_value;  -- Set the a attribute to the correct value
     return(res);
   end;
end;

class TEST_FOO is
   
   main is
        p: FOO := #FOO;   -- Create an instance of foo
                          -- Equivalent to calling the first create routine
        q  ::= #FOO(3);   -- Another instance is created, using
                          -- the second create routine which takes the 
                          -- argument 3.
                          -- The type of the variable "q" is inferred from 
                          -- the rhs. The ::= is used to combine declaration
                          --  and assignment
        #OUT+p.a+"\n";    -- Will print out 5
        #OUT+q.a+"\n";    -- Will print out 3 (the value we set it to)
        #OUT+q.g+"\n";    -- Will print out 42
        -- #OUT+q.b+"\n"; ILLEGAL! the compiler will complain 
                          -- The attribute b is private

        q.a := 9.0;       -- Set the value of the attribute "a" in FOO
        --    q.g := 53;  ILLEGAL! q.g is readonly.  This
                          -- means that the writer routine for g is private.
      
   end;
end;

Each attribute declaration really traslates into two routines.

  attr a: INT;

is sugar for

  a: INT;                   -- Used to get the value of a

  a(value: INT);            -- Used to set the value of "a"

If access is restricted the underlying routines are typed accordingly.

  private attr a: INT;

is syntactic sugar for

  private a: INT;          -- Used to get the value of a within FOO

  private a(value: INT);   -- Used to set the value of "a" within FOO

And ...

  readonly attr a: INT;

is syntactic sugar for

  a: INT;                  -- Publicly used to get the value of a

  private a(value: INT);   -- Used to set the value of "a" within FOO

The := sign is just syntactic sugar for the second function. In fact, you can call any routine with one argument using the := syntactic sugar, though it almost never makes sense except for attribute access.

Abstract Types

This section illustrates

Abstract Type Definitions

$MY_ABSTRACT_TYPE illustrates an abstract type. FOO and BAR are subtypes. Below, we will illustrate how the abstract type may be used.

type $MY_ABSTRACT_TYPE is
   -- Definition of an abstract type.  Any concrete class that subtypes
   -- from this abstract class must provide the two routines
   -- ukridge and mulliner

    ukridge: INT;

    mulliner: CHAR;

end;

class FOO < $MY_ABSTRACT_TYPE is
    -- A concrete class, which supports the $MY_ABSTRACT_TYPE interface.
    -- It is said to subtype from $MY_ABSTRACT_TYPE

   attr ukridge: INT;
    -- This generates 
    -- ukridge: INT      [reader routine]
    -- ukridge(val: INT)  [assignment routine]
    -- It is the reader routine that implements the "ukridge"
    -- feature that that is required by $MY_ABSTRACT_TYPE

   private attr b: FLT;

   readonly attr mulliner: CHAR;

   create: SAME is
     -- Create routine, invoked by calling #FOO (# is syntactic
     -- sugar for the create routine)
     -- SAME, the returned type, refers to the current class i.e. FOO
      res ::= new;
      res.ukridge := 5;
      res.b := 3.2;
      res.mulliner := 'd';
      res.e := "abcd";
      return(res);
   end;

   create(ukridge_value: INT): SAME is
     -- We can overload routines, provided that they differ either in
     -- the number of arguments, presence or absence of return types
     -- or the concrete type of the arguments (i.e. we can't overload
     -- a routine that takes two abstract types)
     res ::= #;  -- This calls the create routine above. Recall that
                 -- the # sign is syntactic sugar for create.  The class is
                 -- assumed to be the current class. The type of res is inferred
     res.ukridge := ukridge_value; 
                 -- Set the "ukridge" attribute to the current class value
     return(res);
   end;

end;

Type Conformance

Sather's subtyping rule is sometimes called contravariance and sometimes called conformance, a source of some confusion in the (way too frequent) wars about the subject, usually phrased as covariance vs. contravariance. The rule is quite simple, and a bit counter-intuitive at first.

        type $HIGHER is ... some definition end;     
        type $HIGH < $HIGHER is ... some definition ... end;
        type $LOW < $HIGH is ... some definition ... end;
        type $SUPER is
            rout(a: $HIGH): $HIGH;
        end;

The question now is, what are the legal signatures for "rout" in any subtype of $SUPER, say $SUB. The Sather rule says that the arguments to rout in $SUB must either be the same as, or supertypes of the arguments in $SUPER (contra-variant). The return value must either be the same as or a subtype of the return value in $SUPER (co-variant). Hence, the following signatures are all legal choices.

        type $SUB < $SUPER is    rout(a: $HIGH): $HIGH;     end;
        type $SUB < $SUPER is   rout(a: $HIGH): $LOW;      end;
        type $SUB < $SUPER is   rout(a: $HIGHER): $HIGH;      end;
        type $SUB < $SUPER is   rout(a: $HIGHER): $LOW;      end;

The following types are ILLEGAL

    type $SUB < $SUPER is    rout(a: $LOW): <any return value>;     end;

In practice, the argument types are almost always the same type (invariant, which is the C++ rule). There are certain unusual situations in which contravariance of arguments is useful. The return types are quite often subtypes.

The reason for this rule is that it is THE ONLY STATICALLY TYPESAFE RULE (invariance, as in C++, is a special case of this rule). Pure covariance is NOT statically typesafe. This is a bit hard to understand. Consider a variable foo of type $SUPER and suppose we had defined a class SUB in the ILLEGAL covariant manner.

        class SUB < $SUPER is 
                rout(arg: $LOW): $HIGH;

and we have a variable "abstract_super"

        abstract_super: $SUPER := #SUB;

Since "abstract_super" can hold any subtype of $SUPER, it can actually be instantiated to some SUB object. Now suppose we call the routine "rout" on the variable "abstract_super".

        actual_high: $HIGH;   -- the actual object held is of type $HIGH
        res: $HIGH := abstract_super.rout(actual_high)

From looking at the signature of the decared type of "abstract_super",

        type $SUPER is .... rout(arg: $HIGH): $HIGH

this looks perfectly ok, since $SUPER::rout can take arguments of type $HIGH. But the call actually goes to the routine in SUB

        SUB::foo(arg: $LOW): $HIGH;

And it is illegal to call this routine with a $HIGH as argument (it is not a subtype of $LOW)! There is no way to detect this problem at compile-time. However, a run-time check may be used.

Covariance vs. Contravariance.

Periodically a co vs. contra-variance argument flares up on the net. The arguments are usually very involved and (imho) not worth following. They often revolve around whether cows are truly herbivores or some such detail of the animal kingdom. Below, I will attempt to briefly describe the technical aspect of the argument - the relative typesafety of both systems.

First some terms. This argument is purely about argument types (there is no disagreement about return types). Hence, the term contravariance is often used instead of conformance. The basic argument arises because Eiffel follows the covariant rule for arguments while Sather is contravariant (and other languages like C++ are invariant). At issue is whether both rules are statically typesafe.

You will frequently hear claims by the Eiffel people that their system uses the more natural covariance rule and is still statically typesafe due to a "closed-system" checking algorithm, which is not yet implemented in the Eiffel compilers (someone correct me if I'm wrong). And the conformance sceptics will say - not possible - that's undecidable (I can give you a somewhat handwavy, but basically correct argument to this effect). The two sides mean slightly different things by "type-safe", and I believe that the contravariant side is clearer. The truth is that the closed system type checker promised by Eiffel is conservative. This means that perfectly legal function calls may be rejected by the algorithm because it cannot prove that the call is typesafe. The undecidablity argument shows that however good the algorithm, since the problem is inherently undecidable, it will always be possible (quite easy, actually) to show that statically legal function calls will be rejected. What this can lead to is groups of people just turning off the type safety check, since they believe that their program is ok, even though the system cannot prove it. If this happens, all static typesafety is lost.

With contravariance, on the other hand, any legal function call (based just on the signatures, and not on some arbitrary property of the computation involved) will be accepted by the compiler and will be typesafe. If covariant behaviour is required, it must be obtained by a typecase within the routine body. This clearly indicates the only points at which a type violation may occur within a program.

Parametrized Abstract Classes

All Sather classes may be parametrized by any number of type parameters. Each type parameter may have an optional type bound; this forces any actual parameter to be a subtype of the corresponding type bound. Given the following definitions,

abstract class $A{T < $BAR} is
  foo(b:T): T;
end;

type $BAR is

end;

class BAR < $BAR is

end;

we may then instantiate an abstract variable a:$A{BAR}. BAR instantiates the parameter T and hence must be under the type bound for T, namely $BAR. If a type-bound is not specified then a type bound of $OB is assumed.

Why have typebounds?

The purpose of the type bound is to permit type checking of a parametrized class over all possible instantiations. Note that the current compiler does not do this, thus permitting some possibly illegal code to go unchecked until an instantiation is attempted.

How are different parametrizations related?

It is sometimes natural to want a $LIST{MY_FOO} < $LIST{$MY_FOO}. Sather, however, specifies no subtyping relationship between various parametrizations. The reason is contravariance. Consider the case where the parameter type is used to specify an argument type in a routine

abstract $LIST{T} is
   append(element: T);
end;

abstract class $POLYNOMIAL is 

end;

abstract class $SQUARE < $POLYNOMIAL is

end;
a: $LIST{$POLYNOMIAL};
b: $LIST{$SQUARE};

We may wish to have a < b, so that we can pass a list of squares to any routine that can deal with a list of polynomials. However, contravariance would not permit $LIST{$SQUARE} < $LIST{$POLYNOMIAL}, since append(element: $SQUARE) cannot be under append(element: $POLYNOMIAL) due to contravariance.

Supertyping

What is the rationale behind supertyping clauses, and how are they used ?

You define supertypes of already existing types. The supertype can only contain routines that are found in the subtype i.e. it cannot extend the interface of the subtype.

        type $IS_EMPTY{T} > FLIST{T}, FSET{T} is
           is_empty: BOOL;
        end;

The need for supertyping clauses arises from our definitition of type-bounds in parametrized types. Any instantiation of the parameters of a parametrized type must be a subtype of those typebounds. You may, however, wish to create a parametrized type which is instantiated with existing library code which is not already under the typebound you want. For instance, suppose you want to create a class FOO, whose parameters must support both is_eq and is_lt. One way to do this is as follows:

class BAR{T}  is
        -- Library class that you can't modify
   is_eq(o: T): BOOL;
   is_lt(o: T): BOOL;
end;
 
type $MY_BOUND{T} > BAR{T} is
    is_eq(o: T): BOOL;
    is_lt(o: T): BOOL;
end;
 
class FOO{T < $MY_BOUND{T}} is
   some_routine is
        -- uses the is_eq and is_lt routines on objects of type T
        a,b,c: T;
        if (a < b or b = c) then
                ..
        end;
  end;
end;

Thus, supertyping provides a convenient method of parametrizing containers, so that they can be instantiated using existing classes. An alternative approach is the structural conformance rule that Sather-K uses.

Value Classes

This section illustrates the declaration of value types using a simple version of CPX. We also describe the benefits of value classes and when you might consider using them. Similar semantics can be obtained using immutable reference classes - we show how with a reference version of CPX. Finally, we discuss the aliasing problems that can arise in FSTR and similar classes when not using value semantics for efficiency reasons.

What are Value Classes

At a fundamental level: value classes define objects which, once created, never change their value. A variable of a value type may only be changed by re-assigning to that variable. When we wish to only modify some portion of a value class (one attribute, say), we are compelled to reassign the whole object. Hence, an attribute attr a: INT of a value class has a setting routine called a(new_a_value: INT): SAME, which returns a new value object in which the attribute a has the value new_a_value. This contrasts with a reference class, in which the setting routine for a similar attribute would have the signature a(new_a_value: INT). This aspect of value classes causes much confusion when one is first introduced to Sather; a chief goal of this section is to help dispell this confusion.

For experienced C programmers the difference between value and reference classes is similar to the difference between structs (value types) and pointers to structs (reference types). Because of that difference, reference objects can be referred to from more than one variable (aliased). Value objects can not.

The basic types mentioned (except arrays) are value classes. Reference objects must be explicitly allocated with new. Variables have the value void until an object is assigned to them. Void for reference objects is similar to a null pointer in C. Void for value objects means that a predefined value is assigned (0 for INT, `\0` for CHAR, false for BOOL, and 0.0 for FLT). Accessing a void value object will always work. Accessing a void reference object will usually be a fatal error.

Value Class Example

We illustrate the use of value classes through a commonly used example: the complex class CPX. The version shown here is a much simplified version of the library class CPX{T}. The key point to note is the manner in which attributes values are set in the create routine.

value class CPX  is
   -- Complex numbers.

   readonly attr real,imag: FLT;
        -- Real and imaginary parts.
   
   create(re,im:FLT):SAME is 
      -- Create a new value type
      -- The following whole routine can be concisely written as:
      -- return(real(re).imag(im));  
      -- For the sake of clarity, we expand the steps involved
      res: SAME;         
         -- Declare a new CPX variable, 
         -- initialized with re, im = 0.0
      res := res.real(re);  
         -- The attribute setting routine for 
         -- real is real(FLT): SAME which takes an argument
         -- of type FLT and returns (conceptually) a new CPX number
         -- i.e. res now holds a new value.
      res := res.imag(im);   
         -- The new "res" has the correct value
         -- for the real part.  Now we set the imaginary part and re
         -- assign to res
       return(res);        
         -- Return the result
    end;
   
   is_eq(c: SAME): BOOL is 
      -- Return true if this object is equal to "c"
      return (real=c.real and imag=c.imag)
   end;

   plus(c:SAME):SAME is
      -- The sum of self and `c'.
      -- res is a new CPX object 
      -- # is syntactic sugar for the create routine shown above
      -- We create a new complex number whose real part is the sum
      -- of self' real part and c's real part and likewise with the
      -- imaginary part
      res ::= #(real+c.real,imag+c.imag); 
      -- We could also write res: CPX := #CPX(real+c.real+imag+c.imag);
      -- However, # refers to the create in this context i.e. CPX::create
      -- The return value of create is a CPX, therefore the type of res is
      -- also CPX. Type inference magic!
      return(res);
    end;

   str:STR is
      -- A string representation of self of the form "1.02+3.23i".
      buf:STR;
      if imag >= 0.0 then 
         buf:= real.str + ("+") + imag.str + "i";
      else buf:= real.str + ("-") + (-imag).str + "i" end;
      return buf 
    end;

end;

The complex class may then be used in the following manner.

        b: CPX := #(2.0,3.0);
        d: CPX := #(4.0,5.0);
        c: CPX := b+d;

The key point to note is the assignment of the attributes of the value class. The attribute imag can be viewed as syntactic sugar for the two routines

        imag(new_imag_value: FLT): SAME
                -- Setting attribute routine
        image: FLT
                -- Getting attribute routine

The reason that the setting routine returns SAME is that we cannot really modify a value object. Rather, changing an attribute generates a new value; the setting routine, imag(FLT):SAME, returns this new value.

If CPX were a regular reference class, the attribute access routines would be

        imag(new_imag_value: FLT);
                -- Setting attribute routine
        image: FLT
                -- Getting attribute routine

For a more elaborate example, please consult the section on nil and void (see section Nil and void).

Value Class Pros and Cons

Value classes have several advantages. One virtue is their immutable semantics, which makes aliasing bugs impossible. You can also get the same effect by creating an immutable reference class (for example, look at the STR class) in which every modification makes a new copy. Value classes have no heap management overhead, they don't use space to store a tag, and the absence of aliasing makes more C compiler optimizations possible. Furthermore, since value objects are stored on the stack, they do not need to be garbage collected. For a small class like CPX, all these factors combine to give a significant win over a reference class implementation. Balanced against these positive factors in using a value class is the overhead that some C compilers introduce in passing the entire object on the stack. This problem is worse in value classes with many attributes.

Unfortunately the efficiency of a value class appears directly tied to how smart the C compiler is; "gcc" is not very bright in this respect.

My rules of thumb for creating value classes

Nil and void

One complexity in using value classes is the meaning of void, which is usually used to indicate a non-existent object. void is perfectly clear in the case of reference objects, where it is implemented by the NULL pointer. However, a value object exists as soon as it is declared (initialized to all zero values), and is never non-existent. Sather's solution is to say that a value object is void if it has this initial, all zero value. This introduces its own problems, since we may well want to use the all-zero value as a legitimate value (for instance, we frequently want to make use of INTs and FLTs with values of zero!).

To illustrate this discussion, we start with a simple example. The same class CPX used above is re-implemented, this time with an abstract type $CPX, below which are a reference implementation REF_CPX and a value implementation VAL_CPX.

        type $CPX is
           -- For the purposes of illustration, we have an abstract
           -- class that might be implemented as either a value or 
           -- reference class
           real: FLT;
           imag: FLT;
           plus(arg: $CPX): $CPX;
        end;

        class REF_CPX < $CPX is
          readonly attr real: FLT;  -- Make the reader publicly available
          readonly attr imag: FLT;

          create(r,i: FLT): SAME is
            res ::= new;
            res.real := r; 
            res.imag := i;      
            return(res);
          end;

          plus(arg: $CPX): REF_CPX is
            -- By the contravariant rule:
            -- The argument(s) of plus must either be of same type
            -- as $CPX or some supertype of $CPX. 
            -- Using a supertype is very rarely useful.
            -- The return value can be either of type $CPX or some 
            -- subtype of $CPX.
            new_real: FLT := real+arg.real;
            new_imag: FLT := imag+arg.imag;
            res: REF_CPX := #REF_CPX(new_real,new_imag);
            return(res);
          end;
        end;

        value class VAL_CPX < $CPX is
             -- Same as the CPX class mentioned previously 
          readonly attr real,imag: FLT;

          create(re,im:FLT):SAME is 
              -- More concise version of earlier create
             return(real(re).imag(im));
          end;
   
          plus(c:$CPX): VAL_CPX is
             res: VAL_CPX := #(real+c.real,imag+c.imag); 
             return(res);
          end;
        end;

As you can see, there are only subtle differences in the way the value and the reference classes are created. However, what if we want to perform the void test on the variables below?

        a: VAL_CPX;
        b: REF_CPX;
        #OUT+void(a);    -- "true"
        #OUT+void(b);    -- "true"

We would like both calls to generate void, since neither variable has been properly initialized, and Sather's definition of void achieves exactly this. However, problematic situations can arise, since there will doubtless be occassions when you want to use a zero valued complex number!

        a: VAL_CPX := #VAL_CPX(0.0,0.0);
        b: REF_CPX := #REF_CPX(0.0,0.0);
        #OUT+void(a);   -- Returns "true"!
        #OUT+void(b);   -- Returns "false"

will generate void for the a and non-void for the call b, because a happened to be set to our void value! The reference class has no such confusion.

The birth of nil

To get around this problem, we can provide a user-defined nil value for the value class - nil will be an unused value which will signify a non-existent value object. In the case of the FLT class, a good nil value is "NaN".

        value class VAL_CPX < $NIL{VAL_CPX}, $CPX is

           is_nil: BOOL is
                -- A complex number is nil if both real
                -- and imaginary parts are nil
               return(imag.is_nil and real.is_nil);
           end;

           nil: VAL_CPX is
             -- Return the "nil" value, which consists of 
             -- a complex number with two nil FLT components
             return(#(FLT::nil,FLT::nil))
           end;
        end;

In this case, we make use of nil FLT values to signify a nil complex number. The FLT class itself uses "NaN" (not a number) to signify a nil value. We subtype from the class $NIL to indicate that VAL_CPX provides the two routines is_nil and nil.

     type IS_NIL is
        is_nil: BOOL; -- Return true if this object is nil
     end;
     type $NIL{T} < $IS_NIL is
        nil: T;    -- The actual nil value
     end;

Hence, when checking to see whether an object is non-existent we can now do the following:

         object_does_not_exist(a: $CPX): BOOL is
            -- Returns true if "a" is a non-existent object.
           typecase a
           when $IS_NIL then return(a.is_nil);
                -- When a is a subtype of $IS_NIL
           else return(void(a)) end;
        end;

This is a fairly standard idiom; you will find code similar to this, particularly in parametrized classes such as FMAP etc, where the nil value is used to indicate empty spots in the hash table.

        a: VAL_CPX := VAL_CPX::nil;
        b: REF_CPX;
        #OUT+object_does_not_exist(a);   -- Returns "true"
        #OUT+object_does_not_exist(b);   -- Returns "true"
        
        c: VAL_CPX := #VAL_CPX(0.0,0.0);
        d: REF_CPX := #REF_CPX(0.0,0.0);
        #OUT+object_does_not_exist(c);   -- Returns "false"
        #OUT+object_does_not_exist(d);   -- Returns "false"

We now need to always initialize value classes when we declare them to the nil value (it would be nice if the language did this automatically, but this introduces other complications). This solution suffers from another problem; for some classes such as INT, there is no single value that can be set aside to signal nil, since we would then not be able to use that value.

Please note that the IS_NIL class does not yet exist, but will shortly.

Immutable Reference Classes

It is possible to get the same semantics as a value class by defining an immutable reference class. Immutable reference objects return a copy of the object whenever an operation might modify the object. An immutable class is not a Sather construct. Rather, it is a property of a reference class interface.

Immutable CPX

Value semantics can be achieved for reference classes my making them immutable. All operations that might be most naturally defined to modify the original object, instead return a new object with the appropriate modification.

We can imagine defining a "square" operation in REF_CPX as follows:

        square is
            r ::= real*real-imag*imag;
            i ::= 2*real*imag;
            real := r;
            imag := i;
          end;

However, our complex numbers would then behave unexpectedly.

       a: REF_CPX := #REF_CPX(3.0,4.0);
       c: REF_CPX := a;                -- c is (3.0,4.0)
       a.square;                       -- c's value has also changed to (-7,24)

The resulting value of a is as we expect (-7.0,24.0). The value of c, however, has changed to be (-7.0,24.0). The reason, of course, is that c points to the a object.

The REF_CPX class we defined above was immutable. Square would return a copy of the class instead of modifying the original.

        square: REF_CPX is
            r := real*real-imag*imag;
            i := 2*real*imag;
            return(#REF_CPX(r,i));
        end;

       a: REF_CPX := #REF_CPX(3.0,4.0);
       c: REF_CPX := a;                 -- c is (3.0,4.0)
       a := a.square;                   -- c's value is still (3.0,4.0);

STR and FSTR

The class STR is an example of such an immutable reference class in the library - a reference class with value semantics. We generally expect value semantics for strings i.e.

        a: STR := "Beetle ";
        b: STR := a;
        a := a+"dung";

At the end of this example, we would like a to hold "Beetle dung" and b to hold "Beetle". However, were STR a reference class with aliasing, b might well be modified along with a. There are two possible solutions. The obvious one is to make STR a value class. However, strings can be extremely large (in the complier, whole files are held in strings), and should definitely not be passed on the stack, which is the current implementation. The other choice is to make an immutable reference class, where every modification generates a copy. However, this copying is inefficient. So we have two types of strings. The basic class STR, which is an immutable reference class and the type of the string literal. We also have the more efficient class FSTR, which is not immutable. Using FSTR can therefore result in aliasing bugs.

To explain this futher we consider the plus operation in FSTR and in STR (this version is simplified to explain the point. The library version is more general and efficient).

        
... From STR

    plus(s: STR):SAME is
        -- A new string obtained by appending `s' to self.
        -- This routine is actually from  STR::append(STR) in the library.

        -- If either self or sarg is void, return the other
        if void(self) then return s; end;
        if void(s) then return self; end;
        selfsize::=asize;            -- Determine the size of self 
        ssize::=s.asize;             -- and sarg strings
        r::=new(selfsize+ssize);     -- Allocate a new string for result
        r.acopy(self);               -- Copy self 
        r.acopy(selfsize,s);         -- and the argument into the new string
        return r;                    -- return the new string
    end;

        
... From FSTR

    plus(s:SAME):SAME is
    -- Append the string `s' to self and return it. 
        -- r will hold the result
        r:SAME;
        l ::= s.length;
        -- If the argument would fit into our left over space,
        -- then let the result be self
        if (loc + l < asize) then
            r:=self;
        else
        -- Otherwise, make the result be a new string
        -- that is twice the (length of self+length of argument)
        -- (this is the technique of amortized doubling which
        -- reduces the total number of allocations necessary 
        -- to incrementally build up an object to a particular
        -- size)
            r :=new(2*(asize+l));
            -- Set the end pointer of the string
            -- Then copy 
            if (~void(self)) then
              -- If self is not void, copy it into the new string
              r.loc := loc;
              r.acopy(self);      
               -- Mark the old string as destroyed.
               -- This helps prevent aliasing bugs - if someone
               -- tries to access the old string and destroy
               -- checking is on, then an error will occur.
               -- We do this because we still would like to 
               -- avoid aliasing behaviour for strings.
              SYS::destroy(self);   -- The old one should never be used now.
            end;
        end;
        -- "r" now holds the original string 
        --  and space enough to copy the argument into it.
        -- If the argumnet string has a size of 0, just return return
        -- current result
        if (l = 0) then return r; end;
        -- Set the new location
        r.loc := r.loc + l;
        -- Copy the argument into the result
        r.acopy(r.loc-l,s);
        return r;
    end;

Both STRs and FSTRs are meant to behave in roughly the same manner, but FSTRs can exhibit aliasing bugs as shown below.

        a: STR := #STR;         -- A new string
        a := a+"Beetle";        -- Append the string "Beetle" to a
        b: STR := a;            -- "b" now points to "a"
        c: STR := a+" Dung";
        #OUT+c;                 -- Outputs "Beetle Dung"
        #OUT+b;                 -- Outputs "Beetle"  
                                -- "b" has not been modified by aliasing
                                -- because the append operation returned 
                                -- a new string.

        -- To avoid seeing the effect of amortized doubling, we
        -- first create a large amount of space for "a"
        a: FSTR := #FSTR;     -- Create an empty FSTR, which starts out with 
                              -- space for 16 chars
        a := a+"Beetle";      
        b: FSTR := a;         
        c: FSTR := a+" Dung";
        #OUT+c;               -- Outputs "Beetle Dung"
        #OUT+b;               -- Outputs "Beetle Dung"  
                              -- "b" has been modified
                              -- because the append operation modified the
                              -- the original string.
        

Amortized Doubling further complicates matters

The presence of such unexpected side effects can be even more tricky in reality, since it is usually not obvious to the user when exactly a new string will be allocated (due to the amortized doubling). It is possible for a program that works fine with "n" characters to break when we add or delete a few characters.

        a: FSTR := #FSTR("A test");     -- 6 chars
        a := a+" of";                   -- Amortized doubling kicks in
                                        -- "a" now has space for 18 characters
        b: FSTR := a;                   -- "b" is aliased to "a"
        a := a+"123456789012"           -- Add on another 12 characters - this 
                                        -- won't fit into "a"'s 18 chars and 
                                        -- so a new string will be 
                                        -- allocated and returned.
        #OUT+a;                         -- "A test of12345678901"
        #OUT+b;                         -- "A test of";   
                                        -- The original "b" is left unchanged.

However, if we change this program slightly, aliasing suddenly rears its ugly head!

        ... First 3 lines are the same ...
        a := a+"123456"                -- Add on another 6 characters - this 
                                       -- now fits into the space "a" has 
                                       -- left over.
                                       -- We just modify the original
                                       -- string.
        #OUT+a;                        -- "A test of123456"
        #OUT+b;                        -- "A test of123456";  
                                        -- b is changed due to aliasing!

The destroy check allows us to catch some of these problems. If destroy checking is on, reading or modifying b should generate a run-time error.

Using FSTR

While the preceeding discussion may make FSTRs seem dangerous, in practice a couple of simple rules will prevent any problems.

Value Class History

Value classes were initially proposed to generalize the notion of basic types such as INTs and FLTs. They were particularly relevant to the use of Sather on machines which had special arithmetic types (FLTs with differing precisions etc.). Later, however, the value semantics came to be emphasized, they acquired attributes and were allowed to be under abstract types.

Bound Routines

Bound routines are Sather's equivalent of function pointers. Like everything else in Sather, they must be strongly typed. The type of a bound routine most closely represents the type ofa parametrized class, with the possible addition of a return type.

Bound Routine Example

In the following example, we define a bound routine that takes an INT as an argument and returns an INT.

class BAR is

  create: SAME is return(new) end;

  foo(a: INT): INT is
     return(a+5) 
  end;

end;

   main is
      a: BAR := #BAR;
      br: ROUT{INT}:INT := #ROUT(a.foo(_));
      result_of_calling_br: INT := br.call(9);
      #OUT+result_of_calling_br;
   end;

The variable br is typed as a bound routine which takes an integer as argument and returns an integer. The routine foo, which is of the appropriate type, is then assigned to br. The routine associated with br may then be invoked by the built in function call. Just as we would when calling the routine foo, we must supply the integer argument to the bound routine.

        result_of_calling_br: INT := br.call(9)

is equivalent to:

        result_of_calling_br:INT := foo(9);

which returns the value 14.

Binding some arguments

When a bound routine is created, it can preset some of the values of the arguments. For example:

class BAR is

  create: SAME is return(new) end;
    -- A trivial create routine - just returns a new instance.

  foo(a: INT,b: INT): INT is
     return(a+b) 
  end;

end;

class TEST_BR is

   main is
        a: BAR := #BAR;  -- Call the create routine of BAR. 
        -- Using type inference, we could also write a ::= #BAR
        -- No preset arguments
        br1: ROUT{INT,INT}: INT := #ROUT(a.foo(_,_));
        br1_res: INT := br1.call(11,15);

        -- Preset the first argument of foo to 53
        br2: ROUT{INT}:INT := #ROUT(a.foo(53,_));
        br2_res: INT := br2.call(9);

        #OUT+br1_res+","+br2_res;
   end;
end;

In the example above, br2 binds the first argument of foo to 53 and the second argument is left unbound. This second argument will have to be supplied by the caller of the bound routine. br1 binds neither argument and hence when it is called, it must supply both arguments. The result of calling br1 should be 26 and the result of calling br2 should be 62.

Leaving self unbound

Bound routines are often used to apply a function to arbitrary objects of a particular class. For this usage, we need the self argument to be unbound. In the following example we will make use of the plus routine from the INT class.

... from the INT class ...
   plus(arg: INT): INT is 
      ... definition of plus 
   end;
...

   main is
       -- Leaving self and the argument unbound
        plusbr1: ROUT{INT,INT}:INT := #ROUT(_:INT.plus(_));
        br1res: INT := plusbr1.call(9,10); -- 19

        -- Binding self, but leaving the argument unbound
        plusbr2: ROUT{INT}:INT := #ROUT(3.plus(_));
        br2res: INT := plusbr2.call(15);   -- 18

        -- Binding the argument but leaving self unbound
        plusbr3: ROUT{INT}:INT := #ROUT(_.plus(9));
        br3res: INT := plusbr3.call(11);   -- 20
        #OUT+br1res+","+br2res+","+br3res; -- 19,18,20
   end;

In the above example, plusbr1 leaves both self and the argument to plus unbound. Note that we must specify the type of self when creating the bound routine, otherwise the compiler cannot know which class the routine belongs to (the type could also be an abstract type that defines that feature in its interface). plusbr2 binds self to 3, so that the only argument that need be supplied at call time is the argument to the plus. plusbr3 binds the argument of plus to 15, so that the only argument that need be supplied at call time is self for the routine.

Bound Routine Usage

Just as is the case in C, there will be programmers who find bound routines indispensible and others who will hardly ever touch them. Since Sather's bound routines are strongly typed, much of the insecurity associated with function pointers (that I felt when using C, at least!) disappears.

Applicative Bound Routines

They are generally useful when you want to write "apply" like routines in a container class, which will work on a collection of data items. A good set of useful bound routines may be found in the ARRAY{T} class, some examples of which are shown below. As usual, the elt! iter returns consecutive elements of the container.

from ARRAY{T} ...
     some(test:ROUT{T}:BOOL):BOOL is
        -- True if some element of self satisfies `test'. 
        -- Self may be void.
        loop if test.call(elt!) then return true end end;
        return false 
      end;

    every(test:ROUT{T}:BOOL):BOOL is
        -- True if every element of self satisfies `test'.
        -- Self may be void.
        loop if ~test.call(elt!) then return false end end; 
        return true 
    end;

These routines may be used thus:

     class MAIN is

        main is
          a ::= #ARRAY(|0.0,1.0,3.0,5.0|);
          br ::= #ROUT(gt_four);
          #OUT+a.every(br);  -- Returns false, all elements are not
                -- greater than four
          #OUT+a.some(br);  -- Returns true, one element is > 4.0
        end;

        gt_four(arg: FLT): BOOL  is return(arg > 4.0) end;
          -- Return true if the argument is greater than four

Menu Structures

Another, common use of function pointers is in the construction of a set of choices. They may be used to, for instance, create a MENU class which associates various menu entries with bound routines. (This corresponds to the COMMAND pattern from the gang-of-four text called Design Patterns).

class MENU is
   
   private attr menu_choices: FMAP{STR,ROUT};
      -- Hash table mapping from strings to bound routines
   create: SAME is return(new) end;
   
   add_menu_item(name: STR, function: ROUT) is 
      -- Add a menu item to the hash table, indexed by it's name
      menu_choices := menu_choices.insert(name,function);
   end;

   private user_selects(m: STR) is
      -- Perform an action when the user selects a particular menu item
      -- Look up the bound routine in the hash table, and call it.
      routine: ROUT := menu_choices.get(m);
      routine.call;
   end;
   
   run is
      -- In a loop, get user input, if it is not the word "done"
      -- then call the user_selects routine.
      loop
         #OUT+">";
         val ::= IN::get_line.str;
         if (val = "done") then
            break!
         else user_selects(val); end;
      end;
   end;

end;

class MAIN is
   
   main is
      m: MENU := #MENU;
      m.add_menu_item("hello",#ROUT(print_hello));
      m.add_menu_item("why",#ROUT(print_why));
      m.run;
   end;
   
   print_hello is  #OUT+"Hello there yourself!\n" end;
   
   print_why is #OUT+"My existance is a cosmic mystery\n" end;
   
end;

This generates the following session:

ttyp0 icsib78:~/tmp>a.out
>hello
Hello there yourself!
>why
My existance is a cosmic mystery
>hello
Hello there yourself!
>done
ttyp0 icsib78:~/tmp>

A very similar usage may be found in the GET_OPT class which may be found `Contrib/gomes/get_opt.sa' of the sather distribution. GET_OPT is capable of parsing a set of command line options in the standard unix form -<keyword1> text -<keyword2> text2. The class associates a bound routine with each keyword, and calls that bound routine with "text" as its argument. If the bound routine expects an argument of a different type (one of INT, FLT or BOOL), it tries to convert the string to a value of the appropriate type.

Higher Order Functions

Bound routines may also be composed to form higher order functions. Our example takes two bound routine predicates (bound routines that return a boolean value) as arguments and returns another bound routine which is the result of ORing the results of the two predicates.

class MY_TEST is
    -- Include the class definition just for clarity

   make_or(bp1: ROUT{INT}:BOOL, bp2: ROUT{INT}:BOOL): ROUT{INT}:BOOL is
      -- bp1 and bp2 are bound routine predicates that take ints as 
      -- arguments and return bools
      res: ROUT{INT}: BOOL;
      res := #ROUT(stub_for_or(_,bp1,bp2));
   end;

   stub_for_or(arg: INT, bp1,bp2: ROUT{INT}:BOOL): BOOL is
      bp1_res: BOOL := bp1.call(arg);
      bp2_res: BOOL := bp2.call(arg);
      return(bp1_res.or(bp2_res));
   end;
        
-- A higher order function may then be developed as follows.
-- Suppose we have the two predicates,
   gt_four(a: INT): BOOL is return(a > 4) end;

   lt_two(a: INT): BOOL is return(a < 2) end;

-- which have been assigned to bound routines
   main is
      bp1: ROUT{INT}:BOOL := #ROUT(gt_four(_));
      bp2: ROUT{INT}:BOOL := #ROUT(lt_two(_));
      or_bp: ROUT{INT}:BOOL := make_or(bp1,bp2));
      #OUT+bp1.call(5);  -- Output is "true"
      #OUT+bp2.call(3);  -- "false"
      #OUT+or_bp(1);     -- "true"
      #OUT+or_bp(5);     --  "true"
      #OUT+or_bp(3);     --  "false"
      -- The predicates may also be directly defined using the INT class
      -- bp1: ROUT{INT}: BOOL :=  #ROUT(_:INT.gt(4));
      -- bp2: ROUT{INT}: BOOL :=  #ROUT(_:INT.lt(2));
   end;
   
end;
      

Using bound routines, we have created a test for whether a number is either greater than 4 or less than two.

Exceptions

Sather 1.0 has fully implemented exception handling. An exception may be raised by calling the raised statement

    foo is
        raise("foo not implemented. This is a test");

Within the raise statement may be any arbitrary expression which returns some object. The type of the object determines the class of the exception. Exceptions may be caught by simply protecting for the appropriate type. In the protect clause, the exception may be referred to by the built in variable "exception" In fact, you can look at the tail half of the protect as a typecase on the exception object.

        protect
            foo;
        when STR then #OUT+exception+"\n" end;

Exceptions are passed to higher contexts until a handler is found and the exception is caught. The compiler provides a default handler for string exceptions, which is why most of the exceptions raised are string exceptions.

As a more complex example, here's an exception from the string cursor class

        class STR_CURSOR_EX < $STR is
          readonly attr s: STR_CURSOR;
          readonly attr error_type: STR;

          create(s: STR_CURSOR,m: STR): SAME is
             res ::= new;
             res.s := s;
             res.error_type := m;
             return(res);
         end;

         str: STR is
           res: STR := "Error type:\n"+error_type+"\n";
           res := res + "String cursor location:"+s.current_loc+"\n";
           return(res);
         end;
       end;

In the STR_CURSOR class we may then raise the exception as follows
        advance_one_char is
          if (current_loc = buf.size -1) then
             raise(#STR_CURSOR_EX(self,"Attempt to read past the end of buf"));
          end;
        end;

We may then catch this exception by:
        protect
           my_string_cursor.advance_one_char;
        when STR_CURSOR_EX then  #OUT+exception.str+"\n"; end;

Note that the exception object in this case has a whole bunch of other information that we might make use of - the exact location in the string cursor, for instance. This information may be used to report the error in a manner that is suitable to the client.

Pros and Cons

Exceptions are very convenient, but create a significant overhead and should be used only for error states. Programming with them is tempting, but should be avoided. For instance, in the STR_CURSOR class, we can make use of exceptions for parsing. Testing whether a bool exists can be done as follows

        test_bool: BOOL is
           protect 
               current_state ::= save_state;
               b ::= get_bool;
               restore_state(current_state);
          when STR_CURSOR_EX then return(false); end;
          return(true);
        end;

However, this is a much higher overhead way of accomplishing the basic test.

Alternatives

The other alternative is to use a sticky error flag in the class, as is done by IEEE exceptions and the current FILE classes. This has problems such as the fact that the outermost error is logged, not the most immediate one, and it is very easy to forget to catch the exception. However, it is of low overhead and would be suitable for use in the previous test_bool case.

Problems and Bugs

This section is about some common problems and bugs. This is from Doc/Bugs of the distribution, with some added comments.

Send comments to "sather-bugs@icsi.berkeley.edu".

Major functionality omissions:

Bugs of libraries and library support:

Bugs affecting checking:

Bugs affecting invocation/environment:

Known pSather bugs:

  • Garbage collection might need to be turned off (until we get our custom collector)

    Enums

            This was from H. Klawitter, in response to a question on how one would
    code up the notion of months in Sather.  
    
    Mainly there are two possibilites. Using a nifty feature of the Sather
    Spec. or really implementing enumerations on ones own, which is not
    difficult, but a little bit work to write.
    
    
    
    

    Easy Approach

    class MONTHS
    is
        const jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec;
    end; -- MONTH
    
    class USING_MONTHS
    is
        include MONTHS;
    
        ...
          if month = dec then self.prepare_for_christmas end;
        ...
    end; -- USING_MONTHS
    

    See also the Sather 1.0 Spec, Page 11f. However, this kind of enums have disadvantages, as they are not distinguished from INTs. Moreover, you can't force special values to special names. In this case, jan would be equal to zero, and you usually do not want this.

    Better Approach

    If you want to have full type control, you have to write a value class like this:

    value class ENUM
        -- This class is meant to be included into classes, which should behave 
        -- like enumerations, and provides the basic comparison routines.
    is
        readonly attr int:INT;
        -- This attribute is also used for transforming an enumeration object into
        -- an INT-value.
    
        is_eq(i:SAME) is int=i.int end;
        is_neq(i:SAME) is int/=i.int end;
    
    end; -- ENUM
    
    value class MONTH
    is
        include ENUM;
    
        create(i:INT):MONTH
            -- Makes a MONTH out of an INT.
            pre i>=1 and i<=12
        is
            return int(i)
        end;
    
        -- Now the enumeration "itself".
        jan:MONTH is return int(1) end;
        feb:MONTH is return int(2) end;
        ...
        dec:MONTH is return int(12) end;
    
    end; -- MONTH
    

    You can use this class in the following way:

        month: MONTH;
        month := MONTH::jan;           -- How to use the "jan" name.
        birth := month.jan;            -- This does the same job.
        month := #MONTH(3);            -- conversion INT->MONTH
        if month = MONTH::feb then ... end;
        if month.int = 3 then ... end; -- conversion MONTH->INT
    

    The disadvantages are obvious: Much code to hack. Also MONTH should subtyped under $IS_EQ{MONTH} and probably under many more types.

    The Sather Library

    This section was has no meat as yet.... The Sather library is available with the distribution and consists of a reasonable set of Container classes as well as a user interface library that is based on Tcl/Tk. Please look at the documentation available under the Gui subidirectory of the distribution. Before too long we hop to have better documentation of the library as well.

    Concept Index

    a

  • Abstract Types
  • Abstract Types - Covariance
  • Abstract Types - Parametrized Abstract Classes
  • Abstract Types - Supertyping
  • Abstract Types - Type Conformance
  • Acknowledgements
  • Amortized Doubling and it's complications
  • b

  • Bound Routine - More Complex Usage
  • Bound Routine Example
  • Bound Routines
  • Bound Routines - Applicative Uses
  • Bound Routines - binding some arguments
  • Bound Routines - Forming Higher Order Functions
  • Bound Routines - Leaving self unbound
  • Bound Routines - Menu Structures
  • Browser
  • c

  • Compiler - Distribution and Documentation
  • Compiler - Has clauses
  • Compiler - Module Comments
  • Compiler - Module files with different suffixes
  • Compiler - Organizing Using Module Files
  • Compiler - Problems and Bugs
  • Compiler- Over-riding library files
  • Concrete Classes
  • Conformance
  • Contravariance
  • d

  • Debugging Sather
  • Development Environment
  • e

  • Emacs Mode
  • Enum
  • Exceptions
  • f

  • Features
  • FSTR and it's problems
  • g

  • gdb - Use with sather
  • h

  • Higher Order Functions
  • How are different parametrizations related?
  • l

  • Library
  • Library files - over-riding individual files
  • m

  • Module Comments
  • Module Files
  • Multi-Line Module Comments
  • n

  • Nil
  • nil - why it exists
  • p

  • Parameterization - Typebounds
  • Parametrization - Abstract Classes
  • Parametrizations - subtyping relationships
  • Protect
  • r

  • Raise
  • s

  • Subtyping - Covariance
  • Supertyping
  • t

  • Tools - Browser
  • Tools - Emacs Mode
  • Tools - gdb
  • Typebounds
  • v

  • Value Class - Pros and Cons
  • Value Class Definition
  • Value Class Example
  • Value Class History
  • Value Classes
  • Value Classes - when to use
  • Value Semantics - Immutable CPX
  • Value semantics - STR and FSTR
  • Value Semantics - via Immutable Reference Classes
  • void