Previous Up Next

8  ML Basis system

This section describes the ML Basis system (MLBs) used in MLton. While the Modules level of Standard ML provides a sophisticated language for programming-in-the-large, it is difficult, if not impossible, to accomplish a number of routine namespace management operations that are necessary for programming-in-the-very-large, when (parts of) a program draws upon multiple libraries provided by different vendors. The ML Basis system is a simple, yet powerful, approach that builds upon the programmer's intuitive notion (and the Definition of SML's formal notion) of the top-level environment (a basis). The system has been designed to be a natural extension of Standard ML; the Formal Specification of MLBs is given in the style of the Definition.

We briefly highlight some of the key features provided by MLBs:

8.1  Syntax and semantics

An .mlb (ML Basis) file describes a library or program. An .mlb file contains a ``basis declaration,'' defined by the following grammar:

basdec ::= basis basid = basexp (and basid = basexp)*
  | open basid1 ··· basidn
  | local basdec in basdec end
  | basdec [;] basdec
  | structure strid [= strid] (and strid [= strid])*
  | signature sigid [= sigid] (and sigid [= sigid])*
  | functor funid [= funid] (and funid [= funid])*
  | path.sml
  | path.mlb
  | ann "ann" in basdec end
 
basexp ::= bas basdec end
  | basid
  | let basdec in basexp end

Nested SML-style comments (enclosed with (* and *)) are ignored (but #line directives of Section 10.1 are recognized).

Conceptually, a basis file is elaborated starting in an empty basis, and each basis declaration produces a basis as a result. Basis expressions and basis identifiers allow binding a basis to a name; this, in turn, allows fine-grained specification of dependenies, without the need for additional .mlb files. Local declarations provide name hiding. Sequencing of basis declarations merges the bases. Structure, signature, and functor declarations bind a module in the current basis.

References to SML source files cause the file to be elaborated in the ``current'' basis. References to other ML basis files cause the basis denoted by that ML basis file to be imported. Recall that an ML basis file is elaborated in an empty basis; hence, no bindings from the ``current'' basis are available to the imported basis file. Since .mlb files are elaborated in the empty basis, they need only be elaborated (and evaluated) once. The semantics of MLBs are such that the results of elaborating (and evaluating) a .mlb file are cached. Thus, any observable effects due to evaluation are not duplicated if the .mlb file is referred to multiple times.

Paths can be relative or absolute. Relative paths are relative to the directory containing the .mlb file. Paths may include path variables and are expanded according to a path map; see Section 8.3 for more details. Unquoted paths may include alpha-numeric characters and the symbols - and _, along with the arc separator / and extension separator .. More complicated paths, including paths with spaces, may be included by quoting the path with ". A quoted path is lexed as a SML string constant.

Finally, annotations allow a library author to control options that affect the elaboration of SML source files; see Section 8.5 for more details.

8.2  Examples

We demonstrate how to accomplish some common tasks:
Complete program: Suppose your complete program consists of the files file1.sml, ..., filen.sml, which depend upon libraries lib1.mlb, ..., libm.mlb.
(* import libraries *)
lib1.mlb
...
libm.mlb

(* program files *)
file1.sml
...
filen.sml
The bases denoted by lib1.mlb, ..., libm.mlb are merged (bindings of names in later bases take precedence over bindings of the same name in earlier bases), producing a basis in which file1.sml, ..., filen.sml are elaborated, possibly adding additional bindings to the basis.
Export filter: Suppose you only want to export certain structures, signatures, and functors from a collection of files.
local
  file1.sml
  ...
  filen.sml
in
  (* export filter here *)
  functor F
  structure S
end
While file1.sml, ..., filen.sml may declare top-level identifiers in addition to F and S, such names are not accessible to programs and libraries that import this .mlb.
Export filter with renaming: Suppose you want an export filter, but want to rename one of the modules.
local
  file1.sml
  ...
  filen.sml
in
  (* export filter, with renaming, here *)
  functor F
  structure S' = S
end
Note that functor F is an abbreviation for functor F = F, which simply exports an identifier under the same name.
Import filter: Suppose you only want to import a functor F from one library and a structure S from another library.
local
  lib1.mlb
in
  (* import filter here *)
  functor F
end
local
  lib2.mlb
in
  (* import filter here *)
  structure S
end
file1.sml
...
filen.sml
Import filter with renaming: Suppose you want to import a structure S from one library and another structure S from another library.
local
  lib1.mlb
in
  (* import filter, with renaming, here *)
  structure S1 = S
end
local
  lib2.mlb
in
  (* import filter, with renaming, here *)
  structure S2 = S
end
file1.sml
...
filen.sml
Since the Modules level of SML is the natural means for organizing program and library components, MLBs provide convenient syntax for renaming Modules level identifiers.1 However, please note that .mlb files elaborate to full bases including top-level types and values (including infix status), in addition to structures, signatures, and functors. For example, suppose you wished to extend the Standard ML Basis Library with an ('a, 'b) either datatype corresponding to a disjoint sum; the type and some operations should be available at the top-level; additionally, a signature and structure provide the complete interface.

We assume that the main implementation is given by two files: either-sigs.sml and either-strs.sml:
either-sigs.sml:
signature EITHER_GLOBAL =
  sig
    datatype ('a, 'b) either = Left of 'a | Right of 'b
    val &  : ('a -> 'c) * ('b -> 'c) -> ('a, 'b) either -> 'c
    val && : ('a -> 'c) * ('b -> 'd) -> ('a, 'b) either -> ('c, 'd) either
  end

signature EITHER =
  sig
    include EITHER_GLOBAL
    val isLeft  : ('a, 'b) either -> bool
    val isRight : ('a, 'b) either -> bool
    ...
  end
either-strs.sml:
structure Either : EITHER =
  struct
    datatype ('a, 'b) either = Left of 'a | Right of 'b
    fun f & g = fn x =>
      case x of Left z => f z | Right z => g z
    fun f && g = fn x =>
      ((Left o f) & (Right o g)) x
    fun isLeft x = ((fn _ => true) & (fn _ => false)) x
    fun isRight x = (not o isLeft) x
    ...
  end
structure EitherGlobal : EITHER_GLOBAL = Either
Two additional files contain the infix directives (either-infixes.sml) and a declaration to import the top-level types and values (either-open.sml):
either-infixes.sml:
infixr 3 & &&
either-open.sml:
open EitherGlobal
The extension is delivered via either.mlb:
either.mlb:
local
  (* import Basis Library *)
  basis.mlb

  either-sigs.sml
  either-infixes.sml
  either-strs.sml
in
  signature EITHER
  structure Either
  either-infixes.sml
  either-open.sml
end
A client that imports either.mlb will have access to neither EITHER_GLOBAL nor EitherGlobal, but will have access to the type either and the values & and && (with infix status) in the top-level environment. Note that the infix directive in either-infixes.sml is repeated, because local limits limits the scope of the directive. Although the repetition is unfortunate, it is preferable to repeat either-infixes.sml in either.mlb, rather than require every client of either.mlb to also import either-infixes.sml.

8.3  Path maps

As noted above, paths can be either relative or absolute. However, using a fixed relative or absolute path to a library makes it difficult to move either the client or the library. Hence, MLton allows path variables to appear in paths in the form $(VAR). The mapping from path variables to paths is initialized by reading two configuration files: a system-wide one and a user-specific one. The system-wide configuration file is read from /usr/lib/mlton/mlb-path-map. The user-specific configuration file is read from .mlton/mlb-path-map in the user's home directory (which must be given by the HOME environment variable).

The format of an mlb-path-map file is a sequence of lines; each line consists of two, white-space delimited tokens. The first token is a path variable VAR and the second token is the path to which the variable is mapped. The path may include path variables, which are recursively expanded. Configuration files are processed from top to bottom, system-wide before user-specific; later mappings take precedence over earlier mappings. The system-wide configuration file makes the following path variables available:
    
LIB_MLTON_DIR /usr/lib/mlton
MLTON_ROOT $(LIB_MLTON_DIR)/sml

8.4  Available libraries


$(MLTON_ROOT)/basis/basis.mlb
     The Standard ML Basis Library (see Section 9).
$(MLTON_ROOT)/basis/basis-1997.mlb
     The (deprecated) 1997 specification of the Standard ML Basis Library.
$(MLTON_ROOT)/basis/mlton.mlb
     The MLton structure and signatures (see Section 10.2).
$(MLTON_ROOT)/basis/sml-nj.mlb
     The SMLofNJ structure and signature (see Section 10.3).
$(MLTON_ROOT)/basis/unsafe.mlb
     The Unsafe structure and signature (see Section 10.4).
$(MLTON_ROOT)/mlyacc-lib/mlyacc-lib.mlb
     Modules used by parsers built with mlyacc.
$(MLTON_ROOT)/cml/cml.mlb
     Concurrent ML, a library for message-passing concurrency (see $(MLTON_ROOT)/cml/README).

There are a number of specialized ML Basis files for importing fragments of the basis library that can not be expressed within SML.
$(MLTON_ROOT)/basis/pervasive-types.mlb
     The top-level types and constructors of the Basis Library (see Section 9.1).
$(MLTON_ROOT)/basis/pervasive-exns.mlb
     The top-level exception constructors of the Basis Library (see Section 9.2).
$(MLTON_ROOT)/basis/pervasive-vals.mlb
     The top-level values of the Basis Library, without infix status (see Section 9.3).
$(MLTON_ROOT)/basis/overloads.mlb
     The top-level overloaded values of the Basis Library, without infix status (see Section 9.4).
$(MLTON_ROOT)/basis/equal.mlb
     The polymorphic equality = and inequality <> values, without infix status.
$(MLTON_ROOT)/basis/infixes.mlb
     The infix declarations of the Basis Library.
$(MLTON_ROOT)/basis/pervasive.mlb
     The entire top-level environment of the Basis Library, with infix status.

8.5  Annotations

Annotations are a mechanism that allows a library author to control options that affect the elaboration of SML source files. Conceptually, a basis file is elaborated in a default annotation environment (just as it is elaborated in an empty basis). The ann "ann" in basdec end declaration merges the annotation ann with the ``current'' annotation environment for the elaboration of basdec. To allow for future expansion, "ann" is lexed as a single SML string constant. To conveniently specify multiple annotations, the following derived form is provided:

basdec
ann "ann" ("ann")+ in basdec end ==> ann "ann" in ann ("ann")+ in basdec end end

In the explanation below, for annotations that take a boolean argument {true|false}, the first value listed is the default annotation.
allowExport {false|true}
 
If true, allow the _export expression form of Section 6.2 to appear in imported source files.

allowImport {false|true}
 
If true, allow the _import expression form of Section 6.1 to appear in imported source files.

forceUsed
 
Force all identifiers in the basis denoted by the body of the ann to be considered used; use in conjunction with warnUnused true.

sequenceUnit {false|true}
 
If true, then in the sequence expression (e1; e2), it is a type error if e1 is not of type unit. This can be helpful in detecting curried applications that are mistakenly not fully applied. To silence spurious errors, you can use ignore e1.

warnMatch {true|false}
 
Report nonexhaustive and redundant matches.

warnUnused {false|true}
 
Report unused identifiers.

8.5.1  Examples

We demonstrate how to use annotations in two common scenarios.

Programs that automatically generate source code can often produce nonexhaustive matches, relying upon invariants of the generated code to ensure that the matches never fail. A programmer may wish to elide the nonexhaustive match warnings from this code, in order that legitimate warnings are not missed in a flurry of false positives. To do so, the programmer simply annotates the generated code with the warnMatch false annotation:
local
  $(GEN_ROOT)/gen-lib.mlb

  ann "warnMatch false" in
    foo.gen.sml
  end
in
  signature FOO
  structure Foo
end
Standard ML libraries can delivered via .mlb files. Authors of such libraries should strive to be mindful of the ways in which programmers may choose to compile their programs. For example, although the defaults for sequenceUnit and warnUnused are false, periodically compiling with these annotations defaulted to true can help uncover likely bugs. However, a programmer is unlikely to be interested in unused modules from an imported library, and the behavior of sequenceUnit true may be incompatible with some libraries. Hence, a library author may choose to deliver a library as follows:
ann 
  "sequenceUnit false"
  "warnMatch true"
  "warnUnused true" "forceUsed"
in
  local
    file1.sml
    ...
    filen.sml
  in
    functor F1
    ...
    signature S1
    ...
    structure SN
    ...
  end
end
The annotations sequenceUnit true and warnMatch true have the obvious effect on elaboration. The annotations warnUnused true and forceUsed work in conjunction --- warning on any identifiers that do not contribute to the exported modules, and preventing warnings on exported modules that are not used in the remainder of the program. Many of the available libraries listed in Section 8.4 are delivered with these annotations.
Previous Up Next