next up previous contents index
Next: 25. TkSGML System integration Up: III. System Integration Previous: III. System Integration   Contents   Index

24. Catalogs and System identifiers

Catalog files are used by the sgml widget's parser to create system identifiers for external entities. This chapter describes the format of catalogs files and the systax of the generated system identifiers.


24.1 Catalog files

The entity manager generates a system identifier for every external entity using catalog entry files in the format defined by ``SGML Open Technical Resolution TR9401:1995''. The entity manager will give an error if it is unable to generate a system identifier for an external entity. Normally if the external identifier for an entity includes a system identifier then the entity manager will use that as the effective system identifier for the entity; this behaviour can be changed using OVERRIDE or SYSTEM entries in a catalog entry file.

A catalog entry file contains a sequence of entries in one of the following forms:

The delimiters can be omitted from the sysid provided it does not contain any white space. Comments are allowed between parameters delimited by - as in SGML.

The environment variable SGML_CATALOG_FILES contains a list of catalog entry files. The list is separated by colons under Unix and by semi-colons under MS-DOS and Windows.. These will be searched after any catalog entry files specified using the -m option, and after the catalog entry file called catalog in the same place as the document entity. If this environment variable is not set, then a system dependent list of catalog entry files will be used. In fact catalog entry files are not restricted to being files: the name of a catalog entry file is interpreted as a system identifier.

A match in one catalog entry file will take precedence over any match in a later catalog entry file. A more specific matching entry in one catalog entry file will take priority over a less specific matching entry in the same catalog entry file. For this purpose, the order of specificity is (most specific first):

SYSTEM entries;
PUBLIC entries;
DELEGATE entries ordered by the length of the prefix, longest first;
ENTITY, DOCTYPE, LINKTYPE and
NOTATION entries.


24.2 System identifiers

There are two kinds of system identifier: formal system identifiers and simple system identifiers. A system identifier that does not start with < will always be interpreted as a simple system identifier. A simple system identifier will always be interpreted either as a filename or as a URL.


24.2.1 Formal system identifiers

Formal system identifiers are based on the System Identifier facility defined in ``ISO/IEC 10744 (HyTime) Technical Corrigendum 1'', Annex D. A system identifier that is a formal system identifier consists of a sequence of one or more storage object specifications. The objects specified by the storage object specifications are concatenated to form the entity. A storage object specification consists of an SGML start-tag in the reference concrete syntax followed by character data content. The generic identifier of the start-tag is the name of a storage manager. The content is a storage object identifier which identifies the storage object in a manner dependent on the storage manager. The start-tag can also specify attributes giving additional information about the storage object. Numeric character references are recognized in storage object identifiers and attribute value literals in the start-tag. Record ends are ignored in the storage object identifier as with SGML. A system identifier will be interpreted as a formal system identifier if it starts with a < followed by a storage manager name, followed by either > or white-space; otherwise it will be interpreted as a simple system identifier. A storage object identifier extends until the end of the system identifier or until the first occurrence of < followed by a storage manager name, followed by either > or white-space.

The following storage managers are available:

In addition, user-defined storage managers can be used to extend the range of possible storage objects. See section [*] for additional details.

Attributes have to be provided within the start tag that specifies the storage manager. The following attributes are supported:

24.2.2 Simple system identfiers

A simple system identifier is interpreted as a storage object identifier with a storage manager that depends on where the system identifier was specified: if it was specified in a storage object whose storage manager was url or if the system identifier looks like an absolute URL in a supported scheme, the storage manager will be url; otherwise the storage manager will be osfile. The storage manager attributes are defaulted as for a formal system identifier. Numeric character references are not recognized in simple system identifiers.

24.2.3 Encodings

Encodings can be specified e.g. as the value of the encoding attribute in system identifiers. For interoperability with SP/based systems, the environment variable SP_ENCODING can be set to specify an encoding.

Encoding names are case insensitive. The following named encodings are available:

The following additional encodings are supported under Windows 95,98,ME, Windows 2000 and Windows NT:

24.3 User defined storage managers

The range of predefined storage managers can be extended by user-written storage managers to implement new classes of storage objects.

For example, an ftp storage manager could be defined to retrieve documents from an ftp server. After implementing an ftp storage manager with a name of FTP, entities could be loaded from an ftp server by using a formal system identifier like <FTP>ftp.epc.de/sample1.xml.

To be able to use formal system identifiers that refer to user defined storage managers, the following steps have to be performed:

  1. A storage manager that conforms to the requirements for user defined storage manager must be implemented.
  2. The storage manager must be registered with the sgml widget so that formal system identifiers with the corresponding storage objects be passed to the storage manager.

User defined storage managers are implemented as Tcl scripts that are called by the parser when an entity with the appropriate formal system identifier is loaded. Basically, a storage manager is a Tcl proedure that is called with a varying number of arguments.

{}
  proc xyz { cmd arg1 args } {
   # storage manager implementation
  }
The first argument cmd that is passed to a storage manager procedure is a command keyword that specifies the required operation. The number and the semantics of the remaining arguments depend on the command keyword cmd. The following commands must be supported by every storage manager:

load Open an entity that is specified in arg1 for reading and return a unique token to identify the open entity.
save Open an entity that us specified in arg1 for writing and return a unique token to identify the open entity.
close Close the open entity that is identified by the token in arg1.
read read data from the open entity identified by the token in arg1 and return the data. Return an empty string if the end of entity (Ee) has been encountered.
write write data contained in args to the open entity identified by the token in arg1.
guess return a boolean value indicating whether the simple storage identifier in args1 may be handled by this storage manager.
resolve resolve the system identifier in args relative to the base system identifier in arg1 and return the result or an empty string if the system identifier can not be resolved.
writable return a boolean value indicating whether the storage identifier arg1 could be written or not.

Typically, a storage manager will implement switch construct to handle the different command keywords.

Once a storage manager has been implemented, it must be registered with the sgml widget to be useable.

It is possible to replace the builtin storage managers (with the exception osfile) by user defined storage managers.


next up previous contents index
Next: 25. TkSGML System integration Up: III. System Integration Previous: III. System Integration   Contents   Index
TkSGML Reference Manual