This file describes changes for version 6. For changes prior to version 6.0, see changes5.html. For changes prior to version 5.0, see history.html
The following errors were reported for version 6.0.1, and have been cleared except where otherwise noted:
6.0.1/001 | When a template is called recursively to obtain a default value for one of its own parameters (i.e. within <xsl:param>), the wrong result may be returned. This is because tail recursion is invoked when it should not be. (Bug also present in 5.5 and earlier releases). | 6.0.1/002 | An array bound exception will occur when processing a document with a stylesheet that uses more than 100 namespace URIs or namespace prefixes. Present since 6.0 | 6.0.1/003 | When a key is defined with match="@*", nothing will be retrieved. The problem also applies to some other patterns that can match attributes, for example match=" name | @name ". (Possibly present in 5.5 and earlier releases - unconfirmed) | 6.0.1/004 | The extension functions saxon:set-user-data() and get-user-data() do not work correctly with the TinyTree model. They may also fail with the standard tree model if the context node is an attribute or namespace. This is because the code relies on a one-to-one mapping of XPath nodes to Java objects. (Present since 6.0) | 6.0.1/005 | Not a bug. | 6.0.1/006 | When attribute value templates are used in the attributes of xsl:sort, for example ascending="{$asc}", then the values used are those that apply the first time the sort occurs; if subsequent sorts have different values for the parameters, these are ignored. This is true even if the subsequent sort takes place in a later transformation using the same PreparedStyleSheet. (Also applies to 5.5 and earlier releases). | 6.0.1/007 | saxon:output and other Saxon extension elements do not allow the xsl:extension-element-prefixes attribute to appear on the extension element itself. (Present since 6.0) | 6.0.1/008 | An attempt to access the last processing instruction in the source document using xsl:value-of, xsl:copy, etc, will fail if the data part of the processing instruction is zero length. The failure occurs with the Microsoft JVM but not with JDK 1.3. (Present since 6.0) | 6.0.1/009 | Running a transformation using the Transformer.getInputContentHandler() method fails saying that the same NamePool must be used for the StyleSheet and the source document. (Present since 6.0) | 6.0.1/010 | The code that searches for an xml-stylesheet processing instruction displays unintended trace information on System.err. | 6.0.1/011 | When xsl:apply-imports is called and there is no explicit imported template rule to invoke, Saxon does a no-op; the correct action is to invoke the built-in template rule for the current node. (Bug present in all previous releases). | 6.0.1/012 | If the value attribute to xsl:number is not an integer, Saxon truncates it towards zero rather than rounding it as specified. (Bug present in all previous releases). | 6.0.1/013 | With the TinyTree model, selecting a namespace node using //e/namespace::n doesn't work. Selecting all namespace nodes using namespace::* is OK. (Present since 6.0) | 6.0.1/014 | An array bound check failure may occur in routine com.icl.saxon.tinytree.TinyElementImpl.makeAttributeNodeFS() when searching for the last attribute node in the document. (Present since 6.0) |
Integration with FOP has been restored. Saxon now works with FOP version 0_15_0.
NamePools: I have changed the approach, so that instead of making a copy of the stylesheet name pool for each transformation, the name pool is now shared (which means its updating methods are now synchronized, to ensure thread-safety). This shouldn't affect most users, unless you are manipulating NamePools explicitly. It is still possible to have multiple name pools, but you now need to organise any copying yourself if this is what you want to do. For 99% of users, it should be possible to ignore NamePools entirely and just leave the system to use the single default name pool all the time.
The following changes are for conformance with the (imminent) XSLT 1.0 errata:
The following errors were reported for version 6.0, and have been cleared except where otherwise noted:
6.0/001 | When xsl:copy-of is used to copy attributes with no namespace prefix, and the owning element has a default namespace declaration (xmlns="xyz"), then an invalid prefix is generated for the attributes. | 6.0/002 | The PreparedStyleSheet object is not serially reusable. A new NamePool needs to be allocated each time it is used. | 6.0/003 | A performance bug: in the match pattern row[id=1234] the predicate is not recognized as a boolean predicate, therefore the pattern matching code determines the position of the row relative to its siblings on the assumption that it needs this information. If there are a large number of <row> siblings this gives a severe performance hit. | 6.0/004 | The function-available() function returns false for a method that exists but that requires one or more arguments. | 6.0/005 | The element-available() function crashes (with a diagnostic print of the name pool contents) if the supplied name is one that is not used in the stylesheet and is not a known XSL or Saxon instruction. | 6.0/006 | With the TinyTree tree model, finding the descendants of a node that has neither descendants nor following-siblings produces incorrect results. | 6.0/007 | DTDGenerator won't compile: no name pool is supplied to RuleManager | 6.0/008 | In the SQL sample application, the last row is not written to database. (This reported bug has not yet been investigated) |
Warning messages (issued typically when a node matches more than one template rule) are now limited in number: only the first 25 are displayed.
In Saxon 5.5, I introduced a change that allows a result-tree-fragment to be implicitly converted to a node-set. I did this in anticipation of changes in XSLT 1.1, and to allow interoperability with MSXML3. However, Microsoft have now withdrawn this facility and conform fully to the XSLT 1.0 rules, so in order to protect Saxon's reputation for 100% conformance, I have decided to withdraw the facility too. It can still be used, however, if the stylesheet specifies version="1.1". For more details, see Conformance
The following errors are cleared in version 6.0:
5.5.1/001 | When xsl:copy-of is used to make a copy of an element node that has no attributes or namespace declarations of its own, the namespace nodes inherited from its ancestor elements are not copied to the result tree. (Present since 5.5) | 5.5.1/002 | In some Java environments (ServletExec) the current method for dynamic loading of classes fails. The fix to this detects this failure and reverts to the simple pre-JDK 1.2 method. | 5.5.1/003 | When <xsl:namespace-alias> is used, Saxon uses the new (result-prefix) prefix and the new URI in the output. A careful reading of the spec suggests that it should use the old (stylesheet-prefix) prefix with the new URI. (The term "result-prefix" is thus a misnomer). | 5.5.1/004 | An ArrayIndexOutOfBounds exception occurs if the match pattern "@comment()" (or "@text()" or "@processing-instruction()") is used in an xsl:template rule. Such a pattern is meaningless (it will never match any nodes) but entirely legal. | 5.5.1/005 | Saxon does not report an error if two sibling <xsl:with-param> elements specify the same parameter name. | 5.5.1/006 | Where conflicting <xsl:strip-space> and <xsl:preserve-space> elements occur in the stylesheet, Saxon gives greater weight to the priority of the pattern than to its import precedence. So <xsl:strip-space elements="ns:item"> in an imported stylesheet will incorrectly override <xsl:preserve-space elements="ns:*"> in the importing stylesheet. | 5.5.1/007 | A null pointer exception can occur in the AElfred parser when attempting to access an XML file using a URL, if the resource accessed by the URL is found but its encoding is unknown. | 5.5.1/008 | A null pointer exception can occur when evaluating a variable reference within the arguments to an extension function that is called within the predicate of a filter expression. | 5.5.1/009 | When running in fowards-compatible mode, Saxon incorrectly rejects XSL elements that contain an attribute other than those defined in XSLT 1.0. | 5.5.1/010 | When xsl:copy is applied to an attribute, text node, comment, or processing instruction, the content of the xsl:copy element should be ignored. It isn't. | 5.5.1/011 | When output to a DOM Node is requested in the TrAX API, this is ignored if an output method is specified in an xsl:output element of the stylesheet. The output is sent to the standard output stream instead. The xsl:output element should be ignored. | 5.5.1/012 | When a top-level element such as xsl:output is used within a template, it is reported as an error. This happens even when processing in forwards-compatible mode (e.g. when version="1.1"). In this case fallback processing (xsl:fallback) should be invoked. | 5.5.1/013 not yet fixed |
When the first argument to the document() function is a result tree fragment, Saxon takes the Base URI (for resolving the URI if it is relative) as if the argument were a string. The intention of the specification, though not clearly stated, is that the Base URI should be calculated as if the argument were a node-set. That is, if the argument is $tree and $tree is defined by <xsl:variable name="tree">doc.xml</xsl:variable>, then the Base URI should be that of the xsl:variable element, not that of the element containing the call on the document() function. |
Added support for two new output encodings on xsl:output: iso-8859-2 and cp1250.
Added two attributes to xsl:output (not yet available in saxon:output):
Added a new extension function saxon:showNodeSet(). It takes a single argument that is a node-set, produces a diagnostic print of the node-set on System.err, and returns an empty string.
Added an extension function saxon:getContext() to get the context object. Only really intended for diagnostic use.
Added an option to choose the tree implementation (see below): -ds for the standard tree, as used in previous releases, -dt for the "tinytree" which is new to this release. The tinytree is the default: it takes up less memory, is faster to build, and generally appears to perform better in most circumstances.
The -a option on the stylesheet, which causes the source document to be processed using the stylesheet identified from its xml-stylesheet processing instruction, now uses the same logic as the getAssociatiedStylesheets() method in the TrAX interface. This means multiple (cascading) stylesheets are now supported. However, embedded stylesheets (identified by href="#id" in the xml-stylesheet processing instruction) are not supported at this release.
There have been a great many internal changes, but relatively few that impact directly on the high-level transformation API. In particular, if you only use TrAX interfaces, there are no changes. Otherwise, the main points to note are:
This release adds support for pluggable character sets: if you specify xsl:output encoding="class-name", class-name should be a class that implements com.icl.saxon.output.PluggableCharacterSet. The class must provide two methods, one that determines whether a given character is present in the character set, and one that gives the name of the encoding to be used by the Java VM for translating Unicode characters into a file with this encoding.
To use free-standing XPath expressions and patterns from a Java application, you now need to supply a StaticContext object when parsing the expression. This object handles the resolution of variable names, namespace prefixes, and function names occurring within the expression. For convenience the StandaloneContext object is provided for this purpose. This class allows namespace prefixes to be declared so they can be used in an expression. It also allows external functions to be called (but not functions defined in your XSLT stylesheet). It does not allow the expression or pattern to contain references to variables.
These details should only affect you if you access intimate internal interfaces or use the Saxon source code.
There are two big changes to the internals of Saxon at this release: a new implementation of the tree structure, and a new system for handling names.
I have introduced an alternative tree implementation (called "tinytree"). This is designed to reduce the number of Java objects created: the tree is sliced vertically rather than horizontally, so instead of having one Java object per node, there is one Java array for each property of the nodes, with an entry in the array for each node. The effect is to greatly reduce the Java memory management overheads. The existing tree structure remains available, and is always used for the stylesheet tree. It is also currently always used for the intermediate result tree created when saxon:output next-in-chain is used.
To select the standard tree structure, use -ds on the command line. To select the "tinytree" structure, use -dt. The default is -dt. You can also select the tree structure using a method on the Controller class.
The tinytree is smaller than the standard tree, as the name suggests, and it is also faster to build. However, it may be slower to navigate. So if you have a small document that is built once in memory and used repeatedly, the standard tree implementation is probably better. In other cases, however, the tinytree usually wins.
I have made radical changes to the way names are managed. Previously, the NamePool object contained a pool of names, but its only real purpose was to avoid the memory overhead of storing each name many times. Now, Saxon takes advantage of the NamePool to avoid storing references to Name objects on the tree at all: instead it stores a "namecode": an integer which can be used to identify the name within the NamePool.
A namecode has 4 bits unused, 8 bits representing the prefix, and 20 bits acting as a pointer to an entry in the namepool containing the local name and namespace URI. Two names are therefore equal if the namecodes are the same in the bottom 20 bits. The value in these 20 bits is also referred to as the fingerprint of the name.
All searching for objects by name is now done by comparing fingerprints; no string comparisons are involved. Fingerprints are used not only for matching names used in XPath expressions to refer to the source document, they are also used for all matching of names within a stylesheet, for example variable names, template names, mode names, key names, and decimal format names.
The name pool is also used for storing namespace declarations: each prefix/URI pair is allocated a namespace code, and all manipulation of namespace nodes in the tree is done using these integer codes.
A consequence of this is that all documents used in a transform must use the same NamePool. This has some implications on the Java API. With simple use of the API, you needn't worry about name pools, they will be taken care of automatically. However, if you are operating a continuously running service in which both source documents and stylesheets are cached in memory, you may need to exercise some care to specify the right NamePool when each document is built.
The model is further complicated by multi-threading. Rather than have synchronization problems with multiple threads updating the same NamePool, the NamePool used to build the stylesheet is copied (imported) into the NamePool used to build the source document, before parsing of the source document starts. When you use the transform() method to parse and transform an InputSource, this happens automatically. However, if you want to build the document yourself, and transform it using transformDocument() (which allows you to run more than one transformation on the same document), then you must manage the NamePool merging yourself. The system does include checks that the NamePools for the stylesheet and source document are compatible, though these are not completely foolproof.
The use of namecodes rather than String names has affected many internal interfaces, and some of these are interfaces that are also exposed externally. For example, the ParameterSet object which is used to pass parameters from a calling template to a called template can also be used to supply global parameters to the Transformer. The parameters in a parameter set are now identified by an integer fingerprint rather than a string name. You can get the integer namecode from the NamePool using the getFingerprint() method; alternatively use the TrAX method addParameter(), which still takes the name as a String.
The Emitter interface has also changed to use name codes; if you have written your own Emitter, the code will have to be modified.
The classes and interfaces used in Saxon for manipulating collections of attributes now implement the SAX2 Attributes interface.
The standard XPath functions have been extensively revised. The main change, apart from tidying up the code, is that the functions are now responsible for evaluating their own arguments, which enables some optimisation, especially when the arguments are node-sets: they can now be evaluated using knowledge of the data type required. For example, the not() function now stops as soon as the first node in the argument node-set is found.
Some of the little-used methods on the NodeInfo interface have been moved as static methods to a separate helper class, com.icl.saxon.om.Navigator. This enables the code of these methods to be independent of the particular tree implementation.
The delayed evaluation of path expressions now works as follows: on the first two occasions that a path expression is evaluated, it navigates the source tree. On the third occasion, it saves the resulting node-set in memory. On subsequent uses, the result is retrieved from memory. This approach is designed to balance time against memory usage.
The optimisation of "//name" as "/descendant::name" (which is possible when there are no predicates) wasn't working in 5.5 (or for a while before that), causing an unnecessary sort. This has been corrected. In addition, the first time "//name" is used for a particular document, the results are now saved, and all subsequent uses of "//name" for the same document retrieve the results from memory. This means that the traditional assumption that "//name" is inefficient may no longer always be true.
A Sequencer class has been introduced for allocating globally-unique sequence numbers. There are two such sequences, one for document numbers, and one for node numbers. By default, two sequencers are created when Saxon is loaded, and remain in use until it is unloaded. However, it is now possible to reset the sequence numbering if required, either to prevent running out of numbers in a long-running server, or to ensure repeatability of the value of generate-id(). The result of generate-id() depends on the document number, and you can restart the sequence of document numbers by calling controller.setDocumentSequencer(new com.icl.saxon.om.Sequencer()). It is the caller's responsibility to ensure that this does not cause two documents that are in use at the same time to have the same number. The node sequence number is used when sorting nodes into document order, and when eliminating duplicates in a union operation. You can similarly allocate a new sequence using controller.setNodeSequencer().
Added an optimization for recursive processing of a node-set: the predicate "[position() > 1]" is now recognized and handled specially, allowing pipelined execution and reducing memory requirements.
Removed getAttributeValue(Name), replaced it with getAttributeValue(String uri, String localName). This is more efficient: in many cases it removes the need to construct the Name object and then take it apart. Attributes can also be found using the integer fingerprint of the name.
The Name class is no longer used for holding expanded names, it now serves merely as a container for a couple of static methods for name validation.
NameTest and its subclasses have been reorganised. There is a new class NodeTest which is a subclass of Pattern; it performs the test on node-type and node-name supporting a node-test in XPath. This test is context-free. As well as replacing the NameTest class, it also replaces NodeTypePattern and NamedNodePattern. The NodeTest is now used on a Step, and on an Axis, replacing the previous combination of a NameTest and a node type. These tests are also used in testing which nodes are candidates for whitespace stripping.
The interface between the Step and Axis classes and the expression parser has been much simplified.
Michael H. Kay
8 December 2000