Florent Georges

XSLT 2.0 extensions for Saxon

Introduction

This is a set of extensions for XSLT 2.0, developed for the Saxon processor, version 9. For now, there is a URIResolver that can pass through a proxy (in particular proxies requiring to authenticate) and a function to send HTTP requests.

Each extension is implemented in it own Java class, with one single dispatching class for all extensions. That way it is easy to see the functions aimed to be used from XSLT, as well as it is easy to reference them from the XSLT code. This class is org.fgeorges.exslt2.saxon.Exslt2. It only contains public static methods. So to use the extensions, just declare the correct namespace and use the functions:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ex="org.fgeorges.xslt.Exslt2" ... >
   ...
   <xsl:sequence select="ex:http-send(...)"/>

Of course, the JAR must be in your classpath...

Note the Proxy URI Resolver is not an extension, strictly speaking. You don't use it from within the stylesheet, but rather you configure Saxon to use it to resolve HTTP accesses (for instance by doc() or document()). See below.

Download

Everything you could need is in the following archive: fgeorges-0.1.zip. There is the JAR file, the documentation (this page), the Java source files, the complete XSLT samples and the Javadoc. To install, you just have to put the JAR in the classpath, depending on how you actually invoke Saxon (you can have a look at the shell script I wrote for myself to launch Saxon from the command line).

Proxy URI Resolver

The class org.fgeorges.xslt.HttpProxyUriResolver is an implementation of the JAXP's interface URIResolver. A lot of resources can be retrieved from a stylesheet via the HTTP protocol. Unfortunately, numerous places are behind a proxy, so the connection used to retrieved those resources has to be configured properly.

Java provides a standard way to configure the proxy host name and port number, by setting the properties http.proxyHost and http.proxyPort respectively. But there is no way to set the credentials for a proxy (credentials are not always needed, but this is more and more used, especially within large organizations).

This class provides you with this ability, by substituting to the standard Saxon's resolver. When it encounters an HTTP request, it configures the connection with the right credentials. If the resource is not an HTTP request, the resolver fall back to the standard Saxon's mechanism. The credentials are set up via the properties fgeorges.httpProxyUser and fgeorges.httpProxyPwd.

So you have to adapt the way you launch Saxon to add the JAR to the classpath, and set both properties (besides the two standard properties for the proxy's host name and port number). For instance:

> java -cp "${SAXON_DIR}/saxon9.jar;${EXT_DIR}/fgeorges-0.1.jar" \
    -Dhttp.proxyHost=host -Dhttp.proxyPort=8080 \
    -Dfgeorges.httpProxyUser=user -Dfgeorges.httpProxyPwd=password \
    net.sf.saxon.Transform \
    -r org.fgeorges.xslt.HttpProxyUriResolver \
    -o out.xml doc.xml style.xsl

My shell script for Saxon supports setting those options more easily (both for standard proxy settings and the authetication extension). You are then able to use the following (equivalent to the above command):

> saxon --proxy=user:password@host:8080 \
    -o out.xml doc.xml style.xsl

HTTP & HTTPS

Warning: Although this extension has been useful for a few years, it is no longer maintained. It has now evolved into the similar http:send-request() function, part of the EXPath project. If you consider to use it, you are strongly adviced to have a look at the EXPath HTTP Client instead.

This extension allows you to make an HTTP request from an XPath expression within your XSLT stylesheet. You just have to call ex:http-send() with the right parameters, and you then get the result of the request as value of the function.

ex:http-send()

ex:http-send($request as node(), $uri as xs:string) as element()

$uri is the target URI the HTTP request will be sent to. $request must be an element node or a document node with a single element. The name of the element is not relevant. It represents the HTTP request, and looks like:

<http-request method="post" mime-type="text/xml" charset="utf-8">
   <header name="Header-Name">...</header>
   <header name="Header2-Name">...</header>
   <body>
      The textual value of body will be the payload of the HTTP request...
   </body>
</http-request>

The attribute method is the HTTP method (for instance get, post or delete). This is get by default. The attribute mime-type is the MIME type of the request. This is text/xml by default. The attribute charset is the encoding of the request, by default utf-8.

You can also set the credential information for the target server, with the attributes user and password. This will set credential conforming to Basic HTTP Authentication. If you set the properties for the proxy credentials (as explained in the previous section), they will be used as well to go through the proxy.

The result of the function is an element with the following format:

<http-response code="200">
   <message>OK</message>
   <header name="Header-Name">...</header>
   <header name="Header-x-Name">...</header>
   <body>
      The textual value of body was the payload of the HTTP response...
   </body>
</http-response>

This is important to understand that HTTP caries text in the body of both requests and responses. So if you want to send and/or receive XML, for instance to query a Web service via SOAP, you will have to serialize or parse the XML. This is showed in the examples below.

Simple, complete eXist samples

Here are two complete samples sending request to the REST interface of a running eXist database (eXist is a native XML database, see http://exist-db.org/). You should not know eXist in order to understand the samples. All you need to know is: 1/ authentication to eXist is done by HTTP Basic Authentication (which is supported by this extension), 2/ to remove a document from the database, one has to send an HTTP DELETE request to eXist, and 3/ to upload a document, one has to send an HTTP PUT request.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ex="java:org.fgeorges.xslt.Exslt2"
                exclude-result-prefixes="ex"
                version="2.0">

   <!-- Parameters: the user (default to admin), the password
        (required) and the document to delete. -->
   <xsl:param name="user" select="'admin'"/>
   <xsl:param name="pwd" required="yes"/>
   <xsl:param name="doc" select="'/db/tests/test.xml'"/>

   <!-- The URI of the eXist instance -->
   <xsl:variable name="exist-uri" select="
       'http://localhost:8080/exist/rest'"/>

   <!-- The HTTP request -->
   <xsl:variable name="request">
      <http-request
          method="delete"
          user="{ $user }"
          password="{ $pwd }"/>
   </xsl:variable>

   <!-- The main template -->
   <xsl:template match="/" name="initial">
      <xsl:variable name="res" select="
          ex:http-send($request, concat($exist-uri, $doc))"/>
      <xsl:choose>
         <xsl:when test="substring($res/@code, 1, 1) eq '2'">
            <success>
               <xsl:value-of select="$res/message"/>
            </success>
         </xsl:when>
         <xsl:otherwise>
            <failure>
               <xsl:value-of select="$res/message"/>
            </failure>
         </xsl:otherwise>
      </xsl:choose>
   </xsl:template>

</xsl:stylesheet>

The above stylesheet send an HTTP DELETE to an eXist database running on the same machine. Then it checks that everything was ok (HTTP codes starting by '2' mean 'Ok').

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ex="java:org.fgeorges.xslt.Exslt2"
                exclude-result-prefixes="ex"
                version="2.0">

   <!-- Parameters: the user (default to admin), the password
        (required) and the document to upload. -->
   <xsl:param name="user" select="'admin'"/>
   <xsl:param name="pwd" required="yes"/>
   <xsl:param name="doc" select="'/db/tests/test.xml'"/>

   <!-- The URI of the eXist instance -->
   <xsl:variable name="exist-uri" select="
       'http://localhost:8080/exist/rest'"/>

   <!-- The HTTP request -->
   <xsl:variable name="request">
      <http-request method="put" user="{ $user }" password="{ $pwd }">
         <body>
            <xsl:copy-of select="unparsed-text('exist-rest-put.xsl')"/>
         </body>
      </http-request>
   </xsl:variable>

   <!-- The main template -->
   <xsl:template match="/" name="initial">
      <xsl:variable name="res" select="
          ex:http-send($request, concat($exist-uri, $doc))"/>
      <xsl:choose>
         <xsl:when test="substring($res/@code, 1, 1) eq '2'">
            <success>
               <xsl:value-of select="$res/message"/>
            </success>
         </xsl:when>
         <xsl:otherwise>
            <failure>
               <xsl:value-of select="$res/message"/>
            </failure>
         </xsl:otherwise>
      </xsl:choose>
   </xsl:template>

</xsl:stylesheet>

The above stylesheet send an HTTP PUT to an eXist database running on the same machine. Then it checks that everything was ok (HTTP codes starting by '2' mean 'Ok'). Note that it uses unparsed-text() instead of doc() to access the document, to avoid to get it parsed, and to have instead the raw text of the file.

Complete Google contacts sample

This sample uses the Google APIs to access your contact information, aka your address book, on your GMail or Google Apps account.

The Google APIs provide a simple REST API: you just need to send an HTTP POST request with parameters encoded in application/x-www-form-urlencoded (that means the request body looks like: param1=value1&param2=value2, with a bit of escaping). You first need to use the Authentication API to get an authentication token, that you'll pass to every call of other APIs. Then you can use the Contact API to get the data of all your contacts, then a second call to get the data of all the groups your contacts belong to.

Before showing the whole stylesheet, here are what the three request should look like (more exactly what the elements representing the three HTTP request should look like). Here is the authentication call (indented for readibility, but there shouldn't be any carriage return):

<http-request method="post" mime-type="application/x-www-form-urlencoded">
   <body>Email=your.email%40gmail.com&amp;Passwd=xxx&amp;service=cp
       &amp;source=fgeorges.org-contacts-1&amp;accountType=GOOGLE</body>
</http-request>

The get feed call (for either contacts or groups, but the endpoint URI is different in both cases):

<http-request method="get">
   <header name="Authorization">GoogleLogin auth=xxx</header>
</http-request>

Finally, this is the whole stylesheet. Run it by applying it to any XML document or with the initial template contacts, and setting both parameters account and pwd.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:ex="java:org.fgeorges.xslt.Exslt2"
                xmlns:goog="http://www.fgeorges.org/ns/xslt/google"
                xmlns:saxon="http://saxon.sf.net/"
                exclude-result-prefixes="xs ex goog saxon"
                version="2.0">

   <!--
       Google information about authentication and contacts API:
       http://code.google.com/apis/accounts/docs/AuthForInstalledApps.html
       http://code.google.com/apis/contacts/developers_guide_protocol.html
   -->

   <xsl:output indent="yes"/>

   <!-- The account to use (the address email) -->
   <xsl:param name="account" as="xs:string" required="yes"/>
   <!-- The associated password, required -->
   <xsl:param name="pwd"     as="xs:string" required="yes"/>

   <!--
       Utility: check for error in HTTP response.
   -->
   <xsl:function name="goog:check-error">
      <!-- the HTTP response element -->
      <xsl:param name="response" as="element()"/>
      <!-- message in case of error -->
      <xsl:param name="message" as="xs:string"/>
      <xsl:variable name="code" select="xs:integer($response/@code)"/>
      <xsl:if test="$code lt 200 or $code gt 299">
         <xsl:sequence select="
             error((), concat($message, ': ', $response/message))"/>
      </xsl:if>
   </xsl:function>

   <!--
       The authentication parameters, as simple param elements.
   -->
   <xsl:function name="goog:auth-params" as="element(param)+">
      <!-- the email (user account) -->
      <xsl:param name="email" as="xs:string"/>
      <!-- the password -->
      <xsl:param name="pwd" as="xs:string"/>
      <!-- $email can be abbreviated if @gmail.com -->
      <xsl:variable name="full-email" select="
          if ( contains($email, '@') ) then
            $email
          else
            concat($email, '@gmail.com')"/>
      <!-- the param elements -->
      <param name="Email">
         <xsl:value-of select="$full-email"/>
      </param>
      <param name="Passwd">
         <xsl:value-of select="$pwd"/>
      </param>
      <param name="source">fgeorges.org-contacts-1</param>
      <param name="service">cp</param>
      <param name="accountType">
         <xsl:value-of select="
             if ( ends-with($full-email, '@gmail.com') ) then
               'GOOGLE'
             else
               'HOSTED_OR_GOOGLE'"/>
      </param>
   </xsl:function>

   <!--
       Authenticates to the Google server, and returns the
       authentication token.
   -->
   <xsl:function name="goog:auth-token" as="xs:string">
      <!-- the email (user account) -->
      <xsl:param name="email" as="xs:string"/>
      <!-- the password -->
      <xsl:param name="pwd" as="xs:string"/>
      <!-- the endpoint -->
      <xsl:variable name="endpoint" as="xs:string" select="
          'https://www.google.com/accounts/ClientLogin'"/>
      <!-- the http request element -->
      <xsl:variable name="request">
         <http-request method="post" mime-type="application/x-www-form-urlencoded">
            <body>
               <xsl:for-each select="goog:auth-params($email, $pwd)">
                  <xsl:value-of select="@name"/>
                  <xsl:text>=</xsl:text>
                  <xsl:value-of select="encode-for-uri(.)"/>
                  <xsl:if test="position() ne last()">
                     <xsl:text>&amp;</xsl:text>
                  </xsl:if>
               </xsl:for-each>
            </body>
         </http-request>
      </xsl:variable>
      <!-- send the request and get the response -->
      <xsl:variable name="response" select="ex:http-send($request, $endpoint)"/>
      <!-- was the request ok? -->
      <xsl:sequence select="goog:check-error($response, 'Error while login')"/>
      <!-- get the auth token in the response -->
      <xsl:sequence select="
          substring-after(
            tokenize($response/body, '&#10;')
              [substring-before(., '=') eq 'Auth'],
            '=')"/>
   </xsl:function>

   <!--
       Get a simple feed content.
       
       Send to the right endpoint (regarding the feed), a simple HTTP
       GET, with the right HTTP header for authorization (defined by
       Google).  Then check the response, and if everything was ok,
       parse the XML result.
   -->
   <xsl:function name="goog:get-feed" as="element()">
      <!-- the authentication token -->
      <xsl:param name="auth" as="xs:string"/>
      <!-- the feed name -->
      <xsl:param name="feed" as="xs:string"/>
      <!-- the endpoint -->
      <xsl:variable name="endpoint" as="xs:string" select="
          concat('https://www.google.com/m8/feeds/',
                 $feed,
                 '/default/full?max-results=1000')"/>
      <!-- the http request element -->
      <xsl:variable name="request">
         <http-request method="get">
            <header name="Authorization">
               <xsl:text>GoogleLogin auth=</xsl:text>
               <xsl:value-of select="$auth"/>
            </header>
         </http-request>
      </xsl:variable>
      <!-- send the request and get the response -->
      <xsl:variable name="response" select="ex:http-send($request, $endpoint)"/>
      <!-- was the request ok? -->
      <xsl:sequence select="goog:check-error($response, 'Error while getting groups')"/>
      <!-- get the response as an xml element -->
      <xsl:sequence select="saxon:parse($response/body)/*"/>
   </xsl:function>

   <!--
       Main template: authenticates, then gets contacts and groups.
   -->
   <xsl:template name="contacts">
      <contacts-and-groups>
         <!-- the authentication token -->
         <xsl:variable name="auth" select="goog:auth-token($account, $pwd)"/>
         <!-- the contacts -->
         <xsl:sequence select="goog:get-feed($auth, 'contacts')"/>
         <!-- the groups -->
         <xsl:sequence select="goog:get-feed($auth, 'groups')"/>
      </contacts-and-groups>
   </xsl:template>

</xsl:stylesheet>

Complete SOAP sample

Here is a complete stylesheet showing a sample of use of this extension that call a Web service by sending it a SOAP message:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns:saxon="http://saxon.sf.net/"
                xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
                xmlns:wsx="http://www.webservicex.net"
                xmlns:ex="java:org.fgeorges.xslt.Exslt2"
                exclude-result-prefixes="xs xsi saxon soap wsx ex"
                version="2.0">

   <!-- The result is text -->
   <xsl:output method="text"/>

   <!-- To serialize with saxon:serialize() -->
   <xsl:output name="default" indent="yes" omit-xml-declaration="yes"/>

   <!-- The Web service endpoint -->
   <xsl:param name="endpoint" as="xs:string" select="
       'http://www.webservicex.net/WeatherForecast.asmx'"/>

   <!-- The SOAP envelope -->
   <xsl:variable name="soap-request">
      <soap:Envelope>
         <soap:Header/>
         <soap:Body>
            <wsx:GetWeatherByPlaceName>
               <wsx:PlaceName>NEW YORK</wsx:PlaceName>
            </wsx:GetWeatherByPlaceName>
         </soap:Body>
      </soap:Envelope>
   </xsl:variable>

   <!-- The element representing the HTTP request -->
   <xsl:variable name="http-request">
      <http-request method="post" mime-type="text/xml" charset="utf-8">
         <header name="SOAPAction">http://www.webservicex.net/GetWeatherByPlaceName</header>
         <body>
            <xsl:value-of select="saxon:serialize($soap-request, 'default')"/>
         </body>
      </http-request>
   </xsl:variable>

   <!-- The main template -->
   <xsl:template match="/" name="initial">
      <!-- Send the HTTP request and get the result back -->
      <xsl:variable name="http-resp" select="ex:http-send($http-request, $endpoint)"/>
      <!-- Check for error in the HTTP layer -->
      <xsl:if test="$http-resp/number(@code) ne 200">
         <xsl:sequence select="
             error((), $http-resp/concat('HTTP error: ', @code, ' ', message))"/>
      </xsl:if>
      <!-- Parse the HTTP response as an XML document -->
      <xsl:variable name="soap-resp" select="saxon:parse($http-resp/body)"/>
      <!-- Apply templates to the SOAP's payload -->
      <xsl:apply-templates select="$soap-resp/soap:Envelope/soap:Body/*/*"/>
   </xsl:template>

   <!-- Handle the payload -->
   <xsl:template match="wsx:GetWeatherByPlaceNameResult">
      <xsl:text>Place: </xsl:text>
      <xsl:value-of select="wsx:PlaceName"/>
      <xsl:text>&#10;</xsl:text>
      <xsl:apply-templates select="wsx:Details/*"/>
   </xsl:template>

   <!-- Handle a single forecast -->
   <xsl:template match="wsx:WeatherData[*]">
      <xsl:text>  - </xsl:text>
      <xsl:value-of select="wsx:Day"/>
      <xsl:text>:&#09;</xsl:text>
      <xsl:value-of select="wsx:MinTemperatureC"/>
      <xsl:text> - </xsl:text>
      <xsl:value-of select="wsx:MaxTemperatureC"/>
      <xsl:text>&#10;</xsl:text>
   </xsl:template>

</xsl:stylesheet>

When you run the above stylesheet, you should get the following result:

Place: NEW YORK
  - Sunday, March 30, 2008:     2 - 11
  - Monday, March 31, 2008:     9 - 19
  - Tuesday, April 01, 2008:    7 - 13
  - Wednesday, April 02, 2008:  2 - 11
  - Thursday, April 03, 2008:   2 - 12
  - Friday, April 04, 2008:     6 - 13
  - Saturday, April 05, 2008:   3 - 14

This is a real, complete example that formats the result besides preparing the request, but the interesting parts are really the global variable $http-request and the call to ex:http-send().