EXPath

From FGiki

Jump to: navigation, search

About my EXPath project...

Contents

Process

Some notes about the process itself...

  • a kwel tool is the W3C's HTML diff service, could be of help while manually annotating diffs in an XML diff version...

HTTP Client

  • Would it be interesting to use the W3C's HTTP vacabulary in RDF? At least as a base? Or just follow XProc HTTP representation?
  • Would be nice to have a simplified way to submit form fields. For now this is to much complex. That could be done in standard XSLT or XQuery by writing a function that transform a form representation into an http:request element. Something like:
<form action="http://localhost:8080/something/"
      method="post"
      enctype="multipart/form-data">
   <field name="amount" value="30.00 EUR"/>
   <field name="file"   src="dir/logo.png"/>
</form>

into:

<http:request href="http://localhost:8080/something/" method="post">
   <http:multipart content-type="multipart/form-data"
                   boundary="tHis-b0undary-iS-not-in-Content">
      <http:header
          name="Content-Disposition"
          value='form-data; name="amount"'/>
      <!-- text/plain, really?  or just nothing?-->
      <http:body content-type="text/plain">30.00 EUR</http:body>
      <http:header name="Content-Disposition"
                   value='form-data; name="file"; filename="logo.png"'/>
      <http:body content-type="image/png"
                 src="file:/absolute/path/dir/logo.png"/>
   </http:multipart>
</http:request>

EXPath Packaging

The goal is to define a packaging format for XML technologies, in particular XPath, XSLT, XQuery and XProc. To package standalone applications, libraries, and extensions.

Could the W3C Web Applications WG's Widgets 1.0: Packaging and Configuration candidate recommendation be of some inspiration?

TODO: An idea would be to use RDDL as a starting point for humans for each module, pointing to documentation (which URIs to use, etc.,) manuals, resources, etc. It could maybe be used instead of the package descriptor if it contains enough meta-information. That would be a good way to encourage (force) people to provide such a human description.

Processor support

  • Saxon XSLT: resolve xsl:import URI through XML Catalogs
  • Saxon XQuery: resolve import "at" hints through XML Catalogs. Is it possible to use catalogs without hints, only on the namespace (unique as this is an XQuery module)?
  • eXist: resolve modules via their namespace, once bundled in a JAR file, put in a directory, and some config in conf.xml
  • MarkLogic: everything is based on "at" hint. If it contains a scheme, that's an error, there is no way to plug a resolver! (see Application Developer's Guide, section 6)
  • Zorba: quid? Namespace URIs? "At" hints?
  • MXQuery: quid? Namespace URIs? "At" hints?
  • XQilla: quite vague... Possible to resolve namespace URIs through setting objects in C++: one thread and its second half. Not clear when "at" hints.

Package names

Packages should be named using a URI. This is a simple means to provide a way to uniquely identify a (list of) package. How this is done is outside the scope of the spec, but we could imagine that an XML IDE would provide a way to select packages to active for a specific scenario, or a web server container would activate packages on a per-web application basis.

Ideas

  • Debug facilities: for instance a function that returns its parameter, but under some conditions will log it somewhere. For instance with the Google Contacts example from Balisage, one can want to log the contact list retrieved from Google, the contact list formatted for the following processing, the formatted list in ODF format, and so on... A bit like if those corresponded to XProc step ports and one would like to log some ports...

Project structure

This is the idea of defining standard project structures to enable standard tool to package automatically a library or a webapp as a XAR or XAW file.

Idea of example (for a tutorial or a presentation, and as a real project): a serializer, for instance to serialize XML to highlighted HTML. It would use XSpec, provide a library, and provide a webapp to enable its use without installing it. The distribution would contain a XAR file for the lib and a XAW file for the webapp (including the XAR file).

Misc

Functional Style

It would be nice to keep functional concepts in mind while defining modules. Dimitre advice on reading Purely Functional Data Structures, by Okasaki (seems very interesting!)

Servlet

Have a look at the related URI rewriting feature of several XML databases (for instance MarkLogic and eXist.)

Implementations:

  • one standalone implementation using Saxon and Calabash, to be deployed in any servlet container (Tomcat, Jetty, JBoss, Glassfish...)

What's in eXist?

If you execute the following query in the server (put it in the database, say in /db/app-server.xq, check content type = application/xquery, then access it with a browser through the REST API: http://localhost:8080/exist/rest/db/app-server.xq):

xquery version "1.0";

declare option exist:serialize "method=xml media-type=application/xml";

import module namespace req = "http://exist-db.org/xquery/request";


<info>
   <exists>{ req:exists() }</exists>
   {
      for $attr in req:attribute-names()
         return
            <attribute name="{ $attr }">{ req:get-attribute($attr) }</attribute>
   }
   <context-path>{ req:get-context-path() }</context-path>
   {
      for $c in req:get-cookie-names()
         return
            <cookie-value name="{ $c }">{ req:get-cookie-value($c) }</cookie-value>
   }
   <data>{ req:get-data() }</data>
   <effective-uri>{ req:get-effective-uri() }</effective-uri>
   {
      for $h in req:get-header-names()
         return
            <header name="{ $h }">{ req:get-header($h) }</header>
   }
   <hostname>{ req:get-hostname() }</hostname>
   <method>{ req:get-method() }</method>
   {
      for $p in req:get-parameter-names()
         return
            <parameter name="{ $p }">{ req:get-parameter($p, ()) }</parameter>
   }
   <path-info>{ req:get-path-info() }</path-info>
   <query-string>{ req:get-query-string() }</query-string>
   <remote-addr>{ req:get-remote-addr() }</remote-addr>
   <remote-host>{ req:get-remote-host() }</remote-host>
   <remote-port>{ req:get-remote-port() }</remote-port>
   <server-name>{ req:get-server-name() }</server-name>
   <server-port>{ req:get-server-port() }</server-port>
   <servlet-path>{ req:get-servlet-path() }</servlet-path>
   <uri>{ req:get-uri() }</uri>
   <url>{ req:get-url() }</url>
</info>

then you get the following result:

<info>
    <exists>true</exists>
    <context-path>/exist</context-path>
    <data/>
    <effective-uri>/exist/rest/db/app-server.xq</effective-uri>
    <header name="Host">localhost:8080</header>
    <header name="User-Agent">Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5</header>
    <header name="Accept">text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8</header>
    <header name="Accept-Language">en-us,en;q=0.5</header>
    <header name="Accept-Encoding">gzip,deflate</header>
    <header name="Accept-Charset">ISO-8859-1,utf-8;q=0.7,*;q=0.7</header>
    <header name="keep-alive">300</header>
    <header name="Connection">keep-alive</header>
    <header name="Cache-Control">max-age=0</header>
    <hostname>0:0:0:0:0:0:0:1%0</hostname>
    <method>GET</method>
    <path-info/>
    <query-string/>
    <remote-addr>0:0:0:0:0:0:0:1%0</remote-addr>
    <remote-host>0:0:0:0:0:0:0:1%0</remote-host>
    <remote-port>50213</remote-port>
    <server-name>localhost</server-name>
    <server-port>8080</server-port>
    <servlet-path>/db/app-server.xq</servlet-path>
    <uri>/exist/rest/db/app-server.xq</uri>
    <url>http://localhost:8080/exist/rest/db/app-server.xq</url>
</info>

What's in MarkLogic?

If you execute the following query in the server (put it in the HTTP app server dir, say in Docs/tmp/app-server.xqy, then access it with a browser through the HTTP app server: http://localhost:8000/tmp/app-server.xqy):

xquery version "1.0-ml";

<info>
   <request>
      {
        let $body := xdmp:get-request-body()
        let $kind := typeswitch ( $body )
                       case document-node()  return 'doc'
                       case element()        return 'element'
                       case binary()         return 'binary'
                       case text()           return 'text'
                       case empty-sequence() return 'empty'
                       default               return 'unknown'
          return
            <body kind="{ $kind }">{ $body }</body>
      }
      <client-address>{ xdmp:get-request-client-address() }</client-address>
      <client-certificate>{ xdmp:get-request-client-certificate() }</client-certificate>
      {
        if ( xdmp:get-request-client-certificate() ) then
          <client-certificate-extracted> {
            xdmp:x509-certificate-extract(
              xdmp:get-request-client-certificate()
            )
          }
          </client-certificate-extracted>
        else
          ()
      }
      {
        for $field in xdmp:get-request-field-names()
          return
            <field name="{ $field }">
               <value>{ xdmp:get-request-field($field) }</value>
               <content-type>{ xdmp:get-request-field-content-type($field) }</content-type>
               <filename>{ xdmp:get-request-field-filename($field) }</filename>
            </field>
      }
      {
        for $header in xdmp:get-request-header-names()
          return
            <header name="{ $header }">{ xdmp:get-request-header($header) }</header>
      }
      <method>{ xdmp:get-request-method() }</method>
      <path>{ xdmp:get-request-path() }</path>
      <protocol>{ xdmp:get-request-protocol() }</protocol>
      <url>{ xdmp:get-request-url() }</url>
      <username>{ xdmp:get-request-username() }</username>
   </request>
   <session> {
      for $field in xdmp:get-session-field-names()
        return
          <field name="{ $field }">{ xdmp:get-session-field($field) }</field>
   }
   </session>
   <response>
      <code>{ xdmp:get-response-code() }</code>
      <encoding>{ xdmp:get-response-encoding() }</encoding>
   </response>
</info>

then you get the following result:

<info>
   <request>
      <body kind="empty"/>
      <client-address>172.16.208.1</client-address>
      <client-certificate/>
      <header name="Host">www.drkm.org:8000</header>
      <header name="User-Agent">Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5)
         Gecko/20091102 Firefox/3.5.5</header>
      <header name="Accept">text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8</header>
      <header name="Accept-Language">en-us,en;q=0.5</header>
      <header name="Accept-Encoding">gzip,deflate</header>
      <header name="Accept-Charset">ISO-8859-1,utf-8;q=0.7,*;q=0.7</header>
      <header name="Keep-Alive">300</header>
      <header name="Connection">keep-alive</header>
      <header name="Cookie">__utma=244509904.1063682111.1259110880.1259110880.1259110880.1;
         __utmz=244509904.1259110880.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
         /cq:session-id=169ed2c076bdebc0</header>
      <header name="Authorization">Digest username="admin", realm="public",
         nonce="4bb808196bc1c1c6d140abf5e0ccca56", uri="/tmp/app-server.xqy",
         response="a215c8e983090a7f64f0a3a33d229f61", opaque="7f29e1f7bb5ca677", qop=auth,
         nc=0000008b, cnonce="2526a3ab45e90ea0"</header>
      <method>GET</method>
      <path>/tmp/app-server.xqy</path>
      <protocol>http</protocol>
      <url>/tmp/app-server.xqy</url>
      <username>admin</username>
   </request>
   <session/>
   <response>
      <code>200 OK</code>
      <encoding>UTF-8</encoding>
   </response>
</info>

What's in Java Servlets?

FROM SERVLET 2.5 in SRV.3.4 "Request Path Elements"

requestURI = contextPath + servletPath + pathInfo

request-uri = context-path + servlet-path + path-info

Context Path        /catalog

Servlet Mapping     Pattern: /lawn/*
                    Servlet: LawnServlet

Servlet Mapping     Pattern: /garden/*
                    Servlet: GardenServlet

Servlet Mapping     Pattern: *.jsp
                    Servlet: JSPServlet

/catalog/lwan/index.html      ContextPath: /catalog
                              ServletPath: /lawn
                              PathInfo:    index.html

/catalog/garden/implements/   ContextPath: /catalog
                              ServletPath: /garden
                              PathInfo:    /implements/

/catalog/help/feedback.jsp    ContextPath: /catalog
                              ServletPath: /help/feedback.jsp
                              PathInfo:    ()

What I would suggest

IN PROGRESS...

The request:

[1]
<request path="/some/page" method="post">
   <auth method="basic" username="user">
      ... [[ basic, or digest stuff ]] ...
      <basic
         sesame="YWRtaW46YWRtaW5hZG1pbg=="
         password="..."/> <!-- because it is already in the sesame and,
                               well, because this is basic auth... -->
      <digest
         username="admin"
         realm="public",
         nonce="4bb808196bc1c1c6d140abf5e0ccca56"
         uri="/tmp/app-server.xqy",
         response="a215c8e983090a7f64f0a3a33d229f61"
         opaque="7f29e1f7bb5ca677"
         qop="auth"
         nc="0000008b"
         cnonce="2526a3ab45e90ea0"/>
      <oauth ...???/>
   </auth>
   <uri>http://www.host.org:8000/myapp/some/page</uri>
   <context-path>/myapp</context-path>
   <header name="..." value="..."/>
   <multipart content-type="multipart/alternative"
              boundary="yoyoyoYOYO123465798YOYOyoyoy">
      <body content-type="text/plain"/>
      <body content-type="text/html">
         <html>
            ...
         </html>
      </body>
      <body content-type="application/xml"/>
   </multipart>
</request>

[2]
text { 'Hello, world!' }

[3]
document { <hello>World!</hello> }

For now, the request:

<srv:request servlet="srv-xsl-3" path="/...-xsl/3/" method="get">
   <srv:uri>http://localhost:8090/tools/servlex/my-webapp/...-xsl/3/</srv:uri>
   <srv:context-root>/tools/servlex/my-webapp</srv:context-root>
   <srv:path>
      <srv:part>/</srv:part>
      <srv:match name="something">...</srv:match>
      <srv:part>-xsl/3/</srv:part>
   </srv:path>
   <srv:header name="host" value="localhost:8090"/>
   <srv:header name="user-agent" value="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5"/>
   <srv:header name="accept" value="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"/>
   <srv:header name="accept-language" value="en-us,en;q=0.5"/>
   <srv:header name="accept-encoding" value="gzip,deflate"/>
   <srv:header name="accept-charset" value="ISO-8859-1,utf-8;q=0.7,*;q=0.7"/>
   <srv:header name="keep-alive" value="300"/>
   <srv:header name="connection" value="keep-alive"/>
   <srv:header name="cache-control" value="max-age=0"/>
</srv:request>

The deployment descriptor:

<web-app xmlns="http://expath.org/ns/webapp"
         xmlns:my1="http://www.mycorp.com/proj/my-app/servlet-1"
         xmlns:my2="http://www.mycorp.com/proj/my-app/servlet-2"
         name="http://www.fgeorges.org/example/my-webapp"
         abbrev="my-webapp"
         version="0.1">
   <title>My simple web application</title>
   <context>
      <param name="webmaster" value="webmaster@mycorp.com"/>
      <param name="collection" value="http://www.mycorp.com/dataset"/>
   </context>
   <page>
      <url pattern="/(.+)\.html"/>
   </page>
   <servlet name="srv-xq-1">
      <xquery function="{http://www.mycorp.com/proj/my-app/servlet-1}servlet"/>
      <!-- OR xquery function="my1:servlet"/-->
      <url pattern="/catalog/(.+)">
         <match group="1" name="something"/>
      </url>
      <param name="catalog" value="string value"/>
   </servlet>
   <servlet name="srv-xq-2">
      <xquery file="servlet-2.xq"/>
      <url pattern="/xq2/(.+)">
         <match group="1" name="something"/>
      </url>
   </servlet>
   <servlet name="srv-xsl-1">
      <xslt function="my1:servlet">
         <import-uri>http://www.mycorp.com/proj/my-app/servlet-1.xsl</import-uri>
      </xslt>
      <url pattern="/(.+)-xsl/1/">
         <match group="1" name="something"/>
      </url>
      <param name="catalog" as="element()">
         <catalog date="2009-12-10">
            <name>yo</name>
         </catalog>
      </param>
   </servlet>
   <servlet name="srv-xsl-2">
      <xslt template="my2:servlet">
         <import-uri>http://www.mycorp.com/proj/my-app/servlet-2.xsl</import-uri>
      </xslt>
      <url pattern="/(.+)-xsl/2/">
         <match group="1" name="something"/>
      </url>
   </servlet>
   <servlet name="srv-xsl-3">
      <xslt file="servlet-3.xsl"/>
      <url pattern="/(.+)-xsl/3/">
         <match group="1" name="something"/>
      </url>
   </servlet>
   <dependencies>
      <package name="http://www.fgeorges.org/google-apis/1.0"/>
      <package name="http://www.functx.org/1.0"/>
   </dependencies>
   <session timeout="30"/>
   <!-- TODO: Define a way to declaratively map exception types to
        error handling...  Basically, this needs the same infos as for
        an xsl:catch, but dispatching to a page or a servlet instead
        of a sequence ctor. -->
   <error-page code="404">
      <location>/404.html</location>
   </error-page>
</web-app>

Actually, instead of the URL element (with its template attribute and match subelements), use the IETF URI Templates (or see this one, just to be sure to have a URI that works).

Proposed interface:

my:servlet(
 $req  as element(srv:request),
 $ctxt as element(srv:servlet),
 $app  as element(srv:application)
)
as node()+

(: Return is an element srv:response, followed by one node per body
   part, each either an element, a text node, or a binary node.  Each
   can be embedded in a document node.  Normally, there is only one
   part in a server response.

   Request body content can be retrieved by srv:xxx-xxx($req). :)

<srv:servlet name="servlet-1">
   <srv:url-param>
      <srv:name>var1</srv:name>
      <srv:name>var2</srv:name>
   </srv:url-param>
   <srv:param name="something" value="string value"/>
   <srv:param name="catalog">
      <catalog>
         <name>spring</name>
         <kind>clothes</kind>
      </catalog>
   </srv:param>
</srv:servlet>

<srv:application name="http://www.fgeorges.org/example/my-webapp"
                 abbrev="my-webapp">
   <srv:param name="webmaster" value="webmaster@mycorp.com"/>
   <srv:param name="collection" value="http://www.mycorp.com/dataset"/>
   <srv:param name="data">
      <specific>
         <level>info</level>
         <other>data</other>
      </specific>
   </srv:param>
</srv:application>

Simple sample:

in MarkLogic:
http://localhost:8000/myapp/catalog/yo
 /myapp
 /catalog/yo
   + url match: 1=yo

in Servlex (only the context root changes):
http://localhost:8084/tools/servlex/myapp/catalog/yo
 /tools/servlex/myapp
 /catalog/yo
   + url match: 1=yo

sample of simple request for a GET:
<request path="/some/page" method="post">
  <uri>http://localhost:8084/tools/servlex/myapp/catalog/yo</uri>
  <context-root>/tools/servlex/myapp</context-root>
  <servlet-path>/catalog</servlet-path>
  <path-info>/yo</path-info>
  <header name="..." value="..."/>
</request>

Servlet definition

A servlet is a component that takes a request and a context as input, and provides a response as output. The request is represented by an element srv:request and a sequence of zero or more request bodies. The context is represented by an element srv:application and an element srv:servlet. The response is represented by an element srv:response and a sequence of zero or more response bodies.

A servlet can be implemented using one of various technologies. Each kind of servlet has its own rules for receiving requests and providing responses. The available servlet kinds are:

  • an XPath function (provided by an XQuery library module, a stylesheet, or any other implementation-specific means);
  • an XSLT named template;
  • an XQuery main module;
  • an XSLT stylesheet;
  • an XProc pipeline (and a step type?).

Session management

The webapp module will have to include a session management facility. For more information about session management over HTTP for web applications, see this and this articles (very high-level, about session management in general and session life cycle in Java EE), this article (about session tracking in general and in Java EE in particular) and this chapter (Session Tracking from the book Java Servlet Programming, about sessions in Java EE). And of course the Java EE servlet spec (version 2.5 and version 3.0).

TODO: See emails with Adam ("Thoughts of WebApp Module", for session stuff but also other stuff).

Function library

The module must provide a library of XPath functions to be used by the servlets (i.e. in XSLT, XQuery or XProc). This library will contain two different kinds of functions: the functions that provide a facility that would not be possible without them, and convenience functions that could be written directly in XSLT, XQuery and/or XProc (but that will be convenient to not write again and again, or because they would be more efficient). About this last kind of functions, an implementer can of course choose to implement some of them as real components or in any language he/she wants. Some functions:

  • simple functions web:get-param and web:get-params to get the value of a named parameter in the request (see below)
  • implement the logic in XHTML Media Types, i.e. get the request as param and return the correct media type to use for XHTML depending on the Accept header
  • ...
<xsl:function name="web:get-params" as="xs:string*">
   <xsl:param name="request"    as="element(web:request)"/>
   <xsl:param name="param-name" as="xs:string"/>
   <xsl:sequence select="$request/web:param[@name eq $param-name]/string(@value)"/>
</xsl:function>

<xsl:function name="web:get-param" as="xs:string?">
   <xsl:param name="request"    as="element(web:request)"/>
   <xsl:param name="param-name" as="xs:string"/>
   <!-- TODO: generate an error if more than one value? -->
   <xsl:sequence select="web:get-params($request, $param-name)[1]"/>
</xsl:function>

Setup at deployment

An interesting idea is to provide a way to run interactive (or not) steps at the deployment of a webapp. E.g. by running a component or a set of servlets. See the following discussion in the MarkLogic mailing list (especially the end of the thread).

Personal tools