XRel - Introduction

This page is a digest of Chapter 1 of Fabrizio Bisi's Masters Thesis, titled "XRel: Analysis of the XDuce Data Types and Implementation of the Related Algorithms". See the XRel home page for more details.

Motivation

Web Services look to become the foundation of e-commerce. The first key enabling technology for Web Services is the use of a standard data format, XML, along with the use of standard protocols like HTTP and SOAP. These standards allow for the interaction of services written in different languages and executed in different environments. The second key technology is orchestration. Orchestration is when a Web Service, as well as performing local computation, can also interact with other Web Services to prepare its answer. For instance, a travel-agent Web Service might interact with an airline Web Service and a hotel Web Service in order to complete its booking. Orchestration means that Web Services must be considered as interacting processes which exchange messages in complicated protocols, rather than just as remote functions which receive input and give back output. Orchestration standards have not yet been finalised; however, existing proposals (Biztalk, HighWire, Bpml, Bpel4ws) are mostly based on the pi calculus formalism.

The pi-calculus is a widely studied formalism based on message passing for describing and reasoning about distributed and concurrent systems; born in early 90s, it had an immediate success in the academic world for its mobility features that potentially made suitable for many distributed applications, such as phone mobile networks. Today also business world looks at it with interest. The evidence of this interest is in the fact that it can be found together with XML-processing in new commercial products. Notable, they are found together in Microsoft Biztalk Server 2000, a product used to create Web Services. The orchestration language used in Biztalk is called XLang, and it is based on the asynchronous version of the pi-calculus; all Biztalk messages are XML documents. XML and the pi-calculus are also found together in Highwire, an ongoing project currently under development at Microsoft. Highwire is based on the explicit fusion version of the pi-calculus; it uses terms in linear logic as its data, uses cut-elimination for exchange of data, and argues that linear logic is a generalisation of XML.

The fusion group at Bologna also wishes to integrate pi-calculus and XML, to make a formalism of this integration, and to implement it. We have chosen to use a different model for XML - namely, the tree regular expressions of Hosoya's XDuce. (These also form the foundation of XQuery, CDuce and XTatic). We believe that this approach will yield an XML-pi-calculus which is easier for programmers to understand than Highwire, and which is easier to formalise and implement. The fusion group's project has been carried out in two parallel dissertations. The first, by Paolo Milazzo, gives a Java implementation of the pi-calculus but without any XML. My dissertation gives a Java implementation of tree regular expressions but without any pi-calculus. An integration of the two implementations is currently in progress.

XDuce

XDuce (pronounced "transduce") is a statically typed programming language for XML processing, developed by Haruo Hosoya and Benjamin C. Pierce. Its basic data values are XML documents and its types (so-called regular expression types) directly correspond to document schemas. It also provides a flexible form of regular expression pattern matching integrating conditional branching, tag checking and subtree extraction, as well as dynamic typechecking. The language is described at the XDuce home page and in particular in the following paper:

"XDuce: A Statically Typed XML Processing Language", 2003
 (Hosoya H., Pierce B. C.)
 www.kurims.kyoto-u.ac.jp/~hahosoya/papers/xduce-toit.ps

XRel

XRel is a small tool I've written to study the algorithms of XDuce; I've called it XRel (XML Regular Expression Language). XRel incorporates the two main features of XDuce - tree regular expressions types and tree regular expression pattern matching - for doing pattern matching with XML documents.

XRel is a test-harness for the XDuce algorithms, and a porting of these algorithms from ML to Java. Note that it lacks recursive functions: the reason is that it's our intention to discard it as a stand-alone tool in favour of an integration with Hipi. (Hipi is a concurrent language based on pi-calculus and developed as master thesis by Paolo Milazzo at the university of Bologna.) Because Hipi has functions and all the other basic constructs of a programming language these features are not repeated in XRel.

Despite its minimality (it has not much more than a typeswitch statement with which it can make pattern matching) all the algorithms of XDuce are still implemented: it has subtyping, type inference, static checks on types and the pattern matching algorithm.

Original Contributions of my thesis

My thesis is mainly an analysis of the XDuce language and a re-implementation of tree regular expression algorithms.

It also introduces a novel and simpler ambiguity-check algorithm, for checking whether a tree regular expression would match some value in more than one way, that is based on the notion of weak ambiguity instead of the strong one used by Hosoya. Ambiguity is defined in section 3.3.3; my algorithm is presented in section 4.3.3.

Another contribution is to note an error in the formalisation of the difference algorithm given both in "Validation and Boolean operations for Attribute-Element Constraints" and "Boolean Operations and Inclusion Test for Attribute-Element Constraints (Extended Abstract)", and to present a corrected algorithm.

A further contribution of this thesis is to provide an up-to-date snapshot of the language and especially of the algorithms, because the work-in-progress condition of XDuce makes the documentation of the algorithms difficult.