Without a doubt, web services have hit the mainstream and have demonstrated their value. They were designed largely to address interoperability and distributed computing and both goals were realised by layering technologies over existing implementation environments.

Service interfaces are formally specified using the language and platform agnostic web services description language (WSDL) and invocation occurs by sending an XML payload over HTTP to a particular endpoint (URI).

Because these technologies were designed to operate within the world wide web, web services have not only changed the way that applications are built, but more importantly, they have enabled entirely new classes of solutions. For the first time we have foundations that enable the construction of applications which allow participation from individuals and computing systems in different organisations, operating within a diverse set of IT environments.

As game changing as they are though, web services aren't always enough. Sure, if you are offering a stock ticker service then you likely needn't address anything beyond platform independence and remote invocation; however, if you are building enterprise content management (ECM) services and applications you must leverage the paradigms and tools that are coarsely categorised as service oriented.

For example, an insurance claims processing application that aggregates content and data from a variety of secure information systems and coordinates regulated processes over that information requires a richer infrastructure in which it is developed and deployed.

While there are many elements of service orientation that could be applied to this complex solution space, there are really only a handful that stand out as the foundation of service oriented architectures (SOA) for ECM. At the core, the discussion centers around how we construct and deploy content management web services and how we evolve the environment in which they are deployed. This is not only a simple technological change, rather involves some fundamental paradigm shifts as well.

ECM agility

Enterprise content management systems have been indisputably successful over the last fifteen to twenty years demonstrating significant ROI and enabling solutions that were previously impossible to achieve. And while steadfastly dependable, in many cases the systems have become rather large and difficult to upgrade, where the average time between major releases across the largest ECM vendors can be measured in years rather than months and customer upgrades to new releases require length planning cycles. SOAs afford an opportunity to change this.

Web services that are decoupled from other services as well as from the content management system itself, will modularise ECM capabilities allowing for fundamental units to be independently upgraded. For example, a newly optimised web service that generates a PDF rendition from a document checked into the repository can be tested and deployed without changing services such as check-out and check-in, avoiding the expensive and time consuming, yet often mandatory recertification of these core ECM library services. This fluidity, however, does not come for free by simply offering web service based interfaces, rather the design of the web services for content management must deliberately focus on meeting this objective.

For example, the mechanics of recording audit history upon document check-in must be sufficiently isolated so that potential issues with a new rendition generating service do not interfere activities so crucial in regulated environments.

Planning for agility does not only apply to the construction of the web services themselves, it also involves anticipation of incremental evolution of the services infrastructure. The environment in which web services operate can be as simple as an application server that services HTTP requests, or it may include a vast array of other components such as single sign-on, message queues, UDDI directories and an enterprise service bus (ESB). Content management web services will be deployed within ecosystems that range over the entire spectrum from basic to sophisticated and will be expected to continue functioning as the services infrastructure is incrementally upgraded.

When initially deployed, for example, ECM services may be invoked with a username and password for authentication, however, an enterprise SOA upgrade may allow connection to the same ECM services with a single sign-on token. Again, the ECM services must be designed to anticipate such environmental changes.

The core WS standards for ECM

Security and content transport are two fundamental concerns for ECM system and these are sufficiently addressed with recognised web service standards. Security, both at the point of access and while content is in transit is addressed through WS-Security.

WS-Security extends SOAP to address message integrity and confidentiality and also defines a mechanism for the inclusion of security tokens within the SOAP envelope. WS-Security does not provide a security solution, rather defines a means by which web services and security solutions are to cooperate in the SOA. In order to ensure that security is soundly addressed it must be an infrastructure component with which all business critical applications, including SOA ready ECM system, must interact.

Enterprise content management necessarily involves the transport of all types of unstructured content and while the content metadata is well suited for transport within the XML based SOAP envelope, the binary files must receive special attention. There are two standards which have emerged as the dominant ones and must be supported by ECM web services, base64 encoding and SOAP Message Transmission Optimisation Mechanism (MTOM).

Base64 suffers from inefficiency concerns, however, its simplicity and ubiquitous availability make it suitable in situations where the encoded payloads are small and infrequently passed (e.g. human based processes such as expense report submissions). MTOM was created to optimise even standard XML via SOAP, also addresses efficient transfer of binary content with serialisation of multi-part/MIME content and is the transfer mechanism of choice for batch uploads (e.g. automated document scanning) or large binary files (e.g. CAD drawings).

Finally, to ensure platform independence the WS-Interoperability basic profile (WS-1 BP) specification, which tightly constrains the use of SOAP in support of interoperability, should be supported. WS-I BP also places more stringent requirements over the use of WSDL, eliminating certain ambiguities that could present interoperability challenges. Providing ECM web services meeting the requirements of WS-I BP 1.1 will ensure that they can be deployed on a variety of platforms and be consumed by applications running on similarly varied platforms.

The application developer for ECM in the SOA

The developer of the content rich application does not fit a single profile. While historically most have been programmers (Java or .Net) who are at least one level removed from the business user whose concerns they are addressing, recent innovations are making application development directly available to the business analyst.

Consider first the business analyst. This developer will use technologies such as business process management (BPM) tools or portal systems to build and configure the applications they construct. These individuals will use tools which have graphical, drag and drop construction environments to compose pieces of functionality to serve a targeted need.

An ECM services provider must serve this developer by providing web services designed with the right level of granularity and by presenting those services in such a way that the composite application development environments can utilise them. Supplying this user with a service (described through WSDL) that imports a document and associated metadata into the repository through a single invocation is much better than individual services that create an object in the repository, upload the content to that object, set the metadata for that content and finally share the object with other ECM system users.

On the other hand, there is still a need for the traditional programmer and properly enabling them should be a concern of the services provider. The service oriented, distributed computing model of today is not fundamentally in conflict with object oriented programming, however, the common approach of using an IDE to generate proxies from a WSDL definition severely limits the usefulness of the generated classes, reducing their functionality to that of facilitating message transfers.

Instead of forcing this development paradigm the ECM services provider should provide certain high-value capabilities in the form of client-side libraries. In ECM for example, uploading an HTML page including images, requires that the HTML file be processed, client-side, to determine which image files should be included in the content transfer. While careful consideration should be given before including a capability that requires a client-side library, ECM requirements call for the utilisation of this valuable approach.

The complex world of ECM will benefit greatly through its adaptation into a service oriented architectural style. Large complex systems will be more effectively deployed, configured and maintained and functionalities will be composed with non-ECM capabilities more easily.

Application of service orientation concepts to the ECM architecture must be carefully thought out and deliberately applied so as to meet the specific needs of the content centric enterprise. Service orientation is not simply about web services, interoperability and distributed computing, rather also it serves to enable a more fluid, adaptable IT ecosystem, enables a new breed of application developer and has a vast array of standards-based technologies to build upon.

John O'Melia is vice president, content management and archiving (CMA) at EMC.