memonic

BBC World Cup 2010 dynamic semantic publishing

Save

BBC World Cup 2010 dynamic semantic publishing

Post categories:

Jem Rayfield | 10:00 UK time, Monday, 12 July 2010

The World Cup 2010 website is a significant step change in the way that content is published. From first using the site, the most striking changes are the horizontal navigation and the larger, format high-quality video. As you navigate through the site it becomes apparent that this is a far deeper and richer use of content than can be achieved through traditional CMS-driven publishing solutions.

The site features 700-plus team, group and player pages, which are powered by a high-performance dynamic semantic publishing framework. This framework facilitates the publication of automated metadata-driven web pages that are light-touch, requiring minimal journalistic management, as they automatically aggregate and render links to relevant stories.

eng_595.jpg

Dynamic aggregation examples include:

The underlying publishing framework does not author content directly; rather it publishes data about the content - metadata. The published metadata describes the world cup content at a fairly low-level of granularity, providing rich content relationships and semantic navigation. By querying this published metadata we are able to create dynamic page aggregations for teams, groups and players.

The foundation of these dynamic aggregations is a rich ontological domain model. The ontology describes entity existence, groups and relationships between the things/concepts that describe the World Cup. For example, "Frank Lampard" is part of the "England Squad" and the "England Squad" competes in "Group C" of the "FIFA World Cup 2010".

The ontology also describes journalist-authored assets (stories, blogs, profiles, images, video and statistics) and enables them to be associated to concepts within the domain model. Thus a story with an "England Squad" concept relationship provides the basis for a dynamic query aggregation for the England Squad page "All stories tagged with England Squad".

This diagram gives a high-level overview of the main architectural components of this domain-driven, dynamic rendering framework.

diagram_595.png

The journalists use a web tool, called 'Graffiti', for the selective association - or tagging - of concepts to content. For example, a journalist may associate the concept "Frank Lampard" with the story "Goal re-ignites technology row".

In addition to the manual selective tagging process, journalist-authored content is automatically analysed against the World Cup ontology. A natural language and ontological determiner process automatically extracts World Cup concepts embedded within a textual representation of a story. The concepts are moderated and, again, selectively applied before publication. Moderated, automated concept analysis improves the depth, breadth and quality of metadata publishing.

Journalist-published metadata is captured and made persistent for querying using the resource description framework (RDF) metadata representation and triple store technology. A RDF triplestore and SPARQL approach was chosen over and above traditional relational database technologies due to the requirements for interpretation of metadata with respect to an ontological domain model. The high level goal is that the domain ontology allows for intelligent mapping of journalist assets to concepts and queries. The chosen triplestore provides reasoning following the forward-chaining model and thus implied inferred statements are automatically derived from the explicitly applied journalist metadata concepts. For example, if a journalist selects and applies the single concept "Frank Lampard", then the framework infers and applies concepts such as "England Squad", "Group C" and "FIFA World Cup 2010" (as generated triples within the triple store).

This inference capability makes both the journalist tagging and the triple store powered SPARQL queries simpler and indeed quicker than a traditional SQL approach. Dynamic aggregations based on inferred statements increase the quality and breadth of content across the site. The RDF triple approach also facilitates agile modeling, whereas traditional relational schema modeling is less flexible and also increases query complexity.


Our triple store is deployed multi-data center in a resilient, clustered, performant and horizontally scalable fashion, allowing future expansion for additional ontologies and indeed linked open data (LOD) sets.

The triple store is abstracted via a JAVA/Spring/CXF JSR 311 compliant REST service. The REST API is accessible via HTTPs with an appropriate certificate. The API is designed as a generic façade onto the triplestore allowing RDF data to be re-purposed and re-used pan BBC. This service orchestrates SPARQL queries and ensures that results are dynamically cached with a low 'time-to-live' (TTL) (1 minute) expiry cross data center using memcached.

All RDF metadata transactions sent to the API for CRUD operations are validated against associated ontologies before any persistence operations are invoked. This validation process ensures that RDF conforms to underlying ontologies and ensures data consistency. The validation libraries used include Jena Eyeball. The API also performs content transformations between the various flavors of RDF such as N3 or XML RDF. Example RDF views on the data include:

Automated XML sports stats feeds from various sources are delivered and processed by the BBC. These feeds are now also transformed into an RDF representation. The transformation process maps feed supplier ids onto corresponding ontology concepts and thus aligns external provider data with the RDF ontology representation with the triple store. Sports stats for matches, teams and players are aggregated inline and served dynamically from the persistent triple store.

The following "Frank Lampard" player page includes dynamic sports stats data served via SPARQL queries from the persistent triple store:

frank_595.jpg


The dynamic aggregation and publishing page-rendering layer is built using a Zend PHP and memcached stack. The PHP layer requests an RDF representation of a particular concept or concepts from the REST service layer based on the audience's URL request. If an "England Squad" page request is received by the PHP code several RDF queries will be invoked over HTTPs to the REST service layer below.

The render layer will then dynamically aggregate several asset types (stories, blogs, feeds, images, profiles and statistics) for a particular concept such as "England Squad". The resultant view and RDF is cached with a low TTL (1 minute) at the render layer for subsequent requests from the audience. The PHP layer dynamically renders views based on HTTP headers providing content negotiated HTML and/or RDF for each and every page.

To make use of the significant number of existing static news kit and architecture (apache servers, HTTP load balancers and gateway architecture) all HTTP responses are annotated with appropriate low (1 minute) cache expires headers. This HTTP caching increases the scalability of the platform and also allows content delivery network caching (CDN) if demand requires.

This dynamic semantic publishing architecture has been serving millions of page requests a day throughout the World Cup with continually changing OWL reasoned semantic RDF data. The platform currently serves an average of a million SPARQL queries a day with a peak RDF transaction rate of 100s of player statistics per minute. Cache expiry at all layers within the framework is 1 minute proving a dynamic, rapidly changing domain and statistic-driven user experience.

The development of this new high-performance dynamic semantic publishing stack is a great innovation for the BBC as we are the first to use this technology on such a high-profile site. It also puts us at the cutting edge of development for the next phase of the Internet, Web 3.0.

So what's next for the platform after the World Cup? There are many engaging expansion possibilities: such as extending the World Cup approach throughout the sport site; making BBC assets geographically 'aware' is another possibility; as is aligning news stories to BBC programs. This is all still to be decided, but one thing we are certain of is that this technological approach will play a key role in the creation, navigation and management of over 12,000 athletes and index pages for the London 2012 Olympics.


Jem Rayfield is Senior Technical Architect, BBC News and Knowledge. Read the previous post on the Internet blog that covers the BBC World Cup website, The World Cup and a call to action around Linked Data.


Metadata is data about data - it describes other data. In this instance, it provides information about the content of a digital asset. For example, a World Cup story may include metadata that describes which football players are mentioned within the text of a story. The metadata may also describe the associated team, group or organization associated to the story.

IBM LanguageWare Language and ontological linguistic platform.

RDF is based upon the idea of making statements about concepts/resources in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource; and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, to represent the notion "Frank Lampard plays for England" in RDF is as a triple, the subject is "Frank Lampard"; the predicate is "plays for" and the object is "England Squad".

SPARQL (pronounced "sparkle") is an RDF query language its name is a recursive acronym (i.e. an acronym that refers to itself) that stands for SPARQL Protocol and RDF Query Language.

BigOWLIM A high performance, scalable, resilient triplestore with robust OWL reasoning support

LOD The term Linked Open Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web.

JAVA Object-orientated programming language developed by Sun Microsystems.

Spring Rich JAVA framework for managing POJOs providing facilities such as inversion of control (ioc) and aspect orientated programming

Apache CXF JAVA Web services framework for JAX-WS and JAR-RS

JSR 311 Java standard specification API for RESTful web services.

Memcached Distributed memory caching system (deployed multi datacenter)

Jena Eyeball JAVA RDF validation library for checking ontological issues with RDF

N3 Shorthand textual representation of RDF designed with human readability in mind.

XML RDF XML representation of an RDF graph.

XML (Extensible Markup Language) is a set of rules for encoding documents and data in machine-readable form

Zend Open source scripting virtual machine for PHP, facilitating common programming patterns such as model view controller.

PHP Hypertext Preprocessor general-purpose dynamic web scripting language, use to create dynamic web pages.

CDN A content delivery network or content distribution network (CDN) is a collection of computers usually hosted within Internet Service Provider hosting facilities. The CDN servers cache local copies of content to maximize bandwidth and reduce requests to origin servers.

OWL Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies.

    Comments (1)

    Patrice Neff

    Patrice Neff Jul 13, 2010

    I like. I've been playing with semantic data and really like to hear that it can be used for real world scenarios. And the backend infrastructure with REST services, caching, etc. sounds similar to what we do at Memonic.

    Web Services are not Distributed Objects: Common

    Save

    Web Services are NOT Distributed Objects

    I wrote an article for IEEE Internet Computing about how there are still many misconceptions about the fundamentals of web services, titled 'Web Services are NOT Distributed Objects: Common Misconceptions about Service Oriented Architectures'. It deals with the confusion about web services and distributed objects, with web-services & RPC, that there are more than just HTTP bindings, the relation between web services and web servers, (un)reliable web services and about why debugging web services is hard but not impossible

    The article is still in 'draft mode', but should be relatively stable. I would very much appreciate feedback and comments, but keep in mind that the audience of the article is the general IT professional, not the web service specialist.


    UPDATE!!

    The text below was the first draft of this article. Many comments improved the article before it was published.

    The full and final text of this article is now available from this page.

    I am leaving the draft article online at this page to show the evolution of the article.


    Web Services are not Distributed Objects: Common Misconceptions about Service Oriented Architectures

    Werner Vogels
    Dept. of Computer Science, Cornell University
    vogels@cs.cornell.edu

    Web services are frequently described as the new incarnation of distributed object technology. This is a serious misconception, made by people from industry and academia alike, and this misconception seriously limits a broader acceptance of the true web services architecture. Even though the architects of distributed systems and internet systems alike have been vocal about the fact that these technologies hardly have any relationship, it appears to be difficult to dispel the myth that they are tied together. In this article I revisit the differences between web services and distributed objects in an attempt to make it clear that web services are an internet-style distributed systems technology that does not rely on, or require, any form of distributed object technology.

    Unfortunately, the mix-up about web services and distributed object systems is not the only misconception that is commonly heard. There are at least a dozen other popular statements about web services that are partially incorrect or just plain wrong. This article also contains clarifications of a number of these common misconceptions about web services.

    Misconceptions

    When I visited the WWW 2003 Conference Peter M. asked me

    “Don’t you think that web services will fail, just like all the other distributed object technologies people have tried to build?”

    Peter is a smart and gifted Internet architect, but this statement baffled me. How is it possible that someone like Peter still views web services as distributed object technology? Peter is not alone in his stubbornness in continuing to address web services as distributed objects. Many developers, architects, managers and academics still see web services as the next episode in the continued saga of distributed object technologies such as CORBA, DCOM and RMI. Web services and distributed objects systems are both distributed systems technologies, but that is where the common ground ends. They have no real relation to each other, except maybe for the fact that web services are now sometimes deployed in areas where in the past the application of distributed objects has failed. If we look for relationships within the distributed technology world it is probably more appropriate to associate web services with messaging technologies, as they share a common architectural view, but address different types of applications.

    Web services are based on XML documents and document exchange, and as such one could call the technological underpinning of web services document-oriented computing. Exchanging documents is a very different concept from requesting the instantiation of an object, requesting the invocation of a method on the specific object instance, receiving the result of that invocation back in a response, and after a number of these exchanges, releasing the object instance.

    This misconception does not stand by itself; there are about a dozen similar statements that I frequently encounter that fall more or less in the same category. Popular ones are: Web services is just RPC for the Internet, or You need HTTP to make web services work. Below I will try to address a number of the more popular misconceptions. First, however, I will try to establish what actually a web service is in its purest, minimalist form. I believe much of the confusion comes from press and vendor hype, which lacks the technical depth needed to make people understand the real concepts. Of course, the political bickering among standards bodies such as WC3, OASIS, and WS-I doesn’t help to clarify the simple, interoperable nature of web services.

    Minimal web services defined

    To realize why most of these issues are misconceptions, it is necessary to cut through all of the hype we have seen in the press and from vendors. If we bring back web services to a minimalist core there are three components that make up a web service:

    • The Service. This is a software component that is capable of processing an XML document it has received through some combination of transport and application protocols. How this software component is constructed, whether object-oriented techniques have been used, if it operates as a stand-alone process, is part of a web or application server, or is merely a thin layered front-end for a massive enterprise application is not of any importance. The only requirement for a Service Process is that it is capable of processing certain well-defined XML documents.
    • The Document.  The XML document that is sent to a service to be processed is the keystone of a web service, as it contains all the application-specific information. The documents a web service can process are described using an XML schema, and two processes that are engaged in a web services conversation need to have access to the same description to make sure that the documents that are exchanged can be validated and interpreted. This information is commonly described using the Web Services Description Language (WSDL).
    • The Address. Also called a port-reference, this is a protocol binding combined with a network address that can be used to access the service. This reference basically identifies where the service can be found when a particular protocol (e.g. TCP or HTTP) is used

    In principle this should be enough to build a web service, but in practice at least one more component is added to it:

    • The Envelope. This is a message encapsulation protocol that ensures that the XML document to be processed is clearly separated from other information the two communicating processes may want to exchange. This allows, for example, routing and security information to be added to the message without the need to modify the XML document. The protocol that is used for almost all web services is SOAP, which originally stood for “Simple Object Access Protocol.” This naming was a mistake as the protocol has nothing to do with accessing objects, and since the SOAP 1.2 specification [6] the protocol name is now used without expanding the acronym. The SOAP message itself, also called the ‘soap-envelope’, is XML and consists of two possible elements: a soap-header, in which all the system information is kept, and a soap-body, which contains the XML document that is to be processed by the web service.

    These 4 components are all that it takes to use a web service. Whether you use your text editor to construct a SOAP message to send in an email, or use an automatically generated proxy-client from within your favorite programming language, this is all that is needed to make it work.

    The document-oriented distributed computing world of web services is all about the design of the documents you want to exchange. Protocols and addresses are necessary only as glue to get the documents to the right places. Although web services are centered around documents, this does not mean these documents are targeted to be read by humans. The goal of web services is to enable machine-to-machine communication at the same scale and using the same style of protocols as human interface centered World Wide Web.

    Not a misconception: web services are really simple.

    At its core, web services technology is really simple; the only thing it does is use standard Internet protocols to move XML documents between service processes. This simplicity guarantees that its primary goal of interoperability can be achieved.

    The simplicity also means that many of the more complex distributed applications cannot be easily built without adding other technologies to the basic web services. Over time we will see that the issues the vendors are now bickering about, such as reliability, transactions and asynchronous processing, will become reality in an interoperable manner. The process around security extensions, for example, gives us reasonable hope that vendors are capable of reaching agreement on a set of interoperable primitives.

    On the other hand the process around reliable messaging has many of the distributed system specialists scared to death. In an attempt to preempt the release of the reliable messaging specification by IBM, Microsoft, BEA, and Tibco [3], a consortium lead by Sun Microsystems and Oracle published a reliable messaging specification [2] that was little more than a cut-and-paste effort from the reliability section of ebXML. This was clearly a specification that was released too early under vendor-political pressure, as the specification was ambiguous in many places, incomplete in others, and riddled with errors throughout the document. Any company implementing this specification would end up with a very unreliable system, and as such this specification is a disservice to the community.  If there is one threat to web services succeeding at large scale, it will be vendor politics.

    As I described in the previous section, document-oriented computing centers around the design of the document, the rest of the web service glue is just support technology to get the document to the right place in the right manner. In contrast with the simplicity of the basic web service technology, the documents can be extremely rich and complex. For example, a web services system I have worked on for the US Air Force publishes flight plans that can easily be up to a megabyte in size. Encoding these rich documents in XML ensures that the documents are extensible at predefined places without breaking any of the existing document consumers.

    Misconception #1: Web services are just like distributed objects

    Given the strong similarities between web services and distributed objects, it is understandable why the misconception exists that they are the same thing. After all, both have some sort of description language, both have well-defined network interactions, and both have a similar mechanism for registering and discovering available components. What contributes to the misconception that these are similar technologies is that many tool vendors provide simple object-oriented techniques for implementing web services, which give them the appearance of distributed objects. A number of these vendors have a long history in selling distributed object technology and as such have a strong interest in molding web services such that they appear to be a next step in the evolution of distributed object systems.

    A first thing to realize however is that the current state of web services technology is very limited compared to distributed object systems. The latter is a well-established technology with very broad support, strong reliability guarantees, and many, many support tools and technologies. For example, web services toolkit vendors have only just started to look at the reliability and transactional guarantees that distributed object systems have supported for years.

    An important aspect at the core of distributed object technology is the notion of the object life cycle: objects are instantiated by a factory upon request, a number of operations are performed on the object instance, and sometime later the instance will be released or garbage collected. A special case is the singleton object, which does not go through the instantiate/release cycle. But in both cases the object is identified through a reference, and this reference can be passed around between processes to provide a unique mechanism to access the object. Objects frequently contain references to other objects, and distributed object technology comes with extensive reference management techniques to support correct object lifetime management.

    This notion of object reference is essential; without it there is no distributed object system. It is also important to realize that with this reference the caller has a mechanism to return to the same object over and over again and as such access the same state. Distributed objects systems enable state-full distributed computing. The state of the object is access through a well-defined interface that is described typically in an interface definition language (IDL).

    Web services have none of the characteristics of distributed object systems. There is no notion of an object, object reference, factories or life cycle. There is no notion of an interface with methods, data structure serialization or reference garbage collection. The only technology web services have is XML documents and document encapsulation.

    With a bit of a stretch, one could force an analogy between a web service and a singleton object. However, such a singleton object would need to be very restrictive to make the comparison work. At the basic level web services cannot offer any of the state-full distributed computing facilities that distributed objects systems support as basic functionality.

    The difference between the two technologies is also obvious when we look at how information flows between client and server or producer and consumer. In the distributed object system the richness of the information flow is encapsulated in the interfaces an object supports, but in a web services system, the richness of the information flow comes from the design of the XML documents that are passed around.

    Another important difference between these two technologies is in the style of distributed computing that they enable. Distributed object systems enable what is often called statefull computing; the remote object on the server can contain data and state that the client can operate on during the lifetime of the object. If a reference to an object is handed to different application, that process will encounter the same state when accessing the referenced object. Web services however have no notion of state, and they fall into the category of distributed system techniques that enable stateless computing. In web services the state of the interaction is contained within the documents that are exchanged. Whether a service is ever able to be truly stateless is disputable; if a web service document includes a customer identification number, which the service then uses to retrieve customer information (state) from a database, does this still constitutes stateless-ness?  Identifying stateless versus stateful distributed components should be seen as a way of categorizing technologies, more than strict architectural guidance. In the context of this categorization distributed objects and web services are in opposite camps.

    At the basic level web services have no notion of a relationship between two service invocations at the same service or at related services. The distributed systems that can be built without identifying relationships between components in a computation are very limited and as such one of the first advanced web service specifications that was released dealt with Coordination [1]. This enables multiple services and their consumers to establish a context for their interaction. It is a misconception to see this context as a weak form of object references, as it references an ongoing conversation and does not reference any state at the services.

    Distributed object technology is very mature and robust, especially if you restrict its usage to those environments which it has been designed for: the corporate intranet with often homogenous platforms and predictable latencies. The strength of web services is in the internet-style distributed computing, where interoperability and support for heterogeneity in terms of platforms and networks are essential. Over time web services will need to incorporate some of the basic distributed systems technologies that also underpin distributed object systems, such as guaranteed, in-order, exactly-once message delivery. It is unlikely however that web services can simply adapt the technology used in the distributed object systems to achieve the same properties.

    There are two known approaches in which web services and distributed object technologies can work together. First, there is the approach of wrapping certain objects from an object system, such as J2EE, with a web service. This has of course its limitations and cannot be done for just any object. See Steve Vinoski’s article on interaction models [5] to learn more about this approach. A second approach that can be observed is to use web service protocols such as SOAP as the transport layer for the distributed object system. This is sometimes used to tunnel object specific interactions over HTTP. It is however a poor man’s choice as alternative solutions such as GIOP are better suited for that interaction pattern.

    Misconception #2 Web services is RPC for the Internet

    RPC provides a network abstraction for the remote execution of procedure calls in a programming language. It provides mechanisms for identifying a remote procedure, for deciding which arguments to the procedure are ‘in’ arguments and as such need to be provide to the remote procedure at invocation time, and which arguments are ‘out’ arguments and need to be presented to the caller at completion time. It also includes extensive mechanisms for handling errors both at the runtime and the programming level.

    Web services in their basic form provide only a networking abstraction for the transfer of XML documents, and the processing of these documents by a remote service entity. Web services have a notion of ‘actor’ or ‘role’ that identifies the service that should consume the document, but there are no predefined semantics associated with the content of the XML document sent to the service.

    An RPC-style interaction could be implemented using pairs of SOAP messages and a transport such as HTTP. One would use certain fixed rules for encoding the arguments in an XML document and rules for returning the results to the caller.

    The original web service architects assumed that this would be a popular form of using web services and even included a specific encoding in the SOAP specification called RPC/encoded to help with the encoding of data types. However in the SOAP 1.2 specification this encoding has become optional and tool builders are no longer required to implement it, and preference is given to the document/literal encoding.

    Even though we like to look at web services as just XML document processors, this doesn’t help the developers that need to build web services and web service clients. Tool vendors will do their best to provide infrastructure that allow traditional procedure calls to be applied to simple web services. For example Microsoft’s Web Service Enhancements 2.0 toolkit provides a set of object types that can be used to implement a request/response style interaction, where the programming infrastructure tries to interpret the document for the programmer. The toolkit also provides the programmer with a similar set of types that provides the programmer with simple but powerful support to receive the raw XML documents.

    Internet-wide RPC has failed to succeed in the past, and web services are not going to be of much help in solving the issues surrounding wide-area RPC. There is no magic in the web services infrastructure that can suddenly overcome what excellent protocol architects were not able to achieve with DCE/RPC or GIOP. Even though web services may solve some of the interoperability issues, it does not solve for example the issue that synchronous interaction over wide-area is not scalable or that versioning procedure interface at large scale is extremely difficult.

    Misconception #3: Web Services need HTTP

    Web services are transport-agnostic, meaning that they can be accessed over any type of transport or application protocol. The SOAP protocol, which describes the web service message format, can be used such that messages are transported over HTTP, but can also be used such that messages go over plain TCP and UDP. There are bindings where the messages flow over SMTP by encapsulating SOAP message in an e-mail message, or over a traditional messaging infrastructure such as MQ-Series or JMS. A core scenario of the web services architecture is the case where a message flows over different transport types before it reaches its destination.

    For example a SOAP request is delivered to an enterprise gateway using HTTP. The gateway then uses a load balancing mechanism to pick one of nodes of a server farm to process this request and uses a persistent TCP connection to forward the incoming document. In another case a purchase order encapsulated in a SOAP message is delivered using an e-mail message addressed to order-processing@cheapcomputers.com over an SMTP transport. The receiving server will take the soap content and encapsulate it in a JMS message and insert it into the order processing workflow system, which might be based on traditional message queuing. The service that actually consumes the SOAP requests may not be determined until the message has visited a few intermediate processors that determine whether this is a request that is entitled to ‘gold’ priority treatment and some auditing has taken place. Eventually the requesting process (remember web services are intended for computer-to-computer conversations, no humans involved), will receive an email message with a confirmation or rejection of the order.

    Even though the web service architecture has been developed with this transport independence in mind, it is true that the majority of the web services that are in use at this moment run over HTTP. One of the reasons for this is that most of the early web services toolkits made use of the existing infrastructure that the major web servers Apache, IBM WebSphere and Microsoft IIS offered. Leaving the parsing of requests and dispatching of messages to the web server, it was possible to abstract all of the grind of web services away using web-server add-ons such as Axis or ASP.NET. These extensions will automatically generate the WSDL for the web service and provide an simple service exercise tools, making it a great environment for prototyping and learning web services.

    A second reason for the popularity of implementing web services using HTTP is more strategic. In contrast to the period of the dot.com boom, most enterprise software projects currently require a short-term return on investment. This forces most of the production web service projects to focus on improving the access to the corporate data and services for partners and customers, without requiring too much new infrastructure. The first place this is possible is by using the web servers that are already functioning as front ends to J2EE infrastructure. This approach has become rather successful and should be seen as the first step in the path to a deeper integration of web services in the enterprise.

    There are people suggesting that the main reason for tunneling web service messages through HTTP is to bypass firewalls. If this would indeed be the reason than it would be a dangerous approach that would seriously weaken a site’s security, and one should only do this in combination with extensive content-based filtering techniques of the HTTP flows.

    Misconception #4: Web services need web servers

    There has been some discussion that maybe we should actually drop the ‘web’ from web services, as it leads to more confusion and does not contribute to clear view of the world. This is already becoming obvious in such terms as service-oriented architectures, service-oriented integration, or services bus. None of these enterprise concepts use the term ‘web’, as they are not relying on any web technologies such as HTTP, or web servers.

    There are quite a few toolkits that allow you to develop and integrate web services without the need for a web server infrastructure. Examples are Simon Fell’s PocketSoap, Systinet’s WASP,  IBM’s Emerging Technologies Toolkit and Microsoft’s WSE. Enterprise integration systems such as Artix and DocSOAP, also provide web-server independent web service development.

    As explained in the previous section, there has been an initial set of web services that have exploited the application-server functionality of web servers. But now that the initial business case has been made, and wider choice of transports is required, most systems will move away from implementation inside web servers.

    In the past months a high-profile debate has taken place about applying the principles of REST to web services architectures. REST encompasses some of the techniques that make web infrastructures scalable. There is a lot of value in this debate about the web principles, and particularly with respect to resource identification and operation visibility, but it is becoming quickly irrelevant for the bigger picture of web services, given that transport independence is surpasing the importance of the ‘web’ part of web services. The REST principles are relevant for the HTTP binding, and for the web server parsing of resource names, but are useless in the context of TCP or message queue bindings where the HTTP verbs do not apply.

    Misconception #5: Web services are reliable because they use TCP

    TCP is a protocol that guarantees reliable, in-order delivery of messages, so it would appear that web services, if they make use of TCP, can achieve the same guarantees. First of all the guarantee of reliability is only partially true for TCP programming, as there a few scenarios under which a message cannot be completely delivered to the remote peer, and the local participant has already closed the connection and will not be notified of this error.

    What is more important to realize is that document and message routing for web services provides for the use of intermediaries. In the presence of network, node, and component failures there are quite a few scenarios possible under which the initial delivery of the document to the first station was successful, but where the document will never reach its final destination and thus never gets processed by the service.

    The type of reliability that is important for web services and distributed systems in general is that of end-to-end reliability.  There are a lot of established techniques in achieving this type of reliability and we will see in the coming year whether they can also be simply applied to web services or whether new technology is needed. In general reliability is achieved through the retransmission of messages, but these retransmissions also require you to weed out duplicate messages in case the message was not really lost. Estimating timeouts, etc. in a heterogeneous network such as the internet is not trivial.

    Frequently when you build reliable distributed systems you would like to let some information flow back about the state of the service request processing, such that the producer of the document can take local actions. Giving feedback about the arrival of the document, about the consumption of the document by the service, and the completion of the processing of the request makes building these systems easier.

    In addition to reliability we would also like to make sure that if the producers care about it, their messages will be consumed by the service in the order they were sent. This put more stress on the reliability system, because if messages get lost other messages may need to be delayed until the lost message is retransmitted.

    None of these guarantees are new; they have been around for years and have been made to work in all sorts of distributed systems, such as distributed objects systems and multi-party fault-tolerant systems. Web services will need these technologies also, but until they’re added web services should be considered unreliable, whether they use TCP or not.

    Misconceptions #6: Web services debugging is impossible

    As web services enable the internet-scale type of distributed computing, where frequently the parties in the conversation will belong to different organizations, web service developers and those who have to deploy them are confronted with a whole new set of problems that cannot be handled with traditional debugging and monitoring tools. The federated nature of web services means that most of these new challenges are introduced by not 'owning' both ends of the wire.  Two of the most prominent challenges facing users are cross-vendor interoperability and WSDL versioning.

    Even though traditional tools are of little help with these problems, new web services diagnostic tools such as SOAPscope [4] are emerging to address the development and deployment challenges of web services.  SOAPscope is unique in that it focuses on 'watching' the wire, logging the traffic and providing a suite of functionality to detect and resolve these federation-related and other potential problems.

    The wide variety of web services toolkits being used to develop both web service clients and servers means that it is becoming more common that different toolkits operate at each end of a SOAP interaction. Each of these toolkits may have interpreted the specification somewhat differently leading to potential interoperability problems.  When a client encounters an obscure error from a server, how does the developer diagnose the problem when they have no access to the code running at the server? The solution available to the developer is to focus the SOAP traffic on the wire and the WSDL contract between services.

    Tools such as SOAPscope offer several capabilities to help understand and fix interoperability problems.  SOAPscope has for example 'resend' and 'invoke' features allow testing 'what-if' scenarios against a server to isolate problem requests. The 'viewing' capabilities allow better understanding of the SOAP messages by visualizing the request at higher-levels of abstract than raw XML.  And, to maximize interoperability, the WSDL Analysis detects and helps resolve potential interoperability problems prior to deploying a service.

    Another challenge of Web services, which will become increasingly common, is caused by change and versioning of Web services.  A small change to the XML Document specification in the WSDL contract at the server can easily break existing clients.  Clients may not even be aware the document specification has changed or how to fix their client to accommodate the change. A Web service client may start to receive 'faults' from the server which indicate a problem but are seldom useful to resolving the issue. Tools such as SOAPscope can inspect the XML document specification in the current WSDL and compare those with the specification used to create the client.

    These new debugging and deployment tools that use historical data next to a real-time view of the web service interaction provide extremely powerful tools for the web service developers.

    Summary

    There are many misconceptions about web services technology. This is mainly caused by the fact that web service technology is still evolving, even at the most basic level. Many vendors, trade magazines and venture capitalists have already tagged web services as a technology that will trigger a new wave of applications, enabled by federated interoperability. This early exposure has resulted in many incomplete and incorrect publications, frequent releases of toolkits with little or no architectural vision, and different standardization bodies fighting for the right to control the standards under pinning web services. Add to this that many of the vendors who jumped on board to promote web services, have a vested stake in web and applications servers and/or distributed object technologies, and promote web services only in the context of their flagship technologies.

    This has become a fertile ground for many misconceptions. In this article I hope to have clarified a few of those common misconceptions that are important for those who have to reasons about web services at architectural level. It is important that we invest significant effort in education about the unique nature of web services to undo the damage that some of the hype reporting has created.

    References

    1. Felipe Cabrera, et al., Web Services Coordination (WS-Coordination), Joint Specification by BEA, IBM and Microsoft, August 2002, http://www-106.ibm.com/developerworks/library/ws-coor/
    2. Colleen Evans, et al., Web Services Reliability (WS-Reliability) Ver 1.0, Joint Specification by Fujitsu, NEC, Oracle, Sonic Software, and Sun Microsystems, January 2003, http://developers.sun.com/sw/platform/technologies/ws-reliability.html
    3. Christopher Ferris and David Langworthy, editors, Web Services Reliable Messaging Protocol (WS-ReliableMessaging), Joint Specification by BEA, IBM, Microsoft and Tibco  March 2003, http://www-106.ibm.com/developerworks/webservices/library/ws-rm/
    4. Mindreef – Web Services Diagnostics, http://www.mindreef.com
    5. Steve Vinoski, Web Services Interaction Models, Part 1: Current Practice, IEEE Internet Computing,  Vol. 6 No 3, pp 89-91, May/June 2002
    6. World Wide Web Consortium, SOAP Version 1.2 Part 0: Primer, June 2003 http://www.w3.org/TR/soap12-part0/
    Posted by Werner Vogels at August 26, 2003 02:26 PM

    Amazon's SOA strategy: 'just do it'

    Save

    June 26th, 2006

    Amazon's SOA strategy: 'just do it'

    Posted by Joe McKendrick @ 7:55 am

    Categories: Business ROI, Case Studies, General, Vendor Watch, Web Services

    Tags:

    At last week’s  Gartner Enterprise Architecture Summit, Werner Vogels, vice president, worldwide architecture and CTO at Amazon.com, provided some sage advice for SOA and Web service implementers: Detailed advanced planning is nice, but your best bet is to just get out there and start doing it. And keep it simple — very simple.

    This may sound breezy for an operation supporting one million partners and 60 million customers, but thriving at such levels calls for as much flexibility as possible. In Vogels’ keynote, summarized here in SearchWebServices, the Amazon CTO said the online retail giant actually purchased a mainframe to handle transaction loads in 1999, a move he soon regretted. Vogels said the mainframe did not provide the scalability and flexibility required for its growing transaction volumes. (Of course, that was 1999, and the current line of zSeries mainframes are better configured for SOA — but that’s the subject of another post.)

    The solution Amazon arrived at within the next couple of years was to build Web services that could form a transaction layer to handle online business applications, while shielding the retailer’s databases. "We were doing SOA before it was a buzzword," Vogels said.

    "Service orientation works," he said. "We never could have built [Amazon's Linux blade server] platform without service orientation."

    Amazon’s management style is that when a development team creates a Web service, that team is responsible for the testing, ongoing maintenance, and upgrading of that service. Vogels’ philosophy is "you build it; you own it."

    Vogels said he also admonishes Amazon developers to keep services as simple as possible, and don’t get attached to any one technology or standard. And, though Amazon is known for its REST-based services, Vogels sidesteps any simmering controversy. It doesn’t matter if a partner uses REST or SOAP, he pointed out. "Our developers don’t care if it’s REST or SOAP. It’s all about customers," he said.

    Amazon Architecture

    Save

    Amazon Architecture

    This is a wonderfully informative Amazon update based on Joachim Rohde's discovery of an interview with Amazon's CTO. You'll learn about how Amazon organizes their teams around services, the CAP theorem of building scalable systems, how they deploy software, and a lot more. Many new additions from the ACM Queue article have also been included.

    Amazon grew from a tiny online bookstore to one of the largest stores on earth. They did it while pioneering new and interesting ways to rate, review, and recommend products. Greg Linden shared is version of Amazon's birth pangs in a series of blog articles

    Site: http://amazon.com

    Information Sources

    • Early Amazon by Greg Linden
    • How Linux saved Amazon millions
    • Interview Werner Vogels - Amazon's CTO
    • Asynchronous Architectures - a nice summary of Werner Vogels' talk by Chris Loosley
    • Learning from the Amazon technology platform - A Conversation with Werner Vogels
    • Werner Vogels' Weblog - building scalable and robust distributed systems

      Platform

    • Linux
    • Oracle
    • C++
    • Perl
    • Mason
    • Java
    • Jboss
    • Servlets

      The Stats

    • More than 55 million active customer accounts.
    • More than 1 million active retail partners worldwide.
    • Between 100-150 services are accessed to build a page.

      The Architecture

    • What is it that we really mean by scalability? A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added. Increasing performance in general means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.

    • The big architectural change that Amazon made was to move from a two-tier monolith to a fully-distributed, decentralized, services platform serving many different applications.
    • Started as one application talking to a back end. Written in C++.
    • It grew. For years the scaling efforts at Amazon focused on making the back-end databases scale to hold more items, more customers, more orders, and to support multiple international sites. In 2001 it became clear that the front-end application couldn't scale anymore. The databases were split into small parts and around each part and created a services interface that was the only way to access the data.
    • The databases became a shared resource that made it hard to scale-out the overall business. The front-end and back-end processes were restricted in their evolution because they were shared by many different teams and processes.
    • Their architecture is loosely coupled and built around services. A service-oriented architecture gave them the isolation that would allow building many software components rapidly and independently.
    • Grew into hundreds of services and a number of application servers that aggregate the information from the services. The application that renders the Amazon.com Web pages is one such application server. So are the applications that serve the Web-services interface, the customer service application, and the seller interface.
    • Many third party technologies are hard to scale to Amazon size. Especially communication infrastructure technologies. They work well up to a certain scale and then fail. So they are forced to build their own.
    • Not stuck with one particular approach. Some places they use jboss/java, but they use only servlets, not the rest of the J2EE stack.
    • C++ is uses to process requests. Perl/Mason is used to build content.
    • Amazon doesn't like middleware because it tends to be framework and not a tool. If you use a middleware package you get lock-in around the software patterns they have chosen. You'll only be able to use their software. So if you want to use different packages you won't be able to. You're stuck. One event loop for messaging, data persistence,
      AJAX, etc. Too complex. If middleware was available in smaller components, more as a tool than a framework, they would be more interested.
    • The SOAP web stack seems to want to solve all the same distributed systems problems all over again.
    • Offer both SOAP and REST web services. 30% use SOAP. These tend to be Java and .NET users and use WSDL files to generate remote object interfaces. 70% use REST. These tend to be PHP or PERL users.
    • In either SOAP or REST developers can get an object interface to Amazon. Developers just want to get job done. They don't care what goes over the wire.
    • Amazon wanted to build an open community around their services. Web services were chosed because it's simple. But hat's only on the perimeter. Internally it's a service oriented architecture. You can only access the data via the interface. It's described in WSDL, but they use their own encapsulation and transport mechanisms.
    • Teams are Small and are Organized Around Services
      - Services are the independent units delivering functionality within Amazon. It's also how Amazon is organized internally in terms of teams.
      - If you have a new business idea or problem you want to solve you form a team. Limit the team to 8-10 people because communication hard. They are called two pizza teams. The number of people you can feed off two pizzas.
      - Teams are small. They are assigned authority and empowered to solve a problem as a service in anyway they see fit.
      - As an example, they created a team to find phrases within a book that are unique to the text. This team built a separate service interface for that feature and they had authority to do what they needed.
      - Extensive A/B testing is used to integrate a new service . They see what the impact is and take extensive measurements.
    • Deployment
      - They create special infrastructure for managing dependencies and doing a deployment.
      - Goal is to have all right services to be deployed on a box. All application code, monitoring, licensing, etc should be on a box.
      - Everyone has a home grown system to solve these problems.
      - Output of deployment process is a virtual machine. You can use EC2 to run them.
    • Work From the Customer Backwards to Verify a New Service is Worth Doing
      - Work from the customer backward. Focus on value you want to deliver
      for the customer.
      - Force developers to focus on value delivered to the customer instead of building technology first and then figuring how to use it.
      - Start with a press release of what features the user will see and work backwards to check that you are building something valuable.
      - End up with a design that is as minimal as possible. Simplicity is the key if you really want to build large distributed systems.
    • State Management is the Core Problem for Large Scale Systems
      - Internally they can deliver infinite storage.
      - Not all that many operations are stateful. Checkout steps are stateful.
      - Most recent clicked web page service has recommendations based on session IDs.
      - They keep track of everything anyway so it's not a matter of keeping state. There's little separate state that needs to be kept for a session. The services will already be keeping the information so you just use the services.
    • Eric Brewer's CAP Theorem or the Three properties of Systems
      - Three properties of a system: consistency, availability, tolerance to network partitions.
      - You can have at most two of these three properties for any shared-data system.
      - Partitionability: divide nodes into small groups that can see other groups, but they can't see everyone.
      - Consistency: write a value and then you read the value you get the same value back. In a partitioned system there are windows where that's not true.
      - Availability: may not always be able to write or read. The system will say you can't write because it wants to keep the system consistent.
      - To scale you have to partition, so you are left with choosing either high consistency or high availability for a particular system. You must find the right overlap of availability and consistency.
      - Choose a specific approach based on the needs of the service.
      - For the checkout process you always want to honor requests to add items to a shopping cart because it's revenue producing. In this case you choose high availability. Errors are hidden from the customer and sorted out later.
      - When a customer submits an order you favor consistency because several services--credit card processing, shipping and handling, reporting--are simultaneously accessing the data.

      Lessons Learned

    • You must change your mentality to build really scalable systems. Approach chaos in a probabilistic sense that things will work well. In traditional systems we present a perfect world where nothing goes down and then we build complex algorithms (agreement technologies) on this perfect world. Instead, take it for granted stuff fails, that's
      reality, embrace it. For example, go more with a fast reboot and fast recover approach. With a decent spread of data and services you might get close to 100%. Create self-healing, self-organizing lights out operations.

    • Create a shared nothing infrastructure. Infrastructure can become a shared resource for development and deployment with the same downsides as shared resources in your logic and data tiers. It can cause locking and blocking and dead lock. A service oriented architecture allows the creation of a parallel and isolated development process that scales feature development to match your growth.

    • Open up you system with APIs and you'll create an ecosystem around your application.

    • Only way to manage as large distributed system is to keep things as simple as possible. Keep things simple by making sure there are no hidden requirements and hidden dependencies in the design. Cut technology to the minimum you need to solve the problem you have. It doesn't help the company to create artificial and unneeded layers of complexity.

    • Organizing around services gives agility. You can do things in parallel is because the output is a service. This allows fast time to market. Create an infrastructure that allows services to be built very fast.

    • There's bound to be problems with anything that produces hype before real implementation

    • Use SLAs internally to manage services.

    • Anyone can very quickly add web services to their product. Just implement one part of your product as a service and start using it.

    • Build your own infrastructure for performance, reliability, and cost control reasons. By building it yourself you never have to say you went down because it was company X's fault. Your software may not be more reliable than others, but you can fix, debug, and deployment much quicker than when working with a 3rd party.

    • Use measurement and objective debate to separate the good from the bad. I've been to several presentations by ex-Amazoners and this is the aspect of Amazon that strikes me as uniquely different and interesting from other companies. Their deep seated ethic is to expose real customers to a choice and see which one works best and to make decisions based on those tests.

      Avinash Kaushik calls this getting rid of the influence of the HiPPO's, the highest paid people in the room. This is done with techniques like A/B testing and Web Analytics. If you have a question about what you should do code it up, let people use it, and see which alternative gives you the results you want.

    • Create a frugal culture. Amazon used doors for desks, for example.

    • Know what you need. Amazon has a bad experience with an early recommender system that didn't work out: "This wasn't what Amazon needed. Book recommendations at Amazon needed to work from sparse data, just a few ratings or purchases. It needed to be fast. The system needed to scale to massive numbers of customers and a huge catalog. And it needed to enhance discovery, surfacing books from deep in the catalog that readers wouldn't find on their own."

    • People's side projects, the one's they follow because they are interested, are often ones where you get the most value and innovation. Never underestimate the power of wandering where you are most interested.

    • Involve everyone in making dog food. Go out into the warehouse and pack books during the Christmas rush. That's teamwork.

    • Create a staging site where you can run thorough tests before releasing into the wild.

    • A robust, clustered, replicated, distributed file system is perfect for read-only data used by the web servers.

    • Have a way to rollback if an update doesn't work. Write the tools if necessary.

    • Switch to a deep services-based architecture (http://webservices.sys-con.com/read/262024.htm).

    • Look for three things in interviews: enthusiasm, creativity, competence. The single biggest predictor of success at Amazon.com was enthusiasm.

    • Hire a Bob. Someone who knows their stuff, has incredible debugging skills and system knowledge, and most importantly, has the stones to tackle the worst high pressure problems imaginable by just leaping in.

    • Innovation can only come from the bottom. Those closest to the problem are in the best position to solve it. any organization that depends on innovation must embrace chaos. Loyalty and obedience are not your tools.

    • Creativity must flow from everywhere.

    • Everyone must be able to experiment, learn, and iterate. Position, obedience, and tradition should hold no power. For innovation to flourish, measurement must rule.

    • Embrace innovation. In front of the whole company, Jeff Bezos would give an old Nike shoe as "Just do it" award to those who innovated.

    • Don't pay for performance. Give good perks and high pay, but keep it flat. Recognize exceptional work in other ways. Merit pay sounds good but is almost impossible to do fairly in large organizations. Use non-monetary awards, like an old shoe. It's a way of saying thank you, somebody cared.

    • Get big fast. The big guys like Barnes and Nobel are on your tail. Amazon wasn't even the first, second, or even third book store on the web, but their vision and drive won out in the end.

    • In the data center, only 30 percent of the staff time spent on infrastructure issues related to value creation, with the remaining 70 percent devoted to dealing with the "heavy lifting" of hardware procurement, software management, load balancing, maintenance, scalability challenges and so on.

    • Prohibit direct database access by clients. This means you can make you service scale and be more reliable without involving your clients. This is much like Google's ability to independently distribute improvements in their stack to the benefit of all applications.

    • Create a single unified service-access mechanism. This allows for the easy aggregation of services, decentralized request routing, distributed request tracking, and other advanced infrastructure techniques.

    • Making Amazon.com available through a Web services interface to any developer in the world free of charge has also been a major success because it has driven so much innovation that they couldn't have thought of or built on their own.

    • Developers themselves know best which tools make them most productive and which tools are right for the job.

    • Don't impose too many constraints on engineers. Provide incentives for some things, such as integration with the monitoring system and other infrastructure tools. But for the rest, allow teams to function as independently as possible.

    • Developers are like artists; they produce their best work if they have the freedom to do so, but they need good tools. Have many support tools that are of a self-help nature. Support an environment around the service development that never gets in the way of the development itself.

    • You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.

    • Developers should spend some time with customer service every two years. Their they'll actually listen to customer service calls, answer customer service e-mails, and really understand the impact of the kinds of things they do as technologists.

    • Use a "voice of the customer," which is a realistic story from a customer about some specific part of your site's experience. This helps managers and engineers connect with the fact that we build these technologies for real people. Customer service statistics are an early indicator if you are doing something wrong, or what the real pain points are for your customers.

    • Infrastructure for Amazon, like for Google, is a huge competitive advantage. They can build very complex applications out of primitive services that are by themselves relatively simple. They can scale their operation independently, maintain unparalleled system availability, and introduce new services quickly without the need for massive reconfiguration.

    Resty

    Save

    Resty

    Resty is a tiny script wrapper for curl. It provides a simple, concise shell interface for interacting with REST services. Since it is implemented as functions in your own shell and not in its own command environment you have access to all the powerful shell tools, such as perl, awk, grep, sed, etc. You can use resty in pipelines to process data from REST services, and PUT or POST the data right back. You can even pipe the data in and then edit it interactively in your text editor prior to PUT or POST.

    Cookies are supported automatically and stored in a file locally. Most of the arguments are remembered from one call to the next to save typing. It has pretty good defaults for most purposes. Additionally, resty allows you to easily provide your own options to be passed directly to curl, so even the most complex requests can be accomplished with the minimum amount of command line pain.

    Quick Start

    You have curl, right? Okay.

      curl http://github.com/micha/resty/raw/master/resty > resty
    
    

    Source the script before using it. (You can put this line in your ~/.bashrc file if you want, or just paste the contents of the resty script right in there. Either way works.)

      . resty
    
    

    Set the REST host to which you will be making your requests (you can do this whenever you want to change hosts, anytime).

      resty http://127.0.0.1:8080/data
    
    

    Make some HTTP requests.

      GET /blogs.json
      PUT /blogs/2.json '{"title" : "updated post", "body" : "This is the new."}'
      DELETE /blogs/2
      POST /blogs.json '{"title" : "new post", "body" : "This is the new new."}'
    
    

    Usage

      resty                                   # prints current request URI base
      resty <remote>                          # sets the base request URI
      GET [path] [-Z] [curl opts]             # does the GET request 
      DELETE [path] [-Z] [curl opts]          # does DELETE request 
      PUT [path] [data|-V] [-Z] [curl opts]   # does PUT request
      POST [path] [data|-V] [-Z] [curl opts]  # does POST request
    
      Options:
    
      -V            Edit the input data interactively in 'vi'. (PUT and POST
                    requests only, with data piped to stdin.)
      -Z            Raw output. This disables any processing of HTML in the
                    response.
    
    

    Request URI Base

    The request URI base is what the eventual URI to which the requests will be made is based on. Specifically, it is a URI that may contain the * character one or more times. The * will be replaced with the path parameter in the GET, POST, PUT, or DELETE request as described above.

    For example:

      resty 'http://127.0.0.1:8080/data*.json'
    
    

    and then

      GET /5
    
    

    would result in a GET request to the URI http://127.0.0.1:8080/data/5.json.

    If no * character is specified when setting the base URI, it's just added onto the end for you automatically.

    URI Base History

    The URI base is saved to an rc file (~/.resty/host) each time it's set, and the last setting is saved in an environment variable ($_resty_host). The URI base is read from the rc file when resty starts up, but only if the $_resty_host environment variable is not set. In this way you can make requests to different hosts using resty from separate terminals, and have a different URI base for each terminal.

    If you want to see what the current URI base is, just run resty with no arguments. The URI base will be printed to stdout.

    The Optional Path Parameter

    The HTTP verbs (GET, POST, PUT, and DELETE) first argument is always an optional URI path. This path must always start with a / character. If the path parameter is not provided on the command line, resty will just use the last path it was provided with. This "last path" is stored in an environment variable ($_resty_path), so each terminal basically has its own "last path".

    URL Encoding Of Path Parameter

    Resty will always [URL encode] (http://www.blooberry.com/indexdot/html/topics/urlencoding.htm) the path, except for slashes. (Slashes in path elements need to be manually encoded as %2F.) This means that the ?, =, and & characters will be encoded, as well as some other problematic characters. See the query string howto below for the way to send query parameters in GET requests.

    POST/PUT Requests and Data

    Normally you would probably want to provide the request body data right on the command line like this:

      PUT /blogs/5.json '{"title" : "hello", "body" : "this is it"}'
    
    

    But sometimes you will want to send the request body from a file instead. To do that you pipe in the contents of the file:

      PUT /blogs/5.json < /tmp/t
    
    

    Or you can pipe the data from another program, like this:

      myprog | PUT /blogs/5.json
    
    

    Or, interestingly, as a filter pipeline with jsawk:

      GET /blogs/5.json | jsawk 'this.author="Bob Smith";this.tags.push("news")' | PUT
    
    

    Notice how the path argument is omitted from the PUT command.

    Edit PUT/POST Data In Vi

    With the -V options you can pipe data into PUT or POST, edit it in vi, save the data (using :wq in vi, as normal) and the resulting data is then PUT or POSTed. This is similar to the way visudo works, for example.

      GET /blogs/2 | PUT -V
    
    

    This fetches the data and lets you edit it, and then does a PUT on the resource. If you don't like vi you can specify your preferred editor by setting the EDITOR environment variable.

    Errors and Output

    For successful 2xx responses, the response body is printed on stdout. You can pipe the output to stuff, process it, and then pipe it back to resty, if you want.

    For responses other than 2xx the response body is dumped to stderr.

    In either case, if the content type of the response is text/html, then resty will try to process the response through either lynx, html2text, or, finally, cat, depending on which of those programs are available on your system.

    Raw Output (-Z option)

    If you don't want resty to process the output through lynx or html2text you can use the -Z option, and get the raw output.

    Passing Command Line Options To Curl

    Anything after the (optional) path and data arguments is passed on to curl.

    For example:

      GET /blogs.json -H "Range: items=1-10"
    
    

    The -H "Range: items=1-10" argument will be passed to curl for you. This makes it possible to do some more complex operations when necessary.

      POST -v -u user:test
    
    

    In this example the path and data arguments were left off, but -v and -u user:test will be passed through to curl, as you would expect.

    Here are some useful options to try:

    • -v verbose output, shows HTTP headers and status on stderr
    • -j junk session cookies (refresh cookie-based session)
    • -u <username:password> HTTP basic authentication
    • -H <header> add request header (this option can be added more than once)
    • -d/-G send query string parameters with a GET request (see below)

    Query Strings For GET Requests

    Since the path parameter is URL encoded, the best way to send query parameters in GET requests is by using curl's commnand line arguments. For example, to make a GET request to /Something?foo=bar&baz=baf you would do:

      GET /Something -d foo=bar -d baz=baf -G
    
    

    This sends the name/value pairs specified with the -d options as a query string in the URL.

    Per-Host/Per-Method Curl Configuration Files

    Resty supports a per-host/per-method configuration file to help you with frequently used curl options. Each host (including the port) can have its own configuration file in the ~/.resty directory. The file format is

      GET [arg] [arg] ...
      PUT [arg] [arg] ...
      POST [arg] [arg] ...
      DELETE [arg] [arg] ...
    
    

    Where the args are curl command line arguments. Each line can specify arguments for that HTTP verb only, and all lines are optional.

    So, suppose you find yourself using the same curl options over and over. You can save them in a file and resty will pass them to curl for you. Say this is a frequent pattern for you:

      resty localhost:8080
      GET /Blah -H "Accept: application/json"
      GET /Other -H "Accept: application/json"
      ...
      POST /Something -H "Content-Type: text/plain" -u user:pass
      POST /SomethingElse -H "Content-Type: text/plain" -u user:pass
      ...
    
    

    It's annoying to add the -H and -u options to curl all the time. So create a file ~/.resty/localhost:8080, like this:

    ~/.resty/localhost:8080

      GET -H "Accept: application/json"
      POST -H "Content-Type: text/plain" -u user:pass
    
    

    Then any GET or POST requests to localhost:8080 will have the specified options prepended to the curl command line arguments, saving you from having to type them out each time, like this:

      GET /Blah
      GET /Other
      ...
      POST /Something
      POST /SomethingElse
      ...
    
    

    Sweet! Much better.

    Exit Status

    Successful requests (HTTP respose with 2xx status) return zero. Otherwise, the first digit of the response status is returned (i.e., 1 for 1xx, 3 for 3xx, 4 for 4xx, etc.) This is because the exit status is an 8 bit integer---it can't be greater than 255. If you want the exact status code you can always just pass the -v option to curl.

    Working With JSON

    JSON REST web services require some special tools to make them accessible and easily manipulated in the shell environment. The following are a few scripts that make dealing with JSON data easier.

    • Jsawk can be used to process and filter JSON data from and to resty, in a shell pipeline. This takes care of parsing the input JSON correctly, rather than using regexes and sed, awk, perl or the like, and prints the resulting output in correct JSON format, as well.

      GET /blogs.json |jsawk -n 'out(this.title)' # prints all the blog titles

    • The included pp script will pretty-print JSON for you. You just need to install the JSON perl module from CPAN or you can use pypp if you have python 2.6 installed.

      GET /blogs.json |pp # pretty-prints the JSON output from resty