memonic

httpie - Python-powered HTTP CLI for humans - The Changelog - Open Source moves fast. Keep up.

Save
httpie - Python-powered HTTP CLI for humans
Wynn Netherland posted this 2 days ago

Although cURL is great, we’re always looking for great console tools for working with HTTP. Jakub Roztocil has released HTTPie. Built on Requests, HTTPie provides a clean command line interface for HTTP requests:

http PATCH api.example.com/person/1 X-API-Token:123 name=John email=john@example.orgPATCH /person/1 HTTP/1.1User-Agent: HTTPie/0.1X-API-Token: 123Content-Type: application/json; charset=utf-8{"name": "John", "email": "john@example.org"}

I appreciate the colored terminal output:

HTTPie output

Source on GitHub.

Web API Versioning Smackdown

Save

Tuesday, 25 October 2011

Web API Versioning Smackdown

A lot of bits have been used over on the OpenStack list recently about versioning the HTTP APIs they provide.

This over-long and rambling post summarises my current thoughts on the topic, both as background for that discussion, as well as for review in the wider community.

The Warm-up: Software vs. Web Versioning

Developers are used to software versioning; e.g., for every release, you bump an identifier. There are usually major versions, minor versions, and sometimes things like package identifiers.

This fine level of granularity is useful to both developers and users; each of these things has precise semantics that helps in figuring out compatibility and debugging.

For example, on my Fedora box, I can do:

cloud:~> yum -q list installed httpd
Installed Packages
httpd.x86_64    2.2.17-1.fc14    @updates

… and I’ll know that Apache httpd version 2.2.17 is installed, and it’s the first package of that version for Fedora 14.

This lets me know that any modules I want to use with the server will need to work with Apache 2.2; and, that if there are security bugs found in httpd 2.2.15, I’m safe. Furthermore, when I install software that depends upon Apache, it can specify a specific version — and even packaging — to require, so that if it wants to avoid specific bugs, or require specific features, it can.

These are good and useful things to use software versioning for; it’s evolved into best practice that’s pretty well-understood. See, for example, Fedora’s package versioning guidelines.

However, they don’t directly apply to versioning on the Web. While there are similar use cases — e.g., maintaining compatibility, enabling debugging, dependency control — the mechanisms are completely different.

For example, if you throw such a version identifier into your URI, like this:

http://api.example.com/v2.2.17-1.fc14/things/foo

then every time you make a minor change to your software, you’ll be minting an entire new set of resources on the Web;

http://api.example.com/v2.2.17-2.fc14/things/foo

Moreover, you’ll need to still support the old ones for old clients, so you’ll have a massive footprint of URIs to support. Now consider what this does to caches in the middle; they have to maintain duplicates of the same thing — because it’s unlikely that foo has changed, but it can’t be sure — and your cache hit rate goes down.

Likewise, anybody holding onto a link from the previous version of the API has to decide what to do with it going forward; while they can guess that there’ll be compatibility between the two versions, they can’t really be sure, and they’ll still need to be rewriting a bunch of APIs.

In other words, just sticking software versions into Web URL removes a lot of the value we get from using HTTP, and if you do this, you might as well be using a ‘dumb’ RPC protocol.

So what does work, on the Web?

The answer is that there is no one answer; there are lots of different mechanisms in HTTP to meet the goals that people have for versioning.

However, there is an underlying principle to almost any kind of of versioning on the Web; not breaking existing clients.

The reasoning is simple; once you publish a Web API, people are going to start writing software that relies upon it, and every time you introduce a change, you introduce the potential to break them. That means that changes have to happen in predictable and well-understood ways.

For example, if you start using the Foo HTTP header, you can’t change its semantics or syntax afterwards. Even fixing bugs in how it works can be tricky, because clients will start to work around the bugs, and when you change things, you break the workarounds.

In other words, good mechanisms are extensible, so that you can introduce change without wiping the slate clean, and it means that any change that doesn’t fit into an extension needs to use a new identifier, so it doesn’t confuse clients expecting the old behaviour.

So, if you want to change the semantics of that Foo header, you can either take advantage of extensibility (if it allows it; see the Cache-Control headers extensibility policy for a great example), or you have to introduce another header, e.g., Foo2.

This approach extends to lots of other things, whether they be media types, URI parameters, and potentially URIs themselves (see below).

Because of this, versioning is something that should not take place often, because every time you change a version identifier, you’re potentially orphaning clients who “speak” that language.

The fundamental principle is that you can’t break existing clients, because you don’t know what they implement, and you don’t control them. In doing so, you need to turn a backwards-incompatible change into a compatible one.

This implies that API versioning absolutely cannot be tied to software versioning in any way; doing so will needlessly limit (and often break) your clients, and generally upset people.

There’s an interesting effect to observe here, by the way; this approach to versioning is inherently non-linear. In other words, every time you mint a new identifier, you’re minting a fundamentally new thing, whether it be a HTTP header, a format identified by a media type, or a URI. you might as well use “foo” and “bar” as “v1” and “v2”. In some ways, that’s preferred, because people read so much into numbers (especially when there are decimal points involved).

The tricky part, as we’ll see in a bit, is what identifiers you nominate to pivot interoperability around.

An Aside: Debugging with Product Tokens

So, if you don’t put minor version information into URIs, media types and other identifiers, how do you debug when you have an implementation-specific problem? How do you track these minor changes?

HTTP’s answer to this is product tokens. The appear in things like the User-Agent, Server and Via headers, and allow software to identify itself, without surfacing minor versioning and packaging information into the protocols “core” identifiers (whether it’s a URI, a media type, a HTTP header, or whatever).

These sorts of versions are free — or even encouraged, delta the security considerations — to contain fine-grained identifiers for what version, package, etc. of software is running. It’s what they’re for.

The Main Event: Resource Versioning

All of that said, the question remains of how to manage change in your Web application’s interface. These changes can be divided into two rough categories; representation format changes and resource changes.

Representation format changes have been covered fairly well by others (e.g., Dave), and they’re both simple and maddeningly complex. In a nutshell, don’t make backwards-incompatible changes, and if you do, change the media type.

JSON makes this easier than XML, because it has both a simpler metamodel, as well as a default mustIgnore rule.

Resource changes are what I’m more interested in here. This is doing things like adding new methods, changing the URIs that clients use (including query parameters and their semantics), and so forth.

Again, many (if not most) changes to resources can be accommodated by turning them into backwards-compatible changes. For example, rather than bumping a version when you want to modify how a resource handles query parameters, you mint a new, sibling resource with a different name that takes the alternate query parameters.

However, there comes a time when you need to “wipe the slate clean.” Perhaps it’s because your API has become overburdened with such add-on resources, or you’ve got some new insights into your problem that benefit from a fresh sheet. Then, it’s time to introduce a new API version (which again, shouldn’t happen often). The question is, “how?”

In this Corner: URI Versioning

The most widely accepted way to do version resources of Web APIs currently is in the URI. A typical example might be:

http://api.example.com/v1/things/foo

Here, first path segment is a major version identifier, and when it changes, everything under it does as well. Therefore, the client needs to decide what version of the API it wants to interact with; there isn’t any correlation between URIs between v1 and v2, for example.

So, even if you have:

http://api.example.com/v2/things/foo

There isn’t necessarily any correlation between the two URIs. This is important, because it gives you that clean slate; if there were correlation between v1 and v2 URIs, you’d be tying your hands in terms of what you could do in v2 (and beyond).

You can see evidence of this in lots of popular Web APIs out there; e.g., Twitter and Yahoo.

However, it’s not necessary to have that version number in there. Consider Facebook; their so-called old REST API has been deprecated in favour of their new Graph API. Neither has “v1” or “v2” in them; rather, they just use the hostname to name space the different interfaces (“api.facebook.com” vs. “graph.facebook.com”). Old clients are still supported, and new clients can get new functionality; they just called their new version something less boring than “v2”.

Fundamentally, this is how the Web works, and there’s nothing wrong with this approach, whether you use “v1” and “v2” or “foo” and “bar” — although I think there’s less confusion inherent in the latter approach.

The Contender: HATEOS

However, there is one lingering concern that gets tied up into this; people assume — very reasonably — that when you document a set of URIs and ship them as a version of an interface, clients can count on those URIs being useful.

This violates a core REST principle called “Hypertext As The Engine of Application State”, or HATEOS for short.

RESTafarians have long searched for signs of HATEOS in Web APIs, and Roy has lamented its absence in the majority of them.

Tying your clients into a pre-set understanding of URIs tightly couples the client implementation to the server; in practice, this makes your interface fragile, because any change can inadvertently break things, and people tend to like to change URIs over time.

In a HATEOS approach to an API, you’d define everything in terms of media types (what formats your accept and produce) and link relations (how the resources producing those representations are related).

This means that your first interaction with an interface might look like this:

GET / HTTP/1.1
Host: api.example.com
Accept: application/vnd.example.link_templates+json

HTTP/1.1 200 OK
Content-Type: application/vnd.example.link_templates+json
Cache-Control: max-age=3600
Connection: close

{
  "account": "http://accounts.example.com/{account_id}",
  "server": "/servers/{server_id}",
  "image": "https://images.example.com/{image_id}"
}

Please don’t read too much into this representation; it’s just a sketch. The important thing is that the client uses information from the server to dynamically generate URIs at runtime, rather than baking them into the implementations.

All of the semantics are baked into those link relations — they should probably be URIs if they’re not registered, by the way — and in the formats produced. URIs are effectively semantic-free.

This gives a LOT of flexibility in the implementation; the client can choose which resources to use based upon the link relations it understands, and changes are introduced by adding new link relations, rather than new URIs (although that’s likely to be a side effect). The URIs in use are completely under control of the server, and can be arranged at will.

In this manner, you don’t need a different URI for your interface, ever, because the entry point is effectively used for agent-driven content negotiation.

The downsides? This approach requires clients to make requests to discover URIs, and not to take shortcuts. It’s therefore chatty — a fairly damning condemnation.

However, notice the all-important Cache-Control header in that response; it may be chatty without caching, but if the client caches, it’s not that bad at all.

The main issues with going HATEOS for your API, then, are the requirements it places upon clients. If client-side HTTP tools were more widely capable, this wouldn’t be a big deal, but currently you can only assume a very low-level, bare HTTP API without caching, so it does place a lot of responsibility on your client developer’s shoulders — not a good thing, since there are usually many more of them than there are server-side.

So, there are arguments for and against HATEOS, and one could say the trade-offs are somewhat balanced; both are at least reasoned positions. However, there’s one more thing…

Enter Extensibility

Extensibility and Versioning are the peanut butter and jelly of protocol engineering. Sure, my kids’ cohort in Australian primary schools are horrified by this combination, but stay with me.

OpenStack has an especially nasty extensibility problem; they allow vendors to add pretty much arbitrary things to the protocol, from new resources to new representations, as well as extensions inside their existing formats.

Allowing such freedom with “baked-in” URIs is hard. You have to carve out extension prefixes to avoid collisions, and then hope that that’s good enough. For example, what if an API uses URIs like this:

http://api.example.com/users/{userid}

and HP wants to add a new subresource to the users collection? Does it become

http://api.example.com/users/hp

? No, that’s bad, because then no userid can be “hp”, and special cases are evil, especially when they’re under the control of others.

You could do:

http://api.example.com/users/ext/hp

and special-case only one thing, “ext”, but that’s pretty nasty too, especially when you can still potentially add “hp” to any point in the URI tree.

Instead, if you take a HATEOS approach, you push extensibility into link relations, so that you have something like:

GET / HTTP/1.1
Host: api.example.com
Accept: application/vnd.example.link_templates+json

HTTP/1.1 200 OK
Content-Type: application/vnd.example.link_templates+json
Cache-Control: max-age=3600
Connection: close

{
  "users": "http://api.example.com/users/{userid}",
  "hp-user-stuff": "http://api.example.com/users/{userid}/stuff"
}

Now, the implementation has full control over the URIs used for extensions, and it’s responsible for avoiding collisions. All that HP (or anyone else wanting an extension) has to do is mint a new link relation type, and describe what it points to (using existing or new media types).

This isn’t the whole extensibility story, of course; format extensions are independent of URIs, for example. However, the freedom of extensibility that taking a HATEOS approach gives you is too good to pass up, in my estimation.

The key insight here, I think, is that URIs are used for so many things — persistent identifiers, cache keys, bases for relative resolution, bookmarks — that overloading them with versioning and extensibility information as well makes them worse for all of their various purposes. By pushing these concerns into link relations and media types using HATEOS, you end up with a flexible, future-proof system that can evolve in a controllable way, without giving up the benefits of using HTTP (never mind REST).


Filed under: Cloud, HTTP, Protocol Design

nginx: HttpLimitReqModule

Save

 

HttpLimitReqModule

 

Synopsis

This module allows you to limit the number of requests for a given session, or as a special case, with one address.

Restriction done using leaky bucket.

Example Configuration

http {
    limit_req_zone  $binary_remote_addr  zone=one:10m   rate=1r/s;
 
    ...
 
    server {
 
        ...
 
        location /search/ {
            limit_req   zone=one  burst=5;
        }

Directives

Syntax: limit_req_log_level info|notice|warn|error

Default: warn

Context: http

Controls the log level of the rejected requests. Delayed requests are logged at the next less severe level, though, for example when limit_req_log_level is set to "error", delayed requests are logged at "warn".


Syntax: limit_req_zone $session_variable zone=name_of_zone:size rate=rate

Default: none

Context: http

The directive describes the area, which stores the state of the sessions. The values of the sessions is determined by the given variable. Example of usage:

limit_req_zone  $binary_remote_addr  zone=one:10m   rate=1r/s;

In this case, the session state is allocated 10MB as a zone called "one", and the average speed of queries for this zone is limited to 1 request per second.

The sessions are tracked per-user in this case, but note that instead of the variable $remote_addr, we've used the variable $binary_remote_addr, reducing the size of the state to 64 bytes. A 1 MB zone can hold approximately 16000 states of this size.

The speed is set in requests per second or requests per minute. The rate must be an integer, so if you need to specify less than one request per second, say, one request every two seconds, you would specify it as "30r/m".

Syntax: limit_req zone=zone burst=burst [nodelay]

Default: none

Context: http, server, location

The directive specifies the zone (zone) and the maximum possible bursts of requests (burst). If the rate exceeds the demands outlined in the zone, the request is delayed, so that queries are processed at a given speed. Excess requests are delayed until their number does not exceed a specified number of bursts. In this case the request is completed the code "Service unavailable" (503). By default, the burst is zero.

For example, the directive

limit_req_zone  $binary_remote_addr  zone=one:10m   rate=1r/s;
 
    server {
        location /search/ {
            limit_req   zone=one  burst=5;
        }

allows a user no more than 1 request per second on average, with bursts of no more than 5 queries. If the excess requests within the limit burst delay are not necessary, you should use the nodelay:

            limit_req   zone=one  burst=5  nodelay;
 

 

Testing for SSL-TLS (OWASP-CM-001) - OWASP

Save

OWASP Testing Guide v3 Table of Contents

This article is part of the OWASP Testing Guide v3. The entire OWASP Testing Guide v3 can be downloaded here.


Due to historic export restrictions of high grade cryptography, legacy and new web servers are often able and configured to handle weak cryptographic options.

Even if high grade ciphers are normally used and installed, some server misconfiguration could be used to force the use of a weaker cipher to gain access to the supposed secure communication channel.

The http clear-text protocol is normally secured via an SSL or TLS tunnel, resulting in https traffic. In addition to providing encryption of data in transit, https allows the identification of servers (and, optionally, of clients) by means of digital certificates.

Historically, there have been limitations set in place by the U.S. government to allow cryptosystems to be exported only for key sizes of, at most, 40 bits, a key length which could be broken and would allow the decryption of communications. Since then, cryptographic export regulations have been relaxed (though some constraints still hold); however, it is important to check the SSL configuration being used to avoid putting in place cryptographic support which could be easily defeated. SSL-based services should not offer the possibility to choose weak ciphers.

Technically, cipher determination is performed as follows. In the initial phase of a SSL connection setup, the client sends the server a Client Hello message specifying, among other information, the cipher suites that it is able to handle. A client is usually a web browser (most popular SSL client nowadays), but not necessarily, since it can be any SSL-enabled application; the same holds for the server, which needs not be a web server, though this is the most common case. (For example, a noteworthy class of SSL clients is that of SSL proxies such as stunnel (www.stunnel.org) which can be used to allow non-SSL enabled tools to talk to SSL services.) A cipher suite is specified by an encryption protocol (DES, RC4, AES), the encryption key length (such as 40, 56, or 128 bits), and a hash algorithm (SHA, MD5) used for integrity checking. Upon receiving a Client Hello message, the server decides which cipher suite it will use for that session. It is possible (for example, by means of configuration directives) to specify which cipher suites the server will honor. In this way you may control, for example, whether or not conversations with clients will support 40-bit encryption only.

Large number of available cipher suites and quick progress in cryptoanalysis makes judging a SSL server a non-trivial task. These criteria are widely recognised as minimum checklist:

  • SSLv2, due to known weaknesses in protocol design
  • Export (EXP) level cipher suites in SSLv3
  • Cipher suites with symmetric encryption algorithm smaller than 128 bits
  • X.509 certificates with RSA or DSA key smaller than 1024 bits
  • X.509 certificates signed using MD5 hash, due to known collision attacks on this hash
  • TLS Renegotiation vulnerability[1]

While there are known collision attacks on MD5 and known cryptoanalytical attacks on RC4, their specific usage in SSL and TLS doesn't allow these attacks to be practical and SSLv3 or TLSv1 cipher suites using RC4 and MD5 with key lenght of 128 bit is still considered sufficient[2].

The following standards can be used as reference while assessing SSL servers:

  • NIST SP 800-52 recommends U.S. federal systems to use at least TLS 1.0 with ciphersuites based on RSA or DSA key agreement with ephemeral Diffie-Hellman, 3DES or AES for confidentality and SHA1 for integrity protection. NIST SP 800-52 specifically disallows non-FIPS compliant algorithms like RC4 and MD5. An exception is U.S. federal systems making connections to outside servers, where these algorithms can be used in SSL client mode.
  • PCI-DSS v1.2 in point 4.1 requires compliant parties to use "strong cryptography" without precisely defining key lengths and algorithms. Common interpretation, partially based on previous versions of the standard, is that at least 128 bit key cipher, no export strength algorithms and no SSLv2 should be used[3].
  • SSL Server Rating Guide has been proposed to standardize SSL server assessment and currently is in draft version.

SSL Server Database can be used to assess configuration of publicly available SSL servers[4] based on SSL Rating Guide[5]

In order to detect possible support of weak ciphers, the ports associated to SSL/TLS wrapped services must be identified. These typically include port 443, which is the standard https port; however, this may change because a) https services may be configured to run on non-standard ports, and b) there may be additional SSL/TLS wrapped services related to the web application. In general, a service discovery is required to identify such ports.

The nmap scanner, via the “–sV” scan option, is able to identify SSL services. Vulnerability Scanners, in addition to performing service discovery, may include checks against weak ciphers (for example, the Nessus scanner has the capability of checking SSL services on arbitrary ports, and will report weak ciphers).


Example 1. SSL service recognition via nmap.

[root@test]# nmap -F -sV localhost

Starting nmap 3.75 ( http://www.insecure.org/nmap/ ) at 2005-07-27 14:41 CEST
Interesting ports on localhost.localdomain (127.0.0.1):
(The 1205 ports scanned but not shown below are in state: closed)

PORT      STATE SERVICE         VERSION
443/tcp   open  ssl             OpenSSL
901/tcp   open  http            Samba SWAT administration server
8080/tcp  open  http            Apache httpd 2.0.54 ((Unix) mod_ssl/2.0.54 OpenSSL/0.9.7g PHP/4.3.11)
8081/tcp  open  http            Apache Tomcat/Coyote JSP engine 1.0

Nmap run completed -- 1 IP address (1 host up) scanned in 27.881 seconds
[root@test]# 


Example 2. Identifying weak ciphers with Nessus. The following is an anonymized excerpt of a report generated by the Nessus scanner, corresponding to the identification of a server certificate allowing weak ciphers (see underlined text).

 https (443/tcp)
 Description
 Here is the SSLv2 server certificate:
 Certificate:
 Data:
 Version: 3 (0x2)
 Serial Number: 1 (0x1)
 Signature Algorithm: md5WithRSAEncryption
 Issuer: C=**, ST=******, L=******, O=******, OU=******, CN=******
 Validity
 Not Before: Oct 17 07:12:16 2002 GMT
 Not After : Oct 16 07:12:16 2004 GMT
 Subject: C=**, ST=******, L=******, O=******, CN=******
 Subject Public Key Info:
 Public Key Algorithm: rsaEncryption
 RSA Public Key: (1024 bit)
 Modulus (1024 bit):
 00:98:4f:24:16:cb:0f:74:e8:9c:55:ce:62:14:4e:
 6b:84:c5:81:43:59:c1:2e:ac:ba:af:92:51:f3:0b:
 ad:e1:4b:22:ba:5a:9a:1e:0f:0b:fb:3d:5d:e6:fc:
 ef:b8:8c:dc:78:28:97:8b:f0:1f:17:9f:69:3f:0e:
 72:51:24:1b:9c:3d:85:52:1d:df:da:5a:b8:2e:d2:
 09:00:76:24:43:bc:08:67:6b:dd:6b:e9:d2:f5:67:
 e1:90:2a:b4:3b:b4:3c:b3:71:4e:88:08:74:b9:a8:
 2d:c4:8c:65:93:08:e6:2f:fd:e0:fa:dc:6d:d7:a2:
 3d:0a:75:26:cf:dc:47:74:29
 Exponent: 65537 (0x10001)
 X509v3 extensions:
 X509v3 Basic Constraints:
 CA:FALSE
 Netscape Comment:
 OpenSSL Generated Certificate
 Page 10
 Network Vulnerability Assessment Report 25.05.2005
 X509v3 Subject Key Identifier:
 10:00:38:4C:45:F0:7C:E4:C6:A7:A4:E2:C9:F0:E4:2B:A8:F9:63:A8
 X509v3 Authority Key Identifier:
 keyid:CE:E5:F9:41:7B:D9:0E:5E:5D:DF:5E:B9:F3:E6:4A:12:19:02:76:CE
 DirName:/C=**/ST=******/L=******/O=******/OU=******/CN=******
 serial:00
 Signature Algorithm: md5WithRSAEncryption
 7b:14:bd:c7:3c:0c:01:8d:69:91:95:46:5c:e6:1e:25:9b:aa:
 8b:f5:0d:de:e3:2e:82:1e:68:be:97:3b:39:4a:83:ae:fd:15:
 2e:50:c8:a7:16:6e:c9:4e:76:cc:fd:69:ae:4f:12:b8:e7:01:
 b6:58:7e:39:d1:fa:8d:49:bd:ff:6b:a8:dd:ae:83:ed:bc:b2:
 40:e3:a5:e0:fd:ae:3f:57:4d:ec:f3:21:34:b1:84:97:06:6f:
 f4:7d:f4:1c:84:cc:bb:1c:1c:e7:7a:7d:2d:e9:49:60:93:12:
 0d:9f:05:8c:8e:f9:cf:e8:9f:fc:15:c0:6e:e2:fe:e5:07:81:
 82:fc
 Here is the list of available SSLv2 ciphers:
 RC4-MD5
 EXP-RC4-MD5
 RC2-CBC-MD5
 EXP-RC2-CBC-MD5
 DES-CBC-MD5
 DES-CBC3-MD5
 RC4-64-MD5
 The SSLv2 server offers 5 strong ciphers, but also 0 medium strength and 2 weak "export class" ciphers.
 The weak/medium ciphers may be chosen by an export-grade or badly configured client software. They only offer a limited protection against a brute force attack
 Solution: disable those ciphers and upgrade your client software if necessary.
 See http://support.microsoft.com/default.aspx?scid=kben-us216482
 or http://httpd.apache.org/docs-2.0/mod/mod_ssl.html#sslciphersuite
 This SSLv2 server also accepts SSLv3 connections.
 This SSLv2 server also accepts TLSv1 connections.
 
 Vulnerable hosts
 (list of vulnerable hosts follows)


Example 3. Manually audit weak SSL cipher levels with OpenSSL. The following will attempt to connect to Google.com with SSLv2.

[root@test]# openssl s_client -no_tls1 -no_ssl3 -connect www.google.com:443
CONNECTED(00000003)
depth=0 /C=US/ST=California/L=Mountain View/O=Google Inc/CN=www.google.com
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 /C=US/ST=California/L=Mountain View/O=Google Inc/CN=www.google.com
verify error:num=27:certificate not trusted
verify return:1
depth=0 /C=US/ST=California/L=Mountain View/O=Google Inc/CN=www.google.com
verify error:num=21:unable to verify the first certificate
verify return:1
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDYzCCAsygAwIBAgIQYFbAC3yUC8RFj9MS7lfBkzANBgkqhkiG9w0BAQQFADCB
zjELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJ
Q2FwZSBUb3duMR0wGwYDVQQKExRUaGF3dGUgQ29uc3VsdGluZyBjYzEoMCYGA1UE
CxMfQ2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZpc2lvbjEhMB8GA1UEAxMYVGhh
d3RlIFByZW1pdW0gU2VydmVyIENBMSgwJgYJKoZIhvcNAQkBFhlwcmVtaXVtLXNl
cnZlckB0aGF3dGUuY29tMB4XDTA2MDQyMTAxMDc0NVoXDTA3MDQyMTAxMDc0NVow
aDELMAkGA1UEBhMCVVMxEzARBgNVBAgTCkNhbGlmb3JuaWExFjAUBgNVBAcTDU1v
dW50YWluIFZpZXcxEzARBgNVBAoTCkdvb2dsZSBJbmMxFzAVBgNVBAMTDnd3dy5n
b29nbGUuY29tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC/e2Vs8U33fRDk
5NNpNgkB1zKw4rqTozmfwty7eTEI8PVH1Bf6nthocQ9d9SgJAI2WOBP4grPj7MqO
dXMTFWGDfiTnwes16G7NZlyh6peT68r7ifrwSsVLisJp6pUf31M5Z3D88b+Yy4PE
D7BJaTxq6NNmP1vYUJeXsGSGrV6FUQIDAQABo4GmMIGjMB0GA1UdJQQWMBQGCCsG
AQUFBwMBBggrBgEFBQcDAjBABgNVHR8EOTA3MDWgM6Axhi9odHRwOi8vY3JsLnRo
YXd0ZS5jb20vVGhhd3RlUHJlbWl1bVNlcnZlckNBLmNybDAyBggrBgEFBQcBAQQm
MCQwIgYIKwYBBQUHMAGGFmh0dHA6Ly9vY3NwLnRoYXd0ZS5jb20wDAYDVR0TAQH/
BAIwADANBgkqhkiG9w0BAQQFAAOBgQADlTbBdVY6LD1nHWkhTadmzuWq2rWE0KO3
Ay+7EleYWPOo+EST315QLpU6pQgblgobGoI5x/fUg2U8WiYj1I1cbavhX2h1hda3
FJWnB3SiXaiuDTsGxQ267EwCVWD5bCrSWa64ilSJTgiUmzAv0a2W8YHXdG08+nYc
X/dVk5WRTw==
-----END CERTIFICATE-----
subject=/C=US/ST=California/L=Mountain View/O=Google Inc/CN=www.google.com
issuer=/C=ZA/ST=Western Cape/L=Cape Town/O=Thawte Consulting cc/OU=Certification Services Division/CN=Thawte Premium Server CA/emailAddress=premium-server@thawte.com
---
No client certificate CA names sent
---
Ciphers common between both SSL endpoints:
RC4-MD5         EXP-RC4-MD5     RC2-CBC-MD5
EXP-RC2-CBC-MD5 DES-CBC-MD5     DES-CBC3-MD5
RC4-64-MD5
---
SSL handshake has read 1023 bytes and written 333 bytes
---
New, SSLv2, Cipher is DES-CBC3-MD5
Server public key is 1024 bit
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : SSLv2
    Cipher    : DES-CBC3-MD5
    Session-ID: 709F48E4D567C70A2E49886E4C697CDE
    Session-ID-ctx:
    Master-Key: 649E68F8CF936E69642286AC40A80F433602E3C36FD288C3
    Key-Arg   : E8CB6FEB9ECF3033
    Start Time: 1156977226
    Timeout   : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)
---
closed


Example 4. Testing supported protocols and ciphers using SSLScan.

SSLScan is a free command line tool that scans a HTTPS service to enumerate what protocols (supports SSLv2, SSLv3 and TLS1) and what ciphers the HTTPS service supports. It runs both on Linux and Windows OS (OSX not tested) and is released under a open source license.

[user@test]$ ./SSLScan --no-failed mail.google.com
                   _
           ___ ___| |___  ___ __ _ _ __
          / __/ __| / __|/ __/ _` | '_ \
          \__ \__ \ \__ \ (_| (_| | | | |
          |___/___/_|___/\___\__,_|_| |_|

                  Version 1.9.0-win
             http://www.titania.co.uk
 Copyright 2010 Ian Ventura-Whiting / Michael Boman
    Compiled against OpenSSL 0.9.8n 24 Mar 2010

Testing SSL server mail.google.com on port 443

  Supported Server Cipher(s):
    accepted  SSLv3  256 bits  AES256-SHA
    accepted  SSLv3  128 bits  AES128-SHA
    accepted  SSLv3  168 bits  DES-CBC3-SHA
    accepted  SSLv3  128 bits  RC4-SHA
    accepted  SSLv3  128 bits  RC4-MD5
    accepted  TLSv1  256 bits  AES256-SHA
    accepted  TLSv1  128 bits  AES128-SHA
    accepted  TLSv1  168 bits  DES-CBC3-SHA
    accepted  TLSv1  128 bits  RC4-SHA
    accepted  TLSv1  128 bits  RC4-MD5

  Prefered Server Cipher(s):
    SSLv3  128 bits  RC4-SHA
    TLSv1  128 bits  RC4-SHA

  SSL Certificate:
    Version: 2
    Serial Number: -4294967295
    Signature Algorithm: sha1WithRSAEncryption
    Issuer: /C=ZA/O=Thawte Consulting (Pty) Ltd./CN=Thawte SGC CA
    Not valid before: Dec 18 00:00:00 2009 GMT
    Not valid after: Dec 18 23:59:59 2011 GMT
    Subject: /C=US/ST=California/L=Mountain View/O=Google Inc/CN=mail.google.com
    Public Key Algorithm: rsaEncryption
    RSA Public Key: (1024 bit)
      Modulus (1024 bit):
          00:d9:27:c8:11:f2:7b:e4:45:c9:46:b6:63:75:83:
          b1:77:7e:17:41:89:80:38:f1:45:27:a0:3c:d9:e8:
          a8:00:4b:d9:07:d0:ba:de:ed:f4:2c:a6:ac:dc:27:
          13:ec:0c:c1:a6:99:17:42:e6:8d:27:d2:81:14:b0:
          4b:82:fa:b2:c5:d0:bb:20:59:62:28:a3:96:b5:61:
          f6:76:c1:6d:46:d2:fd:ba:c6:0f:3d:d1:c9:77:9a:
          58:33:f6:06:76:32:ad:51:5f:29:5f:6e:f8:12:8b:
          ad:e6:c5:08:39:b3:43:43:a9:5b:91:1d:d7:e3:cf:
          51:df:75:59:8e:8d:80:ab:53
      Exponent: 65537 (0x10001)
    X509v3 Extensions:
      X509v3 Basic Constraints: critical
        CA:FALSE      X509v3 CRL Distribution Points: 
        URI:http://crl.thawte.com/ThawteSGCCA.crl
      X509v3 Extended Key Usage: 
        TLS Web Server Authentication, TLS Web Client Authentication, Netscape Server Gated Crypto      Authority Information Access: 
        OCSP - URI:http://ocsp.thawte.com
        CA Issuers - URI:http://www.thawte.com/repository/Thawte_SGC_CA.crt
  Verify Certificate:
    unable to get local issuer certificate


Renegotiation requests supported


Example 5. Testing common SSL flaws with ssl_tests

ssl_tests (http://www.pentesterscripting.com/discovery/ssl_tests) is a bash script that uses sslscan and openssl to check for various flaws - ssl version 2, weak ciphers, md5withRSAEncryption,SSLv3 Force Ciphering Bug/Renegotiation.



[user@test]$ ./ssl_test.sh 192.168.1.3 443
+++++++++++++++++++++++++++++++++++++++++++++++++
SSL Tests - v2, weak ciphers, MD5, Renegotiation
by Aung Khant, http://yehg.net
+++++++++++++++++++++++++++++++++++++++++++++++++

[*] testing on 192.168.1.3:443 ..

[*] tesing for sslv2 ..
[*] sslscan 192.168.1.3:443 | grep Accepted  SSLv2
    Accepted  SSLv2  168 bits  DES-CBC3-MD5
    Accepted  SSLv2  56 bits   DES-CBC-MD5
    Accepted  SSLv2  40 bits   EXP-RC2-CBC-MD5
    Accepted  SSLv2  128 bits  RC2-CBC-MD5
    Accepted  SSLv2  40 bits   EXP-RC4-MD5
    Accepted  SSLv2  128 bits  RC4-MD5


[*] testing for weak ciphers ...
[*] sslscan 192.168.1.3:443 | grep  40 bits | grep Accepted
    Accepted  SSLv2  40 bits   EXP-RC2-CBC-MD5
    Accepted  SSLv2  40 bits   EXP-RC4-MD5
    Accepted  SSLv3  40 bits   EXP-EDH-RSA-DES-CBC-SHA
    Accepted  SSLv3  40 bits   EXP-DES-CBC-SHA
    Accepted  SSLv3  40 bits   EXP-RC2-CBC-MD5
    Accepted  SSLv3  40 bits   EXP-RC4-MD5
    Accepted  TLSv1  40 bits   EXP-EDH-RSA-DES-CBC-SHA
    Accepted  TLSv1  40 bits   EXP-DES-CBC-SHA
    Accepted  TLSv1  40 bits   EXP-RC2-CBC-MD5
    Accepted  TLSv1  40 bits   EXP-RC4-MD5

[*] sslscan 192.168.1.3:443 | grep  56 bits | grep Accepted
    Accepted  SSLv2  56 bits   DES-CBC-MD5
    Accepted  SSLv3  56 bits   EDH-RSA-DES-CBC-SHA
    Accepted  SSLv3  56 bits   DES-CBC-SHA
    Accepted  TLSv1  56 bits   EDH-RSA-DES-CBC-SHA
    Accepted  TLSv1  56 bits   DES-CBC-SHA


[*] testing for MD5 certificate ..
[*] sslscan 192.168.1.3:443 | grep MD5WithRSAEncryption

[*] testing for SSLv3 Force Ciphering Bug/Renegotiation ..
[*] echo R | openssl s_client -connect 192.168.1.3:443 | grep DONE
depth=0 /C=DE/ST=Berlin/L=Berlin/O=XAMPP/OU=XAMPP/CN=localhost/emailAddress=admin@localhost
verify error:num=18:self signed certificate
verify return:1
depth=0 /C=DE/ST=Berlin/L=Berlin/O=XAMPP/OU=XAMPP/CN=localhost/emailAddress=admin@localhost
verify return:1
RENEGOTIATING
depth=0 /C=DE/ST=Berlin/L=Berlin/O=XAMPP/OU=XAMPP/CN=localhost/emailAddress=admin@localhost
verify error:num=18:self signed certificate
verify return:1
depth=0 /C=DE/ST=Berlin/L=Berlin/O=XAMPP/OU=XAMPP/CN=localhost/emailAddress=admin@localhost
verify return:1
DONE


[*] done

Check the configuration of the web servers which provide https services. If the web application provides other SSL/TLS wrapped services, these should be checked as well.

Example: The following registry path in Microsoft Windows 2003 defines the ciphers available to the server:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Ciphers\

When accessing a web application via the https protocol, a secure channel is established between the client (usually the browser) and the server. The identity of one (the server) or both parties (client and server) is then established by means of digital certificates. In order for the communication to be set up, a number of checks on the certificates must be passed. While discussing SSL and certificate based authentication is beyond the scope of this Guide, we will focus on the main criteria involved in ascertaining certificate validity: a) checking if the Certificate Authority (CA) is a known one (meaning one considered trusted), b) checking that the certificate is currently valid, and c) checking that the name of the site and the name reported in the certificate match. Remember to upgrade your browser because CA certs expired too, in every release of the browser, CA Certs has been renewed. Moreover it's important to update the browser because more web sites require strong cipher more of 40 or 56 bit.

Let’s examine each check more in detail.

a) Each browser comes with a preloaded list of trusted CAs, against which the certificate signing CA is compared (this list can be customized and expanded at will). During the initial negotiations with an https server, if the server certificate relates to a CA unknown to the browser, a warning is usually raised. This happens most often because a web application relies on a certificate signed by a self-established CA. Whether this is to be considered a concern depends on several factors. For example, this may be fine for an Intranet environment (think of corporate web email being provided via https; here, obviously all users recognize the internal CA as a trusted CA). When a service is provided to the general public via the Internet, however (i.e. when it is important to positively verify the identity of the server we are talking to), it is usually imperative to rely on a trusted CA, one which is recognized by all the user base (and here we stop with our considerations; we won’t delve deeper in the implications of the trust model being used by digital certificates).

b) Certificates have an associated period of validity, therefore they may expire. Again, we are warned by the browser about this. A public service needs a temporally valid certificate; otherwise, it means we are talking with a server whose certificate was issued by someone we trust, but has expired without being renewed.

c) What if the name on the certificate and the name of the server do not match? If this happens, it might sound suspicious. For a number of reasons, this is not so rare to see. A system may host a number of name-based virtual hosts, which share the same IP address and are identified by means of the HTTP 1.1 Host: header information. In this case, since the SSL handshake checks the server certificate before the HTTP request is processed, it is not possible to assign different certificates to each virtual server. Therefore, if the name of the site and the name reported in the certificate do not match, we have a condition which is typically signalled by the browser. To avoid this, IP-based virtual servers must be used. [2] and [3] describe techniques to deal with this problem and allow name-based virtual hosts to be correctly referenced.


Examine the validity of the certificates used by the application. Browsers will issue a warning when encountering expired certificates, certificates issued by untrusted CAs, and certificates which do not match namewise with the site to which they should refer. By clicking on the padlock which appears in the browser window when visiting an https site, you can look at information related to the certificate – including the issuer, period of validity, encryption characteristics, etc.

If the application requires a client certificate, you probably have installed one to access it. Certificate information is available in the browser by inspecting the relevant certificate(s) in the list of the installed certificates.

These checks must be applied to all visible SSL-wrapped communication channels used by the application. Though this is the usual https service running on port 443, there may be additional services involved depending on the web application architecture and on deployment issues (an https administrative port left open, https services on non-standard ports, etc.). Therefore, apply these checks to all SSL-wrapped ports which have been discovered. For example, the nmap scanner features a scanning mode (enabled by the –sV command line switch) which identifies SSL-wrapped services. The Nessus vulnerability scanner has the capability of performing SSL checks on all SSL/TLS-wrapped services.

Examples

Rather than providing a fictitious example, we have inserted an anonymized real-life example to stress how frequently one stumbles on https sites whose certificates are inaccurate with respect to naming.

The following screenshots refer to a regional site of a high-profile IT company.

Warning issued by Microsoft Internet Explorer. We are visiting an .it site and the certificate was issued to a .com site! Internet Explorer warns that the name on the certificate does not match the name of the site.


SSL Certificate Validity Testing IE Warning.gif


Warning issued by Mozilla Firefox. The message issued by Firefox is different – Firefox complains because it cannot ascertain the identity of the .com site the certificate refers to because it does not know the CA which signed the certificate. In fact, Internet Explorer and Firefox do not come preloaded with the same list of CAs. Therefore, the behavior experienced with various browsers may differ.


SSL Certificate Validity Testing Firefox Warning.gif

Examine the validity of the certificates used by the application at both server and client levels. The usage of certificates is primarily at the web server level; however, there may be additional communication paths protected by SSL (for example, towards the DBMS). You should check the application architecture to identify all SSL protected channels.

Whitepapers

Tools

  • Vulnerability scanners may include checks regarding certificate validity, including name mismatch and time expiration. They usually report other information as well, such as the CA which issued the certificate. Remember that there is no unified notion of a “trusted CA”; what is trusted depends on the configuration of the software and on the human assumptions made beforehand. Browsers come with a preloaded list of trusted CAs. If your web application relies on a CA which is not in this list (for example, because you rely on a self-made CA), you should take into account the process of configuring user browsers to recognize the CA.
  • The Nessus scanner includes a plugin to check for expired certificates or certificates which are going to expire within 60 days (plugin “SSL certificate expiry”, plugin id 15901). This plugin will check certificates installed on the server.
  • Vulnerability scanners may include checks against weak ciphers. For example, the Nessus scanner (http://www.nessus.org) has this capability and flags the presence of SSL weak ciphers (see example provided above).
  • You may also rely on specialized tools such as SSL Digger (http://www.mcafee.com/us/downloads/free-tools/ssldigger.aspx), or – for the command line oriented – experiment with the openssl tool, which provides access to OpenSSL cryptographic functions directly from a Unix shell (may be already available on *nix boxes, otherwise see www.openssl.org).
  • To identify SSL-based services, use a vulnerability scanner or a port scanner with service recognition capabilities. The nmap scanner features a “-sV” scanning option which tries to identify services, while the nessus vulnerability scanner has the capability of identifying SSL-based services on arbitrary ports and to run vulnerability checks on them regardless of whether they are configured on standard or non-standard ports.
  • In case you need to talk to a SSL service but your favourite tool doesn’t support SSL, you may benefit from a SSL proxy such as stunnel; stunnel will take care of tunneling the underlying protocol (usually http, but not necessarily so) and communicate with the SSL service you need to reach.
  • Finally, a word of advice. Though it may be tempting to use a regular browser to check certificates, there are various reasons for not doing so. Browsers have been plagued by various bugs in this area, and the way the browser will perform the check might be influenced by configuration settings that may not be evident. Instead, rely on vulnerability scanners or on specialized tools to do the job.

Doing HTTP Caching Right: Introducing httplib2

Save

Doing HTTP Caching Right: Introducing httplib2
By Joe Gregorio
February 01, 2006

You need to understand HTTP caching. No, really, you do. I have mentioned repeatedly that you need to choose your HTTP methods carefully when building a web service, in part because you can get the performance benefits of caching with GET. Well, if you want to get the real advantages of GET then you need to understand caching and how you can use it effectively to improve the performance of your service.

This article will not explain how to set up caching for your particular web server, nor will it cover the different kinds of caches. If you want that kind of information I recommend Mark Nottingham's excellent tutorial on HTTP caching.

First you need to understand the goals of the HTTP caching model. One objective is to let both the client and server have a say over when to return a cached entry. As you can imagine, allowing both client and server to have input on when a cached entry is to be considered stale is obviously going to introduce some complexity.

The HTTP caching model is based on validators, which are bits of data that a client can use to validate that a cached response is still valid. They are fundamental to the operation of caches since they allow a client or intermediary to query the status of a resource without having to transfer the entire response again: the server returns an entity body only if the validator indicates that the cache has a stale response.

One of the validators for HTTP is the ETag. An ETag is like a fingerprint for the bytes in the representation; if a single byte changes the ETag also changes.

Using validators requires that you already have done a GET once on a resource. The cache stores the value of the ETag header if present and then uses the value of that header in later requests to that same URI.

For example, if I send a request to example.org and get back this response:

HTTP/1.1 200 OK
Date: Fri, 30 Dec 2005 17:30:56 GMT
Server: Apache
ETag: "11c415a-8206-243aea40"
Accept-Ranges: bytes
Content-Length: 33286
Vary: Accept-Encoding,User-Agent
Cache-Control: max-age=7200
Expires: Fri, 30 Dec 2005 19:30:56 GMT
Content-Type: image/png 

-- binary data --

Then the next time I do a GET I can add the validator in. Note that the value of ETag is placed in the If-None-Match: header.

GET / HTTP/1.1
Host: example.org
If-None-Match: "11c415a-8206-243aea40"

If there was no change in the representation then the server returns a 304 Not Modified.

HTTP/1.1 304 Not Modified 
Date: Fri, 30 Dec 2005 17:32:47 GMT

If there was a change, the new representation is returned with a status code of 200 and a new ETag.

HTTP/1.1 200 OK
Date: Fri, 30 Dec 2005 17:32:47 GMT
Server: Apache
ETag: "0192384-9023-1a929893"
Accept-Ranges: bytes
Content-Length: 33286
Vary: Accept-Encoding,User-Agent
Cache-Control: max-age=7200
Expires: Fri, 30 Dec 2005 19:30:56 GMT
Content-Type: image/png 

-- binary data --
 

While validators are used to test if a cached entry is still valid, the Cache-Control: header is used to signal how long a representation can be cached. The most fundamental of all the cache-control directives is max-age. This directive asserts that the cached response can be only max-age seconds old before being considered stale. Note that max-age can appear in both request headers and response headers, which gives both the client and server a chance to assert how old they like their responses cached. If a cached response is fresh then we can return the cached response immediately; if it's stale then we need to validate the cached response before returning it.

Let's take another look at our example response from above. Note that the Cache-Control: header is set and that a max-age of 7200 means that the entry can be cached for up to two hours.

HTTP/1.1 200 OK
Date: Fri, 30 Dec 2005 17:32:47 GMT
Server: Apache
ETag: "0192384-9023-1a929893"
Accept-Ranges: bytes
Content-Length: 33286
Vary: Accept-Encoding,User-Agent
Cache-Control: max-age=7200
Expires: Fri, 30 Dec 2005 19:30:56 GMT
Content-Type: text/xml

There are lots of directives that can be put in the Cache-Control: header, and the Cache-Control: header may appear in both requests and/or responses.

Directive Description
no-cache The cached response must not be used to satisfy this request.
no-store Do not store this response in a cache.
max-age=delta-seconds The client is willing to accept a cached reponse that is delta-seconds old without validating.
max-stale=delta-seconds The client is willing to accept a cached response that is no more than delta-seconds stale.
min-fresh=delta-seconds The client is willing to accept only a cached response that will still be fresh delta-seconds from now.
no-transform The entity body must not be transformed.
only-if-cached Return a response only if there is one in the cache. Do not validate or GET a response if no cache entry exists.

Directive Description
public This can be cached by any cache.
private This can be cached only by a private cache.
no-cache The cached response must not be used on subsequent requests without first validating it.
no-store Do not store this response in a cache.
no-transform The entity body must not be transformed.
must-revalidate If the cached response is stale it must be validated before it is returned in any response. Overrides max-stale.
max-age=delta-seconds The client is willing to accept a cached reponse that is delta-seconds old without validating.
s-maxage=delta-seconds Just like max-age but it applies only to shared caches.
proxy-revalidate Like must-revalidate, but only for proxies.

Let's look at some Cache-Control: header examples.

Cache-Control: private, max-age=3600

If sent by a server, this Cache-Control: header states that the response can only be cached in a private cache for one hour.

Cache-Control: public, must-revalidate, max-age=7200

The included response can be cached by a public cache and can be cached for two hours; after that the cache must revalidate the entry before returning it to a subsequent request.

Cache-Control: must-revalidate, max-age=0

This forces the client to revalidate every request, since a max-age=0 forces the cached entry to be instantly stale. See Mark Nottingham's Leveraging the Web: Caching for a nice example of how this can be applied.

Cache-Control: no-cache

This is pretty close to must-revalidate, max-age=0, except that a client could use a max-stale header on a request and get a stale response. The must-revalidate will override the max-stale property. I told you that giving both client and server some control would make things a bit complicated.

So far all of the Cache-Control: header examples we have looked at are on the response side, but they can also be added on the request too.

Cache-Control: no-cache

This forces an "end-to-end reload," where the client forces the cache to reload its cache from the origin server.

Cache-Control: min-fresh=200

Here the client asserts that it wants a response that will be fresh for at least 200 seconds.

Vary

You may be wondering about situations where a cache might get confused. For example, what if a server does content negotiation, where different representations can be returned from the same URI? For cases like this HTTP supplies the Vary: header. The Vary: header informs the cache of the names of the all headers that might cause a resources representation to change.

For example, if a server did do content negotiation then the Content-Type: header would be different for the different types of responses, depending on the type of content negotiated. In that case the server can add a Vary: accept header, which causes the cache to consider the Accept: header when caching responses from that URI.

Date: Mon, 23 Jan 2006 15:37:34 GMT
Server: Apache
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Cache-Control: max-age=7200
Expires: Mon, 23 Jan 2006 17:37:34 GMT
Content-Length: 5073
Content-Type: text/html; charset=utf-8

In this example the server is stating that responses can be cached for two hours, but that responses may vary based on the Accept-Encoding and User-Agent headers.

When a server successfully validates a cached response, using for example the If-None-Match: header, then the server returns a status code of 304 Not Modified. So nothing much happens on a 304 Not Modified response, right? Well, not exactly. In fact, the server can send updated headers for the entity that have to be updated in the cache. The server can also send along a Connection: header that says which headers shouldn't be updated.

Some headers are by default excluded from list of headers to update. These are called hop-by-hop headers and they are: Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization, TE, Trailers, Transfer-Encoding, and Upgrade. All other headers are considered end-to-end headers.

HTTP/1.1 304 Not Modified
Content-Length: 647
Server: Apache
Connection: close
Date: Mon, 23 Jan 2006 16:10:52 GMT
Content-Type: text/html; charset=iso-8859-1

...

In the above example Date: is not a hop-by-hop header nor is it listed in the Connection: header, so the cache has to update the value of Date: in the cache.

While a little complex, the above is at least conceptually nice. Of course, one of the problems is that we have to be able to work with HTTP 1.0 servers and caches which use a different set of headers, all time-based, to do caching and out of necessity those are brought forward into HTTP 1.1.

The older cache control model from HTTP 1.0 is based solely on time. The Last-Modified cache validator is just that, the last time that the resource was modified. The cache uses the Date:, Expires:, Last-Modified:, and If-Modified-Since: headers to detect changes in a resource.

If you are developing a client you should always use both validators if present; you never know when an HTTP 1.0 cache will pop up between you and a server. HTTP 1.1 was published seven years ago so you'd think that at this late date most things would be updated. This is the protocol equivalent of wearing a belt and suspenders.

Now that you understand caching you may be wondering if the client library in your favorite language even supports caching. I know the answer for Python, and sadly that answer is currently no. It pains me that my favorite language doesn't have one of the best HTTP client implementations around. That needs to change.

Introducing httplib2, a comprehensive Python HTTP client library that supports a local private cache that understands all the caching operations we just talked about. In addition it supports many features left out of other HTTP libraries.

HTTP and HTTPS
HTTPS support is available only if the socket module was compiled with SSL support.
Keep-Alive
Supports HTTP 1.1 Keep-Alive, keeping the socket open and performing multiple requests over the same connection if possible.
Authentication
The following three types of HTTP Authentication are supported. These can be used over both HTTP and HTTPS.
Caching
The module can optionally operate with a private cache that understands the Cache-Control: header and uses both the ETag and Last-Modified cache validators.
All Methods
The module can handle any HTTP request method, not just GET and POST.
Redirects
Automatically follows 3XX redirects on GETs.
Compression
Handles both compress and gzip types of compression.
Lost Update Support
Automatically adds back ETags into PUT requests to resources we have already cached. This implements Section 3.2 of Detecting the Lost Update Problem Using Unreserved Checkout.
Unit Tested
A large and growing set of unit tests.

See the httplib2 project page for more details.

Next time I will cover HTTP authentication, redirects, keep-alive, and compression in HTTP and how httplib2 handles them. You might also be wondering how the "big guys" handle caching. That will take a whole other article to cover.

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.

 

Linked Cache Invalidation

Save

After designing and deploying Cache Channels, it quickly became apparent that one Web cache invalidation mechanism wasn’t able to cover the breadth of use cases.

 

In a nutshell, Cache Channels trades off immediacy for reliability; that is, while cache invalidations don’t take place right away (there’s a 10-30 second window), you know that they’ll be respected, because of how the protocol is designed.

 

That’s great if you need to (for example) invalidate news articles with months of TTL because the legal department is freaking out, but not so good if you want your users’ changes to be reflected on the pages they’re looking at, while still keeping your cache efficient.

 

Linked Cache Invalidation is another approach, with different proerties. Briefly, it allows you to declare the relationships between resources, so that when one changes (because of a person POSTing a blog comment, for example), the cache knows enough to invalidate the related resources. Not perfectly reliable, but great when you’re working with certain kinds of content.

 

So, a couple of years ago I coded this up at Yahoo! and eventually got our Squid implementation Open Sourced, both as a few patches (now sitting on Squid 2.7 HEAD) and as a “helper process” to do the behind-the-scenes accounting and invalidation.

 

Then, a funny thing happened. I was on the programme committee of WS-REST 2010, where Mike Kelly and Michael Hausenblas had submitted a paper called “Using HTTP Link: Header for Gateway Cache Invalidation.” Needless to say, I had a bit of a chuckle and started talking to Mike K. Essentially, they’d created the same system; great minds think alike.

 

I’m blogging about this now because, finally, Mike and I have submitted an Internet-Draft to register the link relations and explain how it works. I would note that the inv-maxage Cache-Control header is not implemented in Squid yet, so that implementation will only work with gateway caches; it can’t be relied to work with proxy caches.

 

If you caught my “Stupid Caching Tricks” at Velocity last year, I mentioned LCI near the end. It has has now been in production in a few parts of Yahoo! for a while now, and the feedback is pretty positive. While it isn’t the last word in invalidation systems, I think it’s a really nice balance. Check out the draft for more details.

seriously - Objective-C HTTP library

Save
seriously

The Objective-C HTTP library that Apple should have created, seriously. Read more

Seriously
---------
The iPhone needs a better way to make HTTP requests, specifically calls to
REST web services. Seriously mixes Blocks with NSURLConnection &
NSOperationQueue to do just that. It also will automatically parse the JSON
response into a dictionary if the response headers are set correctly.

Install
-------
Just drag the files from the "src" directory into your project. You can also try
using the included "Seriously.framework" file

Parse JSON EXAMPLE
------------------
    NSString *url = @"http://api.twitter.com/1/users/show.json?screen_name=probablycorey;"

    [Seriously get:url handler:^(id body, NSHTTPURLResposne *response, NSError *error) {
        if (error) {
            NSLog(@"Got error %@", error);
        }
        else {
            NSLog(@"Look, JSON gets parsed into an dictionary");
            NSLog(@"%@", [body objectForKey:@"profile_background_image_url"]);
        }
    }];

Simple Queue Example
--------------------
    NSArray *urls = [NSArray arrayWithObjects:
                     @"http://farm5.static.flickr.com/4138/4744205956_1f08ae40e3_o.jpg,"
                     @"http://farm5.static.flickr.com/4123/4744238252_d11d0df5a3_b.jpg,"
                     @"http://farm5.static.flickr.com/4097/4743596319_50cce97d80_o.jpg,"
                     @"http://farm5.static.flickr.com/4099/4743581287_7c50529b36_o.jpg,"
                     @"http://farm5.static.flickr.com/4123/4743587437_78f0906e8a_o.jpg,"
                     @"http://farm5.static.flickr.com/4136/4743562971_d5f5c6d5b1_o.jpg,"
                     @"http://farm5.static.flickr.com/4073/4744205142_be44e64ab7_o.jpg,"
                     nil];

    // By default the NSOperation will only do 3 requests at a time
    for (NSString *url in urls) {
        NSOperation *o = [Seriously request:url options:nil handler:^(id body,
        NSHTTPURLResponse *response, NSError *error) {
            NSLog(@"got %d (%@)", [urls indexOfObject:url], url);
        }];
    }

Why Are You Using Blocks?
-------------------------
Welcome to the future dude!

TODO
----
- Document
- Add XML parsing
- Add more options for NSOperationQueue management

RFC 5861 - HTTP caching - stale-if-error / stale-while-revalidate

Save
The stale-if-error HTTP Cache-Control extension allows a cache to return a stale response when an error -- e.g., a 500 Internal Server Error, a network segment, or DNS failure -- is encountered, rather than returning a "hard" error. This improves availability.

The stale-while-revalidate HTTP Cache-Control extension allows a cache to immediately return a stale response while it revalidates it in the background, thereby hiding latency (both in the network and on the server) from clients.

Comments (1)

Patrice Neff

Patrice Neff Jul 9, 2010

I recommend we start using a real proxy (Squid, Varnish or Traffic Server) and then start using these extension headers for some responses. Squid definitely implements them, not yet sure about the other two.

Two HTTP Caching Extensions

Save

Two HTTP Caching Extensions

We use caching extensively inside Yahoo! to improve scalability, latency and availability for back-end HTTP services, as I’ve discussed before.

However, there are a few situations where the plain vanilla HTTP caching model doesn’t quite do the trick. Rather than come up with one-off solultions to our problems, we tried going in the other direction; finding the most general solution that still met our needs, in the hopes of meeting others’ as well. Here are two of them (with specs and implementation).

stale-while-revalidate

The first problem you’ve got when you rely on HTTP caching for performance is simple — what happens when the cache is stale? If fresh responses come in a small number of milliseconds (as they usually do in a well-tuned cache), while stale ones take 200ms or more (as running code often leads to), users will notice (as will your execs).

The naïve solution is to pre-fetch things into cache before the become stale, but this leads to all sorts of problems; deciding when to pre-fetch is a major headache, and if you don’t get it right, you’ll overload your cache, the network or your back-end systems, if not all three.

A more elegant way to do this is to give the cache permission to serve slightly stale content, as long as it refreshes things in the background.

Above, request #1 is served from a fresh cache, as per normal. When the cache becomes stale and stale-while-revalidate is in effect, request #2 will kick off an asyncronous request back to the origin server, while still being served from cache as if it were still fresh (as #3 is, because it’s still inside the stale-while-revalidate “window”). Assuming that the cache is successfully updated, #4 gets served fresh from cache, because that’s what it is now.

So, in a nutshell, stale-while-revalidate hides back-end latency from your clients by taking some liberty with freshness (which you control). See the stale-while-revalidate Internet-Draft for more information.

stale-if-error

The other issue we had was when services go down. In many cases, it’s preferable not to show users a “hard” error, but instead to use slightly stale content, if it’s available. Stale-if-error allows you to do this — again, in a way that’s controllable by you.

For example, Yahoo! Tech has a number of modules on its front page that are sourced from services. If a back-end service has a glitch, in many cases it’s better to show news (for example) that’s a few minutes old, rather than have a blank space on the page. Stale-if-error makes this possible.

Again, see the stale-if-error Internet-Draft for details.

A Word About Cache-Control

People who have looked at these often comment on their requirement for a Cache-Control header; they often just want to be able to configure their cache manually, rather than go around modifying HTTP headers. In fact, we got this request from so many people, we did add this capability in implementation (see below).

That said, my preference is for the Cache-Control extensions, and I always strongly encourage people to use them. Why? Because, while it’s easy for an admin to go into a cache and change things, you then have decoupled the URIs (services) from their metadata; if the services change, it isn’t obvious that some cache configuration somewhere may have to change as well. Additionally, if you have multiple clients caching your data, you then have to go out and remember where all of them are (chances are, you’ll miss one), and configure each. Not good practice.

Status

Both of these extensions are documented and, in my mind, pretty stable; the I-D’s have expired, but AFAICT all I need to do is double-check things, re-submit them and request publication (as Informational RFCs). I’m going to wait a little while to see if anybody has some feedback that I can incorporate.

We also have implementation of both in Squid, coded by Henrik. Currently, there’s a changeset sitting on 2.HEAD, but hopefully it’ll get incorporated in 2.7. Note that that changeset doesn’t have support for the Cache-Control extensions, but only for the squid.conf directives for controlling these mechanisms; when the drafts start progressing, that should change.

The intent here is to make these features available to anyone who wants them; we don’t want to maintain private Squid extensions, and Squid isn’t the only interesting cache in the world. Enjoy, thanks again to Henrik and Yahoo!, and again I’d love any feedback you have.

RFC5861: HTTP Stale Controls

Save

RFC5861: HTTP Stale Controls

On a bit of a roll, RFC5861: HTTP Stale Controls has (finally) been published as an Informational RFC.

As discussed before in “Two HTTP Caching Extensions,” these are very useful ways to hide latency and errors from your end users. While they’re most useful in HTTP gateway caches (a.k.a. reverse proxy caches / accelerators), very latency-sensitive sites might find them useful as well when working with “normal” proxy caches.

Both are implemented in Squid 2.7. Not only does Squid respect both response Cache-Control directives, but it also allows you to tweak its behaviour using the stale-while-revalidate and max-stale refresh_pattern options. Squid 3.2 should have them when it’s released, and I understand that Apache Traffic Server will have stale-while-revalidate available soon as well.

(1 - 10 of 13)