Kjetil Kjernsmo, http://folk.uio.no/kjekje/#cache-survey,
Department of Informatics, University of Oslo, Norway. @KKjernsmo
The HTTP standard and Internet infrastucture has good facilities, we shouldn't speculate on whether they can be used.
Support for it is not as good as I hoped, but not as bad as I feared.
Please check that you're doing it right!
From RFC 7234:
The goal of caching in HTTP/1.1 is to significantly improve performance by reusing a prior response message to satisfy a current request.
Number of seconds that the response may be used without contacting the origin server.
RFC7234 allows for heuristics to be used to estimate freshness lifetime.
Also suggests a useful heuristic based on Last-Modified.
We might also use RDF data or machine learning to estimate.
Gather as many different hosts as we could and send HTTP requests to them and record relevant data.
Classify resources into
To decide where to go, we used:
Got a list of 3117 unique hosts, and did 7745 HTTP requests
We had 2965 successful responses
Successful response defined as
|Predicate||Number of occurences|
Another 1822 requests to check if conditional requests were actually supported
Found 85 faulty implementations
|thin 1.6.0 codename Greek Yogurt|
|Apache/2.2.9 (Win32) PHP/5.2.6|
|Apache/2.4.10 (Unix) mod_fcgid/2.3.9|
|Apache/2.2.17 (Unix) mod_wsgi/3.3 Python/2.6.6|
|Virtuoso/07.10.3211 (Linux) i686-generic-linux-glibc212-64 VDB|
|Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/0.9.8y|
We can do a statistical test to see if certain headers occur significantly more frequently.
Pearson's χ2 test with simulated p-value (based on 10000 replicates)