Ian Stirling | 2 Sep 13:48

Not quite wwwoffle - for mobiles and low bandwidth use.

I've used wwwoffle for some years, and it 'just works' for me.

So, naturally, when I realised a need for a possibly related bit of 
software, and lacking results from google. I started wondering if anyone 
had thought of, or knows of an implementation or proof of concept with 
wwwoffled.

Basically, it's a two part web proxy to drastically reduce web usage 
bandwidth.
One part resides on a mobile device with a (usually) poor bandwidth 
link, but relatively large amount of storage, that may occasionally be 
plugged into a high speed network.

The other part is on server, connected via a fast connection to the 
internet.

To quote a page I wrote describing this.

-------------------------------------

"This is a brief page describing a web proxy optimised for use on 
devices with a reasonable amount of persistant storage, and very limited 
bandwidth.

Once, each page linked to a subpage of contents, which remained static, 
and could be easily refreshed if it changed based on dates in the HTTP 
headers.

Now, this is the case in the minority of popular sites. Most sites now 
have a substantial fraction of pages with some non-static content.

As an example of this, for example consider http://www.ebay.com/index.html.

Over a 15 minute period, the size was constant at around 66K, and it was 
different most times it was loaded.

Simply compressing this page using advanced compression techniques 
provides a useful compression - taking the page to 15K.

A very simple test, using diff and gzip however, revealed that the 
variation between pages is quite small.

This means that if the user clicks 'reload', if the proxy simply 
compresses the page, the user needs to download 15K.

If, however, the user-agent and the proxy act in concert, this can be 
reduced to under 0.5K. (split on "<", count the compressed differences).

This is done by the user-agent caching the pages it downloads, then 
informing the proxy of which version of the page it has.

The proxy then simply sends the compressed differences between the 
previous and current version.

Other optimisations:

     * Comparing pages, and ensuring that any page has in fact changed 
before downloading, as many servers misreport pages changed when they 
have not.
     * Convert all jpegs to progressive, and initially only download the 
first 'scan' of the image, which is 1/8th the size or so. Allow the user 
to download the remainder of the file for full resolution by clicking on 
it. "

-------------------------

As I understand it, the easiest way to implement this (over wwwoffle) 
would be with a special protocol.

Basically, the user-side part sends:
* I want http://www.ebay.com/ and have the version timestamped 
1188733136. with the hash f879f6ff876f8f...

The server side sends:
* Here is the page content <compressed page>, you must be confused, I 
don't have that timestamp.

Or
* That page has not changed.

Or
* Here is the diff between the timestamped version and the page you 
requested <compressed diff> the hash of the whole page is <sha-256>.

User-side then checks the hash of the local part, after applying the 
diff, and if so, stores this and serves it to the local browser.

If it doesn't match, it requests the page without compression.

Obviously this is single user only.

To make it multiuser would require also storing pages in the form 
site/D598387453.HASH, and some complex expiry protocol to make sure that 
the same pages get expired per-user on the remote and local side.

Any thoughts?


Gmane