1 Aug 2012 16:58

## Statistics Project Update #6

```
I've been making great progress on the network simulator, and the
probes are due to be deployed in the next build. (1409) I'd like to
give another shot at explaining what this project is doing. I'll start
from the basics.

Freenet is a medium for censorship-resistant communication. An
installation of Freenet is called a node. Each node connects with a
limited number of other nodes. When nodes are connected, they are
called peers. Every node communicates with the rest of the network
solely through its peers.

Each node has some amount of storage space reserved for a datastore.
Freenet allows inserting data into and fetching data from the
datastore made up of all the individual nodes' datastores. Freenet can
be thought of as a distributed, encrypted storage space.

Freenet must be able to determine which nodes to store data on, and
later be able to find that data again. The process of finding a piece
of data, or a place to store it, is called routing. Because nodes
connect with a limited number of peers and communicate only with them,
routing is very difficult because it must be done with only locally
available information.

This is where math comes in. In graph theory, there is a type of
network called a small-world network. A small-world network contains
relatively short routes between any two nodes. Some types of
small-world networks are especially interesting (and useful!) because
they allow finding short routes with only locally available
information. [1][2]

Here's the concept: all nodes have a location, unrelated to
geographical location, which is a number in the range [0, 1): 0
(inclusive) to 1 (exclusive). Every request has an inherent ideal
location which it is routed towards. Nodes route requests by giving
them to the peer whose location is closest to that ideal location.
However, in order for this to be effective, the network must have very
specific characteristics.

Locations can be thought of as wrapped around a circle: 0 at one
point, approaching 1 as it goes around, then wrapping back to 0. 0.3
is 0.2 away from 0.5, and 0.9 is 0.2 away from 0.1. This distance
between peers' locations is called the connection's link length. On
average, nodes must have many connections with shorter link lengths,
and a few connections with longer link lengths. One can think of this
as being able to quickly make large leaps on the location circle and
also make small adjustments.

We in the Freenet community have suspected for a long time that
problems with the distribution of link lengths throughout the network
were interfering with routing. A few months ago, we assembled a rough
measurement of the link length distribution in the network, and the
distribution differed from the ideal. [3] In my previous update,
[4][5] I found that simulation results support this theory: compared
to the ideal, a network with the measured link length distribution
appears to have significant routing problems.

Depending on the network security level, a node can connect with
untrusted peers, called strangers. In an attempt to improve link
length distribution, nodes perform something called path folding.
Oskar Sandberg's paper "Searching in a Small World" gives a
theoretical basis. [6] Freenet does path folding like so: when a (CHK)
fetch succeeds, the endpoint can return an offer to connect (its
"opennet node reference") along with the requested data. As the data
travels backwards along the route to return to the node which made the
request, a node along the way can accept the offered connection, and
the two become peers. To protect the anonymity of the endpoint, the
node which accepted the connection removes the offer to connect, and
the next node on the way back can add its own.

I'm currently working on a network simulator. My goal is to be able to
start the simulator on a network with almost any link length
distribution, and by simulating Freenet's path folding get a link
length distribution similar to what we measure in the real network. If
I'm able to do that, I can try to make changes to improve the link
length distribution reached by the path folding, and suggest similar
changes in Freenet's behavior. We now have a more flexible network
simulator, which means we can more easily simulate the effects of
changes in addition to theorizing about them. We will be able to use
the probes to measure the network-wide effects of changes, instead of
just relying on anecdotal reports as we do currently. The probes will
also be useful for suggesting areas in which the approximations made
by the simulator are too far from the conditions of the real network.

I made a mistake in the chart in my previous update. What I labeled
"Found" should be "Found in cache or datastore" and what I labeled
"Not found" should be "Found in datastore." A corrected chart is here: [7]

[1] http://www.cs.cornell.edu/home/kleinber/nat00.pdf
[2] http://www.cs.cornell.edu/home/kleinber/swn.pdf
[3]