There's a very simple concept that most of the world embraces in everything
from supply chain management, to personnel allocations, to personal
We call it *slack*
What Dan is talking about amounts to introducing slack into distributed
models. Particularly this version of the definition of slack:
": lacking in completeness, finish, or perfection "
Which is a more realistic version of computation in a universe with
propagation delay (finite speed of light). But it also introduces a concept
similar to anyone familiar with ropes. You can't tie a knot without some
slack. (computation being an exercise in binary sequence knot making).
Finishing a computation is analogous to pulling the rope taunt.
-=-=- [email protected] -=-=-
On Apr 13, 2012, at 5:53 AM, Eugen Leitl
> Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
> Tuesday, March 6, 2012 at 9:15AM
> The argument for a massively multicore future is now familiar: while
> speeds have leveled off, device density is increasing, so the future is
> chips with hundreds and thousands of cores. That’s the inexorable logic
> behind our multicore future.
> The unsolved question that lurks deep in the dark part of a
> is: how on earth are we to program these things? For problems that
> embarrassingly parallel, we really have no idea. IBM Research’s David
> has an idea. And it’s radical in the extreme...
> Grace Hopper once advised “It's easier to ask for forgiveness than it
> get permission.” I wonder if she had any idea that her strategy for
> with human bureaucracy would the same strategy David Ungar thinks will
> us tame the technological bureaucracy of 1000+ core systems?
> You may recognize David as the co-creator of the Self programming
> inspiration for the HotSpot technology in the JVM and the prototype model
> techniques to build user interfaces. Now he’s applying that same
> zeal to solving the multicore problem.
> During a talk on his research, Everything You Know (about Parallel
> Programming) Is Wrong! A Wild Screed about the Future, he called his
> “anti-lock or “race and repair” because the core idea is that the
> we’re going to be able to program the new multicore chips of the future
> sidestep Amdhal’s Law and program without serialization, without locks,
> embracing non-determinism. Without locks calculations will obviously be
> wrong, but correct answers can be approached over time using techniques
> A thread that, instead of responding to user requests, repeatedly
> a cached value according to some strategy, and recomputes that value from
> inputs, in case the value had been inconsistent. Experimentation with a
> prototype showed that on a 16-core system with a 50/50 split between
> and fresheners, fewer than 2% of the queries would return an answer that
> been stale for at least eight mean query times. These results suggest
> tolerance of inconsistency can be an effective strategy in circumventing
> Amdahl’s law.
> During his talk David mentioned that he’s trying to find a better name
> “anti-lock or “race and repair” for this line of thinking. Throwing
> into the name game, I want to call it Ask For Forgiveness Programming
> based on the idea that using locks is “asking for permission”
> not using locks along with fresheners is really “asking for
> think it works, but it’s just a thought.
> No Shared Lock Goes Unpunished
> Amdahl’s Law is used to understand why simply having more cores won’t
> for a large class of problems. The idea is that any program is made up of
> serial fraction and a parallel fraction. More cores only helps you with
> parallel portion. If an operation takes 10 seconds, for example, and one
> second of it is serial, then having infinitely many cores will only help
> make the parallelizable part faster, the serial code will always take
> second. Amdahl says you can never go faster than that 10%. As long as
> code has a serial portion it’s impossible to go faster.
> Jakob Engblom recounts a similar line of thought in his blog:
> They also had the opportunity to test their solution [for parallel
> Erlang] on a Tilera 64-core machines. This mercilessly exposed any
> scalability limitations in their system, and proved the conventional
> that going beyond 10+ cores is quite different from scaling from 1 to
> two key lessons they learned was that no shared lock goes unpunished, and
> data has to be distributed as well as code.
> It seems that for all big system parallelization efforts turn into a
> for locks and the splitting up of code and data into units that can run
> parallel without having to synchronize. The “upper limit” of this
> clearly a system with no synchronization points at all.
> Sherlock Holmes says that when you have eliminated the impossible,
> remains, however improbable, must be the truth, so the truth is: removing
> serialization is the only way to use all these cores. Since
> is the core serial component to applications, we must get rid of
> A Transaction View Isn’t as Strong as it Once Was
> Getting rid of locks/synchronization may not be the radical notion it
> was. Developments over the last few years have conditioned us to deal
> more ambiguity, at least for distributed systems.
> Through many discussions about CAP, we’ve come to accept a
> view of databases as a way to preserve availability in the face of
> partitions. Even if a read doesn’t return the last write, we know it
> eventually and that some merge logic will make them consistent once
> The idea of compensating transactions as a way around the performance
> problems due to distributed coordination has also become more familiar.
> Another related idea comes from the realm of optimistic concurrency
> that lets multiple transactions proceed in parallel without locking and
> checks at commit time if a conflict requires a rollback. The application
> gets to try again.
> And for some time now memcache has supported a compare-and-set operation
> allows multiple clients to avoid writing over each other by comparing
> As relaxed as these methods are, they all still require that old enemy:
> Your Answers are Already Wrong
> The core difficulty with abandoning synchronization is coming to terms
> the notion that results may not be correct at all times. It’s a
> love certainty, so even considering we could get wrong answers for any
> of time is heresy.
> David says we need to change our way of thinking. The idea is to get
> that are “good enough, soon enough.” Get wrong answers quickly, but
> still right enough to be useful. Perfect is the enemy of the good and
> answers simply take too long at scale.
> David emphasises repeatedly that there’s a fundamental trade-off
> correctness and performance. Without locks operations happen out of
> which gives the wrong answer. Race conditions happen. To get correct
> we effectively add in delays, making one thing wait for another, which
> performance, and we don’t want to kill performance, we want to use all
> But consider an aspect of working on distributed systems that people
> like to think about: your answers are already always wrong.
> Unless you are querying a read-only corpus or use a global lock, in
> distributed systems any answer to any query is potentially wrong, always.
> This is the hardest idea to get into your brain when working in
> systems with many cores that experience simultaneous updates. The world
> stops. It’s always in flux. You can never assume for a single moment
> a stable frame of reference. The answer from a query you just made could
> wrong in the next instant. A query to see if a host is up, for example,
> ever be assumed right. That host maybe up or down in the next instant and
> your code won’t know about it until it finds a conflict.
> So are we really that far away from accepting that all answers to queries
> could be wrong?
> Lessons from Nature
> One strategy for dealing with many cores is to move towards biological
> instead of mathematical models, where complicated behaviours emerge
> global determinism. Bird flocks, for example, emerge from three simple
> avoid crowding, steer towards average heading of neighbors, steer
> average position of neighbors. No pi-calculus required, it works without
> synchronization or coordination. Each bird is essentially its own thread,
> simply looks around and makes local decisions. This is a more cellular
> automaton view of the world.
> Race and Repair - Mitigating Wrong Results
> The idea is that errors created by data races won’t be prevented, they
> be repaired. A calculation made without locks will be wrong under
> updates. So why not use some of our many cores to calculate the right
> in the background, in parallel, and update the values? This approach:
> Uses no synchronization.
> Tolerates some wrong answers.
> Probabilistically fixes the answers over time.
> Some obvious questions: how many background threads do you need? What
> should values be recalculated? And how wrong will your answers be?
> To figure this out an experiment was run and described in Inconsistency
> Robustness for Scalability in Interactive Concurrent‑Update In-Memory
> Cubes, which test updates on a complicated spreadsheet.
> With locking the results were correct, but scalability was limited.
> locking, results were usually wrong. Both results are as might be
> And when they added freshener threads they found:
> Combining the frequency with the duration data suggests that in a
> system with a 50/50 split between workers and fresheners, fewer than 2%
> the queries would return an answer that had been stale for at least eight
> mean query times.
> I found this result quite surprising. I would have expected the answers
> wrong more of the time. Could this actually work?
> The simple idea of Race and Repair is open to a lot of clever
> Breadcrumbs are one such innovation that attempts to be smarter about
> values need recalculating. Meta-data is attached on a value indicating
> entity is recalculating the value or has changed a dependent value such
> this value is now out of date. Any entity that might want to use this
> can wait until a “valid” value is calculated and/or not to insert a
> calculated value if it is out of data. This narrows the window of time in
> which errors are introduced. It’s a mitigation.
> There are endless variations of this. I can imagine remembering
> that used a value and then publishing updated values so those
> can be rerun. The result is a roiling event driven sea that is
> bubbling with updates that are trying to bring values towards
> but probably never quite getting there.
> Probabilistic Data Structures
> Another area David has researched are hashtables that can be inserted
> without synchronization. This would allow entries to be added to a
> from any number of cores without slowing down the entire system with a
> Naive insertion into a hashtable in parallel will mess up pointers, which
> either result in the loss of values or the insertion of values, but
> possible to work around these issues and create a lock free hashtable.
> His presentation goes into a lot of detail on how this might work. He
> light-weight locking schmes like CAS because these are still a choke
> and there’s a penalty for atomic instructions under contention. They
> He thinks there’s a big research opportunity in probabilistic data
> that work without synchronization and that work with mitigation.
> Is this the Future?
> This is just a light weight introduction. For more details please read
> the papers and watch all the videos. But I think it’s important to talk
> and think about how we might make use of all these cores for traditional
> programming tasks, though the result may be anything but traditional.
> A bad experience on one project makes me somewhat skeptical that human
> will ever be comfortable in accepting wrong answers as the norm. My
> experience report was on an event driven system with a large number of
> that could generate events so fast that events had to be dropped. Imagine
> city undergoing an earthquake and a huge sensor net spewing out change
> to tell the world what’s going on in real-time.
> The original requirement was events could never be dropped, ever, which
> a certain amount of sense when the number of sensors was small. As the
> of sensors expands it’s simply not possible. My proposal was to drop
> and have background processes query the actual sensors in the background
> that the database would be synced over time. Very much like the proposals
> It was a no go. A vehement no go. Titanic arguments ensued. Management
> everyone on down simply could not accept the idea that their models would
> out of sync with the sensors (which of course they were anyway). My
> solution took a year to implement and radically changed everything, but
> was simpler than trying to convince people to deal with uncertainty.
> So there might be some convincing to do.
> Related Articles
> Everything You Know (About Parallel Programming) Is Wrong!: A Wild
> About the Future (slides, video)
> Renaissance Project
> On Hacker News
> David Ungar: It is Good to be Wrong
> We Really Don't Know How To Compute!
> Harnessing emergence for manycore programming: early experience
> integrating ensembles, adverbs, and object-based inheritance
> Inconsistency Robustness for Scalability in Interactive
> In-Memory MOLAP Cubes
> fonc mailing list
> [email protected]