Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: David Goehrig <dave-VzFlwLIqKc6rG/iDocfnWg <at> public.gmane.org>
Subject: Re: Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
Newsgroups: gmane.comp.lang.smalltalk.fonc
Date: Friday 13th April 2012 15:34:08 UTC (over 4 years ago)
There's a very simple concept that most of the world embraces in everything
from supply chain management, to personnel allocations, to personal
relationships. 

We call it *slack*

What Dan is talking about amounts to introducing slack into distributed
models.  Particularly this version of the definition of slack:

": lacking in completeness, finish, or perfection "

Which is a more realistic version of computation in a universe with
propagation delay (finite speed of light). But it also introduces a concept
similar to anyone familiar with ropes. You can't tie a knot without some
slack. (computation being an exercise in binary sequence knot making).
Finishing a computation is analogous to pulling the rope taunt. 

Dave





-=-=- [email protected] -=-=-

On Apr 13, 2012, at 5:53 AM, Eugen Leitl
 wrote:

> 
> http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html
> 
> Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
> 
> Tuesday, March 6, 2012 at 9:15AM
> 
> The argument for a massively multicore future is now familiar: while
clock
> speeds have leveled off, device density is increasing, so the future is
cheap
> chips with hundreds and thousands of cores. That’s the inexorable logic
> behind our multicore future.
> 
> The unsolved question that lurks deep in the dark part of a
programmer’s mind
> is: how on earth are we to program these things? For problems that
aren’t
> embarrassingly parallel, we really have no idea. IBM Research’s David
Ungar
> has an idea. And it’s radical in the extreme...
> 
> Grace Hopper once advised “It's easier to ask for forgiveness than it
is to
> get permission.” I wonder if she had any idea that her strategy for
dealing
> with human bureaucracy would the same strategy David Ungar thinks will
help
> us tame  the technological bureaucracy of 1000+ core systems?
> 
> You may recognize David as the co-creator of the Self programming
language,
> inspiration for the HotSpot technology in the JVM and the prototype model
> used by Javascript. He’s also the innovator behind using cartoon
animation
> techniques to build user interfaces. Now he’s applying that same
creative
> zeal to solving the multicore problem.
> 
> During a talk on his research, Everything You Know (about Parallel
> Programming) Is Wrong! A Wild Screed about the Future, he called his
approach
> “anti-lock or “race and repair” because the core idea is that the
only way
> we’re going to be able to program the new multicore chips of the future
is to
> sidestep Amdhal’s Law and program without serialization, without locks,
> embracing non-determinism. Without locks calculations will obviously be
> wrong, but correct answers can be approached over time using techniques
like
> fresheners:
> 
>    A thread that, instead of responding to user requests, repeatedly
selects
> a cached value according to some strategy, and recomputes that value from
its
> inputs, in case the value had been inconsistent. Experimentation with a
> prototype showed that on a 16-core system with a 50/50 split between
workers
> and fresheners, fewer than 2% of the queries would return an answer that
had
> been stale for at least eight mean query times. These results suggest
that
> tolerance of inconsistency can be an effective strategy in circumventing
> Amdahl’s law.
> 
> During his talk David mentioned that he’s trying  to find a better name
than
> “anti-lock or “race and repair” for this line of thinking. Throwing
my hat
> into the name game, I want to call it Ask For Forgiveness Programming
(AFFP),
> based on the idea that using locks is “asking for permission”
programming, so
> not using locks along with fresheners is really “asking for
forgiveness.” I
> think it works, but it’s just a thought.
> 
> No Shared Lock Goes Unpunished
> 
> Amdahl’s Law is used to understand why simply having more cores won’t
save us
> for a large class of problems. The idea is that any program is made up of
a
> serial fraction and a parallel fraction. More cores only helps you with
the
> parallel portion. If an operation takes 10 seconds, for example, and one
> second of it is serial, then having infinitely many cores will only help
you
> make the parallelizable part faster, the serial code will always take 
one
> second. Amdahl says you can never go faster than that 10%. As long as
your
> code has a serial portion it’s impossible to go faster.
> 
> Jakob Engblom recounts a similar line of thought in his blog:
> 
>    They also had the opportunity to test their solution [for parallel
> Erlang] on a Tilera 64-core machines. This mercilessly exposed any
> scalability limitations in their system, and proved the conventional
wisdom
> that going beyond 10+ cores is quite different from scaling from 1 to
8… The
> two key lessons they learned was that no shared lock goes unpunished, and
> data has to be distributed as well as code.
> 
>    It seems that for all big system parallelization efforts turn into a
hunt
> for locks and the splitting up of code and data into units that can run
in
> parallel without having to synchronize. The “upper limit” of this
process is
> clearly a system with no synchronization points at all.
> 
> Sherlock Holmes says that when you have eliminated the impossible,
whatever
> remains, however improbable, must be the truth, so the truth is: removing
> serialization is the only way to use all these cores. Since
synchronization
> is the core serial component to applications, we must get rid of
> synchronization.
> 
> A Transaction View Isn’t as Strong as it Once Was
> 
> Getting rid of locks/synchronization may not be the radical notion it
once
> was. Developments over the last few years have conditioned us to deal
with
> more ambiguity, at least for distributed systems.
> 
> Through many discussions about CAP, we’ve come to accept a
non-transactional
> view of databases as a way to preserve availability in the face of
> partitions. Even if a read doesn’t return the last write, we know it
will
> eventually and that some merge logic will make them consistent once
again.
> 
> The idea of compensating transactions as a way around the performance
> problems due to distributed coordination has also become more familiar.
> 
> Another related idea comes from the realm of optimistic concurrency
control
> that lets multiple transactions proceed in parallel without locking and
then
> checks at commit time if a conflict requires a rollback. The application
then
> gets to try again.
> 
> And for some time now memcache has supported a compare-and-set operation
that
> allows multiple clients to avoid writing over each other by comparing
time
> stamps.
> 
> As relaxed as these methods are, they all still require that old enemy:
> synchronization.
> 
> Your Answers are Already Wrong
> 
> The core difficulty with abandoning synchronization is coming to terms
with
> the notion that results may not be correct at all times. It’s a
certainty we
> love certainty, so even considering we could get wrong answers for any
window
> of time is heresy.
> 
> David says we need to change our way of thinking. The idea is to get
results
> that are “good enough, soon enough.” Get wrong answers quickly, but
that are
> still right enough to be useful. Perfect is the enemy of the good and
perfect
> answers simply take too long at scale.
> 
> David emphasises repeatedly that there’s a fundamental trade-off
between
> correctness and performance. Without locks operations happen out of
order,
> which gives the wrong answer. Race conditions happen. To get correct
answers
> we effectively add in delays, making one thing wait for another, which
kills
> performance, and we don’t want to kill performance, we want to use all
those
> cores.
> 
> But consider an aspect of working on distributed systems that people
don’t
> like to think about: your answers are already always wrong.
> 
> Unless you are querying a read-only corpus or use a global lock, in
> distributed systems any answer to any query is potentially wrong, always.
> This is the hardest idea to get into your brain when working in
distributed
> systems with many cores that experience simultaneous updates. The world
never
> stops. It’s always in flux. You can never assume for a single moment
there’s
> a stable frame of reference. The answer from a query you just made could
be
> wrong in the next instant. A query to see if a host is up, for example,
can’t
> ever be assumed right. That host maybe up or down in the next instant and
> your code won’t know about it until it finds a conflict.
> 
> So are we really that far away from accepting that all answers to queries
> could be wrong?
> 
> Lessons from Nature
> 
> One strategy for dealing with many cores is to move towards biological
models
> instead of mathematical models, where complicated behaviours emerge
without
> global determinism. Bird flocks, for example, emerge from three simple
rules:
> avoid crowding, steer towards average heading of neighbors,  steer
towards
> average position of neighbors. No pi-calculus required, it works without
> synchronization or coordination. Each bird is essentially its own thread,
it
> simply looks around and makes local decisions. This is a more cellular
> automaton view of the world.
> 
> Race and Repair - Mitigating Wrong Results
> 
> The idea is that errors created by data races won’t be prevented, they
will
> be repaired. A calculation made without locks will be wrong under
concurrent
> updates. So why not use some of our many cores to calculate the right
answers
> in the background, in parallel, and update the values? This approach:
> 
>    Uses no synchronization.
> 
>    Tolerates some wrong answers.
> 
>    Probabilistically fixes the answers over time.
> 
> Some obvious questions: how many background threads do you need? What
order
> should values be recalculated? And how wrong will your answers be?
> 
> To figure this out an experiment was run and described in Inconsistency
> Robustness for Scalability in Interactive Concurrent‑Update In-Memory
MOLAP
> Cubes, which test updates on a complicated spreadsheet.
> 
> With locking the results were correct, but scalability was limited.
Without
> locking, results were usually wrong. Both results are as might be
expected.
> And when they added freshener threads they found:
> 
>    Combining the frequency with the duration data suggests that in a
16-core
> system with a 50/50 split between workers and fresheners, fewer than 2%
of
> the queries would return an answer that had been stale for at least eight
> mean query times.
> 
> I found this result quite surprising. I would have expected the answers
to be
> wrong more of the time. Could this actually work?
> 
> Breadcrumbs
> 
> The simple idea of Race and Repair is open to a lot of clever
innovations.
> Breadcrumbs are one such innovation that attempts to be smarter about
which
> values need recalculating. Meta-data is attached on a value indicating
that a
> entity is recalculating the value or has changed a dependent value such
that
> this value is now out of date. Any entity that might want to use this
data
> can wait until a “valid” value is calculated and/or not to insert a
> calculated value if it is out of data. This narrows the window of time in
> which errors are introduced. It’s a mitigation.
> 
> There are endless variations of this. I can imagine remembering
calculations
> that used a value and then publishing updated values so those
calculations
> can be rerun. The result is a roiling  event driven sea that is
constantly
> bubbling with updates that are trying to bring values towards
correctness,
> but probably never quite getting there.
> 
> Probabilistic Data Structures
> 
> Another area David has researched are hashtables that can be inserted
into
> without synchronization. This would allow entries to be added to a
hashtable
> from any number of cores without slowing down the entire system with a
lock.
> Naive insertion into a hashtable in parallel will mess up pointers, which
can
> either result in the loss of values or the insertion of values, but
it’s
> possible to work around these issues and create a lock free hashtable.
> 
> His presentation goes into a lot of detail on how this might work. He
rejects
> light-weight locking schmes like CAS because these are still a choke
point
> and there’s a penalty for atomic instructions under contention. They
won’t
> scale.
> 
> He thinks there’s a big research opportunity in probabilistic data
structures
> that work without synchronization and that work with mitigation.
> 
> Is this the Future?
> 
> This is just a light weight introduction. For more details please read
all
> the papers and watch all the videos. But I think it’s important to talk
about
> and think about how we might make use of all these cores for traditional
> programming tasks, though the result may be anything but traditional.
> 
> A bad experience on one project makes me somewhat skeptical that human
nature
> will ever be comfortable in accepting wrong answers as the norm. My
> experience report was on an event driven system with a large number of
nodes
> that could generate events so fast that events had to be dropped. Imagine
a
> city undergoing an earthquake and a huge sensor net spewing out change
events
> to tell the world what’s going on in real-time.
> 
> The original requirement was events could never be dropped, ever, which
made
> a certain amount of sense when the number of sensors was small. As the
number
> of sensors expands it’s simply not possible. My proposal was to drop
events
> and have background processes query the actual sensors in the background
so
> that the database would be synced over time. Very much like the proposals
> here.
> 
> It was a no go. A vehement no go. Titanic arguments ensued.  Management
and
> everyone on down simply could not accept the idea that their models would
be
> out of sync with the sensors (which of course they were anyway). My
eventual
> solution took a year to implement and radically changed everything, but
that
> was simpler than trying to convince people to deal with uncertainty.
> 
> So there might be some convincing to do.
> 
> Related Articles
> 
>    Everything You Know (About Parallel Programming) Is Wrong!: A Wild
Screed
> About the Future (slides, video)
> 
>    Renaissance Project
> 
>    On Hacker News
> 
>    David Ungar: It is Good to be Wrong
> 
>    We Really Don't Know How To Compute!
> 
>    Harnessing emergence for manycore programming: early experience
> integrating ensembles, adverbs, and object-based inheritance
> 
>    Inconsistency Robustness for Scalability in Interactive
Concurrent‑Update
> In-Memory MOLAP Cubes
> 
> 
> _______________________________________________
> fonc mailing list
> [email protected]
> http://vpri.org/mailman/listinfo/fonc
 
CD: 20ms