Dmitriy Lyubimov | 1 Sep 01:06 2011
Picon

Re: Singular vectors of a recommendation Item-Item space

we usually denote inverse , A^{-1} or just A^-1

Apostrophe, superscript star or {top} always mean transpose. I never
saw apostrophe to be used for inverses.

Perhaps confusion may be stemming from the fact that inverse equal
transpose if matrices are orthogonal (or even orthonormal), so
sometimes Q^{-1}\equiv{Q}'

On Wed, Aug 31, 2011 at 3:57 PM, Lance Norskog <goksron <at> gmail.com> wrote:
> In this text-only notation, I though apostrophe meant inverse. What then is
> matrix inversion?
>
> I see a fair amount of stuff here in what I think is MathML, but is displays
> raw in gmail.
>
> On Wed, Aug 31, 2011 at 8:04 AM, Ted Dunning <ted.dunning <at> gmail.com> wrote:
>
>> Uhh...
>>
>> A' is the transpose of A.  Not the inverse.
>>
>> A' A *is* the summation version.
>>
>> On Wed, Aug 31, 2011 at 1:24 AM, Lance Norskog <goksron <at> gmail.com> wrote:
>>
>> > "Also, if your original matrix is A, then it is usually a shorter path to
>> > results to analyze the word (item) cooccurrence matrix A'A.  The methods
>> > below work either way."
>> >
>> > The cooccurrence definitions I'm finding only use the summation-based one
>> > in
>> > wikipedia. Are there any involving inverting the matrix instead? Or, as I
>> > suspect, are the two exactly the same but I don't know enough linear
>> > algebra?
>> >
>> > On Mon, Aug 29, 2011 at 3:28 PM, Ted Dunning <ted.dunning <at> gmail.com>
>> > wrote:
>> >
>> > > Jeff,
>> > >
>> > > I think that this is a much simpler exposition:
>> > > http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
>> > >
>> > > It makes the connection with entropy clear and allows a very simple
>> > > implementation for more than 2x2 situations.
>> > >
>> > > More comments in-line:
>> > >
>> > > On Mon, Aug 29, 2011 at 1:34 PM, Jeff Hansen <dscheffy <at> gmail.com>
>> wrote:
>> > >
>> > > > ...
>> > > > If we have a item/feature matrix (document/words, user/movies) then
>> we
>> > > can
>> > > > consider each row or column to be a sample from larger reference
>> > > population
>> > > > of the full matrix.
>> > >
>> > >
>> > > Well, sort of.  If you have a document (user) and a word (item), then
>> you
>> > > have a joint probability that any given interaction will be between
>> this
>> > > document and word.  We pretend in this case that each interaction is
>> > > independent of every other which is patently not true, but very
>> helpful.
>> > >
>> > > This joint probability can be approximated more or less accurately as
>> the
>> > > product of document (user) and word (item) popularities.  In matrix
>> > terms,
>> > > this is the same as approximating the joint probability as a rank-1
>> > matrix
>> > > that is the outer produced of user and item probability distributions,
>> > each
>> > > of which is a vector.  The point of what we are trying to do is to find
>> a
>> > > good and economical approximation of the joint probability that
>> captures
>> > > important deviations from this rank-1 approximation.  There are two
>> > > problems
>> > > that come up in trying to find this approximation.  The first is that
>> we
>> > > see
>> > > counts rather than probabilities and the second is that the matrix is
>> > big.
>> > >  Sometimes, it is very big.
>> > >
>> > >
>> > > > In that case the log likelihood significance for any
>> > > > given observation within the sample will be based on the observation
>> > > itself
>> > > > (1), the count of observations for the column (documents with this
>> > > > word/users that watched this movie), the count of observations for
>> the
>> > > row
>> > > > (words in this document, movies this user has watched) and the total
>> > > number
>> > > > of any observation within the reference population.  Given those
>> > numbers,
>> > > > G2
>> > > > or log likelihood is just a calculation of a, b, c and d (values
>> > > described
>> > > > above respectively)
>> > > >
>> > >
>> > > As the blog I mentioned above makes more clear than original paper, you
>> > > need
>> > > to reduce the counts before taking them as entries in the contingency
>> > > table.
>> > >  If you are looking at the typical problem of vocabulary in documents,
>> > then
>> > > the discrepancy won't be huge, but if you have any very popular items,
>> > this
>> > > could be a problem.
>> > >
>> > > Given the log likelihood for each individual observation, is the best
>> way
>> > > to
>> > > > apply it simply to remove observations that don't have a high enough
>> > > level
>> > > > of significance?
>> > >
>> > >
>> > > Roughly.  It is a bit misleading to talk about "significance" here
>> since
>> > we
>> > > aren't trying to do a classic frequentist hypothesis test.  Instead,
>> what
>> > > we
>> > > are trying to do is prioritize which joint terms we need to use to get
>> a
>> > > usefully better approximation of the underlying joint distribution.
>> > >
>> > >
>> > > > Or is there a better way to normalize the values rather
>> > > > than removing less significant ones.
>> > >
>> > >
>> > > There are a lot of ways of approaching this.  Most don't do any better
>> > than
>> > > the simple expedient of just keeping the highest scoring k words for
>> each
>> > > document (or items for each user).  Having k be the same for all users
>> is
>> > > not a bad thing at all from a number of angles.  I often use k = 50 to
>> > 300.
>> > >
>> > > Also, in this case I feel like it makes more sense to treat Users like
>> > > > Documents and movies like observations of words within a document, so
>> > > > compare each User and their observations to
>> > > > the reference population rather than each Movie and the users that
>> > > reviewed
>> > > > it.
>> > >
>> > >
>> > > I think that you are on the right track here, but your second sentence
>> > made
>> > > a distinction I didn't understand.  User is to item as Document is to
>> > Word
>> > > is a fine metaphor for recording your observations. I strongly prefer
>> > > binary
>> > > observations, so I usually either count all ratings as 1 or all ratings
>> > > above a threshold as 1 with everything else being a zero.
>> > >
>> > > Whether you subsequently try to find users similar to each other or
>> items
>> > > similar to each other doesn't much matter.  All are reasonable things
>> to
>> > > try.  Whether they are useful is up to you and your customer to decide.
>> > >
>> > > Also, if your original matrix is A, then it is usually a shorter path
>> to
>> > > results to analyze the word (item) cooccurrence matrix A'A.  The
>> methods
>> > > below work either way.
>> > >
>> > >
>> > > > I realize this is a Mahout user list, so I feel somewhat guilty
>> > > constantly
>> > > > pasting in R code, but here's how I ended up implementing it.
>> > >
>> > >
>> > > I do it.  Use the right tools for the right task.  R is great for
>> > > prototyping.
>> > >
>> > >
>> > > > #rough equation that calculates log likelihood given a contingency
>> > table
>> > > of
>> > > > counts
>> > > > llr <- function(a,b,c,d){
>> > > >    r=(a+b)/(c+d)
>> > > >    e1=c*r
>> > > >    e2=d*r
>> > > >    g2=2*(a*log(a/e1)+b*log(b/e2))
>> > > >    return(g2)
>> > > > }
>> > > >
>> > >
>> > > So this code implies that you are arranging the elements of the
>> > contingency
>> > > table this way:
>> > >
>> > >  a    b  | a+b
>> > >  c    d  | c+d
>> > > ---------+--------
>> > > a+c  b+d | a+b+c+d
>> > >
>> > > But your later code you pass in data that uses this convention
>> > >
>> > >  a   b-a   | b
>> > >  c   d-b-c | d-b
>> > > -----------+-----
>> > > a+c  d-a-c | d
>> > >
>> > > I don't like this form because it makes code more complex overall and
>> > > breaks
>> > > important symmetries.
>> > >
>> > > #get counts to calculate log likelihood for function above
>> > > > a <- 1
>> > > > b <- apply(A,2,sum)
>> > >
>> > > c <- apply(A,1,sum)
>> > > >
>> > >
>> > > you can also use rowSums and columnSums
>> > >
>> > > d <- sum(A)
>> > > >
>> > > > #export sparse matrix A to a dataframe for calculating llr
>> > > > triples <- summary(A)
>> > > > #create b and c value vectors that sync up with the dataframe columns
>> > > > bx <- b[triples$j]
>> > > > cx <- c[triples$i]
>> > > > #caculate the log likelihood for each entry in the dataframe
>> > > > ll <- llr(a,bx,cx,d)
>> > > >
>> > >
>> > > This probably needs to be something more like
>> > >
>> > > llr(a, bx-a, d-cx, d-bx-cx+a)
>> > >
>> > > (I think)
>> > >
>> > > Also, summary produces different values for different kinds of
>> matrices.
>> > >  As
>> > > you point out, it is very handy for sparse matrices, but it is nice to
>> > have
>> > > code that works on sparse or dense matrices.
>> > >
>> > >
>> > > > #create a logical operator vector that indicates whether each entry
>> in
>> > > > dataframe is significant
>> > > > significant <- ll>3
>> > > >
>> > >
>> > > I think that a much higher cutoff is better.  Commonly a cutoff of
>> 10-30
>> > is
>> > > useful.  Also, you may be better off if you can take the k-highest.
>> > >
>> > >
>> > > > #build a new sparse vector that only entries with significant enough
>> > log
>> > > > likelihood
>> > > > B <-
>> > sparseMatrix(i=triples$i[significant],j=triples$j[significant],x=1)
>> > > >
>> > >
>> > >
>> > >
>> > > >
>> > > > By the way, I always used to struggle with matrix multiplication (I
>> > could
>> > > > never quite remember which side was rows and which side was columns
>> > with
>> > > > out
>> > > > little mnemonics, but then again I've always struggled with
>> remembering
>> > > my
>> > > > left from my right).  I recently realized it makes much more sense if
>> > you
>> > > > picture it as a cube in 3space -- if you match up the lower left hand
>> > > > corner
>> > > > of the right matrix with the upper right hand corner of the left
>> matrix
>> > > and
>> > > > put the product in between them so you get a backwards capital L
>> shape,
>> > > > then
>> > > > fold back the right and left matrices you get three faces of a cube
>> (or
>> > > > cubic rectangle).  Then you just project inwards from the original
>> > > matrices
>> > > > and multiply where ever the values cross and sum them up those
>> products
>> > > as
>> > > > you project them toward the face which is the resulting matrix.  Once
>> > you
>> > > > visualize it this way it makes slicing and dicing the operation into
>> > > > blockwise formulas trivial and obvious.  If only somebody had
>> explained
>> > > it
>> > > > this way I would have found matrices much less intimidating -- has
>> > > anybody
>> > > > ever seen it explained this way?
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Aug 26, 2011 at 10:21 PM, Lance Norskog <goksron <at> gmail.com>
>> > > wrote:
>> > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html
>> > > > >
>> > > > > Do not fear demographics. Yes, some people rent movies with
>> all-black
>> > > > > casts,
>> > > > > and other people rent movies with all-white casts. And the Walmarts
>> > in
>> > > > the
>> > > > > SF East Bay have palettes full of Tyler Perry videos, while most of
>> > the
>> > > > SF
>> > > > > Peninsula don't know his name.
>> > > > >
>> > > > > And sometimes a bunch of Army Rangers have a great time watching
>> > > > > "Confessions of a Shopaholic" in a tent, 120o farenheit, in Qatar.
>> > > > >
>> > > > > On Fri, Aug 26, 2011 at 8:29 AM, Jeff Hansen <dscheffy <at> gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Thanks for the math Ted -- that was very helpful.
>> > > > > >
>> > > > > > I've been using sparseMatrix() from libray(Matrix) -- largely
>> based
>> > > on
>> > > > > your
>> > > > > > response to somebody elses email.  I've been playing with smaller
>> > > > > matrices
>> > > > > > mainly for my own learning purposes -- it's much easier to read
>> > > through
>> > > > > 200
>> > > > > > movies (most of which I've heard of) and get a gut feel, than
>> > 10,000
>> > > > > > movies.
>> > > > > >  It also means I don't have to shut down my session quite as
>> often
>> > > > (I've
>> > > > > > been using rstudio) when I run something over the wrong
>> > > > dimension(column
>> > > > > vs
>> > > > > > row).  I was running into some limitations as well, but I think
>> > some
>> > > of
>> > > > > > those may have had more to do with my own typos and
>> > misunderstanding
>> > > of
>> > > > > > that
>> > > > > > language.
>> > > > > >
>> > > > > > There's a second reason I've been avoiding the tail -- while
>> > working
>> > > on
>> > > > > > this
>> > > > > > as a possible project to propose, I had a bit of a "duh" moment.
>> > > > >  Sometimes
>> > > > > > it's important to realize the real world constraints.  Picture a
>> > > > company
>> > > > > > with very physical locations and very limited shelf space (say a
>> > > > > > convenience
>> > > > > > store or even a vending machine...)  Netflix and Amazon can make
>> a
>> > > lot
>> > > > of
>> > > > > > money in the long tail because they have centralized and or
>> digital
>> > > > > > inventories that service a large customer base.  Imagine a
>> > situation
>> > > > > where
>> > > > > > you have a small physical distributed inventory with an equally
>> > > > > distributed
>> > > > > > customer base.  In that situation it doesn't pay to chase after
>> the
>> > > > tail
>> > > > > --
>> > > > > > you just need to reduce your costs and target the head where you
>> > get
>> > > > more
>> > > > > > volume with less items.  So you really end up with a three tier
>> > > market
>> > > > > > segmentation -- one strategy works best for the head, another for
>> > the
>> > > > > body,
>> > > > > > and a third for the tail.
>> > > > > >
>> > > > > > As far as clusters go -- I really wasn't finding any clusters at
>> > the
>> > > > > edges
>> > > > > > of the data, but that could have more to do with not including
>> the
>> > > tail
>> > > > > > (and
>> > > > > > not normalizing appropriately for popularity).
>> > > > > >
>> > > > > > Incidentally, there was one amusingly cautionary anecdote I'd
>> share
>> > > --
>> > > > I
>> > > > > > don't remember the exact spread, but at one point I had two
>> > extremes
>> > > > that
>> > > > > > included something like "the johnson family vacation", "my baby's
>> > > > daddy",
>> > > > > > and "barbershop 2" on the one side, and then titles like "mean
>> > > girls",
>> > > > > "50
>> > > > > > first dates" and "along came polly" on the other side.  You may
>> > have
>> > > to
>> > > > > > look
>> > > > > > up those titles to see what I'm talking about, but I'd say the
>> > > > > distinction
>> > > > > > will be pretty clear when you do -- and if you were simply
>> > automating
>> > > > all
>> > > > > > of
>> > > > > > this and not reviewing it with common sense, you could end up
>> > > offending
>> > > > > > some
>> > > > > > of your users...  All I'm saying is there may be trends in the
>> real
>> > > > world
>> > > > > > that some people aren't comfortable having pointed out.
>> > > > > >
>> > > > > > On Fri, Aug 26, 2011 at 4:08 AM, Lance Norskog <
>> goksron <at> gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > This is a little meditation on user v.s. item matrix density.
>> The
>> > > > > > > heavy users and heavy items can be subsampled, once they are
>> > > > > > > identified. Hadoop's built-in sort does give a very simple
>> > > > > > > "map-increase" way to do this sort.
>> > > > > > >
>> > > > > > >
>> > > >
>> http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html
>> > > > > > >
>> > > > > > > On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <
>> > > ted.dunning <at> gmail.com>
>> > > > > > > wrote:
>> > > > > > > > In matrix terms the binary user x item matrix maps a set of
>> > items
>> > > > to
>> > > > > > > users
>> > > > > > > > (A h = users who interacted with items in h).  Similarly A'
>> > maps
>> > > > > users
>> > > > > > to
>> > > > > > > > items.  Thus A' (A h) is the classic "users who x'ed this
>> also
>> > > x'ed
>> > > > > > that"
>> > > > > > > > sort of operation.  This can be rearranged to be (A'A) h.
>> > > > > > > >
>> > > > > > > > This is where the singular values come in.  If
>> > > > > > > >
>> > > > > > > >     A \approx U S V'
>> > > > > > > >
>> > > > > > > > and we know that U'U = V'V = I from the definition of the
>> SVD.
>> > > >  This
>> > > > > > > means
>> > > > > > > > that
>> > > > > > > >
>> > > > > > > >    A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V'
>> > > > > > > >
>> > > > > > > > But V' transforms an item vector into the reduced dimensional
>> > > space
>> > > > > so
>> > > > > > we
>> > > > > > > > can view this as transforming the item vector into the
>> reduced
>> > > > > > > dimensional
>> > > > > > > > space, scaling element-wise using S^2 and then transforming
>> > back
>> > > to
>> > > > > > item
>> > > > > > > > space.
>> > > > > > > >
>> > > > > > > > As you noted, the first right singular vector corresponds to
>> > > > > > popularity.
>> > > > > > >  It
>> > > > > > > > is often not appropriate to just recommend popular things
>> > (since
>> > > > > users
>> > > > > > > have
>> > > > > > > > other means of discovering them) so you might want to drop
>> that
>> > > > > > dimension
>> > > > > > > > from the analysis.  This can be done in the SVD results or it
>> > can
>> > > > be
>> > > > > > done
>> > > > > > > > using something like a log-likelihood ratio test so that we
>> > only
>> > > > > model
>> > > > > > > > anomalously large values in A'A.
>> > > > > > > >
>> > > > > > > > Btw, if you continue to use R for experimentation, you should
>> > be
>> > > > able
>> > > > > > to
>> > > > > > > > dramatically increase the size of the analysis you do by
>> using
>> > > > sparse
>> > > > > > > > matrices and a random projection algorithm.  I was able to
>> > > compute
>> > > > > > SVD's
>> > > > > > > of
>> > > > > > > > matrices with millions of rows and columns and a few million
>> > > > non-zero
>> > > > > > > > elements in a few seconds.  Holler if you would like details
>> > > about
>> > > > > how
>> > > > > > to
>> > > > > > > do
>> > > > > > > > this.
>> > > > > > > >
>> > > > > > > > On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <
>> > dscheffy <at> gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > >> One thing I found interesting (but not particularly
>> > surprising)
>> > > is
>> > > > > > that
>> > > > > > > the
>> > > > > > > >> biggest singular value/vector was pretty much tied directly
>> to
>> > > > > volume.
>> > > > > > > >>  That
>> > > > > > > >> makes sense because the best predictor of whether a given
>> > fields
>> > > > > value
>> > > > > > > was
>> > > > > > > >> 1
>> > > > > > > >> was whether it belonged to a row with lots of 1s or a column
>> > > with
>> > > > > lots
>> > > > > > > of
>> > > > > > > >> 1s
>> > > > > > > >> (I haven't quite figured out the best method to normalize
>> the
>> > > > values
>> > > > > > > with
>> > > > > > > >> yet).  When I plot the values of the largest singular
>> vectors
>> > > > > against
>> > > > > > > the
>> > > > > > > >> sum of values in the corresponding row or column, the
>> > > correlation
>> > > > is
>> > > > > > > very
>> > > > > > > >> linear.  That's not the case for any of the other singular
>> > > vectors
>> > > > > > > (which
>> > > > > > > >> actually makes me wonder if just removing that first
>> singular
>> > > > vector
>> > > > > > > from
>> > > > > > > >> the prediction might not be one of the best ways to
>> normalize
>> > > the
>> > > > > > data).
>> > > > > > > >>
>> > > > > > > >> I understand what you're saying about each singular vector
>> > > > > > corresponding
>> > > > > > > to
>> > > > > > > >> a feature though.  Each left singular vector represents some
>> > > > > abstract
>> > > > > > > >> aspect
>> > > > > > > >> of a movie and each right singular vector represents users
>> > > > leanings
>> > > > > or
>> > > > > > > >> inclinations with regards to that aspect of the movie.  The
>> > > > singular
>> > > > > > > value
>> > > > > > > >> itself just seems to indicate how good a predictor the
>> > > combination
>> > > > > of
>> > > > > > > one
>> > > > > > > >> users inclination toward that aspect of a movie is for
>> coming
>> > up
>> > > > > with
>> > > > > > > the
>> > > > > > > >> actual value.  The issue I mentioned above is that
>> popularity
>> > of
>> > > a
>> > > > > > movie
>> > > > > > > as
>> > > > > > > >> well as how often a user watches movies tend to be the best
>> > > > > predictors
>> > > > > > > of
>> > > > > > > >> whether a user has seen or will see a movie.
>> > > > > > > >>
>> > > > > > > >> I had been picturing this with the idea of one k dimensional
>> > > space
>> > > > > --
>> > > > > > > one
>> > > > > > > >> where a users location corresponds to their ideal
>> prototypical
>> > > > > movies
>> > > > > > > >> location. Not that there would be a movie right there, but
>> > there
>> > > > > would
>> > > > > > > be
>> > > > > > > >> ones nearby, and the nearer they were, the more enjoyable
>> they
>> > > > would
>> > > > > > be.
>> > > > > > > >>  That's a naive model, but that doesn't mean it wouldn't
>> work
>> > > well
>> > > > > > > enough.
>> > > > > > > >>  My problem is I don't quite get how to map the user space
>> > over
>> > > to
>> > > > > the
>> > > > > > > item
>> > > > > > > >> space.
>> > > > > > > >>
>> > > > > > > >> I think that may be what Lance is trying to describe in his
>> > last
>> > > > > > > response,
>> > > > > > > >> but I'm falling short on reconstructing the math from his
>> > > > > description.
>> > > > > > > >>
>> > > > > > > >> I get the following
>> > > > > > > >>
>> > > > > > > >> U S V* = A
>> > > > > > > >> U S = A V
>> > > > > > > >> U = A V 1/S
>> > > > > > > >>
>> > > > > > > >> I think that last line is what Lance was describing.  Of
>> > course
>> > > my
>> > > > > > > problem
>> > > > > > > >> was bootstrapping in a user for whom I don't know A or V.
>> > > > > > > >>
>> > > > > > > >> I also think I may have missed a big step of the puzzle.
>>  For
>> > > some
>> > > > > > > reason I
>> > > > > > > >> thought that just by loosening the rank, you could recompose
>> > the
>> > > > > > Matrix
>> > > > > > > A
>> > > > > > > >> from the truncated SVD values/vectors and use the recomposed
>> > > > values
>> > > > > > > >> themselves as the recommendation.  I thought one of the
>> ideas
>> > > was
>> > > > > that
>> > > > > > > the
>> > > > > > > >> recomposed matrix had less "noise" and could be a better
>> > > > > > representation
>> > > > > > > of
>> > > > > > > >> the underlying nature of the matrix than the original matrix
>> > > > itself.
>> > > > > > >  But
>> > > > > > > >> that may have just been wishful thinking...
>> > > > > > > >>
>> > > > > > > >> On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <
>> srowen <at> gmail.com>
>> > > > > wrote:
>> > > > > > > >>
>> > > > > > > >> > The 200x10 matrix is indeed a matrix of 10 singular
>> vectors,
>> > > > which
>> > > > > > are
>> > > > > > > >> > eigenvectors of AA'. It's the columns, not rows, that are
>> > > > > > > >> > eigenvectors.
>> > > > > > > >> >
>> > > > > > > >> > The rows do mean something. I think it's fair to interpret
>> > the
>> > > > 10
>> > > > > > > >> > singular values / vectors as corresponding to some
>> > underlying
>> > > > > > features
>> > > > > > > >> > of tastes. The rows say how much each user expresses those
>> > 10
>> > > > > > tastes.
>> > > > > > > >> > The matrix of right singular vectors on the other side
>> tells
>> > > you
>> > > > > the
>> > > > > > > >> > same thing about items. The diagonal matrix of singular
>> > values
>> > > > in
>> > > > > > the
>> > > > > > > >> > middle also comes into play -- it's like a set of
>> > multipliers
>> > > > that
>> > > > > > say
>> > > > > > > >> > how important each feature is. (This is why we cut out the
>> > > > > singular
>> > > > > > > >> > vectors / values that have the smallest singular values;
>> > it's
>> > > > like
>> > > > > > > >> > removing the least-important features.) So really you'd
>> have
>> > > to
>> > > > > > stick
>> > > > > > > >> > those values somewhere; Ted says it's conventional to put
>> > > "half"
>> > > > > of
>> > > > > > > >> > each (their square roots) with each side if anything.
>> > > > > > > >> >
>> > > > > > > >> > I don't have as good a grasp on an intuition for the
>> columns
>> > > as
>> > > > > > > >> > eigenvectors. They're also a set of basis vectors, and I
>> had
>> > > > > > > >> > understood them as like the "real" bases of the reduced
>> > > feature
>> > > > > > space
>> > > > > > > >> > expressed in user-item space. But I'd have to go back and
>> > > think
>> > > > > that
>> > > > > > > >> > intuition through again since I can't really justify it
>> from
>> > > > > scratch
>> > > > > > > >> > again in my head just now.
>> > > > > > > >> >
>> > > > > > > >> >
>> > > > > > > >> > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <
>> > > > dscheffy <at> gmail.com
>> > > > > >
>> > > > > > > >> wrote:
>> > > > > > > >> > > Well, I think my problem may have had more to do with
>> what
>> > I
>> > > > was
>> > > > > > > >> calling
>> > > > > > > >> > the
>> > > > > > > >> > > eigenvector...  I was referring to the rows rather than
>> > the
>> > > > > > columns
>> > > > > > > of
>> > > > > > > >> U
>> > > > > > > >> > and
>> > > > > > > >> > > V.  While the columns may be characteristic of the
>> overall
>> > > > > matrix,
>> > > > > > > the
>> > > > > > > >> > rows
>> > > > > > > >> > > are characteristic of the user or item (in that they are
>> a
>> > > > rank
>> > > > > > > reduced
>> > > > > > > >> > > representation of that person or thing). I guess you
>> could
>> > > say
>> > > > I
>> > > > > > > just
>> > > > > > > >> had
>> > > > > > > >> > to
>> > > > > > > >> > > tilt my head to the side and change my perspective 90
>> > > degrees
>> > > > =)
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Lance Norskog
>> > > > > > > goksron <at> gmail.com
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Lance Norskog
>> > > > > goksron <at> gmail.com
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Lance Norskog
>> > goksron <at> gmail.com
>> >
>>
>
>
>
> --
> Lance Norskog
> goksron <at> gmail.com
>


Gmane