Latent Semantic Analysis – Part 1

*************************************************************************
Dear Reader,

All the post in shakthydoss.wordpress.com have been moved to shakthydoss.com

shakthydoss.wordpress.com is no longer functioning. To get the latest updates and follow up your comments please come to shakthydoss.com and get subscribed.

Find Latent Semantic Analysis – Part 1  @ http://shakthydoss.com/latent-semantic-analysis-part-1/

Thank you
shakthydoss

**************************************************************************

Advertisements

7 Responses to “Latent Semantic Analysis – Part 1”

  1. ganesh Says:

    awesome dude——-
    ur work is very much helpful for me…

  2. sudheer Says:

    hi,
    can u plz make clear ur assumption and declaration of row -column and vice versa , are different..

    countMatrix [length of ArrayCollection][ total number of documents]

    Also i have posted a question to ur mail did u check.. reply me..

  3. shakthydoss Says:

    CountMatrix [ No.of.row ] [ No.of.column]

    No.of.row is nothing but no of words in all documents
    No.of.column is nothing but no of documents in corpus

  4. sudheer Says:

    One more question i have to you is:

    Is that Number of words in all documents (in previous post ) include:

    1) terms remained after Stopword removal stemming (index words)?? or full document(s) tokens.

    2)the above index words resulted , are they assumed duplicated in index file u have?? or removed repeated words

    Waiting 4 reply
    Regards

    • shakthydoss Says:

      It is terms remaining after stop-words removing and stemming process.

      Index will not have duplications.

  5. TV TUAN Says:

    You use tdidf (not tfifd), so it might work with word similarity, since you can somehow define the tdidf of the query Q for each document by couting occurences of all words in Q in that document ? (T,F ?), however, if you want to solve doc-similarity then tdidf will not fix the case. I think we need to nomalize both term frequency, not only doc frequency, how do we define the tfidf for document query Q?

  6. huangzy Says:

    I hava a question:

    In this article, you wrote the words followed:
    For example, the word “market” (whose annotated value is 2 in row) appears 4 times in a particular document (whose annotated value is 5) in column.

    4 should not be in the 2nd row, 5th column in the count matrix (the 4 in blue color is 5th row, 2nd column in your picture)? Or am I wrong?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: