Conventional electrical unit: Difference between revisions

Revision as of 04:39, 3 July 2013

SimRank is a general similarity measure, based on a simple and intuitive graph-theoretic model. SimRank is applicable in any domain with object-to-object relationships, that measures similarity of the structural context in which objects occur, based on their relationships with other objects. Effectively, SimRank is a measure that says "two objects are similar if they are related to similar objects."

Introduction

Many applications require a measure of "similarity" between objects. One obvious example is the "find-similar-document" query, on traditional text corpora or the World-Wide Web. More generally, a similarity measure can be used to cluster objects, such as for collaborative filtering in a recommender system, in which “similar” users and items are grouped based on the users’ preferences.

Various aspects of objects can be used to determine similarity, usually depending on the domain and the appropriate definition of similarity for that domain. In a document corpus, matching text may be used, and for collaborative filtering, similar users may be identified by common preferences. SimRank is a general approach that exploits the object-to-object relationships found in many domains of interest. On the Web, for example, two pages are related if there are hyperlinks between them. A similar approach can be applied to scientific papers and their citations, or to any other document corpus with cross-reference information. In the case of recommender systems, a user’s preference for an item constitutes a relationship between the user and the item. Such domains are naturally modeled as graphs, with nodes representing objects and edges representing relationships.

The intuition behind the SimRank algorithm is that, in many domains, similar objects are related to similar objects. More precisely, objects $a$ and $b$ are similar if they are related to objects $c$ and $d$ , respectively, and $c$ and $d$ are themselves similar. The base case is that objects are similar to themselves .^[1]

It is important to note that SimRank is a general algorithm that determines only the similarity of structural context. SimRank applies to any domain where there are enough relevant relationships between objects to base at least some notion of similarity on relationships. Obviously, similarity of other domain-specific aspects are important as well; these can — and should be combined with relational structural-context similarity for an overall similarity measure. For example, for Web pages SimRank can be combined with traditional textual similarity; the same idea applies to scientific papers or other document corpora. For recommender systems, there may be built-in known similarities between items (e.g., both computers, both clothing, etc.), as well as similarities between users (e.g., same gender, same spending level). Again, these similarities can be combined with the similarity scores that are computed based on preference patterns, in order to produce an overall similarity measure.

Basic SimRank equation

For a node $v$ in a graph, we denote by $I(v)$ and $O(v)$ the set of in-neighbors and out-neighbors of $v$ , respectively. Individual in-neighbours are denoted as $I_{i}(v)$ , for $1\leq i\leq \left|I(v)\right|$ , and individual out-neighbors are denoted as $O_{i}(v)$ , for $1\leq i\leq \left|O(v)\right|$ .

Let us denote the similarity between objects $a$ and $b$ by $s(a,b)\in [0,1]$ . Following the earlier motivation, a recursive equation is written for $s(a,b)$ . If $a=b$ then $s(a,b)$ is defined to be $1$ . Otherwise,

s(a,b)={\frac {C}{\left|I(a)\right|\left|I(b)\right|}}\sum _{i=1}^{\left|I(a)\right|}\sum _{j=1}^{\left|I(b)\right|}s(I_{i}(a),I_{j}(b))

where $C$ is a constant between $0$ and $1$ . A slight technicality here is that either $a$ or $b$ may not have any in-neighbors. Since there is no way to infer any similarity between $a$ and $b$ in this case, similarity is set to $s(a,b)=0$ , so the summation in the above equation is defined to be $0$ when $I(a)=\emptyset$ or $I(b)=\emptyset$ .

Computing SimRank

A solution to the SimRank equations for a graph $G$ can be reached by iteration to a fixed-point. Let $n$ be the number of nodes in $G$ . For each iteration $k$ , we can keep $n^{2}$ entries $R_{k}(*,*)$ of length $n^{2}$ , where $R_{k}(a,b)$ gives the score between $a$ and $b$ on iteration $k$ . We successively compute $R_{k+1}(*,*)$ based on $R_{k}(*,*)$ . We start with $R_{0}(*,*)$ where each $R_{0}(a,b)$ is a lower bound on the actual SimRank score $s(a,b)$ :

R_{0}(a,b)={\begin{cases}1{\mbox{  }},{\mbox{    }}{\mbox{if }}a=b{\mbox{  }},\\0{\mbox{  }},{\mbox{    }}{\mbox{if }}a\neq b{\mbox{  }}.\end{cases}}

To compute $R_{k+1}(a,b)$ from $R_{k}(*,*)$ , we use the basic SimRank equation to get:

R_{k+1}(a,b)={\frac {C}{\left|I(a)\right|\left|I(b)\right|}}\sum _{i=1}^{\left|I(a)\right|}\sum _{j=1}^{\left|I(b)\right|}R_{k}(I_{i}(a),I_{j}(b))

for $a\neq b$ , and $R_{k+1}(a,b)=1$ for $a=b$ . That is, on each iteration $k+1$ , we update the similarity of $(a,b)$ using the similarity scores of the neighbours of $(a,b)$ from the previous iteration $k$ according to the basic SimRank equation. The values $R_{k}(*,*)$ are nondecreasing as $k$ increases. It was shown in ^[1] that the values converge to limits satisfying the basic SimRank equation, the SimRank scores $s(*,*)$ , i.e., for all $a,b\in V$ , $\lim _{k\to \infty }R_{k}(a,b)=s(a,b)$ .

The original SimRank proposal suggested choosing the decay factor $C=0.8$ and a fixed number $K=5$ of iterations to perform. However, the recent research ^[2] showed that the given values for $C$ and $K$ generally imply relatively low accuracy of iteratively computed SimRank scores. For guaranteeing more accurate computation results, the latter paper suggests either using a smaller decay factor (in particular, $C=0.6$ ) or taking more iterations.

Further research on SimRank

Fogaras and Racz ^[3] suggested speeding up SimRank computation through probabilistic computation using the Monte Carlo method.

Antonellis et al.^[4] extended SimRank equations to take into consideration (i) evidence factor for incident nodes and (ii) link weights.

Lizorkin et al.^[2] proposed several optimization techniques for speeding up SimRank iterative computation.

Citations

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

↑ ^1.0 ^1.1 G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In KDD'02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538-543. ACM Press, 2002. [1]
↑ ^2.0 ^2.1 D. Lizorkin, P. Velikhov, M. Grinev and D. Turdakov. Accuracy Estimate and Optimization Techniques for SimRank Computation. In VLDB '08: Proceedings of the 34th International Conference on Very Large Data Bases, pages 422--433. [2]
↑ D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 641--650, New York, NY, USA, 2005. ACM. [3]
↑ I. Antonellis, H. Garcia-Molina and C.-C. Chang. Simrank++: Query Rewriting through Link Analysis of the Click Graph. In VLDB '08: Proceedings of the 34th International Conference on Very Large Data Bases, pages 408--421. [4]

[jeh_widom-1] 1.0 ^1.1 G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In KDD'02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538-543. ACM Press, 2002. [1]

[lizorkin-2] 2.0 ^2.1 D. Lizorkin, P. Velikhov, M. Grinev and D. Turdakov. Accuracy Estimate and Optimization Techniques for SimRank Computation. In VLDB '08: Proceedings of the 34th International Conference on Very Large Data Bases, pages 422--433. [2]

[fogaras_racz-3] D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 641--650, New York, NY, USA, 2005. ACM. [3]

[simrank_plusplus-4] I. Antonellis, H. Garcia-Molina and C.-C. Chang. Simrank++: Query Rewriting through Link Analysis of the Click Graph. In VLDB '08: Proceedings of the 34th International Conference on Very Large Data Bases, pages 408--421. [4]

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
+'''SimRank''' is a general [[Semantic similarity|similarity measure]], based on a simple and intuitive [[Graph theory|graph-theoretic model]].
+SimRank is applicable in any [[Domain model|domain]] with object-to-object [[Relation (mathematics)|relationships]], that measures similarity of the structural context in which objects occur, based on their relationships with other objects.
+Effectively, SimRank is a measure that says "'''two objects are similar if they are related to similar objects'''."
+== Introduction ==
-The main advantage of using the blog is that anyone can use the Word - Press blog and customize the elements in the theme regardless to limited knowledge about internet and website development. It is used by around 25% of all new websites, and there are more than 27 thousand plugins currently available. This is a service where people write articles using a specific keyword you have given them. s and intelligently including a substantial amount of key words in the title tags, image links, etc. This particular wordpress plugin is essential for not only having the capability where you improve your position, but to enhance your organic searches for your website. <br><br>These folders as well as files have to copied and the saved. If you wish to sell your services or products via internet using your website, you have to put together on the website the facility for trouble-free payment transfer between customers and the company. We also help to integrate various plug-ins to expand the functionalities of the web application. These four plugins will make this effort easier and the sites run effectively as well as make other widgets added to a site easier to configure. This can be done by using a popular layout format and your unique Word - Press design can be achieved in other elements of the blog. <br><br>ve labored so hard to publish and put up on their website. Now if we talk about them one by one then -wordpress blog customization means customization of your blog such as installation of wordpress on your server by wordpress developer which will help you to acquire the SEO friendly blog application integrated with your site design as well as separate blog administration panel for starting up your own business blog,which demands a experienced wordpress designer. After age 35, 18% of pregnancies will end in miscarriage. Storing write-ups in advance would have to be neccessary with the auto blogs. Article Source:  Stevens works in Internet and Network Marketing. <br><br>The disadvantage is it requires a considerable amount of time to set every thing up. I didn't straight consider near it solon than one distance, I got the Popup Ascendancy plugin and it's up and lengthways, likely you make seen it today when you visited our blog, and I yet customize it to fit our Thesis Wound which gives it a rattling uncomparable visage and search than any different popup you know seen before on any added journal, I hump arrogated asset of one of it's quatern themes to make our own. A higher percentage of women are marrying at older ages,many are delaying childbearing until their careers are established, the divorce rate is high and many couples remarry and desire their own children. IVF ,fertility,infertility expert,surrogacy specialist in India at Rotundaivf. Where from they are coming, which types of posts are getting top traffic and many more. <br><br>Every single module contains published data and guidelines, usually a lot more than 1 video, and when pertinent, incentive links and PDF files to assist you out. Automated deal feed integration option to populate your blog with relevant deals. As a result, it is really crucial to just take aid of some experience when searching for superior quality totally free Word - Press themes, Word - Press Premium Themes for your web site. If this is not possible you still have the choice of the default theme that is Word - Press 3.  In case you loved this post and you want to receive details relating to [http://s.do-dance.com/wordpressbackupplugin866155 wordpress backup] assure visit the web-site. Get started today so that people searching for your type of business will be directed to you.
+Many [[Application software|applications]] require a measure of "similarity" between objects.
+One obvious example is the "find-similar-document" query,
+on traditional text corpora or the [[World Wide Web|World-Wide Web]].
+More generally, a similarity measure can be used to [[Cluster analysis|cluster objects]], such as for [[collaborative filtering]] in a [[recommender system]], in which “similar” users and items are grouped based on the users’ preferences.
+Various aspects of objects can be used to determine similarity, usually depending on the domain and the appropriate definition of similarity for that domain.
+In a [[Text corpus|document corpus]], matching text may be used, and for collaborative filtering, similar users may be identified by common preferences.
+SimRank is a general approach that exploits the object-to-object relationships found in many domains of interest.
+On the [[World Wide Web|Web]], for example, two pages are related if there are [[hyperlink]]s between them.
+A similar approach can be applied to scientific papers and their citations, or to any other document corpus with [[cross-reference]] information.
+In the case of recommender systems, a user’s preference for an item constitutes a relationship between the user and the item.
+Such domains are naturally modeled as [[Graph (mathematics)|graphs]], with [[Vertex (graph theory)|nodes]] representing objects and [[Edge_(graph_theory)#Graph|edges]] representing relationships.
+The intuition behind the SimRank algorithm is that, in many domains, '''similar objects are related to similar objects'''.
+More precisely, objects <math>a</math> and <math>b</math> are similar if they are related to objects <math>c</math> and <math>d</math>, respectively, and <math>c</math> and <math>d</math> are themselves similar.
+The [[Recursion (computer science)#Recursive_programming|base case]] is that objects are similar to themselves
+.<ref name=jeh_widom>G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In [[SIGKDD|KDD'02]]: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538-543. [[Association for Computing Machinery|ACM Press]], 2002. [http://www-cs-students.stanford.edu/~glenj/simrank.pdf]</ref>
+It is important to note that SimRank is a general algorithm that determines only the similarity of structural context.
+SimRank applies to any domain where there are enough relevant relationships between objects to base at least some notion of similarity on relationships.
+Obviously, similarity of other domain-specific aspects are important as well; these can — and should be combined with relational structural-context similarity for an overall similarity measure.
+For example, for [[Web page]]s SimRank can be combined with traditional textual similarity; the same idea applies to scientific papers or other document corpora.
+For recommender systems, there may be built-in known similarities between items (e.g., both computers, both clothing, etc.), as well as similarities between users (e.g., same gender, same spending level).
+Again, these similarities can be combined with the similarity scores that are computed based on preference patterns, in order to produce an overall similarity measure.
+== Basic SimRank equation ==
+For a node <math>v</math> in a graph, we denote by <math>I(v)</math> and <math>O(v)</math> the set of in-neighbors and out-neighbors of <math>v</math>, respectively.
+Individual in-neighbours are denoted as <math>I_i(v)</math>, for <math>1 \le i \le \left|I(v)\right|</math>, and individual
+out-neighbors are denoted as <math>O_i(v)</math>, for <math>1 \le i \le \left|O(v)\right|</math>.
+Let us denote the similarity between objects <math>a</math> and <math>b</math> by <math>s(a, b) \in [0, 1]</math>.
+Following the earlier motivation, a recursive equation is written for <math>s(a, b)</math>.
+If <math>a = b</math> then <math>s(a, b)</math> is defined to be <math>1</math>.
+Otherwise,
+:<math>s(a, b) = \frac{C}{\left|I(a)\right| \left|I(b)\right|}
+ \sum_{i=1}^{\left|I(a)\right|}\sum_{j=1}^{\left|I(b)\right|}
+ s(I_i(a), I_j(b))</math>
+where <math>C</math> is a constant between <math>0</math> and <math>1</math>.
+A slight technicality here is that either <math>a</math> or <math>b</math> may not have any in-neighbors.
+Since there is no way to infer any similarity between <math>a</math> and <math>b</math> in this case, similarity is set to <math>s(a, b) = 0</math>, so the summation in the above equation is defined to be <math>0</math> when <math>I(a) = \emptyset</math> or <math>I(b) = \emptyset</math>.
+== Computing SimRank ==
+A solution to the SimRank equations for a graph <math>G</math> can be reached by [[Iterative method|iteration]] to a [[Fixed point (mathematics)|fixed-point]].
+Let <math>n</math> be the number of nodes in <math>G</math>.
+For each iteration <math>k</math>, we can keep <math>n^2</math> entries <math>R_k(*, *)</math> of length <math>n^2</math>, where <math>R_k(a, b)</math> gives the score between <math>a</math> and <math>b</math> on iteration <math>k</math>.
+We successively compute <math>R_{k+1}(*, *)</math> based on <math>R_k(*, *)</math>.
+We start with <math>R_0(*, *)</math> where each <math>R_0(a, b)</math> is a lower bound on the actual SimRank score <math>s(a, b)</math>:
+:<math> R_0(a, b) =
+ \begin{cases}
+\mbox{  } , \mbox{    } \mbox{if } a = b  \mbox{  } , \\
+\mbox{  } , \mbox{    } \mbox{if } a \neq b \mbox{  } .
+ \end{cases}</math>
+To compute <math>R_{k+1}(a, b)</math> from <math>R_k(*, *)</math>, we use the basic SimRank equation to get:
+:<math>R_{k + 1}(a, b) =
+ \frac{C}{\left|I(a)\right| \left|I(b)\right|}
+ \sum_{i=1}^{\left|I(a)\right|}\sum_{j=1}^{\left|I(b)\right|}
+  R_k(I_i(a), I_j(b))</math>
+for <math>a \ne b</math>, and <math>R_{k+1}(a, b) = 1</math> for <math>a = b</math>.
+That is, on each iteration <math>k + 1</math>, we update the similarity of <math>(a, b)</math> using the similarity scores of the neighbours of <math>(a, b)</math> from the previous iteration <math>k</math> according to the basic SimRank equation.
+The values <math>R_k(*, *)</math> are [[Monotonic function|nondecreasing]] as <math>k</math> increases.
+It was shown in <ref name="jeh_widom"/> that the values [[Limit of a sequence|converge]] to [[Limit of a sequence|limits]] satisfying the basic SimRank equation, the SimRank scores <math>s(*, *)</math>, i.e., for all <math>a, b \in V</math>, <math>\lim_{k \to \infty} R_k(a, b) = s(a, b)</math>.
+The original SimRank proposal suggested choosing the decay factor <math>C = 0.8</math> and a fixed number <math>K = 5</math> of iterations to perform.
+However, the recent research <ref name="lizorkin">D. Lizorkin, P. Velikhov, M. Grinev and D. Turdakov. Accuracy Estimate and Optimization Techniques for
+SimRank Computation. In [[Very large database|VLDB '08]]: Proceedings of the 34th International Conference on Very Large Data Bases, pages 422--433. [http://modis.ispras.ru/Lizorkin/Publications/simrank_accuracy.pdf]</ref> showed that the given values for <math>C</math> and <math>K</math> generally imply relatively low [[Accuracy and precision|accuracy]] of iteratively computed SimRank scores.
+For guaranteeing more accurate computation results, the latter paper suggests either using a smaller decay factor (in particular, <math>C = 0.6</math>) or taking more iterations.
+== Further research on SimRank ==
+* Fogaras and Racz <ref name="fogaras_racz">D. Fogaras and B. Racz. Scaling link-based similarity search. In [[World Wide Web Conference|WWW '05]]: Proceedings of the 14th international conference on World Wide Web, pages 641--650, New York, NY, USA, 2005. [[Association for Computing Machinery|ACM]]. [http://www2005.org/cdrom/docs/p641.pdf]</ref> suggested speeding up SimRank computation through [[Probability theory|probabilistic]] computation using the [[Monte Carlo method]].
+* Antonellis et al.<ref name="simrank_plusplus">I. Antonellis, H. Garcia-Molina and C.-C. Chang. Simrank++: Query Rewriting through Link Analysis of the Click Graph. In [[Very large database|VLDB '08]]: Proceedings of the 34th International Conference on Very Large Data Bases, pages 408--421. [http://dbpubs.stanford.edu/pub/showDoc.Fulltext?lang=en&doc=2008-17&format=pdf&compression=&name=2008-17.pdf]</ref> extended SimRank equations to take into consideration (i) evidence factor for [[Graph (mathematics)#Properties of graphs|incident nodes]] and (ii) link weights.
+* Lizorkin et al.<ref name="lizorkin"/> proposed several [[Optimization (computer science)|optimization]] techniques for speeding up SimRank iterative computation.
+== See also ==
+* [[PageRank]]
+== Citations ==
+{{reflist|colwidth=30em}}
+[[Category:Information retrieval]]

Conventional electrical unit: Difference between revisions

Revision as of 04:39, 3 July 2013

Contents

Introduction

Basic SimRank equation

Computing SimRank

Further research on SimRank

See also

Citations

Navigation menu

Conventional electrical unit: Difference between revisions

Revision as of 04:39, 3 July 2013

Introduction

Basic SimRank equation

Computing SimRank

Further research on SimRank

See also

Citations

Navigation menu

Search