Lawvere theory: Difference between revisions

Latest revision as of 20:05, 4 July 2013

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.^[1]^[2]^[3] This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

Details

In this method, the entries that hash to the same slot of the table are organized as separate second-level hash table. If there are k entries in this set S, the second-level table is allocated with k² slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function). Therefore, the look-up cost is guaranteed to be O(1) in the worst-case.^[2]

function Locate(x) is
       j = h(x);
       if (position h_j(x) of subtable T_j contains x (not deleted))
          return (x is in S);
       end if
       else 
          return (x is not in S);
       end else
end

Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected O(n) space, since bucket sizes are small with high probability.^[1]

During the insertion of a new entry x at j, the global operations counter, count, is incremented. If x exists at j but is marked as deleted then the mark is removed. If x exists at j, or at the subtable T_j, but is not marked as deleted then a collision is said to occur and the j^th bucket's second-level table T_j is rebuilt with a different randomly selected hash function h_j. Because the load factor of the second-level table is kept low (1/k), rebuilding is infrequent, and the amortized cost of insertions is O(1).^[2]

function Insert(x) is
       count = count + 1;
       if (count > M) 
          FullRehash(x);
       end if
       else
          j = h(x);
          if (Position h_j(x) of subtable T_j contains x)
             if (x is marked deleted) 
                remove the delete marker;
             end if
          end if
          else
             b_j = b_j + 1;
             if (b_j <= m_j) 
                if position h_j(x) of T_j is empty 
                   store x in position h_j(x) of T_j;
                end if
                else
                   Put all unmarked elements of T_j in list L_j;
                   Append x to list L_j;
                   b_j = length of L_j;
                   repeat 
                      h_j = randomly chosen function in H_sj;
                   until h_j is injective on the elements of L_j;
                   for all y on list L_j
                      store y in position h_j(y) of T_j;
                   end for
                end else
             end if
             else
                m_j = 2 * max{1, m_j};
                s_j = 2 * m_j * (m_j - 1);
                if the sum total of all s_j ≤ 32 * M² / s(M) + 4 * M 
                   Allocate s_j cells for T_j;
                   Put all unmarked elements of T_j in list L_j;
                   Append x to list L_j;
                   b_j = length of L_j;
                   repeat 
                      h_j = randomly chosen function in H_sj;
                   until h_j is injective on the elements of L_j;
                   for all y on list L_j
                      store y in position h_j(y) of T_j;
                   end for
                end if
                else
                   FullRehash(x);
                end else
             end else
          end else
       end else
end

Deletion of x simply flags x as deleted without removal and increments count. In the case of both insertions and deletions, if count reaches a threshold M the entire table is rebuilt, where M is some constant multiple of the size of S at the start of a new phase. Here phase refers to the time between full rebuilds. The amortized cost of delete is O(1).^[2] Note that here the -1 in "Delete(x)" is a representation of an element which is not in the set of all possible elements U.

function Delete(x) is
       count = count + 1;
       j = h(x);
       if position h(x) of subtable Tj contains x
          mark x as deleted;
       end if
       else 
          return (x is not a member of S);
       end else
       if (count >= M)
          FullRehash(-1);
       end if
end

A full rebuild of the table of S first starts by removing all elements marked as deleted and then setting the next threshold value M to some constant multiple of the size of S. A hash function, which partitions S into s(M) subsets, where the size of subset j is s_j, is repeatedly randomly chosen until:

$\sum_{0 \leq j \leq s (M)} s_{j} \leq \frac{32 M^{2}}{s (M)} + 4 M .$

Finally, for each subtable T_j a hash function h_j is repeatedly randomly chosen from H_sj until h_j is injective on the elements of T_j. The expected time for a full rebuild of the table of S with size n is O(n).^[2]

function FullRehash(x) is
       Put all unmarked elements of T in list L;
       if (x is in U) 
          append x to L;
       end if
       count = length of list L;
       M = (1 + c) * max{count, 4};
       repeat 
          h = randomly chosen function in H_s(M);
          for all j < s(M) 
             form a list L_j for h(x) = j;
             b_j = length of L_j; 
             m_j = 2 * b_j; 
             s_j = 2 * m_j * (m_j - 1);
          end for
       until the sum total of all s_j ≤ 32 * M² / s(M) + 4 * M
       for all j < s(M) 
          Allocate space s_j for subtable T_j;
          repeat 
             h_j = randomly chosen function in H_sj;
          until h_j is injective on the elements of list L_j;
       end for
       for all x on list L_j 
          store x in position h_j(x) of T_j;
       end for
end

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

↑ ^1.0 ^1.1 Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#
↑ ^2.0 ^2.1 ^2.2 ^2.3 ^2.4 Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#
↑ Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf

[inventor-1] 1.0 ^1.1 Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#

[dietzfelbinger-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#

[3] Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
-Hi there. I would like to start with [http://www.bing.com/search?q=introducing&form=MSNNWS&mkt=en-us&pq=introducing introducing] mcdougal, her brand is Kittie Vowell.  What he loves doing is currently solving pcs but he can't allow it to be his career  He became a collector after being out of his job for many years, but shortly his girlfriend and him will start their own organization  Oregon is his birth-place however now he's currently [http://browse.deviantart.com/?q=contemplating+options contemplating options] that are other  You'll find my website here http://radiolavoce.it/mercatino/.<br><br>my blog post; [http://radiolavoce.it/mercatino/. iphone game cheats]
+In [[computer science]], '''dynamic perfect hashing''' is a programming technique for resolving [[collision (computer science)|collisions]] in a [[hash table]] [[data structure]].<ref name="inventor">Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#</ref><ref name="dietzfelbinger">Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#</ref><ref>Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf</ref> This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.
+==Details==
+In this method, the entries that hash to the same slot of the table are organized as separate second-level hash table.  If there are ''k'' entries in this set ''S'', the second-level table is allocated with ''k''<sup>2</sup> slots, and its [[hash function]] is selected at random from a [[universal hash function]] set so that it is collision-free (i.e. a [[perfect hash function]]). Therefore, the look-up cost is guaranteed to be [[big O notation|O(1)]] [[worst-case complexity|in the worst-case]].<ref name="dietzfelbinger"/>
+  '''function''' Locate(''x'') '''is'''
+        ''j'' = h('''x''');
+        '''if''' (position h<sub>j</sub>(''x'') of subtable ''T<sub>j</sub>'' contains ''x'' (not deleted))
+           '''return''' (''x'' is in ''S'');
+        '''end if'''
+        '''else'''
+           '''return''' (''x'' is not in ''S'');
+        '''end else'''
+ '''end'''
+Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are [[Uniform distribution (discrete)|uniformly distributed]], the structure as a whole occupies expected O(''n'') space, since bucket sizes are small with high [[probability]].<ref name="inventor"/>
+During the insertion of a new entry ''x'' at ''j'', the global operations counter, ''count'', is incremented. If ''x'' exists at ''j'' but is marked as deleted then the mark is removed. If ''x'' exists at ''j'', or at the subtable ''T<sub>j</sub>'',  but is not marked as deleted then a collision is said to occur and the ''j''<sup>th</sup> bucket's second-level table ''T<sub>j</sub>'' is rebuilt with a different randomly selected hash function ''h<sub>j</sub>''. Because the [[load factor (hash table)|load factor]] of the second-level table is kept low (1/''k''), rebuilding is infrequent, and the [[amortized analysis|amortized]] cost of insertions is O(1).<ref name="dietzfelbinger"/>
+  '''function''' Insert(''x'') '''is'''
+        ''count'' = ''count'' + 1;
+        '''if''' (''count'' > ''M'')
+           FullRehash(''x'');
+        '''end if'''
+        '''else'''
+           ''j'' = h(''x'');
+           '''if''' (Position h<sub>''j''</sub>(x) of subtable ''T<sub>j</sub>'' contains ''x'')
+              '''if''' (''x'' is marked deleted)
+                 remove the delete marker;
+              '''end if'''
+           '''end if'''
+           '''else'''
+              ''b<sub>j</sub>'' = ''b<sub>j</sub>'' + 1;
+              '''if''' (''b<sub>j</sub>'' <= ''m<sub>j</sub>'')
+                 '''if''' position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'' is empty
+                    store ''x'' in position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'';
+                 '''end if'''
+                 '''else'''
+                    Put all unmarked elements of ''T<sub>j</sub>'' in list ''L<sub>j</sub>'';
+                    Append ''x'' to list ''L<sub>j</sub>'';
+                    ''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';
+                    '''repeat'''
+                       ''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';
+                    '''until''' ''h<sub>j</sub>'' is injective on the elements of ''L<sub>j</sub>'';
+                    '''for''' all ''y'' on list ''L<sub>j</sub>''
+                       store ''y'' in position h<sub>''j''</sub>(''y'') of ''T<sub>j</sub>'';
+                    '''end for'''
+                 '''end else'''
+              '''end if'''
+              '''else'''
+                 ''m<sub>j</sub>'' = 2 * max{1, ''m<sub>j</sub>''};
+                 ''s<sub>j</sub>'' = 2 * ''m<sub>j</sub>'' * (''m<sub>j</sub>'' - 1);
+                 '''if''' the sum total of all s<sub>j</sub> ≤ 32 * ''M''<sup>2</sup> / ''s''(''M'') + 4 * ''M''
+                    Allocate ''s<sub>j</sub>'' cells for ''T<sub>j</sub>'';
+                    Put all unmarked elements of ''T<sub>j</sub>'' in list ''L<sub>j</sub>'';
+                    Append ''x'' to list ''L<sub>j</sub>'';
+                    ''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';
+                    '''repeat'''
+                       ''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';
+                    '''until''' ''h<sub>j</sub>'' is injective on the elements of ''L<sub>j</sub>'';
+                    '''for''' all ''y'' on list ''L<sub>j</sub>''
+                       store ''y'' in position h<sub>''j''</sub>(''y'') of ''T<sub>j</sub>'';
+                    '''end for'''
+                 '''end if'''
+                 '''else'''
+                    FullRehash(''x'');
+                 '''end else'''
+              '''end else'''
+           '''end else'''
+        '''end else'''
+ '''end'''
+Deletion of ''x'' simply flags ''x'' as deleted without removal and increments ''count''. In the case of both insertions and deletions, if ''count'' reaches a threshold ''M'' the entire table is rebuilt, where ''M'' is some constant multiple of the size of S at the start of a new ''phase''. Here ''phase'' refers to the time between full rebuilds. The amortized cost of delete is O(1).<ref name="dietzfelbinger"/> Note that here the -1 in "Delete(''x'')" is a representation of an element which is not in the set of all possible elements ''U''.
+ '''function''' Delete(''x'') '''is'''
+        ''count'' = ''count'' + 1;
+        ''j'' = h(''x'');
+        '''if''' position h(''x'') of subtable ''Tj'' contains ''x''
+           mark ''x'' as deleted;
+        '''end if'''
+        '''else'''
+           '''return''' (x is not a member of S);
+        '''end else'''
+        '''if''' (''count'' >= ''M'')
+           FullRehash(-1);
+        '''end if'''
+  '''end'''
+A full rebuild of the table of ''S'' first starts by removing all elements marked as deleted and then setting the next threshold value ''M'' to some constant multiple of the size of ''S''. A hash function, which partitions ''S'' into ''s''(''M'') subsets, where the size of subset ''j'' is ''s<sub>j</sub>'', is repeatedly randomly chosen until:
+<math>\sum_{0\le j\le s(M)} s_j \le \frac{32M^2}{s(M)} + 4M.</math>
+Finally, for each subtable ''T<sub>j</sub>'' a hash function ''h<sub>j</sub>'' is repeatedly randomly chosen from ''H<sub>sj</sub>'' until ''h<sub>j</sub>'' is injective on the elements of ''T<sub>j</sub>''. The expected time for a full rebuild of the table of ''S'' with size ''n'' is O(''n'').<ref name="dietzfelbinger"/>
+ '''function''' FullRehash(''x'') '''is'''
+        Put all unmarked elements of ''T'' in list ''L'';
+        '''if''' (''x'' is in ''U'')
+           append ''x'' to ''L'';
+        '''end if'''
+        ''count'' = length of list ''L'';
+        ''M'' = (1 + ''c'') * max{''count'', 4};
+        '''repeat'''
+           h = randomly chosen function in ''H<sub>s(M)</sub>'';
+           '''for''' all ''j'' < ''s''(''M'')
+              form a list ''L<sub>j</sub>'' for h(''x'') = ''j'';
+              ''b<sub>j</sub>'' = length of ''L<sub>j</sub>'';
+              ''m<sub>j</sub>'' = 2 * ''b<sub>j</sub>'';
+              ''s<sub>j</sub>'' = 2 * ''m<sub>j</sub>'' * (''m<sub>j</sub>'' - 1);
+           '''end for'''
+        '''until''' the sum total of all s<sub>j</sub> ≤ 32 * ''M''<sup>2</sup> / ''s''(''M'') + 4 * ''M''
+        '''for''' all ''j'' < ''s''(''M'')
+           Allocate space ''s<sub>j</sub>'' for subtable ''T<sub>j</sub>'';
+           '''repeat'''
+              ''h<sub>j</sub>'' = randomly chosen function in ''H<sub>sj</sub>'';
+           '''until''' ''h<sub>j</sub>'' is injective on the elements of list ''L<sub>j</sub>'';
+        '''end for'''
+        '''for''' all ''x'' on list ''L<sub>j</sub>''
+           store ''x'' in position h<sub>''j''</sub>(''x'') of ''T<sub>j</sub>'';
+        '''end for'''
+ '''end'''
+==See also==
+*[[Perfect hashing]]
+==References==
+{{reflist}}
+{{DEFAULTSORT:Dynamic Perfect Hashing}}
+[[Category:Hashing]]
+[[Category:Search algorithms]]

Lawvere theory: Difference between revisions

Latest revision as of 20:05, 4 July 2013

Details

See also

References

Navigation menu

Search