MESSAGE
DATE | 2015-04-14 |
FROM | Ruben Safir
|
SUBJECT | Subject: [LIU Comp Sci] hashing slots sizes
|
From owner-learn-outgoing-at-mrbrklyn.com Tue Apr 14 10:36:11 2015 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: by mrbrklyn.com (Postfix) id 3EA15161154; Tue, 14 Apr 2015 10:36:11 -0400 (EDT) Delivered-To: learn-outgoing-at-mrbrklyn.com Received: by mrbrklyn.com (Postfix, from userid 28) id 2197016115E; Tue, 14 Apr 2015 10:36:11 -0400 (EDT) Delivered-To: learn-at-nylxs.com Received: from mailbackend.panix.com (mailbackend.panix.com [166.84.1.89]) by mrbrklyn.com (Postfix) with ESMTP id 79D80161154 for ; Tue, 14 Apr 2015 10:35:46 -0400 (EDT) Received: from [10.0.0.19] (www.mrbrklyn.com [96.57.23.82]) by mailbackend.panix.com (Postfix) with ESMTPSA id D9F7913EFD for ; Tue, 14 Apr 2015 10:35:45 -0400 (EDT) Message-ID: <552D25C1.8000708-at-panix.com> Date: Tue, 14 Apr 2015 10:35:45 -0400 From: Ruben Safir User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 Followup-To: Jose,A.,Rodriguez, To: learn-at-nylxs.com Subject: [LIU Comp Sci] hashing slots sizes Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-learn-at-mrbrklyn.com Precedence: bulk Reply-To: learn-at-mrbrklyn.com
we have in the notes that chosing
m=2**P is a poor choice since k % 2**p is the p lowest-order bits of m.
I'm running some numbers to see if I understand the nature of this problem
is m is 2**4 (16) and then k is 31 then the slot from the hash is 15 so that is good. And 30 is 14. It seems like as good as any other distribution which can be constructed.
You then give the example of a 000 keys all ending up on the zero slot and conclude that it is a good idea to have a hash key that depends all all the bits.
First, I'm not certain that is possible regardless how m is represented BUT, the issue here seems to be the number of keys slots (or the size of m). the significance of the bits don't matter, but replication is a big problem. Regardless of the m chosen, the slots will replicate in a cycle ever m times in a cycle. Are we trying to create cryptography or a hash? Also do we know that the data will have lots of 000 numbers? If so then we would want to adjust m
|
|