Discussion:
Comparing two hash tables for equality?
Aleksandar Sandic
2018-08-26 10:01:17 UTC
Permalink
Hello everyone,

I have been looking through the reference manual, but there does not seem to
be a procedure for comparing two hash tables for equality. The procedure
'equal?' only returns true if the two tables are also 'eq?':

scheme@(guile-user)> (define h1 (make-hash-table))
scheme@(guile-user)> (define h2 (make-hash-table))
scheme@(guile-user)> (equal? h1 h2)
$1 = #f
scheme@(guile-user)> (equal? h1 h1)
$2 = #t

Is this the intended behaviour? If so, how can I compare two hash tables for
equality then? I have rolled my own function for the time being:

(define (hash-table-equal? h1 h2)
"- Scheme procedure: hash-table-equal? h1 h2
Compare two hash tables for entry-wise equality.

The arguments 'h1' and 'h2' are not equal if and only if either:
- One of them is not a hash table
- They do not have the same number of entries
- Values for the same key are not 'equal?' or 'hash-equal?'

Two hash tables are always 'hash-equal?' if they are 'eq?'."
;; First take care of the trivial cases. Then compare the length of the
;; tables; if the lengths don't match the tables not equal. Finally,
loop
;; over the keys of one table and compare the values with those in the
other
;; table.
;;
;; This algorithm traverses the first tables thrice and the second table
;; twice in the worst case: Once each to count the keys, once the first
one
;; to collect the keys, and once both to compare all values. The loop
will
;; terminate the instant a mismatch is found.
;;
;; If the first value is a hash table we call this function recursively
;; because hash tables are never 'equal?' (unless they are 'eq?')
(cond
((not (hash-table? h1)) #f)
((not (hash-table? h2)) #f)
((eq? h1 h2) #t)
(else
(let ((n-keys1 (hash-fold (λ (k v keys) (1+ keys)) 0 h1))
(n-keys2 (hash-fold (λ (k v keys) (1+ keys)) 0 h2)))
(if (not (= n-keys1 n-keys2))
#f
(do ((keys (hash-fold (λ (k v keys) (cons k keys)) '() h1) (cdr
keys))
(same? #t))
((or (not same?) (null? keys))
same?)
(let* ((key (car keys))
(v1 (hash-ref h1 key))
(v2 (hash-get-handle h2 key)))
(cond
((not v2) (set! same? #f))
((hash-table? v1) (set! same? (hash-table-equal?
v1 v2)))
((not (equal? v1 (cdr v2))) (set! same? #f))))))))))
Mark H Weaver
2018-08-26 22:49:00 UTC
Permalink
Hi Aleksandar,
Post by Aleksandar Sandic
I have been looking through the reference manual, but there does not seem to
be a procedure for comparing two hash tables for equality. The procedure
$1 = #f
$2 = #t
Is this the intended behaviour?
Yes. That might seem surprising. I have a few things to say on that,
in no particular order:

An equality test on hash tables needs to know how to compare the keys
and how to compare the values. There's no way to pass those additional
arguments to 'equal?', so it can't do that job.

Hash table implementations typically don't offer an equality test on the
hash tables themselves, and I don't recall anyone ever asking for such
an operation before now. I guess that's because in the use case where
hash tables are beneficial -- when the tables are very large --
comparing them for equality is expensive.

While hash tables are indispensible for certain use cases, they also
have very significant downsides, and I tend to avoid them whenever
possible. Their most severe downside, in my opinion, is that they are
fundamentally incompatible with a functional programming style. They
invariably force all code that uses them to be written in an imperative
style. I cannot stress enough how important this is.

It's also not possible to efficiently create a new hash table based on
an existing one, but with one or more additional entries. In Scheme,
you can efficiently 'cons' or 'append' some entries onto the front of an
existing list without modifying the original list. That includes
association lists. To do the same with a hash table, you must make a
copy of the entire hash table and then add the new entries to the copy.

There can be no sharing of storage between multiple hash tables, as can
be done with lists, association lists, or balanced trees. Even if you
don't need sharing, hash tables also use more space than those other
data structures.

So, I would ask yourself whether the benefits of hash tables truly
outweigh the downsides in your particular use case.

If you're comfortable sharing details, I'd be curious to hear what
you're trying to accomplish here, at a higher level. Maybe there's a
better way.

Anyway, if you really need to compare hash tables for equality, here's
some suggested code that's simpler than the code you wrote, and should
be quite a bit more efficient as well. In particular, this code
performs no heap allocation:

--8<---------------cut here---------------start------------->8---
(use-modules (ice-9 match))

(define (hash-table-subset? val=? h1 h2)
"H1 and H2 must be hash tables which use equal? as the equality test
for keys. Return #t if and only if for every entry (K . V1) in H1,
there exists an entry (K . V2) in H2 such that (VAL=? V1 V2),
otherwise return #f."
(hash-fold (lambda (k v1 equal-so-far)
(and equal-so-far
(match (hash-get-handle h2 k)
((_ . v2) (val=? v1 v2))
(#f #f))))
#t
h1))

(define (hash-table=? val=? h1 h2)
"H1 and H2 must be hash tables which use equal? as the equality test
for keys. Return #t if H1 and H2 have the same keys (in the sense of
equal?) and each key has the same value (in the sense of val=?),
otherwise return #f."
(and (hash-table-subset? val=? h1 h2)
(hash-table-subset? val=? h2 h1)))

(define (equal*? a b)
(or (equal? a b)
(and (hash-table? a)
(hash-table? b)
(hash-table=? equal*? a b))))
--8<---------------cut here---------------end--------------->8---

You might not need 'equal*?' here. I included it only to match more
closely the behavior of your code.

For most purposes, 'hash-table=?' is actually more flexible than
'equal*?', as long as the "type" of values stored in a given hash table
are known. If the values stored in the hash table are numbers, then
'val=?' should be '='. If the values are themselves hash tables, then
'val=?' should be (lambda (a b) (hash-table=? foo=? a b)) for some value
of 'foo=?'. In some other cases, 'val=?' should be 'equal?'.

If you truly need hash tables whose values could contain either hash
tables or some other type, then something like 'equal*?' might be the
right tool. However, keep in mind that 'equal?' is usually not the
desired equality test for numbers, because (equal? 1 1.0) => #false and
(equal? +0.0 -0.0) => #false. 'equal?' tests for _operational_
equivalence, which makes distinctions important for memoization and some
other purposes, whereas '=' tests for mathematical equality and is
usually what you want for numbers.

Regards,
Mark
John Cowan
2018-08-26 23:37:54 UTC
Permalink
Post by Mark H Weaver
An equality test on hash tables needs to know how to compare the keys
and how to compare the values. There's no way to pass those additional
arguments to 'equal?', so it can't do that job.
Correct. However, it's possible given either SRFI 69 based or R6RS based
hash tables to retrieve the key equality predicate. Therefore, a minimal
hash-table equality function requires three arguments, the hash tables
to be compared (which must have the same key equality predicate) and
a value equality predicate.
Post by Mark H Weaver
Hash table implementations typically don't offer an equality test on the
hash tables themselves,
SRFI 125, which is part of R7RS-large, does provide such a procedure
exactly as described above, except that instead of a predicate you pass
a comparator (a record containing a type test, an equality predicate,
an optional ordering predicate, and a hash function), a data structure
pervasive in R7RS-large and described in SRFI 128.
Post by Mark H Weaver
and I don't recall anyone ever asking for such
an operation before now. I guess that's because in the use case where
hash tables are beneficial -- when the tables are very large --
comparing them for equality is expensive.
It's O(n), no worse than comparing lists, vectors, or strings for equality.
Post by Mark H Weaver
While hash tables are indispensible for certain use cases, they also
have very significant downsides, and I tend to avoid them whenever
possible. Their most severe downside, in my opinion, is that they are
fundamentally incompatible with a functional programming style.
That's true for standard hash tables, such as are provided by Perl,
Python, MRI Ruby, Common Lisp, R6RS, etc. But it is not true of
hash array mapped tries (HAMTs), which are functional hash tables
<https://en.wikipedia.org/wiki/Hash_array_mapped_trie>.
They are standard in Haskell, Scala, Erlang, and Rubinius.
Although SRFI 146 does not require the use of HAMTs to implement
its hashmaps, the sample implementation does in fact provide them.
Post by Mark H Weaver
It's also not possible to efficiently create a new hash table based on
an existing one, but with one or more additional entries.
Again, this is not true of HAMTs.
Post by Mark H Weaver
There can be no sharing of storage between multiple hash tables, as can
be done with lists, association lists, or balanced trees.
But not with vectors or strings. (SRFI 135 texts, which are immutable,
can and do share.) In addition, lists can only safely share if you restrict
yourself by never mutating them.
--
John Cowan http://vrici.lojban.org/~cowan ***@ccil.org
It's like if you meet an really old, really rich guy covered in liver
spots and breathing with an oxygen tank, and you say, "I want to be
rich, too, so I'm going to start walking with a cane and I'm going to
act crotchety and I'm going to get liver disease. --Wil Shipley
Aleksandar Sandic
2018-08-27 22:14:25 UTC
Permalink
I did not know about HAMTs, looks interesting. It's not relevant to my use-
case, but it's an interesting topic to look into. Thanks.
...
Mark H Weaver
2018-08-29 04:46:44 UTC
Permalink
Post by Aleksandar Sandic
I did not know about HAMTs, looks interesting. It's not relevant to my use-
case, but it's an interesting topic to look into. Thanks.
There's an implementation of HAMTs that works with Guile in the
following "Purely Functional Data Structures in Scheme" library:

https://github.com/ijp/pfds

I'm not sure off-hand how well optimized they are.

Mark

Arne Babenhauserheide
2018-08-27 20:25:05 UTC
Permalink
Post by Mark H Weaver
Hi Aleksandar,
Hash table implementations typically don't offer an equality test on the
hash tables themselves, and I don't recall anyone ever asking for such
an operation before now.
I already wrote diff-algorithms for Python dictionaries, which is close
to implementing (equal? h H).

The main advantage of hash-tables over other dictionary types is that
they are very, very fast. I tried finding anything faster, but I did not.

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken
Aleksandar Sandic
2018-08-27 21:53:09 UTC
Permalink
Post by Mark H Weaver
An equality test on hash tables needs to know how to compare the keys
and how to compare the values. There's no way to pass those additional
arguments to 'equal?', so it can't do that job.
It has to compare the values, but not the keys. If you take a look at my code,
what it does is first compare the size of both tables; if it differs then the
tables are not equal. If the number of entries is the same we can loop over
the keys of one table and compare the corresponding values. If a key from
table A is not present in table B, then the tables differ. Finally, if two
values for a key are not 'equal?', then the tables differ as well.
Post by Mark H Weaver
Hash table implementations typically don't offer an equality test on the
hash tables themselves, and I don't recall anyone ever asking for such
an operation before now. I guess that's because in the use case where
hash tables are beneficial -- when the tables are very large --
comparing them for equality is expensive.
I was mostly interested in equality for writing unit tests, so in my case an
equality test that's good enough for the job is adequate. It is not intended
to be used in production code. I was just wondering if there might be a
standard way instead of rolling my own.
Mark H Weaver
2018-08-28 07:37:46 UTC
Permalink
Hi Aleksandar,
Post by Aleksandar Sandic
Post by Mark H Weaver
An equality test on hash tables needs to know how to compare the keys
and how to compare the values. There's no way to pass those additional
arguments to 'equal?', so it can't do that job.
It has to compare the values, but not the keys.
There's an implicit equality test on keys every time you perform a hash
table lookup. By using 'hash-ref' and 'hash-get-handle', you are
implicitly using 'equal?' to compare the keys in the hash table with the
key that you're asking to look up.

Guile's native hash tables are unusual in that they do not internally
keep track of which equality test on keys to use. Instead, it is your
responsibility to consistently use the functions corresponding to same
equality test in all accesses to a given hash table. As the Guile
manual states:

Like the association list functions, the hash table functions come in
several varieties, according to the equality test used for the keys.
Plain ‘hash-’ functions use ‘equal?’, ‘hashq-’ functions use ‘eq?’,
‘hashv-’ functions use ‘eqv?’, and the ‘hashx-’ functions use an
application supplied test.

A single ‘make-hash-table’ creates a hash table suitable for use
with any set of functions, but it’s imperative that just one set is
then used consistently, or results will be unpredictable.

<https://www.gnu.org/software/guile/manual/html_node/Hash-Table-Reference.html>

So, your 'hash-table-equal?' predicate implicitly assumes that the hash
tables passed to it were populated using 'hash-set!' or
'hash-create-handle!'. If it is applied to a hash table that was
populated with the 'hashq-*!', 'hashv-*!', or 'hashx-*!' procedures,
then the results will be unpredictable.

My 'hash-table=?' predicate makes the same implicit assumption.

Mark
Loading...