optimizing lazy sequences: srfi-41 vs delayed continuation with values + promise vs delayed continuation with cons + lambda

Discussion:

Amirouche Boubekki

2018-02-25 14:16:48 UTC

Permalink

Hello all,

I know it's not good to optimize first, but I got into it
and now I need to know.

A few months ago, I created a lazy sequence library that retains
history based on delayed continuation, it can be summarized
as follow:

(define-public (list->traversi lst)
(let loop ((lst lst))
(lambda ()
(if (null? lst)
'()
;; here the car of the pair contains the first value
;; and an empty list as history
(cons (cons (car lst) '()) (loop (cdr lst)))))))

(define-public (traversi->list traversi)
(let loop ((traversi traversi)
(out '()))
(match (traversi)
('() (reverse out))
((item . next) (loop next (cons (car item) out))))))

(define-public (traversi-map proc traversi)
(let loop ((traversi traversi))
(lambda ()
(match (traversi)
('() '())
((item . next) (cons (cons (proc (car item)) item) (loop
next)))))))

They are other procedures, but that is the gist of it.

I tested it on a parser combinator for csv and it seemed much
faster than the srfi-41 based parser combinators [0].

[0] https://git.dthompson.us/guile-parser-combinators.git

Now, I got another idea what if I replace the 'cons' returned
by the traversi procedures by 'values' and replace the
lambda with 'delay' with some use of 'force' at the correct
places. It will result in the following procedures:

(define-public (list->stream lst)
(let loop ((lst lst))
(if (null? lst)
(delay (values #f #f))
(delay (values (car lst) (loop (cdr lst)))))))

(define-public (stream->list stream)
(let loop ((stream stream)
(out '()))
(call-with-values (lambda () (force stream))
(lambda (value next)
(if next
(loop next (cons value out))
(reverse! out))))))

(define-public (stream-map proc stream)
(let loop ((stream stream))
(call-with-values (lambda () (force stream))
(lambda (value next)
(if next
(delay (values (proc value) (loop next)))
(delay (values #f #f)))))))

I tested those procedures in the REPL with ,time and
the results are not conclusive:

scheme@(guile-user)> (define (nop o) #t)

scheme@(guile-user)> ,time (nop (traversi->list (traversi-map (lambda
(x) (1+ x)) (list->traversi (iota (expt 2 10))))))
$1 = #t
;; 0.028090s real time, 0.037698s run time. 0.012421s spent in GC.
scheme@(guile-user)> ,time (nop (traversi->list (traversi-map (lambda
(x) (1+ x)) (list->traversi (iota (expt 2 10))))))
$2 = #t
;; 0.025622s real time, 0.025614s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,time (nop (traversi->list (traversi-map (lambda
(x) (1+ x)) (list->traversi (iota (expt 2 10))))))
$3 = #t
;; 0.034165s real time, 0.034146s run time. 0.000000s spent in GC.

scheme@(guile-user)> ,time (nop (stream->list (stream-map (lambda (x)
(1+ x)) (list->stream (iota (expt 2 10))))))
$4 = #t
;; 0.031616s real time, 0.041882s run time. 0.014514s spent in GC.
scheme@(guile-user)> ,time (nop (stream->list (stream-map (lambda (x)
(1+ x)) (list->stream (iota (expt 2 10))))))
$5 = #t
;; 0.026148s real time, 0.026141s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,time (nop (stream->list (stream-map (lambda (x)
(1+ x)) (list->stream (iota (expt 2 10))))))
$6 = #t
;; 0.028477s real time, 0.037107s run time. 0.011057s spent in GC.

Q: What in theory should be faster?

Q: How can I test / proove that one approach must be faster?

From the usability & readability point of view I find the
values + promise much more readable and easier to code.

TIA!

Amirouche Boubekki

2018-02-25 14:30:16 UTC

Permalink

Post by Amirouche Boubekki
Hello all,
I know it's not good to optimize first, but I got into it
and now I need to know.
A few months ago, I created a lazy sequence library that retains
history based on delayed continuation, it can be summarized
(define-public (list->traversi lst)
(let loop ((lst lst))
(lambda ()
(if (null? lst)
'()
;; here the car of the pair contains the first value
;; and an empty list as history
(cons (cons (car lst) '()) (loop (cdr lst)))))))
(define-public (traversi->list traversi)
(let loop ((traversi traversi)
(out '()))
(match (traversi)
('() (reverse out))
((item . next) (loop next (cons (car item) out))))))
(define-public (traversi-map proc traversi)
(let loop ((traversi traversi))
(lambda ()
(match (traversi)
('() '())
((item . next) (cons (cons (proc (car item)) item) (loop
next)))))))
They are other procedures, but that is the gist of it.
I tested it on a parser combinator for csv and it seemed much
faster than the srfi-41 based parser combinators [0].
[0] https://git.dthompson.us/guile-parser-combinators.git
Now, I got another idea what if I replace the 'cons' returned
by the traversi procedures by 'values' and replace the
lambda with 'delay' with some use of 'force' at the correct
(define-public (list->stream lst)
(let loop ((lst lst))
(if (null? lst)
(delay (values #f #f))
(delay (values (car lst) (loop (cdr lst)))))))
(define-public (stream->list stream)
(let loop ((stream stream)
(out '()))
(call-with-values (lambda () (force stream))
(lambda (value next)
(if next
(loop next (cons value out))
(reverse! out))))))
(define-public (stream-map proc stream)
(let loop ((stream stream))
(call-with-values (lambda () (force stream))
(lambda (value next)
(if next
(delay (values (proc value) (loop next)))
(delay (values #f #f)))))))
I tested those procedures in the REPL with ,time and
(x) (1+ x)) (list->traversi (iota (expt 2 10))))))
$1 = #t
;; 0.028090s real time, 0.037698s run time. 0.012421s spent in GC.
(x) (1+ x)) (list->traversi (iota (expt 2 10))))))
$2 = #t
;; 0.025622s real time, 0.025614s run time. 0.000000s spent in GC.
(x) (1+ x)) (list->traversi (iota (expt 2 10))))))
$3 = #t
;; 0.034165s real time, 0.034146s run time. 0.000000s spent in GC.
(1+ x)) (list->stream (iota (expt 2 10))))))
$4 = #t
;; 0.031616s real time, 0.041882s run time. 0.014514s spent in GC.
(1+ x)) (list->stream (iota (expt 2 10))))))
$5 = #t
;; 0.026148s real time, 0.026141s run time. 0.000000s spent in GC.
(1+ x)) (list->stream (iota (expt 2 10))))))
$6 = #t
;; 0.028477s real time, 0.037107s run time. 0.011057s spent in GC.

I forgot the srfi-41 because they are worse:

scheme@(guile-user)> (use-modules (srfi srfi-41))scheme@(guile-user)>
(define (nop o) #t)
scheme@(guile-user)> ,time (nop (stream->list (stream-map (lambda (x)
(1+ x)) (list->stream (iota (expt 2 10))))))
$1 = #t
;; 0.034429s real time, 0.043788s run time. 0.012689s spent in GC.
scheme@(guile-user)> ,time (nop (stream->list (stream-map (lambda (x)
(1+ x)) (list->stream (iota (expt 2 10))))))
$2 = #t
;; 0.033329s real time, 0.041973s run time. 0.010852s spent in GC.
scheme@(guile-user)> ,time (nop (stream->list (stream-map (lambda (x)
(1+ x)) (list->stream (iota (expt 2 10))))))
$3 = #t
;; 0.056261s real time, 0.064563s run time. 0.010440s spent in GC.

Post by Amirouche Boubekki
Q: What in theory should be faster?
Q: How can I test / proove that one approach must be faster?
From the usability & readability point of view I find the
values + promise much more readable and easier to code.
TIA!

--
Amirouche ~ amz3 ~ http://www.hyperdev.fr

Amirouche Boubekki

2018-02-25 22:39:05 UTC

Permalink

I figured how to benchmark this.

Here are the timings:

promise: 43s
lambda: 7s

And at the current 'max' value the srfi-41 streams can't complete
the benchmark with this error:

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS

Here is the benchmark program:

(use-modules (srfi srfi-41))

(define max (expt 10 8))

(define (time-it thunk)
(let ((start (current-time)))
(thunk)
(- (current-time) start)))

;; promise based lazy seq

(define (lazyseq-with-promise)
(let loop ((v 1))
(delay
(values v (loop (+ 1 v))))))

(define (consume-promise promise)
(let loop ((v 0)
(promise promise))
(unless (eq? v max)
(call-with-values (lambda () (force promise))
(lambda (v next)
(loop (+ 1 v) next))))))

(define v1 (time-it (lambda () (consume-promise
(lazyseq-with-promise)))))

;; lambda based lazy seq

(define (lazyseq-with-lambda)
(let loop ((v 1))
(lambda ()
(values v (loop (+ 1 v))))))

(define (consume-lambda thunk)
(let loop ((v 0)
(thunk thunk))
(unless (eq? v max)
(call-with-values thunk
(lambda (v n)
(loop (+ 1 v) n))))))

(define v2 (time-it (lambda () (consume-lambda (lazyseq-with-lambda)))))

(display "promise: ")
(display v1)
(newline)
(display "lambda: ")
(display v2)
(newline)

(define (lazyseq-with-stream)
(list->stream (iota max)))

(define (consume-stream stream)
(let loop ((v 0)
(stream stream))
(unless (eq? v max)
(loop (+ 1 v) (stream-cdr stream)))))

(define stream (lazyseq-with-stream))
(define v3 (time-it (lambda () (consume-stream stream))))

(display "stream: ")
(display v3)
(newline)

Happy hacking!

Amirouche Boubekki

2018-02-25 23:31:39 UTC

Permalink

Post by Amirouche Boubekki
(define (lazyseq-with-stream)
(list->stream (iota max)))

This is wrong.

It must be implemented as:

(define-stream (lazyseq-with-stream)
(stream-let loop ((v 1))
(stream-cons v (loop (+ 1 v)))))

I get the same segfault with the following error:

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS

Mark H Weaver

2018-03-02 23:22:00 UTC

Permalink

Hi,

Amirouche Boubekki <***@hypermove.net> writes:
[...]

Post by Amirouche Boubekki
Now, I got another idea what if I replace the 'cons' returned
by the traversi procedures by 'values' and replace the
lambda with 'delay' with some use of 'force' at the correct
(define-public (list->stream lst)
(let loop ((lst lst))
(if (null? lst)
(delay (values #f #f))
(delay (values (car lst) (loop (cdr lst)))))))
(define-public (stream->list stream)
(let loop ((stream stream)
(out '()))
(call-with-values (lambda () (force stream))
(lambda (value next)
(if next
(loop next (cons value out))
(reverse! out))))))
(define-public (stream-map proc stream)
(let loop ((stream stream))
(call-with-values (lambda () (force stream))
(lambda (value next)
(if next
(delay (values (proc value) (loop next)))
(delay (values #f #f)))))))

This code assumes that promises can store multiple values. Although
Guile's legacy core promises *accidentally* support multiple values
today, there's no guarantee that they will continue to do so in the
future. None of the standards allow this, and Guile's manual states in
the documentation for 'delay' that "The effect of <expression> returning
multiple values is unspecified."

Supporting multiple values in promises makes them more complex, and
inevitably less efficient.

SRFI-45 promises are simpler than Guile's legacy core promises in two
ways: (1) they do not include built-in thread synchronization, and (2)
they do not support multiple values.

I would recommend using SRFI-45 promises. Although they are implemented
in Scheme, last I checked I found that they were about as fast as
Guile's legacy core promises implemented in C, presumably because of the
built-in thread synchronization in our core promises.

In any case, I would strongly recommend against writing code that
assumes that Guile's promises can hold multiple values.

Regards,
Mark