Discussion:
reading from input pipe is slow?
Thomas Danckaert
2018-06-22 16:40:30 UTC
Permalink
Dear guile users,

I've notice that reading a child process' stdout from a port opened
with (open-input-pipe "command") is very slow. In fact, it's faster
to redirect the child process' output to a temporary file, and then
read this temporary file from Guile, than to directly read the child
process' output using a port (I've added some example code below
where you can see this).

Is there an explanation for this? Or even better: a solution?

Thank you for any information!

Thomas

You can execute the following at the REPL. Here, I "cat" a big file
to simulate a process generating a largish amount of text.

(use-modules (ice-9 textual-ports) (ice-9 popen))

(define big-file "/tmp/big-file.txt")

;; write 5MB to big-file
(with-output-to-file big-file
(lambda ()
(let put-line ((n 0))
(simple-format (current-output-port) "~a~%" (make-string 1024
#\a))
(unless (> n (* 1024 5))
(put-line (1+ n))))
(simple-format (current-output-port) "This is the end")))

(define my-command (format #f "cat ~a" big-file))

;; 15 seconds on my machine:
,time (format #t "end of file: '~a'~%"
(string-take-right
(get-string-all (open-input-pipe my-command))
15))

;; 1 second on my machine:
,time (let ((tmpfile (tmpnam)))
;; redirect my-command into a temporary file and read the
file:
(system (format #f "~a > ~a" my-command tmpfile))
(format #t "end of file: '~a'~%"
(string-take-right
(call-with-input-file tmpfile get-string-all)
15))
(delete-file tmpfile))
Mark H Weaver
2018-06-22 21:47:42 UTC
Permalink
Hi Thomas,
Post by Thomas Danckaert
I've notice that reading a child process' stdout from a port opened
with (open-input-pipe "command") is very slow. In fact, it's faster
to redirect the child process' output to a temporary file, and then
read this temporary file from Guile, than to directly read the child
process' output using a port (I've added some example code below where
you can see this).
Is there an explanation for this? Or even better: a solution?
For some reason that I don't know, possibly historical, the pipes
created by procedures in (ice-9 popen) are unbuffered by default. When
reading or writing from an unbuffered port, a system call is made for
every byte.

To enable buffering on pipe ports, call (setvbuf port 'block) after
opening it. Something like this, where (rnrs io ports) is added to the
module imports you already had, for 'call-with-port':

(call-with-port (open-input-pipe my-command)
(lambda (port)
(setvbuf port 'block)
(string-take-right (get-string-all port)
15)))

Hope this helps,

Mark
Thomas Danckaert
2018-06-25 11:32:04 UTC
Permalink
Hi Mark,

indeed, setting buffering with (setvbuf ...) solved the issue.
Thanks a lot!
Post by Mark H Weaver
For some reason that I don't know, possibly historical, the pipes
created by procedures in (ice-9 popen) are unbuffered by default.
I think you're right, but doesn't the manual suggest otherwise, when
it says "Each direction has PIPE_BUF bytes of buffering"
(https://www.gnu.org/software/guile/manual/html_node/Pipes.html)? Or
do I misinterpret that?

Also, let me mention that the statistical profiler from the
(statprof) module has been very useful to isolate the problem.
Initially I assumed the slowness was due to the external process.
Only after running statprof did I realize that the problem was in my
code.

Thomas

Loading...