Discussion:
Is there any security risk related to the use of the reader?
Amirouche Boubekki
2018-02-25 15:35:54 UTC
Permalink
I have procedures like that in my program:

(define-public (scm->string scm)
(call-with-output-string
(lambda (port)
(write scm port))))

(define-public (string->scm string)
(call-with-input-string string read))

Is it safe to pass to this procedures input from third parties?

TIA!
--
Amirouche ~ amz3 ~ http://www.hyperdev.fr
Matt Wette
2018-02-25 17:29:25 UTC
Permalink
Post by Amirouche Boubekki
(define-public (scm->string scm)
  (call-with-output-string
    (lambda (port)
      (write scm port))))
(define-public (string->scm string)
  (call-with-input-string string read))
Is it safe to pass to this procedures input from third parties?
TIA!
maybe check (ice-9 sandbox), explained in section 6.18.12 of the 2.2.3
manual
Amirouche Boubekki
2018-02-25 17:38:29 UTC
Permalink
Post by Matt Wette
Post by Amirouche Boubekki
(define-public (scm->string scm)
  (call-with-output-string
    (lambda (port)
      (write scm port))))
(define-public (string->scm string)
  (call-with-input-string string read))
Is it safe to pass to this procedures input from third parties?
TIA!
maybe check (ice-9 sandbox), explained in section 6.18.12 of the 2.2.3
manual
I don't know what are the performance implication to
seriallize / deserialize in a sandbox. This would slow
down every write / read.

The alternative I am thinking about is to use msgpack
but I will loose direct representation of bignum, maybe
it's a good enough strategy. People that need precision
maybe use hdf5 string representation.
Amirouche Boubekki
2018-02-25 17:35:10 UTC
Permalink
Post by Amirouche Boubekki
(define-public (scm->string scm)
(call-with-output-string
(lambda (port)
(write scm port))))
(define-public (string->scm string)
(call-with-input-string string read))
Is it safe to pass to this procedures input from third parties?
TIA!
Another question: is 'write' deterministic?
Christopher Lemmer Webber
2018-02-25 17:35:31 UTC
Permalink
Luckily I kept notes on this... the answer is not very if you're
snarfing stuff over the wire!

: <wingo> paroneayea: relatedly,
: http://git.savannah.gnu.org/cgit/guile.git/commit/?id=e68dd5c601ef7975507d4118bcc2ad334b0450b2
: <wingo> i suppose that doc update doesn't touch security at all tho... [16:54]
: <paroneayea> wingo: ah nice
: <paroneayea> wingo: yeah... srfi-10 seems to be one way where (read) can get
: scary [16:55]
: <paroneayea> wingo: really makes me wish I could specify a scope where I
: *know* how read behaves
: <wingo> yeah i know what you mean [16:56]
: <paroneayea> maybe (scoped-read) (scoped-write)
: <paroneayea> or even a module that provides (read) and (write), splatting over
: the default ones unless you namespace it :) [16:57]
: <wingo> nah, i think a port option or something is what you want
: <paroneayea> a port option would be fine
: <dsmith-work> There is also read-hash-extend (I think srfi-10 uses that)
: <paroneayea> wingo: I saw that there was some mention of being able to control
: how read works in the port... but it looked like the *data* read
: by read was able to change read behavior [16:58]
: <paroneayea> which is actually even more worrying!
: <wingo> how could the data change the behavior? only via #.
: <wingo> which is off by default
: <paroneayea> ah
: <paroneayea> okay whew
: <wingo> no one in their right mind turns that one on :)
: <wingo> there's also #!r6rs i think... [16:59]
: <paroneayea> wingo: I think this paragraph is not clear to me:
: <wingo> that changes some parameters, but they are static parameters, and
: maybe only in certain situations
: <wingo> anyway not like a danger or anything
: <paroneayea> https://dpaste.de/Pdor
: <paroneayea> wingo: nothing in that section mentions how # gets turned on to
: enable that [17:00]
: <paroneayea> it sounds like *any* port that runs across #!fold-case will start
: doing that to me..
: <paroneayea> but maybe I'm just not understandgin while reading [17:01]
: <wingo> no, that's correct
: <wingo> maybe it's a problem? dunno
: <paroneayea> it doesn't seem like those are dangerous options
: <wingo> when #!fold-case is in a file that's certainly what you want
: <wingo> if an attacker can inject #!fold-case in some thing... [17:02]
: <wingo> it's an interesting idea, i hadn't thought about that
: <wingo> mark_weaver will like that one :)
: <paroneayea> well, the question is whether (read) is "safe" when you get
: arbitrary incoming data [17:03]
: <wingo> we probably need to have a port option to disable these things i guess
: <paroneayea> in the case I'm in, a signature is checked before it ever hits
: (read), but could read/write be as safe as json?
: <wingo> paroneayea: what does "safe" mean in this context? threat model
: missing :)
: <paroneayea> wingo: think "reading data off a REST style API"
: <wingo> and? [17:04]
: <davexunit> arbitrary code execution
: <paroneayea> yes
: <paroneayea> eg, you're probably at no risk parsing that data through json
: <wingo> so
: <davexunit> which I don't think is possible
: <paroneayea> every rest api in the world does that
: <wingo> arbitrary code execution is only possible via read-eval (the #. thing)
: <wingo> that should be off globally.
: <paroneayea> davexunit: what about if srfi-10 is on? :)
: <wingo> you can get some code execution via srfi-10 or read-hash-extend
: <wingo> so you have to know whether the extensions that are active in your
: program are safe against your threat model [17:05]
: <wingo> for example
: <wingo> a syntax extension which turns #'foo into (syntax foo)
: <wingo> there's no danger there.
: <paroneayea> sure
: <paroneayea> though say I decided "to hell with json, my REST API is going to
: use sexps! Wheee!" [17:06]
: <paroneayea> and then I import a module that itself imports srfi-10
: <paroneayea> and I had no idea
: <paroneayea> and bam, code execution seems possible
: <wingo> yeah, not full #. code execution but yes [17:07]
: <wingo> but that same argument applies if a module enables #.
: <paroneayea> wingo: right
: <wingo> so yeah, it's a property of your whole program
: <paroneayea> wingo: basically I'm saying, I don't want either of those to
: happen
: <paroneayea> because I don't know much about my libraries, necessarily
: <wingo> what code can be run by `read' [17:08]
: <wingo> yeah i know what you mean. i solve this by being an asshole and never
: using other people's code :/ not a great solution
: <dsmith-work> paroneayea: You could purge srfi-10 from systems you control?
: <dsmith-work> If some module needs it, too bad. [17:09]
: <wingo> if you want to control the property globally, yeah like dsmith-work
: says we can add global kill switches
: <davexunit> how about a dumb-read procedure?
: <davexunit> no reader macros allowed :)
: <wingo> we could also add per-port kill switches but that's less convenient in
: some ways
: <wingo> heh, at that point just write your own `read' :)
: <wingo> there's also some memory dangers [17:10]
: <wingo> e.g. reading 10e10000
: <wingo> rather #e10e1000
: <wingo> anyway
: <davexunit> I don't know read is implemented, so I don't know how feasible it
: would be to say "don't use any reader macros for this read call"
: <dsmith-work> What about malformed sexpr?
: <dsmith-work> Like reading "(((foo" [17:11]
: <wingo> read is implemented in c right now. for options it's better to attach
: them to the port.
: <paroneayea> wingo: wow #e10e1000 is interesting
: <paroneayea> I would have never thought of that. [17:12]
: <wingo> irritating isn't it :)
: <paroneayea> so, this conversation is making me feel like: don't use read for
: any data that comes over the wire that you don't really trust
: <paroneayea> or use a safer one you write yourself :) [17:13]
: <paroneayea> so, it's still safe with the signed sessions, because an attacker
: would need your key to get you to do something evil with read
: [17:14]
: <paroneayea> but #e10e10000, srfi-10, and #. are all worrying possible attacks
: against using vanilla read for much data heading over the wire
: that you don't control [17:15]
: <wingo> i think there is another attack, which is ((((((((((((((((((((((((((
: [17:16]
: <wingo> keep on going
: <wingo> the c stack keeps recursing, eventually segfault i think
: <paroneayea> wingo: interesting [17:17]
: <paroneayea> wingq: I guess the json parser I'm using is also semi-vulnerable
: to that, but it wouldn't be in C
: <paroneayea> so probably there's more safety
: <paroneayea> wouldn't segfault, anyway
: * wingo currently trying the ((((( attack
: <wingo> do you have a limit on your message size? [17:18]
: <paroneayea> wingo: for this particular case (sessions), the limit is 4kb, but
: I'm not worried about that. Probably for the json-ld stuff
: coming over the wire... I haven't set any limit :) [17:19]
: <wingo> lol
: <wingo> scheme@(guile-user)> (call-with-input-string (make-string 200000 #\()
: read)
: <wingo> Segmentation fault
: <paroneayea> whee :)
: <wingo> erryting turribul [17:20]
: * paroneayea saves this conversation in eir notes.org file :)
: <wingo> this is "just" a bug [17:21]
: <dsmith-work> IS there a limit to cookie size?
: <dsmith-work> paroneayea: or whatever you are planning on read'ing
: <paroneayea> dsmith-work: yes, cookies are limited to 4kb [17:22]
: <paroneayea> according to the HTTP spec
: <paroneayea> well, whether guile does that
: <paroneayea> I haven't checked yet
: <paroneayea> (or rather, headers are)
: <dsmith-work> scheme@(guile-user)> (call-with-input-string (make-string 4096
: #\() read)
: <dsmith-work> ERROR: In procedure read:
: <dsmith-work> ERROR: In procedure scm_i_lreadparen: #<unknown port>:1:4097:
: end of file
: <paroneayea> or if not according to the http spec, according to browser
: conventions :)
: <paroneayea>
: http://stackoverflow.com/questions/640938/what-is-the-maximum-size-of-a-web-browsers-cookies-key
: <dsmith-work> So in your case, at least you won't segfault.
: <paroneayea> not sure if this is from any spec, or just convention
: <paroneayea> dsmith-work: in my particular case, for sessions, it's still not
: risky because again, someone would need to forge the key...
: [17:24]
: <paroneayea> but yeah
: <paroneayea> I'm still interested in understanding in general how "safe" read
: / write are in various scenarios
Post by Amirouche Boubekki
(define-public (scm->string scm)
(call-with-output-string
(lambda (port)
(write scm port))))
(define-public (string->scm string)
(call-with-input-string string read))
Is it safe to pass to this procedures input from third parties?
TIA!
Amirouche Boubekki
2018-02-25 17:46:28 UTC
Permalink
Post by Christopher Lemmer Webber
Luckily I kept notes on this... the answer is not very if you're
snarfing stuff over the wire!
: <wingo> paroneayea: relatedly,
http://git.savannah.gnu.org/cgit/guile.git/commit/?id=e68dd5c601ef7975507d4118bcc2ad334b0450b2
: <wingo> i suppose that doc update doesn't touch security at all tho... [16:54]
: <paroneayea> wingo: ah nice
: <paroneayea> wingo: yeah... srfi-10 seems to be one way where (read) can get
: scary [16:55]
: <paroneayea> wingo: really makes me wish I could specify a scope where I
: *know* how read behaves
: <wingo> yeah i know what you mean [16:56]
: <paroneayea> maybe (scoped-read) (scoped-write)
: <paroneayea> or even a module that provides (read) and (write), splatting over
: the default ones unless you namespace it :) [16:57]
: <wingo> nah, i think a port option or something is what you want
: <paroneayea> a port option would be fine
: <dsmith-work> There is also read-hash-extend (I think srfi-10 uses that)
: <paroneayea> wingo: I saw that there was some mention of being able to control
: how read works in the port... but it looked like the *data* read
: by read was able to change read behavior [16:58]
: <paroneayea> which is actually even more worrying!
: <wingo> how could the data change the behavior? only via #.
: <wingo> which is off by default
: <paroneayea> ah
: <paroneayea> okay whew
: <wingo> no one in their right mind turns that one on :)
: <wingo> there's also #!r6rs i think... [16:59]
: <wingo> that changes some parameters, but they are static parameters, and
: maybe only in certain situations
: <wingo> anyway not like a danger or anything
: <paroneayea> https://dpaste.de/Pdor
: <paroneayea> wingo: nothing in that section mentions how # gets turned on to
: enable that [17:00]
: <paroneayea> it sounds like *any* port that runs across #!fold-case will start
: doing that to me..
: <paroneayea> but maybe I'm just not understandgin while reading
[17:01]
: <wingo> no, that's correct
: <wingo> maybe it's a problem? dunno
: <paroneayea> it doesn't seem like those are dangerous options
: <wingo> when #!fold-case is in a file that's certainly what you want
: <wingo> if an attacker can inject #!fold-case in some thing...
[17:02]
: <wingo> it's an interesting idea, i hadn't thought about that
: <wingo> mark_weaver will like that one :)
: <paroneayea> well, the question is whether (read) is "safe" when you get
: arbitrary incoming data [17:03]
: <wingo> we probably need to have a port option to disable these things i guess
: <paroneayea> in the case I'm in, a signature is checked before it ever hits
: (read), but could read/write be as safe as json?
: <wingo> paroneayea: what does "safe" mean in this context? threat model
: missing :)
: <paroneayea> wingo: think "reading data off a REST style API"
: <wingo> and? [17:04]
: <davexunit> arbitrary code execution
: <paroneayea> yes
: <paroneayea> eg, you're probably at no risk parsing that data through json
: <wingo> so
: <davexunit> which I don't think is possible
: <paroneayea> every rest api in the world does that
: <wingo> arbitrary code execution is only possible via read-eval (the #. thing)
: <wingo> that should be off globally.
: <paroneayea> davexunit: what about if srfi-10 is on? :)
: <wingo> you can get some code execution via srfi-10 or
read-hash-extend
: <wingo> so you have to know whether the extensions that are active in your
: program are safe against your threat model [17:05]
: <wingo> for example
: <wingo> a syntax extension which turns #'foo into (syntax foo)
: <wingo> there's no danger there.
: <paroneayea> sure
: <paroneayea> though say I decided "to hell with json, my REST API is going to
: use sexps! Wheee!" [17:06]
: <paroneayea> and then I import a module that itself imports srfi-10
: <paroneayea> and I had no idea
: <paroneayea> and bam, code execution seems possible
: <wingo> yeah, not full #. code execution but yes [17:07]
: <wingo> but that same argument applies if a module enables #.
: <paroneayea> wingo: right
: <wingo> so yeah, it's a property of your whole program
: <paroneayea> wingo: basically I'm saying, I don't want either of those to
: happen
: <paroneayea> because I don't know much about my libraries,
necessarily
: <wingo> what code can be run by `read' [17:08]
: <wingo> yeah i know what you mean. i solve this by being an asshole and never
: using other people's code :/ not a great solution
: <dsmith-work> paroneayea: You could purge srfi-10 from systems you control?
: <dsmith-work> If some module needs it, too bad. [17:09]
: <wingo> if you want to control the property globally, yeah like dsmith-work
: says we can add global kill switches
: <davexunit> how about a dumb-read procedure?
: <davexunit> no reader macros allowed :)
: <wingo> we could also add per-port kill switches but that's less convenient in
: some ways
: <wingo> heh, at that point just write your own `read' :)
: <wingo> there's also some memory dangers [17:10]
: <wingo> e.g. reading 10e10000
: <wingo> rather #e10e1000
: <wingo> anyway
: <davexunit> I don't know read is implemented, so I don't know how feasible it
: would be to say "don't use any reader macros for this read call"
: <dsmith-work> What about malformed sexpr?
: <dsmith-work> Like reading "(((foo" [17:11]
: <wingo> read is implemented in c right now. for options it's better to attach
: them to the port.
: <paroneayea> wingo: wow #e10e1000 is interesting
: <paroneayea> I would have never thought of that. [17:12]
: <wingo> irritating isn't it :)
: <paroneayea> so, this conversation is making me feel like: don't use read for
: any data that comes over the wire that you don't really trust
: <paroneayea> or use a safer one you write yourself :)
[17:13]
: <paroneayea> so, it's still safe with the signed sessions, because an attacker
: would need your key to get you to do something evil with read
: [17:14]
: <paroneayea> but #e10e10000, srfi-10, and #. are all worrying possible attacks
: against using vanilla read for much data heading over the wire
: that you don't control [17:15]
: <wingo> i think there is another attack, which is
((((((((((((((((((((((((((
: [17:16]
: <wingo> keep on going
: <wingo> the c stack keeps recursing, eventually segfault i think
: <paroneayea> wingo: interesting [17:17]
: <paroneayea> wingq: I guess the json parser I'm using is also semi-vulnerable
: to that, but it wouldn't be in C
: <paroneayea> so probably there's more safety
: <paroneayea> wouldn't segfault, anyway
: * wingo currently trying the ((((( attack
: <wingo> do you have a limit on your message size? [17:18]
: <paroneayea> wingo: for this particular case (sessions), the limit is 4kb, but
: I'm not worried about that. Probably for the json-ld stuff
: coming over the wire... I haven't set any limit :)
[17:19]
: <wingo> lol
: read)
: <wingo> Segmentation fault
: <paroneayea> whee :)
: <wingo> erryting turribul [17:20]
: * paroneayea saves this conversation in eir notes.org file :)
: <wingo> this is "just" a bug [17:21]
: <dsmith-work> IS there a limit to cookie size?
: <dsmith-work> paroneayea: or whatever you are planning on read'ing
: <paroneayea> dsmith-work: yes, cookies are limited to 4kb
[17:22]
: <paroneayea> according to the HTTP spec
: <paroneayea> well, whether guile does that
: <paroneayea> I haven't checked yet
: <paroneayea> (or rather, headers are)
(make-string 4096
: #\() read)
: end of file
: <paroneayea> or if not according to the http spec, according to browser
: conventions :)
: <paroneayea>
http://stackoverflow.com/questions/640938/what-is-the-maximum-size-of-a-web-browsers-cookies-key
: <dsmith-work> So in your case, at least you won't segfault.
: <paroneayea> not sure if this is from any spec, or just convention
: <paroneayea> dsmith-work: in my particular case, for sessions, it's still not
: risky because again, someone would need to forge the key...
: [17:24]
: <paroneayea> but yeah
: <paroneayea> I'm still interested in understanding in general how "safe" read
: / write are in various scenarios
Post by Amirouche Boubekki
(define-public (scm->string scm)
(call-with-output-string
(lambda (port)
(write scm port))))
(define-public (string->scm string)
(call-with-input-string string read))
Is it safe to pass to this procedures input from third parties?
TIA!
Seems like 'read' is unsafe against arbitrary code execution
via srfi-10 and segfault or consume the whole memory on invalid
input.
Christopher Lemmer Webber
2018-02-28 22:35:43 UTC
Permalink
Post by Amirouche Boubekki
Seems like 'read' is unsafe against arbitrary code execution
via srfi-10 and segfault or consume the whole memory on invalid
input.
Yep...

I think the answer is to write a safer read in scheme.
Mark H Weaver
2018-03-01 23:56:11 UTC
Permalink
Post by Amirouche Boubekki
(define-public (scm->string scm)
(call-with-output-string
(lambda (port)
(write scm port))))
(define-public (string->scm string)
(call-with-input-string string read))
Is it safe to pass to this procedures input from third parties?
I would not consider Guile's 'read' to be trustworthy when processing
potentially malicious inputs.

Mark
Amirouche Boubekki
2018-03-02 03:42:38 UTC
Permalink
Post by Mark H Weaver
I would not consider Guile's 'read' to be trustworthy when processing
potentially malicious inputs.
Mark
Thanks for the input.

FWIW, I've written a procedure 'read' that is AFAIK safe but can
_fail_ on malicious input.

It can read:

- strings,
- booleans,
- simple numbers like 123456,
- rationals like 1/4 and
- simple symbols .ie unlike what is generated by 'gensym'

I attached to this mail all the files. It based on the stream
library I've been working on and a simple parser combinator
library based on it.

The main issue I see is that combinatorix doesn't take a port
as input. I could probably turn a port into a functional stream
but right now I don't need it. Patch welcome :)

Also I greatly improved error reporting compared to my last
attempt at building a parser combinator. Now the library
will tell you the failing parser with the argument that were
passed to it and the char line and column that triggered the
error.

For instance, given the following definitions:

(define parse-a (parse-xchar #\a))
(define parse-b (parse-xchar #\b))
(define parse-c (parse-xchar #\c))

(define parse-abc (each parse-a parse-b parse-c))

If you run the parser like that:

(pk (parse parse-abc "ab©"))

You get the following exception:

neon/read.scm:75:4: Throw to key `combinatorix' with args `(#<<error>
value: <xchar #\© [1,3] @ 2> parser: #<procedure parse-xchar (char)>
args: #\c>)'.

That is parse-xchar #\c that is failing. Things could be improved
but its encouraging I think.

For info on parser combinators see https://epsil.github.io/gll/


Happy hacking!

Loading...