Issues with glob_to_regex examples regarding forward slash and dotfiles

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Issues with glob_to_regex examples regarding forward slash and dotfiles

Martin Preisler
Hi,

Jan Cerny has been working on glob_to_regex implementation [4] in OpenSCAP.
While reviewing the changes I discovered that the specification examples
are not consistent with how UNIX glob [1] should behave. There are 2 issues:

1) * and ? must never match forward slash.

For example glob "a/*.txt" matches "a/b.txt" but not "a/b/c.txt".
Examples [2] listed in the specification provide wrong test vectors.
In the following section I will ignore noescape settings because it
doesn't affect the issues at all and would only cause confusion.
The examples say that "*" glob should translate to regex ".*"
and "?" should translate to regex ".". Unfortunately this is
inconsistent with the glob man-page which explicitly says that forward
slash should never be matched by * or ? [1].

Examples corrected to address issue 1):
glob "*" should be translated to "^[^/]*$"
glob "?" should be translated to "^[^/]$"

2) the specification says the glob must not match dotfiles if it doesn't
   start with a dot [3]

exact quote:
the glob_to_regex() function will rule out matches starting with
'.' (e.g. dotfiles) unless the provided glob pattern itself starts
with the '.' character

The statement is a bit unfortunate. I think the specification should
have said that the the glob must not match dotfiles or dotfolders
if the respective part of it doesn't start with a dot.

Examples of this behavior:
glob "/*" matches "/file.txt" but doesn't match "/.file.txt"
glob "/.*" matches /.file.txt" but doesn't match "/file.txt"
(note that the second glob doesn't start with a dot but the filename
part of it does)

This is quite a complex behavior to replicate in perl regex. Jan Cerny,
Tomas Heinrich and I have discussed this and we believe it is possible
to address this.

Examples corrected to address issue 1) and 2):
glob "/*" should be translated to "^/(?=[^\.])[^/]*$"
glob "/?" should be translated to "^/[^\./]$"
glob "/a*" should be translated to "^/a[^/]*$"
glob "/a?" should be translated to "^/a[^/]$"

If you think this is way more contrived than one would expect, you
aren't alone :-) The important take-away for the implementor of this
function is that * and ? behave in 2 possible modes. One mode is * or ?
right after a forward slash or at the beginning. The second mode are
all other cases. The first mode needs to avoid matching leading dots,
the second mode matches all but a forward slash. Specifically:

First mode:
* gets translated to (?=[^\.])[^/]*
? gets translated to [^\./]

Second mode:
* gets translated to [^/]*
? gets translated to [^/]

Jan Cerny is working on an implementation that we believe handles all
of this, hopefully everything will be much clearer when that is finished
and can be referenced. You can see the work in progress version on GitHub:
https://github.com/OpenSCAP/openscap/pull/67


Hope all of this makes sense, feel free to ask if it doesn't.
Can we get this fixed for the OVAL 5.11.1 release or is it too late?


[1] http://unixhelp.ed.ac.uk/CGI/man-cgi?glob+7 section "Pathnames"
[2] https://github.com/OVALProject/Sandbox/blob/b139c75050c5b23ec9d7f1e5da187234c5efafb9/resources/x-glob-to-regex/oval-function-proposal-form-v%233.txt#L54
[3] https://github.com/OVALProject/Sandbox/blob/b139c75050c5b23ec9d7f1e5da187234c5efafb9/resources/x-glob-to-regex/oval-function-proposal-form-v%233.txt#L108
[4] https://github.com/OpenSCAP/openscap/pull/67

--
Martin Preisler
Security Technologies | Red Hat, Inc.

To unsubscribe, send an email message to [hidden email] with
SIGNOFF OVAL-DEVELOPER-LIST
in the BODY of the message.  If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Issues with glob_to_regex examples regarding forward slash and dotfiles

Jan Cerny
Hello,

I have improved my implementation of the glob_to_regex function in OpenSCAP.
I have fixed expanding of * and ? in glob. This should be compatible with unix globs now,
as described in manual page (man 7 glob).
I have also fixed the problem with dotfiles.
You can see work in progress at:
https://github.com/OpenSCAP/openscap/pull/67
This pull request also contains small OVAL test file, which tests how the globs are
converted to regular expressions.

Jan Černý
Red Hat


----- Original Message -----
From: "Martin Preisler" <[hidden email]>
To: [hidden email]
Sent: Wednesday, April 8, 2015 7:13:21 PM
Subject: [OVAL-DEVELOPER-LIST] Issues with glob_to_regex examples regarding forward slash and dotfiles

Hi,

Jan Cerny has been working on glob_to_regex implementation [4] in OpenSCAP.
While reviewing the changes I discovered that the specification examples
are not consistent with how UNIX glob [1] should behave. There are 2 issues:

1) * and ? must never match forward slash.

For example glob "a/*.txt" matches "a/b.txt" but not "a/b/c.txt".
Examples [2] listed in the specification provide wrong test vectors.
In the following section I will ignore noescape settings because it
doesn't affect the issues at all and would only cause confusion.
The examples say that "*" glob should translate to regex ".*"
and "?" should translate to regex ".". Unfortunately this is
inconsistent with the glob man-page which explicitly says that forward
slash should never be matched by * or ? [1].

Examples corrected to address issue 1):
glob "*" should be translated to "^[^/]*$"
glob "?" should be translated to "^[^/]$"

2) the specification says the glob must not match dotfiles if it doesn't
   start with a dot [3]

exact quote:
the glob_to_regex() function will rule out matches starting with
'.' (e.g. dotfiles) unless the provided glob pattern itself starts
with the '.' character

The statement is a bit unfortunate. I think the specification should
have said that the the glob must not match dotfiles or dotfolders
if the respective part of it doesn't start with a dot.

Examples of this behavior:
glob "/*" matches "/file.txt" but doesn't match "/.file.txt"
glob "/.*" matches /.file.txt" but doesn't match "/file.txt"
(note that the second glob doesn't start with a dot but the filename
part of it does)

This is quite a complex behavior to replicate in perl regex. Jan Cerny,
Tomas Heinrich and I have discussed this and we believe it is possible
to address this.

Examples corrected to address issue 1) and 2):
glob "/*" should be translated to "^/(?=[^\.])[^/]*$"
glob "/?" should be translated to "^/[^\./]$"
glob "/a*" should be translated to "^/a[^/]*$"
glob "/a?" should be translated to "^/a[^/]$"

If you think this is way more contrived than one would expect, you
aren't alone :-) The important take-away for the implementor of this
function is that * and ? behave in 2 possible modes. One mode is * or ?
right after a forward slash or at the beginning. The second mode are
all other cases. The first mode needs to avoid matching leading dots,
the second mode matches all but a forward slash. Specifically:

First mode:
* gets translated to (?=[^\.])[^/]*
? gets translated to [^\./]

Second mode:
* gets translated to [^/]*
? gets translated to [^/]

Jan Cerny is working on an implementation that we believe handles all
of this, hopefully everything will be much clearer when that is finished
and can be referenced. You can see the work in progress version on GitHub:
https://github.com/OpenSCAP/openscap/pull/67


Hope all of this makes sense, feel free to ask if it doesn't.
Can we get this fixed for the OVAL 5.11.1 release or is it too late?


[1] http://unixhelp.ed.ac.uk/CGI/man-cgi?glob+7 section "Pathnames"
[2] https://github.com/OVALProject/Sandbox/blob/b139c75050c5b23ec9d7f1e5da187234c5efafb9/resources/x-glob-to-regex/oval-function-proposal-form-v%233.txt#L54
[3] https://github.com/OVALProject/Sandbox/blob/b139c75050c5b23ec9d7f1e5da187234c5efafb9/resources/x-glob-to-regex/oval-function-proposal-form-v%233.txt#L108
[4] https://github.com/OpenSCAP/openscap/pull/67

--
Martin Preisler
Security Technologies | Red Hat, Inc.

To unsubscribe, send an email message to [hidden email] with
SIGNOFF OVAL-DEVELOPER-LIST
in the BODY of the message.  If you have difficulties, write to [hidden email].

To unsubscribe, send an email message to [hidden email] with
SIGNOFF OVAL-DEVELOPER-LIST
in the BODY of the message.  If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Issues with glob_to_regex examples regarding forward slash and dotfiles

David Solin-3
I have one little remark on the discussion below.  In the event that glob_noescape=“false”, implementors should check to insure that the * and ? characters are not escaped before replacing them as described.

Regards,
—David A. Solin
Co-Founder, Research & Technology
[hidden email]

 

   



> On Apr 9, 2015, at 7:00 AM, Jan Cerny <[hidden email]> wrote:
>
> Hello,
>
> I have improved my implementation of the glob_to_regex function in OpenSCAP.
> I have fixed expanding of * and ? in glob. This should be compatible with unix globs now,
> as described in manual page (man 7 glob).
> I have also fixed the problem with dotfiles.
> You can see work in progress at:
> https://github.com/OpenSCAP/openscap/pull/67
> This pull request also contains small OVAL test file, which tests how the globs are
> converted to regular expressions.
>
> Jan Černý
> Red Hat
>
>
> ----- Original Message -----
> From: "Martin Preisler" <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, April 8, 2015 7:13:21 PM
> Subject: [OVAL-DEVELOPER-LIST] Issues with glob_to_regex examples regarding forward slash and dotfiles
>
> Hi,
>
> Jan Cerny has been working on glob_to_regex implementation [4] in OpenSCAP.
> While reviewing the changes I discovered that the specification examples
> are not consistent with how UNIX glob [1] should behave. There are 2 issues:
>
> 1) * and ? must never match forward slash.
>
> For example glob "a/*.txt" matches "a/b.txt" but not "a/b/c.txt".
> Examples [2] listed in the specification provide wrong test vectors.
> In the following section I will ignore noescape settings because it
> doesn't affect the issues at all and would only cause confusion.
> The examples say that "*" glob should translate to regex ".*"
> and "?" should translate to regex ".". Unfortunately this is
> inconsistent with the glob man-page which explicitly says that forward
> slash should never be matched by * or ? [1].
>
> Examples corrected to address issue 1):
> glob "*" should be translated to "^[^/]*$"
> glob "?" should be translated to "^[^/]$"
>
> 2) the specification says the glob must not match dotfiles if it doesn't
>   start with a dot [3]
>
> exact quote:
> the glob_to_regex() function will rule out matches starting with
> '.' (e.g. dotfiles) unless the provided glob pattern itself starts
> with the '.' character
>
> The statement is a bit unfortunate. I think the specification should
> have said that the the glob must not match dotfiles or dotfolders
> if the respective part of it doesn't start with a dot.
>
> Examples of this behavior:
> glob "/*" matches "/file.txt" but doesn't match "/.file.txt"
> glob "/.*" matches /.file.txt" but doesn't match "/file.txt"
> (note that the second glob doesn't start with a dot but the filename
> part of it does)
>
> This is quite a complex behavior to replicate in perl regex. Jan Cerny,
> Tomas Heinrich and I have discussed this and we believe it is possible
> to address this.
>
> Examples corrected to address issue 1) and 2):
> glob "/*" should be translated to "^/(?=[^\.])[^/]*$"
> glob "/?" should be translated to "^/[^\./]$"
> glob "/a*" should be translated to "^/a[^/]*$"
> glob "/a?" should be translated to "^/a[^/]$"
>
> If you think this is way more contrived than one would expect, you
> aren't alone :-) The important take-away for the implementor of this
> function is that * and ? behave in 2 possible modes. One mode is * or ?
> right after a forward slash or at the beginning. The second mode are
> all other cases. The first mode needs to avoid matching leading dots,
> the second mode matches all but a forward slash. Specifically:
>
> First mode:
> * gets translated to (?=[^\.])[^/]*
> ? gets translated to [^\./]
>
> Second mode:
> * gets translated to [^/]*
> ? gets translated to [^/]
>
> Jan Cerny is working on an implementation that we believe handles all
> of this, hopefully everything will be much clearer when that is finished
> and can be referenced. You can see the work in progress version on GitHub:
> https://github.com/OpenSCAP/openscap/pull/67
>
>
> Hope all of this makes sense, feel free to ask if it doesn't.
> Can we get this fixed for the OVAL 5.11.1 release or is it too late?
>
>
> [1] http://unixhelp.ed.ac.uk/CGI/man-cgi?glob+7 section "Pathnames"
> [2] https://github.com/OVALProject/Sandbox/blob/b139c75050c5b23ec9d7f1e5da187234c5efafb9/resources/x-glob-to-regex/oval-function-proposal-form-v%233.txt#L54
> [3] https://github.com/OVALProject/Sandbox/blob/b139c75050c5b23ec9d7f1e5da187234c5efafb9/resources/x-glob-to-regex/oval-function-proposal-form-v%233.txt#L108
> [4] https://github.com/OpenSCAP/openscap/pull/67
>
> --
> Martin Preisler
> Security Technologies | Red Hat, Inc.
>
> To unsubscribe, send an email message to [hidden email] with
> SIGNOFF OVAL-DEVELOPER-LIST
> in the BODY of the message.  If you have difficulties, write to [hidden email].
>
> To unsubscribe, send an email message to [hidden email] with
> SIGNOFF OVAL-DEVELOPER-LIST
> in the BODY of the message.  If you have difficulties, write to [hidden email].

To unsubscribe, send an email message to [hidden email] with
SIGNOFF OVAL-DEVELOPER-LIST
in the BODY of the message.  If you have difficulties, write to [hidden email].