Skip to content

Object parse fixes#4864

Merged
pks-t merged 5 commits intolibgit2:masterfrom
pks-t:pks/object-parse-fixes
Oct 26, 2018
Merged

Object parse fixes#4864
pks-t merged 5 commits intolibgit2:masterfrom
pks-t:pks/object-parse-fixes

Conversation

@pks-t
Copy link
Copy Markdown
Member

@pks-t pks-t commented Oct 26, 2018

No description provided.

pks-t added 5 commits October 25, 2018 12:52
Unfortunately, neither the `memmem` nor the `strnstr` functions are part
of any C standard but are merely extensions of C that are implemented by
e.g. glibc. Thus, there is no standardized way to search for a string in
a block of memory with a limited size, and using `strstr` is to be
considered unsafe in case where the buffer has not been sanitized. In
fact, there are some uses of `strstr` in exactly that unsafe way in our
codebase.

Provide a new function `git__memmem` that implements the `memmem`
semantics. That is in a given haystack of `n` bytes, search for the
occurrence of a byte sequence of `m` bytes and return a pointer to the
first occurrence. The implementation chosen is the "Not So Naive"
algorithm from [1]. It was chosen as the implementation is comparably
simple while still being reasonably efficient in most cases.
Preprocessing happens in constant time and space, searching has a time
complexity of O(n*m) with a slightly sub-linear average case.

[1]: http://www-igm.univ-mlv.fr/~lecroq/string/
While the tests in object::tag::read exercises reading and parsing valid
tags from the ODB, they barely try to verify that the parser fails in a
sane way when parsing invalid tags. Create a new test suite
object::tag::parse that directly exercise the parser by using
`git_object__from_raw` and add various tests for valid and invalid tags.
When parsing tags, we skip all unknown fields that appear before the tag
message. This skipping is done by using a plain `strstr(buffer, "\n\n")`
to search for the two newlines that separate tag fields from tag
message. As it is not possible to supply a buffer length to `strstr`,
this call may skip over the buffer's end and thus result in an out of
bounds read. As `strstr` may return a pointer that is out of bounds, the
following computation of `buffer_end - buffer` will overflow and result
in an allocation of an invalid length.

Fix the issue by using `git__memmem` instead. Add a test that verifies
parsing the tag fails not due to the allocation failure but due to the
tag having no message.
We currently do not have any test suites dedicated to parsing commits
from their raw representations. Add one based on `git_object__from_raw`
to be able to test special cases more easily.
The commit message encoding is currently being parsed by the
`git__prefixcmp` function. As this function does not accept a buffer
length, it will happily skip over a buffer's end if it is not `NUL`
terminated.

Fix the issue by using `git__prefixncmp` instead. Add a test that
verifies that we are unable to parse the encoding field if it's cut off
by the supplied buffer length.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant