Skip to content

Incremental (streaming) HMAC#1007

Merged
msprotz merged 55 commits intomainfrom
protz_incremental_hmac
Feb 5, 2025
Merged

Incremental (streaming) HMAC#1007
msprotz merged 55 commits intomainfrom
protz_incremental_hmac

Conversation

@msprotz
Copy link
Copy Markdown
Contributor

@msprotz msprotz commented Dec 3, 2024

In support of python/cpython#126359, this is an implementation of streaming (incremental) HMAC for all known fixed-length hash algorithms in HACL*.

This went relatively smoothly (insofar as wrestling with Low* goes), with only a tiny modification to the streaming functor.

This is a work in progress, and I still want to clean up the toplevel API, but this verifies, extracts, and compiles.

@msprotz msprotz requested a review from a team as a code owner December 3, 2024 22:42
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 3, 2024

[CI] Important!
The code in dist/gcc-compatible, dist/msvc-compatible, and dist/wasm
is tested on cryspen/hacl-packages.
dist is not automatically re-generated, be sure to do it manually.
(A fresh snapshot can also be downloaded from CI by going to the "artifacts" section of your run.)
Then check the following tests before merging this PR.
Always check the latest run, it corresponds to the most recent version of this branch.
All jobs are expected to be successful.
In some cases manual intervention is needed. Please ping the hacl-packages maintainers.

  • Build, Test, Benchmark: Build on Linux (x86, x64), Windows (x86, x64), MacOS (x64, arm64), s390x, Android (arm64) and test on Linux (x86, x64), Windows (x86, x64), MacOS (x64).
  • Performance Regression Tests: Navigate to the terminal output in “Run benchmarks”. The comparison with the main branch will be at the bottom. The run fails if the performance regresses too much.
  • OCaml bindings: Build & Tests
  • JS bindings: Build & Tests
  • Rust bindings: Build & Tests

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Dec 3, 2024

@picnixz can you take a look at the API? This is not the final form, but it should allow you to confirm this is headed in the right direction. What I have on my list is:

  • the key length is repeated between the index and key_and_len types -- the final malloc function will look more "C-native", and will take implementation, key, and key length, and will build the two structs under the hood
  • the malloc function right now assumes that you never screw up and only ever pass, e.g., Blake2b_256 on platforms that do support it -- the final function will be defensive and will return NULL in case you pass something inconsistent
  • the C files depend on two defines that need to be computed at build-configure time -- the Python integration will have to mimic what it does already for blake2 and define those two macros via build-time compile flags
  • the reset function only works if the key length is identical -- just like malloc, this will be wrapped in a defensive function that may return an error code if the reset operation failed

There's also a false_elim function somewhere that will be replaced by a KRML_ERROR once I get to it.

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Dec 4, 2024

APIs for malloc and reset were fixed.

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Dec 4, 2024

The digest function was missing! It's now in there, with the caveat that the implementation of digest has a temporary allocation declared on the stack for each possible choice of algorithm. I don't think it's the end of the world, but if it's a dealbreaker, we can have a variant that heap-allocates the temporary state and thus won't generate this kind of code (or we do a peephole optimization that introduces a true C union on the basis that the allocations all originate from disjoint branches of the code).

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Jan 4, 2025

Ok the key is no longer kept at runtime.

We still have one issue which is that mmalloc_partial is unsound (CC @tahina-pro) so we need to remove that function and expose mmalloc_partial_uninitialized. See https://github.com/hacl-star/hacl-star/pull/1007/files#diff-6575620b2a57a4b1ceed2b0a54e8d51f8c4e29225d5e7ac991568dcf8ad251b3R230-R231 -- I'll say more on Monday

I'll do another round later with the more minor fixes related to enums and excessive inlining.

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Jan 7, 2025

Ok so the problem is that malloc is modeled in Low* as taking an initial value, to be filled for every index of the array. When the initial value is zero, we use calloc -- this is fine. But if the initial value is something else, we either perform an assignment (array length 1) or a for-loop based initializer (unknown array length), and then the null-check that the user wrote comes after that.

My proposed fix is in FStarLang/karamel#513 and I just pushed a refresh C package that incorporates this fix.

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Feb 1, 2025

hey @picnixz I think I've fixed all of your comments and this should be in good shape -- when you get a chance, can you tell me whether this looks good for python? if so, I'll work on fixing the last few tidbits to get a green, then merge... thanks!

@picnixz
Copy link
Copy Markdown

picnixz commented Feb 1, 2025

Thank you! I am currently travelling and will be back next week. I have some work for the upcoming CRYPTO deadline but I'll definitely look at it after the 15th, if not before.

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Feb 3, 2025

Just talked about this with @R1kM and the plan is to merge this so that we have consistent null-checking across all those APIs now, then we can do follow-up touch-ups as you see fit if you have further tweaks requested. Cheers.

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Feb 4, 2025

The build issue is that there is now an incorrect dependency from SHA1 towards Blake2, most likely owing to type monomorphizations. Looking into it.

@msprotz
Copy link
Copy Markdown
Contributor Author

msprotz commented Feb 4, 2025

@R1kM this turned out to be a lot more complicated than I thought: the fact that the streaming functor now relies on the option type means that various monomorphizations of option were inserted at the first use-site, leading to the tags (None, Some) being inserted in whichever file used it first (in this case, Blake2) and subsequent algorithms (e.g. SHA2) #include-ing the header for Blake2 just to get the tags None and Some in scope.

What I ended up doing is lifting some of the state type definitions (which, upon being visited, would generate monomorphizations of option) into a separate file, and grouping all type definitions into one file.

The net result is that all types now live in Hacl_Streaming_Types.h, including state types that contain pointers to arrays of vec128/vec256. This still compiles, because libintvector.h is always safe to include (vec256 becomes void* on platforms that don't support it, precisely for this use-case), but it's not ideal. Worse, because I can no longer eliminate unused declarations from this module (because their presence now has a side-effect), I get a C file with the discriminators (is_UU...).

But @mtzguido has given me a secret attribute to remove that last bit so hopefully I can fix that.

@msprotz msprotz merged commit 785f536 into main Feb 5, 2025
@msprotz msprotz deleted the protz_incremental_hmac branch February 5, 2025 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants