MultiDistribution by benjamin-lieser · Pull Request #18 · rust-random/rand_distr

benjamin-lieser · 2025-03-02T09:26:26Z

Added a CHANGELOG.md entry

Summary

Some code related to #16

benjamin-lieser · 2025-03-02T11:38:55Z

I put a &self receiver, like in Distribution @dhardy was there a specific reason why you proposed &mut self?

I am not sure if I want a different name, because it clashed with the one in Distribution and I would like structs to implement both if applicable. Making these things unambiguous feels often too cumbersome in Rust. And it can break existing code when doing use rand_distr::* or similar.

dhardy · 2025-03-02T12:13:31Z

I put a &self receiver, like in Distribution @dhardy was there a specific reason why you proposed &mut self?

Sorry, it should be &self.

I am not sure if I want a different name, because it clashed with the one in Distribution and I would like structs to implement both if applicable. Making these things unambiguous feels often too cumbersome in Rust. And it can break existing code when doing use rand_distr::* or similar.

I'm not really sure of the answer to that. Any of these can work (from the POV of a dependency):

use rand_distr::MultiDistribution; — Do we want the multi module to be pub? Likely yes, in which case supporting this path is redundant.
use rand_distr::multi::Distribution; — Usable but potential for confusion and can make usage of both Distribution traits annoying
use rand_distr::multi::MultiDistribution; — Redundant naming, but avoids the above issues so probably the best choice

benjamin-lieser · 2025-03-02T15:33:15Z

I was talking about the naming of sample because its the same as in distribution and makes the method ambiguous when both traits are in scope.

I guess I also prefer the last of the 3 options for the name of the trait.

benjamin-lieser · 2025-03-03T14:30:28Z

Also, should we do this in a v0.6 branch? I guess there will be breaking changes with respect to the Dirichlet at the end and possible some v0.5.* releases.

dhardy · 2025-03-03T15:20:00Z

I was talking about the naming of sample because its the same as in distribution and makes the method ambiguous when both traits are in scope.

This shouldn't be an issue unless a type implements both Distribution and MultiDistribution — but I doubt we'd want that.

Except possibly a symmetric distribution like Dirichlet which could be sampled in one or multiple dimensions? Or if we wanted to transparently support uncorrelated multi-dimensional sampling of 1D distributions like Normal? No, both these are likely bad ideas.

benjamin-lieser · 2025-03-03T18:15:13Z

I was talking about the naming of sample because its the same as in distribution and makes the method ambiguous when both traits are in scope.

This shouldn't be an issue unless a type implements both Distribution and MultiDistribution — but I doubt we'd want that.

Except possibly a symmetric distribution like Dirichlet which could be sampled in one or multiple dimensions? Or if we wanted to transparently support uncorrelated multi-dimensional sampling of 1D distributions like Normal? No, both these are likely bad ideas.

I was thinking about both having MultiDistribution to sample into a buffer and still Distribution which returns a Vec. If someone would anyway allocate for each sample or does not care about allocations, the latter is more convenient.
But I could understand if this would lead to more confusion than it benefits people.

dhardy · 2025-03-04T07:04:23Z

The point of using const generics was to avoid needing to allocate. If allocation is necessary anyway, having both sample_to_buf and sample(...) -> Vec<_> doesn't add much. Moreover, if we are going to support both styles of method, it should be done under the same trait in my opinion — we can implement sample automatically, provided we know the expected sample length.

Which leads to another point: we may want a sample_len or just len method.

So:

pub trait MultiDistribution<T> {
    fn sample_len(&self) -> usize;
    fn sample_to_buf(&self, buf: &mut [T]);
    fn sample(&self) -> Vec<T> where T: Default {
        let mut buf = Vec::new();
        buf.resize_with(self.sample_len(), T::default());
        self.sample_to_buf(&mut buf);
    }
}

That requires T: Default to support sample, which I think is reasonable.

benjamin-lieser · 2025-03-04T09:14:46Z

We should decide if we want to keep the possibility to have multidim sampling without allocations or not. If we do not need it, we can ditch the const generics and have less code and then your proposed Trait would make a lot of sense.

I am still leaning toward keeping it, because I had a usecase where Multinomial samples where extremely time critical and it helps to save the allocations, especially in multithread where there is synchronization with malloc. But this might also be a niche usecase. (I would only sample once per Multinomial)

Edit: Your Trait is also still implementable for a const generic version and you can avoid all allocations. So I would go with this approach.

dhardy · 2025-03-05T08:07:59Z

I thought we did decide to drop the const-generics approach for rand_distr?

Your use-case sounds fairly specific. Maybe there are further optimisations available when sampling only once.

MortenLohne · 2025-03-05T15:31:26Z

Would it be possible to have a non-const generic Dirichlet that can still be used without allocating? Without discussing Distribution traits for now, imagine the following API:

impl<F: Float> Dirichlet<F> {
    pub fn new(alpha: Vec<F>) -> Result<Dirichlet<F>, Error>;
    pub fn sample<R: Rng>(&self, rng: &mut R, output: &mut Vec<F>); // Users can re-use this vector to avoid allocations
    pub fn into_vec(self) -> Vec<F>; // Recovers the memory passed in with `new()`
}

This still requires the alloc feature, but so does the current (const generic) implementation, and afaict no one has presented such a use case yet.

For sample(), we could also generalize the output type to iter::Extend, if we want to allow collecting into other data structures.

benjamin-lieser · 2025-03-05T16:26:40Z

Would it be possible to have a non-const generic Dirichlet that can still be used without allocating? Without discussing Distribution traits for now, imagine the following API:
impl<F: Float> Dirichlet<F> {
    pub fn new(alpha: Vec<F>) -> Result<Dirichlet<F>, Error>;
    pub fn sample<R: Rng>(&self, rng: &mut R, output: &mut Vec<F>); // Users can re-use this vector to avoid allocations
    pub fn into_vec(self) -> Vec<F>; // Recovers the memory passed in with `new()`
}
This still requires the alloc feature, but so does the current (const generic) implementation, and afaict no one has presented such a use case yet.

For sample(), we could also generalize the output type to iter::Extend, if we want to allow collecting into other data structures.

In the case of Dirichlet it needs an array of other distributions (Beta or Gamma), so this would not work. I would also say its a bit to complex of an API.

dhardy

Some comments on comment style below.

More significantly, the sample method now looks identical to Distribution::sample, hence your idea to use that trait likely makes more sense: we can automatically impl Distribution<Vec<T>> where T: Default.

This would also allow an explicit impl of Distribution<[T; N]> where appropriate (e.g. your mentioned const-generic Multinomial).

But, at this point, do we still want the MultiDistribution trait at all?

dhardy · 2025-03-07T06:53:27Z

+/// This trait allows to sample from a multi-dimensional distribution without extra allocations.
+/// For convenience it also provides a `sample` method which returns the result as a `Vec`.
+pub trait MultiDistribution<T> {


Items have a short one-line description, with additional details in new paragraphs.

benjamin-lieser · 2025-08-04T14:16:09Z

My current favorite would be something like this:

pub trait MultiDistribution<T> {
    /// Returns the length of one sample (dimension of the distribution)
    fn sample_len(&self) -> usize;
    /// Samples from the distribution and writes the result to `buf`
    fn sample_to_buf<R: Rng + ?Sized>(&self, rng: &mut R, buf: &mut [T]);
}

impl<T, D> Distribution<Vec<T>> for D
where
    D: MultiDistribution<T>,
    T: Clone + Default,
{
    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> Vec<T> {
        let len = self.sample_len();
        let mut buf = vec![Default::default(); len];
        self.sample_to_buf(rng, &mut buf);
        buf
    }
}

The distributions would implement MultiDistribution and we get the Distribution<Vec<T>> implementation for free.

Unfortunately this does not work because of orphan rules. It also does not feel right to define the MultiDistribution in rand, so I am not sure if this can work.

We could just implement Distribution manually (we could then also distinguish between Vec<T> and [T;3] for example).

dhardy

I think we can go with this trait design.

benjamin-lieser · 2025-09-11T12:07:18Z

I think we can keep the MultiDistribution design. The implementation of Distribution is pretty ugly, but this is not a problem for the users and not too bad to handle for us.

Final question: Should we keep the const generic Dirichlet and add a non const generic one, or just replace it.

dhardy · 2025-09-11T13:07:11Z

The implementation of Distribution is pretty ugly

This is fine IMO.

Should we keep the const generic Dirichlet and add a non const generic one, or just replace it.

That question shouldn't be directed here; can we adjust it in a new PR?

From what I remember, there didn't appear to be much use (or evidence for use) for the const-generic implementation.

benjamin-lieser · 2025-09-11T14:05:57Z

So in this PR, we keep the original const generic Dirichlet, just implementing Multidistribution?
And then merging modified #15 later?

dhardy

We can also just keep these changes to Dirichlet. The only mandatory change is the licence header.

Perhaps we should add an additional trait, ConstMultiDistribution: MultiDistribution with associated const SAMPLE_LEN: usize and an additional macro impleminting Distribution<[T; SAMPLE_LEN]> (array output). This might work better for your time-critical Multinomial distrs.

For another PR however.

…ue_stability.rs

dhardy · 2025-09-11T17:19:35Z

Needs cargo fmt.

benjamin-lieser added 4 commits March 2, 2025 10:19

Multidistribution trait

4d7387d

documentation for MultiDistr

0048240

add to lib.rs and more doc

5c96617

better doc

6a5bd80

benjamin-lieser added 3 commits March 2, 2025 16:46

remove pub use of MultiDistribution

c9df79c

remove test impl of MultiDistribution again

e61da69

fmt

4a78dbc

benjamin-lieser added 2 commits March 3, 2025 22:25

move dirichlet

cb4c824

MultiDistribution in Dirichlet, still const generics

7572ce3

benjamin-lieser added 7 commits March 6, 2025 17:18

new MultiDistribution still const gen Dirichlet

b1e663d

fmt

b53aeda

doc

2435d3f

dirichlet usage

c4acb6d

fmt

860897c

doctest

3219cd6

typo

e74b4b4

dhardy reviewed Mar 7, 2025

View reviewed changes

vks reviewed Mar 12, 2025

View reviewed changes

Comment thread src/multi/dirichlet.rs

dhardy reviewed Aug 5, 2025

View reviewed changes

Comment thread src/multi/dirichlet.rs Outdated

Comment thread src/multi/dirichlet.rs Outdated

Comment thread src/multi/mod.rs Outdated

Multidistribution rework, Distribution macro

7d36b0a

dhardy mentioned this pull request Sep 11, 2025

Remove const-generic size parameter from Dirichlet distribution #15

Closed

1 task

fmt

5685143

benjamin-lieser marked this pull request as ready for review September 11, 2025 14:06

dhardy requested changes Sep 11, 2025

View reviewed changes

Comment thread src/multi/mod.rs

Comment thread src/multi/mod.rs Outdated

benjamin-lieser added 4 commits September 11, 2025 16:41

License header, MultiDistribution doc string and unused import in val…

76de93d

…ue_stability.rs

changelog

13c8984

Merge branch 'master' into multi

34e36ad

Merge remote-tracking branch 'rand/master' into multi

7081044

dhardy approved these changes Sep 11, 2025

View reviewed changes

Merge branch 'master' into multi

d38020b

fmt

8f9a914

dhardy merged commit 63f0430 into rust-random:master Sep 12, 2025
14 checks passed

Uh oh!

Conversation

benjamin-lieser commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

benjamin-lieser commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Mar 2, 2025

Uh oh!

benjamin-lieser commented Mar 2, 2025

Uh oh!

benjamin-lieser commented Mar 3, 2025

Uh oh!

dhardy commented Mar 3, 2025

Uh oh!

benjamin-lieser commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjamin-lieser commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Mar 5, 2025

Uh oh!

MortenLohne commented Mar 5, 2025

Uh oh!

benjamin-lieser commented Mar 5, 2025

Uh oh!

dhardy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dhardy Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

benjamin-lieser commented Aug 4, 2025

Uh oh!

dhardy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benjamin-lieser commented Sep 11, 2025

Uh oh!

dhardy commented Sep 11, 2025

Uh oh!

benjamin-lieser commented Sep 11, 2025

Uh oh!

dhardy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dhardy commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

benjamin-lieser commented Mar 2, 2025 •

edited

Loading

benjamin-lieser commented Mar 2, 2025 •

edited

Loading

benjamin-lieser commented Mar 3, 2025 •

edited

Loading

dhardy commented Mar 4, 2025 •

edited

Loading

benjamin-lieser commented Mar 4, 2025 •

edited

Loading