test: Accumulate CPU half-precision sums in float32#3488
test: Accumulate CPU half-precision sums in float32#3488sofinvalery wants to merge 1 commit intoml-explore:mainfrom
Conversation
6aee023 to
10188e2
Compare
|
Made it inline. Feels cleaner. |
zcbenz
left a comment
There was a problem hiding this comment.
Allocating a new array would have heavy performance penalty, the correct way would be refactoring strided_reduce/contiguous_reduce to accumulate in float32 rather than the output type.
10188e2 to
f869884
Compare
|
Refactored |
|
I don't think we should merge this tbh. It's kind of expected for floats that you get the accumulation precision of the float you are in. I am not against choosing high precision temporaries internally if possible but given the current implementation it is actually faster to cast to float32 and then sum (didn't try it but almost certainly it will be). For the test by the way, the correct fix is to simply upcast the random numbers before computing their average. We want to test random after all not sum. |
f869884 to
96fed69
Compare
|
Updated the test only. Thanks for the feedback! @zcbenz @angeloskath |
zcbenz
left a comment
There was a problem hiding this comment.
Thanks for fixing the test! Can you also update .github/actions/test-windows/action.yml to enable the test in CI?
96fed69 to
2ff1c70
Compare
Proposed changes
Accumulate CPU float16 and bfloat16 sum reductions in float32, while preserving the output dtype. This fixes precision loss in ops that use sum().
Fixes #3326.
Checklist
Put an
xin the boxes that apply.