Forecast sharpness is massively undervalued here. Insufficiently sharp forecasts are extremely hard to make decisions from. In fact, in many cases, trading some calibration for sharpness might be preferable.
As an extreme example, assume you can choose between two oracles for binary outcomes. One is perfectly calibrated but always gives probabilities between 40 and 60%. The other only ever predicts 1 or 99% but its calibration is slightly off. Assuming "slightly" isn't too big, the latter oracle is probably more useful for many real decision.
Forecast sharpness is massively undervalued here. Insufficiently sharp forecasts are extremely hard to make decisions from. In fact, in many cases, trading some calibration for sharpness might be preferable.
As an extreme example, assume you can choose between two oracles for binary outcomes. One is perfectly calibrated but always gives probabilities between 40 and 60%. The other only ever predicts 1 or 99% but its calibration is slightly off. Assuming "slightly" isn't too big, the latter oracle is probably more useful for many real decision.