One of the reasons is the expectation is reward (r) x probability (p) . Because p is continuous, it can smooth r.
For example, r = +1/-1 in the driving scheme below, but p might be 0.899 (if you drive 1000 times, you will fall 101 times). As a result, you get a smoothed expectation.
The picture above comes from Deep RL course taught by Sergey Levine.