In robot vision you can obtain three-dimensional models of the objects and therefore be aware of the objects you can manipulate. Then you can point to them, you can estimate their properties and create virtual models of them, which you can use in virtual environment and so on.
Those are some of the motivations for gathering information about the scene so that you can do all these kinds of things without ever having to be in a situation where you do not have the data that you want.