the estimate of this function is learned and is the solution of the learning problem
in case of classification the function is called "decision function", binary classification, multi-class classification, regression (for realvalued functions)
the solution is chosen from a set of candidate functions (hypothesis), before learning the correct function (e.g. decision trees), "Generalisation" is the ability of a hypothesis to correctly classify data not in the training set; >>generalisation has to be optimized, hypothesis should not get too complex (should not "overfit"), choose optimal compromise between complexity and accuracy by heuristics, we will adopt an approach that motivates the trade-off by reference to statistical bounds on the generalisation error, Margin of the Classifier, avoids danger of a heuristic, statistical result provides a well founded basis for the approach, Bayesian analysis, prior distribution over the set of hypothesis that describes the learner's belief of the likelihood of a particular hypothesis generating the data, all learning systems have to make some prior assumption of this type, often called the "learning bias"
the choice of the set of hypthesis (hypothesis space) is key strategy