https://stats.stackexchange.com/questions/517/unsupervised-supervised-and-semi-supervised-learning http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf
Wrappers
utilize the learning machine of interest as a black box to score subsets of
variable according to their predictive power.
Filters
select subsets of variables as a pre-processing
step, independently of the chosen predictor.
Embedded
methods perform variable selection in the
process of training and are usually specific to given learning machines
In supervised learning one is furnished with input (x1, x2, . .,) and output (y1, y2, . .,) and are challenged with finding a function that approximates this behavior in a generalizable fashion. The output could be a class label (in classification) or a real number (in regression)-- these are the "supervision" in supervised learning.
In the case of unsupervised learning, in the base case, you receives inputs x1, x2, . ., but neither target outputs, nor rewards from its environment are provided. Based on the problem (classify, or predict) and your background knowledge of the space sampled, you may use various methods: density estimation (estimating some underlying PDF for prediction), k-means clustering (classifying unlabeled real valued data), k-modes clustering (classifying unlabeled categorical data), etc.
Semi-supervised learning involves function estimation on labeled and unlabeled data. This approach is motivated by the fact that labeled data is often costly to generate, whereas unlabeled data is generally not. The challenge here mostly involves the technical question of how to treat data mixed in this fashion. See this Semi-Supervised Learning Literature Survey for more details on semi-supervised learning methods.