CodeCollection2018
11/4/2018 - 7:53 AM

sklearn概览

sklearn主要内容

sklearn三大组件:Estimator,Transformer,Pipeline'
sklearn六大模块:Classfication ,Rregression,Clustering , Dimensionality reduction,Model selection,Preprocessing
数据预处理方面:
  MinMaxScalar
  Normalizer
  StandardSaclar
  LabelEncoder
  OneHotEncoder
  Binarizer
  MultiLabelBinary
特征抽取方面:
  DictVectorizer
  FeatureHasher
  CountVectorizer
  TfidfVectorizer
  HashingVectorizer
特征选择方面:
  VarianceThreshold
  SelectKBest
  SelectPercentile
聚类:
  DBSCAN
  KMeans
  MiniBatchKMeans
  SpectralClustering
  MeanShift

矩阵分解:
  FactorAnalysis
  FastICA
  IncrementalPCA
  KernelPCA
  LatentDirichletAllocation
  NMF
  SparsePCA
  
集成学习:
  BaggingClassifier
  BaggingRegressor
  AdaBoostClassifier
  AdaBoostRegressor
  GradientBoostingClassifier
  GradientBoostingRegressor
  ExtraTreeClassfier
  ExtraTreeRegressor
  RandomForestClassfier
  RandomForestRegressor
  VotingClassifier
GLM系列:
  BayesianRidge
  Lasso
  LinearRegression
  LogisticRegression
  Perceptron
  Ridge
  SGDClassifier
  SGDRegressor
Manifold learning:
  Isomap
  MDS
  LocallyLinearEmbedding
  SpectralEmbedding
模型评价metrics:
  accuracy_score
  auc
  classfication_report
  confusion_matrix
  f1_score
  hamming_loss
  log_loss
  precision_score
  recall_score
  ros_auc_score
  roc_curve
  mean_absolute_error
  mean_squared_error
  median_absolute_error
  r2_score
高斯模型:
  BayesianGaussianMixture
  GaussianMixture
模型选择model_selection:
  GridSearchCV
  RandomizedSearchCV
  cross_validate
多类多标签:
  OneVsRestClassifier
  OneVsOneClassifier
  OutputCoderClassifier
数据预处理preprocessing:
  
朴素贝叶斯
最近邻
神经网络
Pipeline流水线作业
SVM
Decision Trees