scikit-learn是python进行数据挖掘与分析的重要而有效的工具,学习scikit-learn最重要的资料来源就是它的官网:
http://scikit-learn.org/stable/index.html
一进官网就能看到它的6大功能:
classification
Regression
Clustering
Dimensionality reduction
Model selection
Preprocessing
它的API模块包括:
sklearn.base: Base classes and utility function
sklearn.cluster: Clustering
sklearn.cluster.bicluster: Biclustering
sklearn.covariance: Covariance Estimators
sklearn.model_selection: Model Selection
sklearn.datasets: Datasets
sklearn.decomposition: Matrix Decomposition
sklearn.dummy: Dummy estimators
sklearn.ensemble: Ensemble Methods
sklearn.exceptions: Exceptions and warnings
sklearn.feature_extraction: Feature Extraction
sklearn.feature_selection: Feature Selection
sklearn.gaussian_process: Gaussian Processes
sklearn.isotonic: Isotonic regression
sklearn.kernel_approximation: Kernel Approximation
sklearn.kernel_ridge: Kernel Ridge Regression
sklearn.discriminant_analysis: Discriminant Analysis
sklearn.linear_model: Generalized Linear Models
sklearn.manifold: Manifold Learning
sklearn.metrics: Metrics
sklearn.mixture: Gaussian Mixture Models
sklearn.multiclass: Multiclass and multilabel classification
sklearn.multioutput: Multioutput regression and classification
sklearn.naive_bayes: Naive Bayes
sklearn.neighbors: Nearest Neighbors
sklearn.neural_network: Neural network models
sklearn.calibration: Probability Calibration
sklearn.cross_decomposition: Cross decomposition
sklearn.pipeline: Pipeline
sklearn.preprocessing: Preprocessing and Normalization
sklearn.random_projection: Random projection
sklearn.semi_supervised: Semi-Supervised Learning
sklearn.svm: Support Vector Machines
sklearn.tree: Decision Tree
sklearn.utils: Utilities
经常玩数据比赛的人就会发现classification,Regression,Clustering,Dimensionality reduction,Model selection,Preprocessing这几样都会经常用到的。尤其是classification和Regression,不过如果你直接用他们进行分类或者回归,不对数据进行预处理和其他的操作的话,成绩通常不太会太好。
其实scikit-learn学习还真没有太多好说的,因为文档写的很好,而且例代码也很多,一般情况拿过例代码,调一调就好了。
比如你想试一下linear_model的Ridge,直接访问:
http://scikit-learn.org/stable/modules/linear_model.html
看一下Ridge的说明,如果想试一下,里面的例代码都写好了:
>>> from sklearn import linear_model
>>> reg = linear_model.Ridge (alpha = .5)
>>> reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, random_state=None, solver='auto', tol=0.001)
>>> reg.coef_
array([ 0.34545455, 0.34545455])
>>> reg.intercept_
0.13636...
用起来的顺序大概就是
1,拿过来,跑一下,理解一下。
2,改一改,代入自己的数据,跑一下,看看结果。
3,调一调参数,优化一下。
如果你在想了解函数的参数怎么回事,点击这个文档里这个函数的链接就行了啊。比如ridgeCV:
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV
就酱,你已经成为一个合格的调包侠了。
代码是Python的
里面的语句参数啥的都看不懂。。。
嗯,一起来学习吧
现在每天玩的就是这个啊
感谢分享,说的很详细
会用sklearn可以去做数据挖掘吗,哈哈。
忘了说安装了,Python的包很多,自己安装很麻烦,大家直接下载安装Anaconda就可以了,sklearn这些常用的包里面都有了。