Sequential Forward Feature Selection (SFFS)¶

This example shows how to make a Random Sampling with 50% for each class.

Import librairies¶

from museotoolbox.ai import SequentialFeatureSelection
from museotoolbox.cross_validation import LeavePSubGroupOut
from museotoolbox import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
import numpy as np

Out:

/home/docs/checkouts/readthedocs.org/user_builds/museotoolbox/conda/latest/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
/home/docs/checkouts/readthedocs.org/user_builds/museotoolbox/conda/latest/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
/home/docs/checkouts/readthedocs.org/user_builds/museotoolbox/conda/latest/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)

Load HistoricalMap dataset¶

X,y,g = datasets.load_historical_data(return_X_y_g=True,low_res=True)

Create CV¶

LSGO = LeavePSubGroupOut(valid_size=0.8,n_repeats=2,
                random_state=12,verbose=False)

Initialize Random-Forest and metrics¶

classifier = RandomForestClassifier(random_state=12,n_jobs=1)

f1 = metrics.make_scorer(metrics.f1_score)

Set and fit the Sequentia Feature Selection¶

SFFS = SequentialFeatureSelection(classifier=classifier,param_grid=dict(n_estimators=[10,20]),verbose=False)

SFFS.fit(X.astype(np.float),y,g,cv=LSGO,max_features=3)

Show best features and score

print('Best features are : '+str(SFFS.best_features_))
print('F1 are : '+str(SFFS.best_scores_))

Out:

Best features are : [2, 0, 1]
F1 are : [0.8039189318356357, 0.8279091229700823, 0.8296741254915145]

In order to predict every classification from the best feature

SFFS.predict_best_combination(datasets.load_historical_data()[0],'/tmp/SFFS/best_classification.tif')

Out:

Predict with combination 2
Total number of blocks : 15
Detected 1 band for function predict_array.
No data is set to : 0.
Batch processing (15 blocks using 11Mo of ram)

Prediction... [##......................................]6%

Prediction... [#####...................................]13%

Prediction... [########................................]20%

Prediction... [##########..............................]26%

Prediction... [#############...........................]33%

Prediction... [################........................]40%

Prediction... [##################......................]46%

Prediction... [#####################...................]53%

Prediction... [########################................]60%

Prediction... [##########################..............]66%

Prediction... [#############################...........]73%

Prediction... [################################........]80%

Prediction... [##################################......]86%

Prediction... [#####################################...]93%

Prediction... [########################################]100%

Plot example

from matplotlib import pyplot as plt
plt.plot(np.arange(1,len(SFFS.best_scores_)+1),SFFS.best_scores_)
plt.xlabel('Number of features')
plt.xticks(np.arange(1,len(SFFS.best_scores_)+1))
plt.ylabel('F1')
plt.show()

Total running time of the script: ( 0 minutes 2.709 seconds)

Gallery generated by Sphinx-Gallery