Learn algorithm and customize your input raster without writing it on disk¶

This example shows how to customize your raster (ndvi, smooth signal…) in the learning process to avoi generate a new raster.

Import librairies¶

from museotoolbox.ai import SuperLearner
from museotoolbox.processing import extract_ROI
from museotoolbox import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

Load HistoricalMap dataset¶

raster,vector = datasets.load_historical_data(low_res=True)
field = 'Class'

Initialize Random-Forest and metrics¶

classifier = RandomForestClassifier(random_state=12,n_jobs=1)

kappa = metrics.make_scorer(metrics.cohen_kappa_score)
f1_mean = metrics.make_scorer(metrics.f1_score,average='micro')
scoring = dict(kappa=kappa,f1_mean=f1_mean,accuracy='accuracy')

Start learning¶

sklearn will compute different metrics, but will keep best results from kappa (refit=’kappa’)

SL = SuperLearner(classifier=classifier,param_grid=dict(n_estimators=[10]),n_jobs=1,verbose=1)

Create or use custom function

def reduceBands(X,bandToKeep=[0,2]):
    # this function get the first and the last band
    X=X[:,bandToKeep].reshape(-1,len(bandToKeep))
    return X

# add this function to learnAndPredict class
SL.customize_array(reduceBands)

# if you learn from vector, refit according to the f1_mean
X,y = extract_ROI(raster,vector,field)
SL.fit(X,y,cv=2,scoring=scoring,refit='f1_mean')

Out:

Fitting 2 folds for each of 1 candidates, totalling 2 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s finished
best score : 0.9187145557655955
best n_estimators : 10

Read the model¶

print(SL.model)
print(SL.model.cv_results_)
print(SL.model.best_score_)

Out:

GridSearchCV(cv=<museotoolbox.cross_validation.RandomStratifiedKFold object at 0x7f7fdf7d3690>,
             estimator=RandomForestClassifier(n_jobs=1, random_state=12),
             n_jobs=1, param_grid={'n_estimators': [10]}, refit='f1_mean',
             scoring={'accuracy': 'accuracy',
                      'f1_mean': make_scorer(f1_score, average=micro),
                      'kappa': make_scorer(cohen_kappa_score)},
             verbose=1)
{'mean_fit_time': array([0.02137256]), 'std_fit_time': array([0.00026202]), 'mean_score_time': array([0.00601017]), 'std_score_time': array([5.94854355e-05]), 'param_n_estimators': masked_array(data=[10],
             mask=[False],
       fill_value='?',
            dtype=object), 'params': [{'n_estimators': 10}], 'split0_test_kappa': array([0.85692241]), 'split1_test_kappa': array([0.85752681]), 'mean_test_kappa': array([0.85722461]), 'std_test_kappa': array([0.0003022]), 'rank_test_kappa': array([1], dtype=int32), 'split0_test_f1_mean': array([0.91871456]), 'split1_test_f1_mean': array([0.91871456]), 'mean_test_f1_mean': array([0.91871456]), 'std_test_f1_mean': array([0.]), 'rank_test_f1_mean': array([1], dtype=int32), 'split0_test_accuracy': array([0.91871456]), 'split1_test_accuracy': array([0.91871456]), 'mean_test_accuracy': array([0.91871456]), 'std_test_accuracy': array([0.]), 'rank_test_accuracy': array([1], dtype=int32)}
0.9187145557655955

Get F1 for every class from best params¶

for stats in SL.get_stats_from_cv(confusion_matrix=False,F1=True):
    print(stats['F1'])

Out:

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s finished
[0.94966269 0.77289377 0.99649123 0.74418605 0.        ]
[0.94994786 0.77697842 0.99647887 0.73282443 0.        ]

Get each confusion matrix from folds¶

for stats in SL.get_stats_from_cv(confusion_matrix=True):
    print(stats['confusion_matrix'])

Out:

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s finished
[[915  30   0   1   1]
 [ 64 211   0  14   0]
 [  0   0 284   0   0]
 [  0  16   2  48   0]
 [  1   0   0   0   0]]
[[911  34   0   2   0]
 [ 59 216   0  14   0]
 [  0   0 283   1   0]
 [  0  17   1  48   0]
 [  1   0   0   0   0]]

Save each confusion matrix from folds¶

SL.save_cm_from_cv('/tmp/testMTB/',prefix='RS50_')

Out:

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s finished

Predict map¶

SL.predict_image(raster,'/tmp/classification.tif',
                  higher_confidence='/tmp/confidence.tif',
                  confidence_per_class='/tmp/confidencePerClass.tif')

Out:

Total number of blocks : 6
Detected 1 band for function predict_array.
No data is set to : 0.
Detected 5 bands for function predict_confidence_per_class.
No data is set to : -32768.
Detected 1 band for function predict_higher_confidence.
No data is set to : -32768.
Batch processing (6 blocks using 23Mo of ram)

Prediction... [##......................................]5%

Prediction... [####....................................]11%

Prediction... [######..................................]16%

Prediction... [########................................]22%

Prediction... [###########.............................]27%

Prediction... [#############...........................]33%

Prediction... [###############.........................]38%

Prediction... [#################.......................]44%

Prediction... [####################....................]50%

Prediction... [######################..................]55%

Prediction... [########################................]61%

Prediction... [##########################..............]66%

Prediction... [############################............]72%

Prediction... [###############################.........]77%

Prediction... [#################################.......]83%

Prediction... [###################################.....]88%

Prediction... [#####################################...]94%

Prediction... [#######################################.]99%

Prediction... [########################################]100%

Plot example

from matplotlib import pyplot as plt
from osgeo import gdal
src=gdal.Open('/tmp/classification.tif')
plt.imshow(src.GetRasterBand(1).ReadAsArray(),cmap=plt.get_cmap('tab20'))
plt.axis('off')
plt.show()

Total running time of the script: ( 0 minutes 1.092 seconds)

Gallery generated by Sphinx-Gallery