8  Thematic Accuracy Assessment

Module Overview

This module performs a comprehensive thematic accuracy assessment of the land cover map using an independent ground reference data. The framework for this module are based on the publication, Good Practices for Estimating Area and Assessing Accuracy of Land Change, by Olofsson et al 2014. This framework separates accuracy assessment into three major components, namely sampling design, response design, and analysis. The Luma-GE only adapt the sampling design and analysis components. Since the response design consist of protocols and guideline in obtaining the ground reference data, this component are currently not implemented.

Input

Name of input Input type Details
Validation map User’s input In shapefile format.
Classified LULC map Input from Other Modules Module 6

Output

  • Confusion matrix.
  • Thematic accuracy metrics: Overall Accuracy (OA), Kappa Coefficient, Producer’s Accuracy (PA), User’s Accuracy (UA).
  • Reference data sites (optional, for users who do not have reference data, the system will generate reference data sites for them to label outside Luma-GE).
  • Spatial distribution of error

Process

8.1 Checking Prerequisites from Previous Modules

This is a part of System Response 7.1: Verification data

8.1.1 Checking Prerequisites from Previous Modules

Luma User Journey

  1. The user is displayed a verification of the required inputs stored in the system.

    ImportantError Handling Notification

    This module cannot be accessed if the system is missing the required inputs from the Input from Other Modules.

Luma Geospatial Engine

  1. The system validates availability of the required inputs.

8.2 Thematic Accuracy Assessment

8.2.1 Selecting The Workflow

Two workflow are available for Module 7. The first workflow is design for the user who did not have reference data. The second workflow is for the user who already have a reference data.

Luma User Journey

  1. If the user did not have reference samples, the user select the “generate reference sample” workflow.

    1. The user specified desire standard error (margin of error) of the map. The range of valid value is 0.1% - 10%, with smaller value resulting in larger sample requirement

    2. The user determine the minimum expected accuracy for each class. This option is set to optional, with the default value of 85% for all class

    3. User are able to generate the sample and check the allocation as well as the spatial distribution in the map canvas.

    4. The user download the reference data and proceed to labeled the data using visual interpretation of higher resolution imagery, field survey, or combination of both. This process are conducted outside Luma-GE

  2. If the user already have reference sample, they choose the “compute accuracy workflow”

    1. The user prompted to upload their reference samples. Currently, the Luma-GE only support shapefile data

    2. The user select column header that correspond to class ID and class name.

    3. The user perform the accuracy assessment and decide if their classification is meet their accuracy needs

Luma Geospatial Engine

For each workflow, the geospatial engine perform the following operation

8.2.2 Generate Reference Sample Workflow

This workflow consist of two steps, steps, namely sample size calculation and sample allocation. The result of each steps is how many minimum sample required for each class and the location for the samples.

Sample Size Calculation

Sample size calculation for stratified random sample is calculated using the formula provided by Cochran (1977, Eq. (5.25)). The samples are proportionally allocated for each strata (class), resulting in balance sample allocation. The key steps for sample size calculation is as follows:

  1. Calculate stratum standard deviation: Si = √(Ui × (1 - Ui))

    Where:

    Si = standard deviation for stratum i

    Ui = expected accuracy for class

  2. Calculate stratum weight: Wi = Ni / N

    Wi = weight of stratum i

    Ni = Number of pixels in class i

    N = Total number of pixels

  3. Calculate total sample size: n = (Σ(Wi × Si) / SE)²

    n = Total required samples

    SE = desire standard error

  4. Allocate sample per stratum ni = n × (Wi × Si) / Σ(Wi × Si)

    ni = Number of samples for stratum i

TipRelated Function

sample_size_calculator.get_pixel_count_per_class: calculate pixel count for each class using earth engine’s reducer

sample_size_calculator.validate_sample_size_inputs: Input validation prior to main sample size calculation

sample_size_calculator.calculate_strata_sample: Main function that calculate sample size for each class

Sample Allocation

The required sample for each class is allocated in the area of interest using ee.Image.stratifiedSample() . The sample size are based on the sample size calculation result. The feature collection from this process only contain class ID for each land cover class. Class name will be added in the future update. The user are able to download the samples, therefore they can labeled the sample outside Luma GE. Current function only support shapefiles.

TipRelated Function

sample_size_calculator.generate_stratified_samples: Generate the reference samples based sample size calculation

sample_size_calculator.export_samples_to_shp: Export the generated samples

8.2.3 Accuracy Computation Workflow

Luma User Journey

This is a part of User Journey 7.2 Upload data

  1. The user is prompted to upload the validation map in a .zip file. The user is reminded that the validation map should have the same ID class as the one used for the classification.

    NoteSuccess Notification

    The system shows a confirmation that the validation data has been validated.

    ImportantError Handling Notification

    The system shows an error message if the uploaded validation map fails the validation process.

  2. The system displays the uploaded validation map on a canvas map along with the tabular data.

Luma Geospatial Engine

  1. The system verifies if the .shp of the validation map is the uploaded inside the .zip file.

    ImportantError Handling Notification

    The system provides an error notification if the .shp file is not found inside the .zip file.

  2. The system conducts geometry fixes on the uploaded validation map.

    TipRelated Function

    input_utils.shapefile_validator.validate_and_fix_geometry():to validate and fix geometries issues.

    ImportantError Handling Notification

    An error message is displayed if the system failed to run validate_and_fix_geometry().

  3. The system converts the validation map from .gdf data into Earth Engine Feature Collection format

    TipRelated Function

    input_utils.shapefile_validator.convert_roi_gdf(): to convert geodataframe into EE Feture Collection. The supported multi-geometries type for this conversions are MultiPoint and MultiPolygon.

    ImportantError Handling Notification

    An error message is displayed if the system failed to run convert_roi_gdf().

8.2.4 Starting the thematic accuracy assessment

Luma User Journey

This is a part of User Journey 7.3: Thematic accuracy assessment

  1. The user is prompted to fill in the parameters for the thematic accuracy assessment, including specifying the column that refers to the LULC ID class and the pixel size of the classified LULC map.

  2. The user can optionally set the confidence interval for the thematic accuracy assessment.

  3. The user is prompted to start the thematic accuracy assessment process.

    ImportantError Handling Notification

    The system shows an error message if the system fails to run the thematic accuracy assessment.

Luma Geospatial Engine

This is a part of System Response 7.3: Thematic accuracy assessment

  1. The system performs a thematic accuracy assessment by validating inputs, sampling the classified LULC map at reference points.

    TipRelated Functions

    accuracy.thematic_accuracy.validate_assessment_inputs(): checks whether the classified LULC map, validation map, LULC ID class field, and pixel size parameter meet the required conditions before the assessment runs. This function is used in run_accuracy_assessment().

    accuracy.thematic_accuracy._extract_confusion_matrix_data(): extracts overall accuracy, kappa, per-class accuracies, and the confusion matrix array from an ee.ConfusionMatrix object. This function is used in run_accuracy_assessment().

    accuracy.thematic_accuracy._calculate_confidence_interval(): Computes the confidence interval for overall accuracy using a normal approximation based on the number of correct and total samples. This function is used in run_accuracy_assessment().

    accuracy.thematic_accuracy._calculate_f1_scores(): Calculates the F1 score for each class using the producer’s and user’s accuracy values. This function is used in run_accuracy_assessment().

    accuracy.thematic_accuracy.run_accuracy_assessment(): Executes the full thematic accuracy workflow, including validation, sampling (sampleRegions()), metric extraction (errorMatrix()), confidence interval calculation, and compilation of final results.

  2. Thematic Accuracy Metrics

    Several accuracy metrics is calculated to provide comprehensive report of the map’s thematic quality.

    • Overall Accuracy

      Overall Accuracy (OA): Sum of the major diagonal (correctly classified pixels) divided by the total pixels in the entire confusion matrix

      OA = Σ(n_ii) / N

    • Kappa Coefficient

      Kappa coefficient is statistical test generated from the error matrix. Kappa coefficient show how well the classification performed as compared to just randomly assigning values. Kappa value range from -1 to 1, in which the value of 0 indicate that the classification is no better than a random classification. The negative value indicate that the classification is worse than random classification. The value closer to 1 indicate that the classification is better than random classification.

      κ = (Po - Pe) / (1 - Pe) Po = Σ(n_ii) / N Pe = Σ(n_i+ × n_+i) / N²

      Where:

      Po = Observed Aggreement (OA)

      Pe = Expected agreement by chance

    • Producers’s Accuracy (Recall/Sensitivity)

      One of the class level accuracy metric, which answer the question, “What fraction of actual class was correctly mapped?”

      PA_i = n_ii / n_i+

      Where:

      n_ii = correctly classified samples for class i n_i+ = total reference samples for class i (row sum)

    • User’s Accuracy (Precision)

      One of the class level error metric, which answer the question “What fraction of predicted class i is actually class i?”

      UA_i = n_ii / n_+i

      Where:

      n_ii = correctly classified samples for class i n_+i = total predicted samples for class i (column sum)

8.2.5 Spatial Distribution of Error

Luma Geospatial Engine

This is part of System Response 7.3: Thematic accuracy assessment

This function provide reference data flagging for visualizing the spatial error distribution, improving the error analysis by tagging each point with whether the classifier got it right or wrong. Currently, this function only works if the reference data unit is point (single pixel). If the reference data consist of polygons, this function will not work.

  1. The system conduct checks if reference data geodataframe is available.

  2. Add actual and predicted class to the geodataframe as key point in determining correctness of the map

  3. Generate a pop-up html with comprehensive information regarding the status of the corresponding reference sites

TipRelated Function

accuracy.validation_error_flag.classify_validation_points(): core function to perform correctness flagging to the classification map

accuracy.validation_error_flag.generate_popup_html(): create a pop up html for visualizing the error flag

8.2.6 Reviewing the thematic accuracy result

Luma User Journey

This is a part of User Journey 7.4: Preview thematic accuracy result

  1. The system displays the overall accuracy metrics result:

    • Overall Accuracy (OA): Percentage of correctly classified validation samples
    • Kappa Coefficient: Agreement beyond chance, accounting for class distribution
    • Confidence interval at 95%: The lower and upper bounds that describe the uncertainty range of the overall accuracy estimate at the 95% confidence level
    • Reference Data: The total number of validation samples used in the thematic accuracy assessment
  2. The system displays the accuracy metrics result on class-level:

    • Producer’s Accuracy: Measure of classification completeness (1 - omission error)
    • User’s Accuracy: Measure of classification reliability (1 - commission error)
    • F1 score: Summarizes Producer’s and User’s Accuracy metrics
    • The user can inspect where the misclassification happens using the spatial distribution of error map
  3. The system displays the confusion matrix using heatmap visualization

  4. The user is offered the option to download the overall accuracy metrics result summary or the class-level accuracy result in a .csv format.

  5. The user is provided with several option to improve the map quality or repeating the accuracy assessment. They can return to Module 6 to change classification parameters, return to Module 5 to add predictors, return to Module 3 to add more training data, and return to Module 2 to change their classification scheme. Each options is arrange according to easiest modification (module 6) to the hardest (module 2).

Luma Geospatial Engine

This sub-step does not involve any operations from Luma Geospatial Engine.