5  Sample Data Quality Analysis

Module Overview

This module assesses the spectral separability of LULC classes using sample data and near-cloud-free satellite imagery. Pairwise class separability is quantified using the Transformed Divergence (TD) method based on pixel-level spectral statistics, using only the spectral data from Module 1.

This module is diagnostic only and does not produce inputs for subsequent modules. Its outputs support interpretation of class confusion and provide justification for feature enhancement in later stages of the workflow.

Input

Name of Input Input Type Details
Parameters for the separability analysis User’s input

Contains:

  • Spatial resolution of the imagery from Module 1 can be adjusted for the separability analysis

  • Number of maximum pixels which the analysis will run for each pair of classes

Sample dataset Input from Other Modules Output from Module 3
Near-cloud free satellite imagery Input from Other Modules Output from Module 1

Output

  1. Separability analysis result.
  2. Spectral profile visualization and text description.

Process

ImportantError Handling Notification

This module cannot be accessed if the system is missing the required inputs from the Input from Other Modules.

5.1 Specifying the Separability Analysis Parameter

5.1.1 Specifying the Separability Analysis Parameter

Luma User Journey

This is a part of User Journey 4.1: Determining separability analysis parameter

  1. The user is prompted to fill in the parameters for separability analysis. The system provides a default value for each of the parameters as the following:
    • Spatial resolution used for the separability analysis: 30 meter
    • Maximum number of pixels which the separability analysis will be conducted to: 5000 for each class, to prevent crash in the analysis.
  2. The system shows a message showing Transformed Divergence method will be used to conduct the analysis. In this development phase, there is no option for the user to change the separability analysis method.

Luma Geospatial Engine

This sub-step does not involve any operations within the Luma Geospatial Engine.

5.2 Conducting the Separability Analysis

5.2.1 Initialize the Separability Analysis

This is a part of System Response 4.1: Separability Analysis Workflow

Luma User Journey

  1. The user is prompted to run the separability analysis

Luma Geospatial Engine

This sub-step does not involve any operations within the Luma Geospatial Engine.

5.2.2 Validating the sample data’s geometry

Luma User Journey

  1. The system shows a progress update indicating that the separability analysis is currently at Step 1 of 7.

Luma Geospatial Engine

  1. The system validates the geometry of the sample data stored in the system from Module 3 output.

    TipRelated Function

    input_utils.shapefile_validator.validate_and_fix_geometry():to validate and fix geometries issues.

    ImportantError Handling Notification

    An error message is displayed if the system failed to run validate_and_fix_geometry().

5.2.3 Converting sample data into Earth Engine format

Luma User Journey

  1. The system shows a progress update indicating that the separability analysis is currently at Step 2 of 7

Luma Geospatial Engine

  1. The system converts sample data from .gdf data into Earth Engine Feature Collection format

    TipRelated Function

    input_utils.shapefile_validator.convert_roi_gdf(): to convert geodataframe into EE Feture Collection. The supported multi-geometries type for this conversions are MultiPoint and MultiPolygon.

    ImportantError Handling Notification

    An error message is displayed if the system failed to run convert_roi_gdf().

5.2.4 Starting the separability analysis computation

Luma User Journey

  1. The system shows a progress update indicating that the separability analysis is currently at Step 3 of 7

Luma Geospatial Engine

  1. The system initializes the Related Function’s for separability analysis with the appropriate required input objects for the class.

    TipRelated Function

    sample_data_quality.sample_quality(): a class to handle all the workflows for separability analysis.

5.2.5 Counting sample data statistics

Luma User Journey

  1. The system shows a progress update indicating that the separability analysis is currently at Step 4 of 7.

Luma Geospatial Engine

  1. The system retrieves the basic statistic information of the sample data.

    TipRelated Function

    sample_data_quality.sample_quality.sample_stats() : to retrieve the sample data’s total sample count, unique classes, class-wise sample counts, and class balance, and includes class names when provided.

    sample_data_quality.sample_quality.get_sample_stats_df() : to turn the statistic information into a dataframe format.

5.2.6 Extracting spectral value

Luma User Journey

  1. The system shows a progress update indicating that the separability analysis is currently at Step 5 of 7. The system also highlights that this step will take some time.

Luma Geospatial Engine

  1. The system extracts the pixel-level spectral values for each sample data by sampling the composite image at the specified scale. The system also limits the maximum number of pixels per LULC class that will be extracted to avoid memory overload.

    TipRelated Function

    sample_data_quality.sample_quality.extract_spectral_values() : to extract and compile spectral reflectance values for all sample data. The result is a dataframe containing band values for each sample data.

    sample_data_quality.sample_quality.limit_samples_per_class() : limits (down-samples) the number of sample data per class whenever the number of available samples exceeds the maximum pixel threshold with random sampling.

5.2.7 Calculating the extracted spectral feature statistics

Luma User Journey

  1. The system shows a progress update indicating that the separability analysis is currently at Step 6 of 7.

Luma Geospatial Engine

  1. The system computes detailed pixel-level statistics for each LULC class (mean, standard deviation, minimum, maximum, median, and sample count) based on the extracted spectral values and formats these results into a structured summary table.

    TipRelated Function

    sample_data_quality.sample_quality.sample_pixel_stats() : calculates per-class spectral statistics (mean, std, min, max, median, count) for all bands.

    sample_data_quality.sample_quality.get_sample_pixel_stats_df() : converts the computed statistics into a dataframe.

5.2.8 Calculating separability index

Luma User Journey

  1. The system shows a progress update indicating that the separability analysis is currently at Step 7 of 7.

    NoteSuccess Notification

    The system shows a confirmation that separability analysis has been successfully conducted.

Luma Geospatial Engine

  1. The system computes the class-to-class separability using Transformed Divergence (TD).

  2. The system generates a pairwise separability matrix and converts the results into a structured dataframe.

  3. The system categorizes the number of good, weak, and poor class pairs as a summary for user.

    TipRelated Function

    sample_data_quality.sample_quality.check_class_separability(): calculates the pairwise separability values between classes using Transformed Divergence. The choice of using TD as the algorithm is still hard-coded.

    sample_data_quality.sample_quality.get_separability_df(): converts separability results into a dataframe with class names and interpretation levels.

    sample_data_quality.sample_quality.lowest_separability(): extracts the class pairs with the lowest separability scores.

    sample_data_quality.sample_quality.sum_separability(): produces a summary report counting how many class pairs fall under good, weak, or poor separability categories.

5.3 Visualization of Reference Data’s Spectral Profile

5.3.1 Visualization of Reference Data’s Spectral Profile

This is a part of User Journey 4.2: Analyzing separability index result and visualizing sample data

Luma User Journey

  1. The user is shown a summary of the number and proportion of sample data for each LULC class from the output of Section 5.2.5.

    NoteSuccess Notification

    The system displays a summary of the number and proportion of sample data for each LULC class in a tabular format.

  2. The user is shown a summary of the extracted sample features for each LULC class from the output of Section 5.2.7.

    NoteSuccess Notification

    The system displays a summary of the extracted sample features for each LULC class in a tabular format.

  3. The user is presented with an overall summary of the separability analysis based on the output of Section 5.2.8.

    NoteSuccess Notification

    The system displays a highlight of separability category (good, weak, or poor) for each class, and highlights which LULC classes require improvement. A dataframe containing details of the pairwise separability is also displayed.

  4. The user is provided with interactive visualizations to explore spectral distributions and clustering. Available visualization options include histograms, boxplots, and scatter plots.

    NoteSuccess Notification

    The system displays visualization options: histograms, boxplots, 3D scatter plot, and 2D scatter plots.

Luma Geospatial Engine

  1. The system provides four different visualization options of the spectral distributions and clustering.

    TipRelated Functions

    sample_data_quality.spectral_plotter.plot_histogram(): generates overlaid interactive histograms of selected bands showing class-wise frequency distributions

    sample_data_quality.spectral_plotter.plot_boxplot(): creates interactive boxplots summarizing per-class band statistics (median, quartiles, outliers) for selected bands.

    sample_data_quality.spectral_plotter.interactive_scatter_plot(): produces an interactive 2D scatter plot of two bands with class coloring and hover details for exploring class clustering.

    sample_data_quality.spectral_plotter.static_scatter_plot(): renders a static 2D scatter plot with optional confidence ellipses

    sample_data_quality.spectral_plotter.add_elipse(): draws a confidence ellipse axis for a class’ 2D band distribution (used by static_scatter_plot).

    sample_data_quality.spectral_plotter.scatter_plot_3d(): builds an interactive 3D scatter plot to explore three-band feature space and rotate/zoom clusters.