5 Sample Data Quality Analysis
Module Overview
This module assesses the spectral separability of LULC classes using sample data and near-cloud-free satellite imagery. Pairwise class separability is quantified using the Transformed Divergence (TD) method based on pixel-level spectral statistics, using only the spectral data from Module 1.
This module is diagnostic only and does not produce inputs for subsequent modules. Its outputs support interpretation of class confusion and provide justification for feature enhancement in later stages of the workflow.
Input
| Name of Input | Input Type | Details |
|---|---|---|
| Parameters for the separability analysis | User’s input | Contains:
|
| Sample dataset | Input from Other Modules | Output from Module 3 |
| Near-cloud free satellite imagery | Input from Other Modules | Output from Module 1 |
Output
- Separability analysis result.
- Spectral profile visualization and text description.
Process
This module cannot be accessed if the system is missing the required inputs from the Input from Other Modules.
5.1 Specifying the Separability Analysis Parameter
5.1.1 Specifying the Separability Analysis Parameter
Luma User Journey
This is a part of User Journey 4.1: Determining separability analysis parameter
- The user is prompted to fill in the parameters for separability analysis. The system provides a default value for each of the parameters as the following:
- Spatial resolution used for the separability analysis: 30 meter
- Maximum number of pixels which the separability analysis will be conducted to: 5000 for each class, to prevent crash in the analysis.
- The system shows a message showing Transformed Divergence method will be used to conduct the analysis. In this development phase, there is no option for the user to change the separability analysis method.
Luma Geospatial Engine
This sub-step does not involve any operations within the Luma Geospatial Engine.
5.2 Conducting the Separability Analysis
5.2.1 Initialize the Separability Analysis
This is a part of System Response 4.1: Separability Analysis Workflow
Luma User Journey
- The user is prompted to run the separability analysis
Luma Geospatial Engine
This sub-step does not involve any operations within the Luma Geospatial Engine.
5.2.2 Validating the sample data’s geometry
Luma User Journey
- The system shows a progress update indicating that the separability analysis is currently at Step 1 of 7.
Luma Geospatial Engine
The system validates the geometry of the sample data stored in the system from Module 3 output.
TipRelated Functioninput_utils.shapefile_validator.validate_and_fix_geometry():to validate and fix geometries issues.ImportantError Handling NotificationAn error message is displayed if the system failed to run
validate_and_fix_geometry().
5.2.3 Converting sample data into Earth Engine format
Luma User Journey
- The system shows a progress update indicating that the separability analysis is currently at Step 2 of 7
Luma Geospatial Engine
The system converts sample data from
.gdfdata into Earth Engine Feature Collection formatTipRelated Functioninput_utils.shapefile_validator.convert_roi_gdf(): to convert geodataframe into EE Feture Collection. The supported multi-geometries type for this conversions are MultiPoint and MultiPolygon.ImportantError Handling NotificationAn error message is displayed if the system failed to run
convert_roi_gdf().
5.2.4 Starting the separability analysis computation
Luma User Journey
- The system shows a progress update indicating that the separability analysis is currently at Step 3 of 7
Luma Geospatial Engine
The system initializes the Related Function’s for separability analysis with the appropriate required input objects for the
class.TipRelated Functionsample_data_quality.sample_quality(): aclassto handle all the workflows for separability analysis.
5.2.5 Counting sample data statistics
Luma User Journey
- The system shows a progress update indicating that the separability analysis is currently at Step 4 of 7.
Luma Geospatial Engine
The system retrieves the basic statistic information of the sample data.
TipRelated Functionsample_data_quality.sample_quality.sample_stats(): to retrieve the sample data’s total sample count, unique classes, class-wise sample counts, and class balance, and includes class names when provided.sample_data_quality.sample_quality.get_sample_stats_df(): to turn the statistic information into a dataframe format.
5.2.6 Extracting spectral value
Luma User Journey
- The system shows a progress update indicating that the separability analysis is currently at Step 5 of 7. The system also highlights that this step will take some time.
Luma Geospatial Engine
The system extracts the pixel-level spectral values for each sample data by sampling the composite image at the specified scale. The system also limits the maximum number of pixels per LULC class that will be extracted to avoid memory overload.
TipRelated Functionsample_data_quality.sample_quality.extract_spectral_values(): to extract and compile spectral reflectance values for all sample data. The result is a dataframe containing band values for each sample data.sample_data_quality.sample_quality.limit_samples_per_class(): limits (down-samples) the number of sample data per class whenever the number of available samples exceeds the maximum pixel threshold with random sampling.
5.2.7 Calculating the extracted spectral feature statistics
Luma User Journey
- The system shows a progress update indicating that the separability analysis is currently at Step 6 of 7.
Luma Geospatial Engine
The system computes detailed pixel-level statistics for each LULC class (mean, standard deviation, minimum, maximum, median, and sample count) based on the extracted spectral values and formats these results into a structured summary table.
TipRelated Functionsample_data_quality.sample_quality.sample_pixel_stats(): calculates per-class spectral statistics (mean, std, min, max, median, count) for all bands.sample_data_quality.sample_quality.get_sample_pixel_stats_df(): converts the computed statistics into a dataframe.
5.2.8 Calculating separability index
Luma User Journey
The system shows a progress update indicating that the separability analysis is currently at Step 7 of 7.
NoteSuccess NotificationThe system shows a confirmation that separability analysis has been successfully conducted.
Luma Geospatial Engine
The system computes the class-to-class separability using Transformed Divergence (TD).
The system generates a pairwise separability matrix and converts the results into a structured dataframe.
The system categorizes the number of good, weak, and poor class pairs as a summary for user.
TipRelated Functionsample_data_quality.sample_quality.check_class_separability(): calculates the pairwise separability values between classes using Transformed Divergence. The choice of using TD as the algorithm is still hard-coded.sample_data_quality.sample_quality.get_separability_df(): converts separability results into a dataframe with class names and interpretation levels.sample_data_quality.sample_quality.lowest_separability(): extracts the class pairs with the lowest separability scores.sample_data_quality.sample_quality.sum_separability(): produces a summary report counting how many class pairs fall under good, weak, or poor separability categories.
5.3 Visualization of Reference Data’s Spectral Profile
5.3.1 Visualization of Reference Data’s Spectral Profile
This is a part of User Journey 4.2: Analyzing separability index result and visualizing sample data
Luma User Journey
The user is shown a summary of the number and proportion of sample data for each LULC class from the output of Section 5.2.5.
NoteSuccess NotificationThe system displays a summary of the number and proportion of sample data for each LULC class in a tabular format.
The user is shown a summary of the extracted sample features for each LULC class from the output of Section 5.2.7.
NoteSuccess NotificationThe system displays a summary of the extracted sample features for each LULC class in a tabular format.
The user is presented with an overall summary of the separability analysis based on the output of Section 5.2.8.
NoteSuccess NotificationThe system displays a highlight of separability category (good, weak, or poor) for each class, and highlights which LULC classes require improvement. A dataframe containing details of the pairwise separability is also displayed.
The user is provided with interactive visualizations to explore spectral distributions and clustering. Available visualization options include histograms, boxplots, and scatter plots.
NoteSuccess NotificationThe system displays visualization options: histograms, boxplots, 3D scatter plot, and 2D scatter plots.
Luma Geospatial Engine
The system provides four different visualization options of the spectral distributions and clustering.
TipRelated Functionssample_data_quality.spectral_plotter.plot_histogram(): generates overlaid interactive histograms of selected bands showing class-wise frequency distributionssample_data_quality.spectral_plotter.plot_boxplot(): creates interactive boxplots summarizing per-class band statistics (median, quartiles, outliers) for selected bands.sample_data_quality.spectral_plotter.interactive_scatter_plot(): produces an interactive 2D scatter plot of two bands with class coloring and hover details for exploring class clustering.sample_data_quality.spectral_plotter.static_scatter_plot(): renders a static 2D scatter plot with optional confidence ellipsessample_data_quality.spectral_plotter.add_elipse(): draws a confidence ellipse axis for a class’ 2D band distribution (used bystatic_scatter_plot).sample_data_quality.spectral_plotter.scatter_plot_3d(): builds an interactive 3D scatter plot to explore three-band feature space and rotate/zoom clusters.