4  Sample Data Generation

Module Overview

To be developed.

Input

Name of input Input Type Details
User’s sample data User’s Input
Default sample data System’s Input RESTORE+ sample data
LULC classification class table Input from Other Modules Output from Module 2
AOI Input from Other Modules Output from Module 1
Near-cloud free satellite imagery Input from Other Modules Output from Module 1

Output

  1. Statistical analyses of the sample data.

Process

4.1 Checking Prerequisites from Previous Modules

4.1.1 Checking Prerequisites from Previous Modules

Luma User Journey

This is a part of System Response 3.1: Checking prerequisites from previous modules

  1. If the user has not completed Module 1 and Module 2, the system displays a warning message prompting them to finish those modules first.

    ImportantError Handling Notification

    This module cannot be accessed if the system is missing the required inputs from the Input from Other Modules.

  2. The user is prompted to choose how to add sample data, either by uploading their own dataset, using on-screen sampling, or using default sample data .

Luma Geospatial Engine

This sub-step does not involve any operations within the Luma Geospatial Engine.

4.2 Sample Data Generation

4.2.1 User Uploads Their Own Sample Data

Luma User Journey

This is a part of User Journey 3.1: Identifying sample data availability

  1. The user is prompted to upload a .zip file containing their sample data.

    ImportantError Handling Notification

    The system provides an error notification if the .shp file is not found inside the .zip file.

  2. The user is prompted to specify the column header that specify the LULC class. This is a part of User Journey 3.2a: Uploading sample data

Luma Geospatial Engine

  1. The system verifies the format data input of the .shp inside the .zip . This is a part of System Response 3.2.a: Verifying the uploaded sample data format

  2. Once the input format data is validated, the system runs Section 4.2.2

  3. The system loads the sample data after verifying the upload

    • Check the initial sample count and conduct sample filtering based on its geometry, making sure all of the sample is located inside the area of interest
    • Handle a large dataset by stratified sampling so that the sample does not exceed 5000 features, and maintain class distribution representation
    • Convert the shapefile into a geodataframe
    • Updates the class field identifier in the training data dictionary.
    TipRelated Function

    sample_data.SyncTrainData.LoadTrainData(): to load the sample data into the system by conducting several checks

CautionFeature note

The main issue is that the original stratified sampling algorithm made sequential .getInfo() calls for each class label when calculating class sizes and extracting features, which could cause browser crashes or exceed Earth Engine quotas when processing large datasets (5000+ features). The improved implementation consolidates all server-side computations into a single aggregated request, keeping all operations on the server side until the final .getInfo() call, thus avoiding excessive client-server communication and quota limitations.

4.2.2 Verifying Sample Data from User’s Input

Luma User Journey

  1. The user receives continuous feedback on the status of their sample data verification. Five steps are displayed in the status panel:

    1. Checking the field containing class names
    2. Validating the class entries
    3. Checking whether class IDs are sufficient
    4. Verifying that sample points fall within the AOI
    5. Generating a summary of the sample data status
  2. The user is shown a preview of the uploaded sample data on a map canvas and a preview table of the uploaded sample data.

  3. The user is then prompted to process the sample data.

    NoteSuccess Notification

    The system shows a confirmation if the sample data is successfully verified.

    ImportantError Handling Notification

    The system provides an error notification if the system fails to conduct the sample data verification, along with the specific message at which step the validation failed.

  4. The user is provided with the summary display of the sample data.

    NoteSuccess Notification

    A notification shows a highlight of which class has insufficient sample data, the number of sample data that falls outside the AOI, and the total number of classes that has sample data outside the AOI.

  5. The system provides an option to move to on-screen sampling to add more sample data. This is a part of User Journey 3.2.b: Choosing to add new sample data for insufficient class

  6. If the user is satisfied with the sample data, the user is prompted to finalize the sample data by clicking a button.

    NoteSuccess Notification

    The system shows a confirmation if the sample data is successfully saved into the system.

Luma Geospatial Engine

This is a part of System Response 3.2.b: Verifying the uploaded sample data’s content and adding new sample data

  1. The system checks which column in the sample dataset should be used as the class label.

    TipRelated Functions

    sample_data.SyncTrainData.SetClassField(): Set the class name field for sample data.

  2. The system validates class labels in the sample dataset by filtering out class values that do not exist in the classification scheme from Module 2, and updates the sample data and validation report accordingly.

    TipRelated Functions

    sample_data.SyncTrainData.ValidClass(): Validate classes in sample data.

  3. The system checks the number of sample data for each class. The minimum required sample size per class is currently hard-coded at 20.

    TipRelated Functions

    sample_data.SyncTrainData.CheckSufficiency(): Check if there are sufficient samples per class.

  4. The system filters the training samples so only those that spatially overlap the AOI remain, updates validation counts, and records warnings if AOI filtering fails.

    TipRelated Functions

    sample_data.SyncTrainData.FilterTrainAoi(): Check if there are sufficient samples per class.

  5. The system generates a summary table of the user’s uploaded sample data. The summary includes:

    • LULC class ID
    • LULC class name
    • Number of sample data of the LULC class
    • Percentage of the sample data count
    • Sufficiency category of the LULC class’ sample data
    TipRelated Functions

    sample_data.SyncTrainData.TrainDataRaw(): Check if there are sufficient sample data per class.

4.2.3 On-Screen Sampling

Luma User Journey

This is a part of System Response 3.3.a: On screen sampling facility

  1. The system shows an interactive map where user can add a sample data. If the user already uploaded their own sample data, the verified sample data will be shown on the interactive map.

    NoteSuccess Notification

    The system shows a confirmation that the uploaded sample data has been loaded to the on-screen canvas, along with the number of sample data uploaded.

  2. The system lists all LULC classes defined in Module 2. The user assign new sample points accordingly to the correct class.

  3. The system offers an option to save the result from on-screen sampling process to their local device.

  4. The user is prompted to finalize the on-screen sampling process by clicking a button.

    NoteSuccess Notification

    Once, finalized the system shows a summary of the number of the modified sample data after on-screen process.

  5. The system then moves to Section 4.3.

Luma Geospatial Engine

This sub-step does not involve any operations within the Luma Geospatial Engine. The on-screen sampling feature was developed using folium for the Streamlit development.

4.2.4 Using Default Sample Data

Luma User Journey

  1. The user is automatically directed to use the default sample data if the user chose to use an existing LULC scheme in Module 2

  2. The system displays the area of the AOI and the number of sample data loaded on a map canvas.

    ImportantError Handling Notification

    The system shows an error notification if RESTORE+ data is failed to be loaded. This could be due to the AOI located outside the scope of RESTORE+ dataset, or the AOI is too big.

    NoteSuccess Notification

    The system displayed a summary of the number of sample data loaded from RESTORE+ data that matches with the AOI saved in the system, and the area of AOI in km2.

Luma Geospatial Engine

  1. The system retrieves an existing sample data accordingly with the chosen LULC scheme. In this development phase, the RESTORE+ sample data is retrieved from Google Earth Engine asset.

4.3 Sample Data Preview

4.3.1 Sample Data Preview

Luma User Journey

This is a part of System Response 3.4: Verified sample data preview

  1. The user is provided with the summary display of the updated sample data.

    NoteSuccess Notification

    A notification shows a highlight of which class has insufficient sample data, the number of sample data that falls outside the AOI, and the total number of classes that has sample data outside the AOI.

  2. The user is prompted to confirm that they will use the sample dataset.

    NoteSuccess Notification

    The system shows a confirmation that the sample data has been saved into the system.

  3. The user is given the option to proceed to Chapter 5 .

    ImportantError Handling Notification

    The system disables the option to continue to the next module if the system does not detect any sample data stored in the system.

Luma Geospatial Engine

  1. The system runs Section 4.2.2 for the updated sample data.

  2. The system saved the sample data as an object to be loaded in the next modules.