4 Sample Data Generation
Module Overview
To be developed.
Input
| Name of input | Input Type | Details |
|---|---|---|
| User’s sample data | User’s Input | |
| Default sample data | System’s Input | RESTORE+ sample data |
| LULC classification class table | Input from Other Modules | Output from Module 2 |
| AOI | Input from Other Modules | Output from Module 1 |
| Near-cloud free satellite imagery | Input from Other Modules | Output from Module 1 |
Output
- Statistical analyses of the sample data.
Process
4.1 Checking Prerequisites from Previous Modules
4.1.1 Checking Prerequisites from Previous Modules
Luma User Journey
This is a part of System Response 3.1: Checking prerequisites from previous modules
If the user has not completed Module 1 and Module 2, the system displays a warning message prompting them to finish those modules first.
ImportantError Handling NotificationThis module cannot be accessed if the system is missing the required inputs from the Input from Other Modules.
The user is prompted to choose how to add sample data, either by uploading their own dataset, using on-screen sampling, or using default sample data .
Luma Geospatial Engine
This sub-step does not involve any operations within the Luma Geospatial Engine.
4.2 Sample Data Generation
4.2.1 User Uploads Their Own Sample Data
Luma User Journey
This is a part of User Journey 3.1: Identifying sample data availability
The user is prompted to upload a
.zipfile containing their sample data.ImportantError Handling NotificationThe system provides an error notification if the
.shpfile is not found inside the.zipfile.The user is prompted to specify the column header that specify the LULC class. This is a part of User Journey 3.2a: Uploading sample data
Luma Geospatial Engine
The system verifies the format data input of the
.shpinside the.zip. This is a part of System Response 3.2.a: Verifying the uploaded sample data formatOnce the input format data is validated, the system runs Section 4.2.2
The system loads the sample data after verifying the upload
- Check the initial sample count and conduct sample filtering based on its geometry, making sure all of the sample is located inside the area of interest
- Handle a large dataset by stratified sampling so that the sample does not exceed 5000 features, and maintain class distribution representation
- Convert the shapefile into a geodataframe
- Updates the class field identifier in the training data dictionary.
TipRelated Functionsample_data.SyncTrainData.LoadTrainData():to load the sample data into the system by conducting several checks
The main issue is that the original stratified sampling algorithm made sequential .getInfo() calls for each class label when calculating class sizes and extracting features, which could cause browser crashes or exceed Earth Engine quotas when processing large datasets (5000+ features). The improved implementation consolidates all server-side computations into a single aggregated request, keeping all operations on the server side until the final .getInfo() call, thus avoiding excessive client-server communication and quota limitations.
4.2.2 Verifying Sample Data from User’s Input
Luma User Journey
The user receives continuous feedback on the status of their sample data verification. Five steps are displayed in the status panel:
- Checking the field containing class names
- Validating the class entries
- Checking whether class IDs are sufficient
- Verifying that sample points fall within the AOI
- Generating a summary of the sample data status
The user is shown a preview of the uploaded sample data on a map canvas and a preview table of the uploaded sample data.
The user is then prompted to process the sample data.
NoteSuccess NotificationThe system shows a confirmation if the sample data is successfully verified.
ImportantError Handling NotificationThe system provides an error notification if the system fails to conduct the sample data verification, along with the specific message at which step the validation failed.
The user is provided with the summary display of the sample data.
NoteSuccess NotificationA notification shows a highlight of which class has insufficient sample data, the number of sample data that falls outside the AOI, and the total number of classes that has sample data outside the AOI.
The system provides an option to move to on-screen sampling to add more sample data. This is a part of User Journey 3.2.b: Choosing to add new sample data for insufficient class
If the user is satisfied with the sample data, the user is prompted to finalize the sample data by clicking a button.
NoteSuccess NotificationThe system shows a confirmation if the sample data is successfully saved into the system.
Luma Geospatial Engine
This is a part of System Response 3.2.b: Verifying the uploaded sample data’s content and adding new sample data
The system checks which column in the sample dataset should be used as the class label.
TipRelated Functionssample_data.SyncTrainData.SetClassField():Set the class name field for sample data.The system validates class labels in the sample dataset by filtering out class values that do not exist in the classification scheme from Module 2, and updates the sample data and validation report accordingly.
TipRelated Functionssample_data.SyncTrainData.ValidClass():Validate classes in sample data.The system checks the number of sample data for each class. The minimum required sample size per class is currently hard-coded at 20.
TipRelated Functionssample_data.SyncTrainData.CheckSufficiency():Check if there are sufficient samples per class.The system filters the training samples so only those that spatially overlap the AOI remain, updates validation counts, and records warnings if AOI filtering fails.
TipRelated Functionssample_data.SyncTrainData.FilterTrainAoi():Check if there are sufficient samples per class.The system generates a summary table of the user’s uploaded sample data. The summary includes:
- LULC class ID
- LULC class name
- Number of sample data of the LULC class
- Percentage of the sample data count
- Sufficiency category of the LULC class’ sample data
TipRelated Functionssample_data.SyncTrainData.TrainDataRaw():Check if there are sufficient sample data per class.
4.2.3 On-Screen Sampling
Luma User Journey
This is a part of System Response 3.3.a: On screen sampling facility
The system shows an interactive map where user can add a sample data. If the user already uploaded their own sample data, the verified sample data will be shown on the interactive map.
NoteSuccess NotificationThe system shows a confirmation that the uploaded sample data has been loaded to the on-screen canvas, along with the number of sample data uploaded.
The system lists all LULC classes defined in Module 2. The user assign new sample points accordingly to the correct class.
The system offers an option to save the result from on-screen sampling process to their local device.
The user is prompted to finalize the on-screen sampling process by clicking a button.
NoteSuccess NotificationOnce, finalized the system shows a summary of the number of the modified sample data after on-screen process.
The system then moves to Section 4.3.
Luma Geospatial Engine
This sub-step does not involve any operations within the Luma Geospatial Engine. The on-screen sampling feature was developed using folium for the Streamlit development.
4.2.4 Using Default Sample Data
Luma User Journey
The user is automatically directed to use the default sample data if the user chose to use an existing LULC scheme in Module 2
The system displays the area of the AOI and the number of sample data loaded on a map canvas.
ImportantError Handling NotificationThe system shows an error notification if RESTORE+ data is failed to be loaded. This could be due to the AOI located outside the scope of RESTORE+ dataset, or the AOI is too big.
NoteSuccess NotificationThe system displayed a summary of the number of sample data loaded from RESTORE+ data that matches with the AOI saved in the system, and the area of AOI in km2.
Luma Geospatial Engine
- The system retrieves an existing sample data accordingly with the chosen LULC scheme. In this development phase, the RESTORE+ sample data is retrieved from Google Earth Engine asset.
4.3 Sample Data Preview
4.3.1 Sample Data Preview
Luma User Journey
This is a part of System Response 3.4: Verified sample data preview
The user is provided with the summary display of the updated sample data.
NoteSuccess NotificationA notification shows a highlight of which class has insufficient sample data, the number of sample data that falls outside the AOI, and the total number of classes that has sample data outside the AOI.
The user is prompted to confirm that they will use the sample dataset.
NoteSuccess NotificationThe system shows a confirmation that the sample data has been saved into the system.
The user is given the option to proceed to Chapter 5 .
ImportantError Handling NotificationThe system disables the option to continue to the next module if the system does not detect any sample data stored in the system.
Luma Geospatial Engine
The system runs Section 4.2.2 for the updated sample data.
The system saved the sample data as an object to be loaded in the next modules.