2 Acquisition of Near-Cloud-Free Satellite Imagery
Module Overview
The output of this module is near–cloud-free satellite imagery that corresponds to the user-defined area of interest, which is essential for generating accurate and reliable land use and land cover (LULC) datasets. This module emphasizes the generation of pre-processed and corrected satellite imagery, incorporating procedures such as cloud masking and shadow removal.
Input
| Name of Input | Input Type | Details |
|---|---|---|
| Area of Interest | User’s input | |
| Target map date | User’s input | YYYY or DD-MM-YYYY |
| Desired spatial resolution | User’s input | in meters |
| Maximum allowable cloud cover | User’s input | in percentage |
| Raw satellite imagery | System’s input | |
| Area of Interest from administrative boundaries | System’s input |
Output
- Selected AOI.
- Near-cloud-free satellite imagery.
Process
2.1 Selection of AOI
2.1.1 AOI from Shapefile
Luma User Journey
This is a part of User Journey 1.1: Prerequisite Check
The user can upload their own AOI file in
.zipformat that includes all required shapefile components (.shp,.shx,.dbf,.prj). The system emphasizes on all required shapefile components that needs to be put in the.zipfile. The system also supports uploads of.kmlor.kmzfiles.The system validates whether the
.zipfile includes all the required shapefile components.
The system provides an error notification if the .shp file is not found inside the .zip file.
- The AOI file is saved in a temporary directory, allowing users to go back-and-forth in between modules without having to re-uploading the AOI file.
Luma Geospatial Engine
This sub-step does not involve any operations within the Luma Geospatial Engine.
2.1.2 AOI from Indonesia Administrative Boundaries
In addition to user define upload the system supports the selection of AOI from Indonesia administrative boundaries at regency level.
Luma User Journey
The user selects an AOI based on a drop down list of Indonesian regency administrative boundaries
The system provides a preview of the selected boundary
Luma Geospatial Engine
- Load
ee.FeatureCollectioncontaining Indonesian administrative boundaries from Badan Informasi Geospasial. The data should be uploaded as GEE asset and the link for it must be provided and shared publicly - The system extracts the regency name and provide complete list of the available regencies for the user
- The system convert the
ee.FeatureCollectiontogeodataframefor visualization, while using the unconverted one as an AOI for the imagery search
input_utils.load_regency_asset: Fetching GEE asset containing Indonesia administrative boundaries
input_utils.get_regency_name: Creating a list of regency name for the user
input_utils.get_regency_geometry: Get geometry for the selected regency
input_utils.convert_gee_features_to_gdf: convert ee.FeatureCollection to geodataframe for visualization in the system
2.2 Correction of Shapefile Geometries
2.2.1 Correction of Shapefile Geometries
Luma User Journey
This is a part of User Journey 1.1: Prerequisite Check
- After the user uploads the AOI, they receive a continuous feedback on each the validation process.
The system shows a confirmation when the AOI is loaded successfully.
The system provides an error notification if shapefile validation fails.
- Once the system succeeded to validate the shapefile input, the system displays the AOI on a map canvas.
Luma Geospatial Engine
This is a part of System Response 1.1: Area of Interest Definition
The system performs a multi-step geometry sanitization process before converting any shapefile or GeoDataFrame into Earth Engine objects. These input validation processes for shapefiles are designed mainly for Streamlit compatibility. Each correction is handled by exactly one dedicated function:
- The Coordinate Reference System (CRS) is automatically corrected and reprojected to WGS 1984 (EPSG:4326).
TipRelated Functionsinput_utils.shapefile_validator.fix_crs():to resolve coordinate reference system issues.- Invalid, empty, or topologically broken geometries are detected and repaired.
TipRelated Functionsinput_utils.shapefile_validator.clean_geometries():to repair invalid geometries.- An initial structural validation and basic geometry fixing.
TipRelated Functionsinput_utils.shapefile_validator.validate_and_fix_geometry():to validate and fix geometries issues.- Polygon-specific validation, including removal of invalid coordinates and simplification of overly complex shapes.
TipRelated Functionsinput_utils.shapefile_validator.validate_polygons():to checks point geometries, remove invalid coordinates, and simplify polygon geometries.- Every individual coordinate is checked to ensure it falls within valid global latitude/longitude ranges.
TipRelated Functionsinput_utils.shapefile_validator.is_valid_coordinate():to checks point geometries, remove invalid coordinates, and simplify polygon geometries.- The number of vertices in each geometry is counted and logged (for diagnostic and simplification decisions).
TipRelated Functionsinput_utils.shapefile_validator.count_vertices():to checks point geometries, remove invalid coordinates, and simplify polygon geometries.- Finally, a comprehensive pre-upload validation is executed to guarantee that every geometry is valid, non-empty, correctly typed, and fully compatible with Earth Engine.
TipRelated Functionsinput_utils.shapefile_validator.final_validation():to checks point geometries, remove invalid coordinates, and simplify polygon geometries.After the system succeeded to validate the shapefile, the system converted the GeoDataFrame file format into Earth Engine geometry format.
input_utils.EE_converter.convert_aoi_gdf(): to conduct the conversion of .gdf input file into ee.geometry . Three different conversion approaches was defined in this function:
Primary method: using
geemap.gdf_to_ee()If primary method fails, proceed to secondary method: manual GeoJSON conversion by union multiple geometries first (
gpd.GeoDataFrame([{'geometry': union_geom}], crs=gdf.crs)) and convert it to GeoJSON (gdf.to_json()). Only polygon and multipolygon geometry type is supported in this method.If both primary and secondary method fails, proceed to the last option: bounding box conversion to estimate the rectangular approximation of the AOI (
ee.Geometry.Rectangle([bounds[0], bounds[1], bounds[2], bounds[3]])).
2.3 Selection and Preparation of Satellite Imagery
2.3.1 Satellite Data Acquisition and Filtering
Luma User Journey
This is a part of User Journey 1.2: Determining Satellite Imageries Parameters
The user is prompted to fill in these parameters for the satellite imagery:
Target map date: Choose the range of period of satellite imagery in a single year format (YYYY) or full-date format (YYYY-MM-DD). If YYYY is used as the input format, the year option starts from 1972 until present year. The year 2020 is shown as the default option.
Spatial resolution: Specify the desired spatial resolution. During Phase 1, only Landsat data will be utilized; therefore, the spatial resolution will be set on default to 30 meters.
Sensor type: a list of sensors available based on the selected target map date input parameter. Landsat 8 OLI is shown to be the default option. A warning message will show about the limited availability of Landsat 1-3 data if the user uses temporal scope input between 1972-1984.
ImportantError Handling NotificationSince only Landsat data is utilized, the system provides an error notification if no Landsat sensor is available in the selected time range.
Maximum cloud cover: Set a preferred cloud cover threshold, or use the default value of 30%.
The system provides information on satellite image availability shows the number of total satellite images found based on the user parameter input. The information includes total images found, date range of images, WRS tiles, path row tiles, scene IDs, image acquisition dates, average scene cloud cover, date range, and cloud cover range.
NoteSuccess NotificationThe system shows the total number of satellite images found.
ImportantError Handling NotificationThe system provides an error notification if no satellite images were found based on the user’s parameter. The user is prompted to change the cloud cover threshold or change the target map date.
Luma Geospatial Engine
This is a part of System Response 1.2: Search and Filter Imagery
The system specifies Landsat data collections for optical processing.
TipRelated Functionsdata_acquisition.Reflectance_Data.OPTICAL_DATASETSdefines the properties of Landsat Surface Reflectance (SR) collections (Collection 2, Level-2) to be fetched for optical data processing. In this list, the metadata for reporting the data retrieval is also define. More satellite sensor can be define heredata_acquisition.Reflectance_Data.THERMAL_DATASETSDefine the properties of Landsat Top-of-Atmosphere (TOA) thermal bands. Since some landsat mission have different band names designation, the thermal band is refer in the listThe system currently supports Landsat 1-3 RAW collection and and Landsat 4-9 surface reflectance (SR) collection. Additionally, Landsat 4 - 9 Top-of-Atmosphere (TOA) is used for retrieving thermal band as additional input band for the classification. This consideration stems from Landsat SR collection thermal band quality which often resulted in no data and prone to over-correction in high contrast land cover feature. Landsat mission availability is as follows:
- Landsat 1 Multispectral Scanner/MSS (1972 - 1978)
- Landsat 2 Multispectral Scanner/MSS (1978 - 1982)
- Landsat 3 Multispectral Scanner/MSS (1978 - 1983)
- Landsat 4 Thematic Mapper/TM (1982 - 1993)
- Landsat 5 Thematic Mapper/TM (1984 - 2012)
- Landsat 7 Enhanced Thematic Mapper Plus/ETM+ (1999 - 2021)
- Landsat 8 Operational Land Imager/OLI (2013 - present)
- Landsat 9 Operational Land Imager-2/OLI-2 (2021 - present)
The system retrieves images that match user’s selected criteria, including spatial resolution, year, and cloud cover based on the following input parameters: AOI, target map date, and cloud cover threshold.
TipRelated Functionsdata_acqusition.parse_date_input():Parse date input to YYYY-MM-DD format. Accepts either a year (int or 4-digit string) or a full date string. For year inputs, returns Jan 1 for start dates or Dec 31 for end dates. This function is used byget_optical_dataandget_thermal_bandsdata_acquisition.Reflactance_Data.get_optical_data():to retrieve and the Landsat image collection based on the input parameters. This function uses themask_landsat_sr,apply_scale_factors, andrename_landsat_bandsto return a masked, scaled, and renamed image collection.The system performs cloud masking using the quality assurance band (QA_PIXEL) QA_PIXEL band that encodes pixel conditions as bitwise flags using specific bit flags. Deterministic bit tests are applied to identify and exclude pixels affected by cirrus, clouds and cloud shadows. The bit value is adjusted based on the Landsat sensor selected by the user. Landsat 8-9 QA_PIXEL stored per-pixel confidence levels for clouds cirrus, and clouds shadow. These confidence values (ranging from 0–3) are extracted and filtered using hard-coded thresholds, such that pixels with medium to high confidence are masked out. Other Landsat sensor did not have confidence bit, therefore, only deterministic bit are used for cloud masking
TipRelated Functionsdata_acquisition.Reflactance_Data.mask_landsat_sr(): Mask clouds, shadows and cirrus for Landsat Collection 2 SR using QA_PIXEL band with confidence thresholdThe system applies scaling to the optical reflectance bands to convert the raw digital numbers (DNs) into physically meaningful surface reflectance values. A scale factor 0.0000275 and offset -0.2 are applied, as defined in the Landsat Collection 2 Surface Reflectance metadata. This conversion standardize the reflectance values to a range of approximately 0 to 1, enabling accurate comparison and analysis across different scenes and sensors.
TipRelated Functionsdata_acquisition.Reflactance_Data.apply_scale_factors(): to apply Landsat collection 2 scaling factors using the following formula: Digital Number (DN) * scale_factor + offset. The scale factor will only be applied to optical bands.The system renames the bands accordingly with the sensor type input.
TipRelated Functionsdata_acquisition.Reflactance_Data.rename_landsat_bands(): to standardize Landsat Surface Reflectance (SR) band names based on sensor type. From ‘SR_B’ or ’B’ to ‘NIR’, ‘GREEN’, etc.
The system then use the same parameter as multispectral data and use it to search collection 2 TOA data, except for Landsat 1-3 MSS. The TOA data thermal band (namely, band 10) is then stacked with multispectral band from SR data. Since band 11, contain larger calibration uncertainty, only band 10 is used. No additional pre-processing is conducted for thermal band since digital number value alone is considered satisfactory for discriminating land cover features.
TipRelated Functionsdata_acquisition.Reflectance_Data.has_thermal_capability():Function to check if the landsat mission selected contain thermal bandsdata_acquisition.Reflactance_Data.get_thermal_bands():to retrieve thermal band from landsat collection 2 TOA dataThe system evaluates whether the retrieved imagery meets the required spatial resolution, input year, and cloud cover thresholds. If suitable images are found, the system generates a composite by calculating the median value across the selected imagery, as implemented using the
median()function from Earth Engine. The system also then clips and mosaics the satellite imagery.
2.4 Mosaic and Composite for Satellite Imagery
Luma Geospatial Engine
This is a part of System Response 1.2: Search and Filter Imagery
The final image (single layer) which will be presented to the user is created using final_image methods, which resulted in mosaic imagery or temporal aggregation composite imagery.
Mosaicking
Mosaicking approach simply stacks all images in the collection and, for each pixel, selects the value from the topmost image according to the collection’s ordering. The system implement quality mosaic, which allows the use of quality band that affected the order of mosaic
TipRelated Functionsdata_acquisition.final_Image.get_quality_mosaiccreate a quality mosaic image based on quality band (source band).add_quality_bandcompute or select the band that will be used for creating the mosaic. Three quality band is supported, namely NDVI and NIR. The mosaic is created based on the highest pixel value of the corresponding bandTemporal Aggregation
Temporal aggregation used statistical reducers, (such as median) to combine pixel values from an image collection over time. A median composite often produces a noticeably cleaner image because cloud and other high-reflectance features tend to occupy the upper tail of the distribution. This approach, naturally excludes the feature, resulting in fewer artifacts and a more consistent surface reflectance. However, if at certain location there’s no valid pixel across the image collection, the location will resulted in no-data (blank region). Temporal aggregation function is paired with
calculate_coverageto provide report of the valid pixels in an AOI.TipRelated Functionsdata_acquisition.final_Image.get_temporal_composite:Create a temporal aggregation composite using earth engine reducers. Supported reducers aremedian,mean,min,max, andpercentileValid Pixel Reporting
To provide comprehensive report on the final image creation, system provide the data range which composite is created. Additionally, valid pixel and no data is estimated and reported to the user.
TipRelated Functionscalculate_coverageestimate, the valid pixels (unmasked by cloud masking) within AOI. This quantified the ‘no data’ or blank blocks on the final image. This served as alternative approach in reporting the cloud cover within AOI, since in theory, the cloud cover within AOI is already masked out during cloud masking procedure. This function is used within theget_temporal_compositeand reported in the final image creation.The
calculate_coverageserves as an alternative to calculate cloud cover within AOI. Since cloud cover should be masked out during cloud masking and remove during temporal aggregation
2.5 Visualization and Saving of Processed Satellite Imagery
2.5.1 Visualization and Saving Processed Satellite Imagery
Luma User Journey
This is a part of User Journey 1.3: Download Satellite Imagery
The system visualize the satellite imagery on a canvas map using true color composite (RED, GREEN, BLUE bands) as default visualization. The map is centralized on the AOI.
The user is able to change the imagery composite using several commonly used composites or manually define their own band composite
The user can also change visualization parameters to adjust the imagery brightness and contrast
The user can download the satellite imagery through two options:
- Direct download: Downloads the image locally in the user’s device. This option is only available if the satellite image is less than 32 MB.
NoteSuccess NotificationThe system shows a confirmation if the URL for direct download of satellite imagery is successfully made.
ImportantError Handling NotificationThe system provides an error notification if the system fails to create the URL for direct download and suggests the user to use Google Cloud Storage instead or use smaller AOI area.
- Export to Google Cloud Storage: Downloads the image to Google Cloud Storage.
NoteSuccess NotificationThe system shows a confirmation if the satellite image is successfully saved to Google Cloud Storage.
ImportantError Handling NotificationThe system provides an error notification if the system fails to export the image to Google Cloud Storage.
The user is given the option to continue to Module 2
ImportantError Handling NotificationThe system disables the option to continue to the next module if the system does not detect a satellite imagery saved inside the system.
Luma Geospatial Engine
This is a part of System Response 1.3: Download Satellite Imagery and Continue to Next Module
The system collects the statistics of the satellite images found.
TipRelated Functionsdata_acquisition.Reflactance_Stats.get_collection_statistics():to get comprehensive statistics about the gathered image collection.The system provide a preset of commonly used band combinations as well as optimal value range for the imagery. The commonly used composites define in Luma-GE are:
True Color Composites (RED, GREEN, BLUE)
False Color Infrared Composites (NIR, RED, GREEN)
Short-wave Infrared Composites (SWIR2, NIR, RED)
Land/Water (NIR, SWIR1, RED)
TipRelated Functiondata_acquisition.Vis_Params.BAND_COMBINATIONS:The list of preset band combinations as well as their default valuedata_acquisition.Vis_Params.get_combinations_names():function to fetch band combinations names from the presetdata_acquisition.Vis_Params.get_vis_params(): function to retrieve band combinations and visualization parameters from the define listThe system provide tools for the user to build their own band combinations and modified the imagery minimum/maximum value as well as the gamma value, improving the visualization of the data
TipRelated Functiondata_acquisition.build_custom_vis_param(): function to allow the user to build their own band combinationsFor details on exporting to Google Cloud Storage, refer to Earth Engine Initialization and Authentication page.
The system saved the satellite imagery as an object to be loaded in the next modules.