2  Acquisition of Near-Cloud-Free Satellite Imagery

Module Overview

The output of this module is near–cloud-free satellite imagery that corresponds to the user-defined area of interest, which is essential for generating accurate and reliable land use and land cover (LULC) datasets. This module emphasizes the generation of pre-processed and corrected satellite imagery, incorporating procedures such as cloud masking and shadow removal.

Input

Name of Input Input Type Details
Area of Interest User’s input
Target map date User’s input YYYY or DD-MM-YYYY
Desired spatial resolution User’s input in meters
Maximum allowable cloud cover User’s input in percentage
Raw satellite imagery System’s input
Area of Interest from administrative boundaries System’s input

Output

  1. Selected AOI.
  2. Near-cloud-free satellite imagery.

Process

2.1 Selection of AOI

2.1.1 AOI from Shapefile

Luma User Journey

This is a part of User Journey 1.1: Prerequisite Check

  1. The user can upload their own AOI file in .zip format that includes all required shapefile components (.shp, .shx, .dbf, .prj). The system emphasizes on all required shapefile components that needs to be put in the .zip file. The system also supports uploads of .kml or .kmz files.

  2. The system validates whether the .zip file includes all the required shapefile components.

ImportantError Handling Notification

The system provides an error notification if the .shp file is not found inside the .zip file.

  1. The AOI file is saved in a temporary directory, allowing users to go back-and-forth in between modules without having to re-uploading the AOI file.

Luma Geospatial Engine

This sub-step does not involve any operations within the Luma Geospatial Engine.

2.1.2 AOI from Indonesia Administrative Boundaries

In addition to user define upload the system supports the selection of AOI from Indonesia administrative boundaries at regency level.

Luma User Journey

  1. The user selects an AOI based on a drop down list of Indonesian regency administrative boundaries

  2. The system provides a preview of the selected boundary

Luma Geospatial Engine

  1. Load ee.FeatureCollection containing Indonesian administrative boundaries from Badan Informasi Geospasial. The data should be uploaded as GEE asset and the link for it must be provided and shared publicly
  2. The system extracts the regency name and provide complete list of the available regencies for the user
  3. The system convert the ee.FeatureCollection to geodataframe for visualization, while using the unconverted one as an AOI for the imagery search
TipRelated Function

input_utils.load_regency_asset: Fetching GEE asset containing Indonesia administrative boundaries

input_utils.get_regency_name: Creating a list of regency name for the user

input_utils.get_regency_geometry: Get geometry for the selected regency

input_utils.convert_gee_features_to_gdf: convert ee.FeatureCollection to geodataframe for visualization in the system

2.2 Correction of Shapefile Geometries

2.2.1 Correction of Shapefile Geometries

Luma User Journey

This is a part of User Journey 1.1: Prerequisite Check

  1. After the user uploads the AOI, they receive a continuous feedback on each the validation process.
NoteSuccess Notification

The system shows a confirmation when the AOI is loaded successfully.

ImportantError Handling Notification

The system provides an error notification if shapefile validation fails.

  1. Once the system succeeded to validate the shapefile input, the system displays the AOI on a map canvas.

Luma Geospatial Engine

This is a part of System Response 1.1: Area of Interest Definition

  1. The system performs a multi-step geometry sanitization process before converting any shapefile or GeoDataFrame into Earth Engine objects. These input validation processes for shapefiles are designed mainly for Streamlit compatibility. Each correction is handled by exactly one dedicated function:

    • The Coordinate Reference System (CRS) is automatically corrected and reprojected to WGS 1984 (EPSG:4326).
    TipRelated Functions

    input_utils.shapefile_validator.fix_crs(): to resolve coordinate reference system issues.

    • Invalid, empty, or topologically broken geometries are detected and repaired.
    TipRelated Functions

    input_utils.shapefile_validator.clean_geometries(): to repair invalid geometries.

    • An initial structural validation and basic geometry fixing.
    TipRelated Functions

    input_utils.shapefile_validator.validate_and_fix_geometry():to validate and fix geometries issues.

    • Polygon-specific validation, including removal of invalid coordinates and simplification of overly complex shapes.
    TipRelated Functions

    input_utils.shapefile_validator.validate_polygons(): to checks point geometries, remove invalid coordinates, and simplify polygon geometries.

    • Every individual coordinate is checked to ensure it falls within valid global latitude/longitude ranges.
    TipRelated Functions

    input_utils.shapefile_validator.is_valid_coordinate(): to checks point geometries, remove invalid coordinates, and simplify polygon geometries.

    • The number of vertices in each geometry is counted and logged (for diagnostic and simplification decisions).
    TipRelated Functions

    input_utils.shapefile_validator.count_vertices(): to checks point geometries, remove invalid coordinates, and simplify polygon geometries.

    • Finally, a comprehensive pre-upload validation is executed to guarantee that every geometry is valid, non-empty, correctly typed, and fully compatible with Earth Engine.
    TipRelated Functions

    input_utils.shapefile_validator.final_validation(): to checks point geometries, remove invalid coordinates, and simplify polygon geometries.

  2. After the system succeeded to validate the shapefile, the system converted the GeoDataFrame file format into Earth Engine geometry format.

TipRelated Functions

input_utils.EE_converter.convert_aoi_gdf(): to conduct the conversion of .gdf input file into ee.geometry . Three different conversion approaches was defined in this function:

  • Primary method: using geemap.gdf_to_ee()

  • If primary method fails, proceed to secondary method: manual GeoJSON conversion by union multiple geometries first (gpd.GeoDataFrame([{'geometry': union_geom}], crs=gdf.crs)) and convert it to GeoJSON (gdf.to_json()). Only polygon and multipolygon geometry type is supported in this method.

  • If both primary and secondary method fails, proceed to the last option: bounding box conversion to estimate the rectangular approximation of the AOI (ee.Geometry.Rectangle([bounds[0], bounds[1], bounds[2], bounds[3]])).

2.3 Selection and Preparation of Satellite Imagery

2.3.1 Satellite Data Acquisition and Filtering

Luma User Journey

This is a part of User Journey 1.2: Determining Satellite Imageries Parameters

  1. The user is prompted to fill in these parameters for the satellite imagery:

    • Target map date: Choose the range of period of satellite imagery in a single year format (YYYY) or full-date format (YYYY-MM-DD). If YYYY is used as the input format, the year option starts from 1972 until present year. The year 2020 is shown as the default option.

    • Spatial resolution: Specify the desired spatial resolution. During Phase 1, only Landsat data will be utilized; therefore, the spatial resolution will be set on default to 30 meters.

    • Sensor type: a list of sensors available based on the selected target map date input parameter. Landsat 8 OLI is shown to be the default option. A warning message will show about the limited availability of Landsat 1-3 data if the user uses temporal scope input between 1972-1984.

      ImportantError Handling Notification

      Since only Landsat data is utilized, the system provides an error notification if no Landsat sensor is available in the selected time range.

    • Maximum cloud cover: Set a preferred cloud cover threshold, or use the default value of 30%.

  2. The system provides information on satellite image availability shows the number of total satellite images found based on the user parameter input. The information includes total images found, date range of images, WRS tiles, path row tiles, scene IDs, image acquisition dates, average scene cloud cover, date range, and cloud cover range.

    NoteSuccess Notification

    The system shows the total number of satellite images found.

    ImportantError Handling Notification

    The system provides an error notification if no satellite images were found based on the user’s parameter. The user is prompted to change the cloud cover threshold or change the target map date.

Luma Geospatial Engine

This is a part of System Response 1.2: Search and Filter Imagery

  1. The system specifies Landsat data collections for optical processing.

    TipRelated Functions

    data_acquisition.Reflectance_Data.OPTICAL_DATASETS defines the properties of Landsat Surface Reflectance (SR) collections (Collection 2, Level-2) to be fetched for optical data processing. In this list, the metadata for reporting the data retrieval is also define. More satellite sensor can be define here

    data_acquisition.Reflectance_Data.THERMAL_DATASETS Define the properties of Landsat Top-of-Atmosphere (TOA) thermal bands. Since some landsat mission have different band names designation, the thermal band is refer in the list

    The system currently supports Landsat 1-3 RAW collection and and Landsat 4-9 surface reflectance (SR) collection. Additionally, Landsat 4 - 9 Top-of-Atmosphere (TOA) is used for retrieving thermal band as additional input band for the classification. This consideration stems from Landsat SR collection thermal band quality which often resulted in no data and prone to over-correction in high contrast land cover feature. Landsat mission availability is as follows:

    • Landsat 1 Multispectral Scanner/MSS (1972 - 1978)
    • Landsat 2 Multispectral Scanner/MSS (1978 - 1982)
    • Landsat 3 Multispectral Scanner/MSS (1978 - 1983)
    • Landsat 4 Thematic Mapper/TM (1982 - 1993)
    • Landsat 5 Thematic Mapper/TM (1984 - 2012)
    • Landsat 7 Enhanced Thematic Mapper Plus/ETM+ (1999 - 2021)
    • Landsat 8 Operational Land Imager/OLI (2013 - present)
    • Landsat 9 Operational Land Imager-2/OLI-2 (2021 - present)
  2. The system retrieves images that match user’s selected criteria, including spatial resolution, year, and cloud cover based on the following input parameters: AOI, target map date, and cloud cover threshold.

    TipRelated Functions

    data_acqusition.parse_date_input(): Parse date input to YYYY-MM-DD format. Accepts either a year (int or 4-digit string) or a full date string. For year inputs, returns Jan 1 for start dates or Dec 31 for end dates. This function is used by get_optical_data and get_thermal_bands

    data_acquisition.Reflactance_Data.get_optical_data(): to retrieve and the Landsat image collection based on the input parameters. This function uses the mask_landsat_sr, apply_scale_factors, and rename_landsat_bands to return a masked, scaled, and renamed image collection.

    • The system performs cloud masking using the quality assurance band (QA_PIXEL) QA_PIXEL band that encodes pixel conditions as bitwise flags using specific bit flags. Deterministic bit tests are applied to identify and exclude pixels affected by cirrus, clouds and cloud shadows. The bit value is adjusted based on the Landsat sensor selected by the user. Landsat 8-9 QA_PIXEL stored per-pixel confidence levels for clouds cirrus, and clouds shadow. These confidence values (ranging from 0–3) are extracted and filtered using hard-coded thresholds, such that pixels with medium to high confidence are masked out. Other Landsat sensor did not have confidence bit, therefore, only deterministic bit are used for cloud masking

      TipRelated Functions

      data_acquisition.Reflactance_Data.mask_landsat_sr(): Mask clouds, shadows and cirrus for Landsat Collection 2 SR using QA_PIXEL band with confidence threshold

    • The system applies scaling to the optical reflectance bands to convert the raw digital numbers (DNs) into physically meaningful surface reflectance values. A scale factor 0.0000275 and offset -0.2 are applied, as defined in the Landsat Collection 2 Surface Reflectance metadata. This conversion standardize the reflectance values to a range of approximately 0 to 1, enabling accurate comparison and analysis across different scenes and sensors.

      TipRelated Functions

      data_acquisition.Reflactance_Data.apply_scale_factors(): to apply Landsat collection 2 scaling factors using the following formula: Digital Number (DN) * scale_factor + offset. The scale factor will only be applied to optical bands.

    • The system renames the bands accordingly with the sensor type input.

      TipRelated Functions

      data_acquisition.Reflactance_Data.rename_landsat_bands(): to standardize Landsat Surface Reflectance (SR) band names based on sensor type. From ‘SR_B’ or ’B’ to ‘NIR’, ‘GREEN’, etc.

  3. The system then use the same parameter as multispectral data and use it to search collection 2 TOA data, except for Landsat 1-3 MSS. The TOA data thermal band (namely, band 10) is then stacked with multispectral band from SR data. Since band 11, contain larger calibration uncertainty, only band 10 is used. No additional pre-processing is conducted for thermal band since digital number value alone is considered satisfactory for discriminating land cover features.

    TipRelated Functions

    data_acquisition.Reflectance_Data.has_thermal_capability(): Function to check if the landsat mission selected contain thermal bands

    data_acquisition.Reflactance_Data.get_thermal_bands(): to retrieve thermal band from landsat collection 2 TOA data

  4. The system evaluates whether the retrieved imagery meets the required spatial resolution, input year, and cloud cover thresholds. If suitable images are found, the system generates a composite by calculating the median value across the selected imagery, as implemented using the median() function from Earth Engine. The system also then clips and mosaics the satellite imagery.

2.4 Mosaic and Composite for Satellite Imagery

Luma Geospatial Engine

This is a part of System Response 1.2: Search and Filter Imagery

The final image (single layer) which will be presented to the user is created using final_image methods, which resulted in mosaic imagery or temporal aggregation composite imagery.

  1. Mosaicking

    Mosaicking approach simply stacks all images in the collection and, for each pixel, selects the value from the topmost image according to the collection’s ordering. The system implement quality mosaic, which allows the use of quality band that affected the order of mosaic

    TipRelated Functions

    data_acquisition.final_Image.get_quality_mosaic create a quality mosaic image based on quality band (source band). add_quality_band compute or select the band that will be used for creating the mosaic. Three quality band is supported, namely NDVI and NIR. The mosaic is created based on the highest pixel value of the corresponding band

  2. Temporal Aggregation

    Temporal aggregation used statistical reducers, (such as median) to combine pixel values from an image collection over time. A median composite often produces a noticeably cleaner image because cloud and other high-reflectance features tend to occupy the upper tail of the distribution. This approach, naturally excludes the feature, resulting in fewer artifacts and a more consistent surface reflectance. However, if at certain location there’s no valid pixel across the image collection, the location will resulted in no-data (blank region). Temporal aggregation function is paired with calculate_coverage to provide report of the valid pixels in an AOI.

    TipRelated Functions

    data_acquisition.final_Image.get_temporal_composite: Create a temporal aggregation composite using earth engine reducers. Supported reducers are median, mean, min, max, and percentile

  3. Valid Pixel Reporting

    To provide comprehensive report on the final image creation, system provide the data range which composite is created. Additionally, valid pixel and no data is estimated and reported to the user.

    TipRelated Functions

    calculate_coverage estimate, the valid pixels (unmasked by cloud masking) within AOI. This quantified the ‘no data’ or blank blocks on the final image. This served as alternative approach in reporting the cloud cover within AOI, since in theory, the cloud cover within AOI is already masked out during cloud masking procedure. This function is used within the get_temporal_composite and reported in the final image creation.

    The calculate_coverage serves as an alternative to calculate cloud cover within AOI. Since cloud cover should be masked out during cloud masking and remove during temporal aggregation

2.5 Visualization and Saving of Processed Satellite Imagery

2.5.1 Visualization and Saving Processed Satellite Imagery

Luma User Journey

This is a part of User Journey 1.3: Download Satellite Imagery

  1. The system visualize the satellite imagery on a canvas map using true color composite (RED, GREEN, BLUE bands) as default visualization. The map is centralized on the AOI.

  2. The user is able to change the imagery composite using several commonly used composites or manually define their own band composite

  3. The user can also change visualization parameters to adjust the imagery brightness and contrast

  4. The user can download the satellite imagery through two options:

    1. Direct download: Downloads the image locally in the user’s device. This option is only available if the satellite image is less than 32 MB.
    NoteSuccess Notification

    The system shows a confirmation if the URL for direct download of satellite imagery is successfully made.

    ImportantError Handling Notification

    The system provides an error notification if the system fails to create the URL for direct download and suggests the user to use Google Cloud Storage instead or use smaller AOI area.

    1. Export to Google Cloud Storage: Downloads the image to Google Cloud Storage.
    NoteSuccess Notification

    The system shows a confirmation if the satellite image is successfully saved to Google Cloud Storage.

    ImportantError Handling Notification

    The system provides an error notification if the system fails to export the image to Google Cloud Storage.

  5. The user is given the option to continue to Module 2

    ImportantError Handling Notification

    The system disables the option to continue to the next module if the system does not detect a satellite imagery saved inside the system.

Luma Geospatial Engine

This is a part of System Response 1.3: Download Satellite Imagery and Continue to Next Module

  1. The system collects the statistics of the satellite images found.

    TipRelated Functions

    data_acquisition.Reflactance_Stats.get_collection_statistics(): to get comprehensive statistics about the gathered image collection.

  2. The system provide a preset of commonly used band combinations as well as optimal value range for the imagery. The commonly used composites define in Luma-GE are:

    1. True Color Composites (RED, GREEN, BLUE)

    2. False Color Infrared Composites (NIR, RED, GREEN)

    3. Short-wave Infrared Composites (SWIR2, NIR, RED)

    4. Land/Water (NIR, SWIR1, RED)

    TipRelated Function

    data_acquisition.Vis_Params.BAND_COMBINATIONS: The list of preset band combinations as well as their default value

    data_acquisition.Vis_Params.get_combinations_names(): function to fetch band combinations names from the preset

    data_acquisition.Vis_Params.get_vis_params(): function to retrieve band combinations and visualization parameters from the define list

  3. The system provide tools for the user to build their own band combinations and modified the imagery minimum/maximum value as well as the gamma value, improving the visualization of the data

    TipRelated Function

    data_acquisition.build_custom_vis_param(): function to allow the user to build their own band combinations

  4. For details on exporting to Google Cloud Storage, refer to Earth Engine Initialization and Authentication page.

  5. The system saved the satellite imagery as an object to be loaded in the next modules.