Scope
80,000 images, 500,000+ objects (polygons), annotations of Sentinel 2 satellite imagery.
Challenge
Clouds come in a great variety of shapes and capturing and reflecting these shapes in different weather and light conditions is extremely challenging, as well as identifying thin clouds and distinguishing between bright clouds and other bright objects.
Project
Improving methods of identifying clouds to unlock the potential of an unlimited range of satellite imagery use cases, enabling faster, more efficient, and more accurate image-based research.
Solution
Two quality check levels were applied to ensure consistency across the large dataset processed by multiple annotators. Using Taqadam mobile app on a tablet with a pen simplified the drawing process and allowed faster annotation.
The labeled dataset was used in a competition by Microsoft AI for Earth and Radiant Earth Foundation to be awarded to the best use case.
Problem Overview

Satellite imagery is critical for a wide variety of applications from disaster management and recovery to agriculture, to military intelligence. A major obstacle for all of these use cases is the presence of clouds, which cover over 66% of the Earth’s surface (Xie et al, 2020). Clouds introduce noise and inaccuracy in image-based models and usually have to be identified and removed. Improving methods of identifying clouds can unlock the potential of an unlimited range of satellite imagery use cases, enabling faster, more efficient, and more accurate image-based research.

The labeling project used data from the Sentinal-2 mission, which captures wide-swath, high-resolution, multi-spectral imaging used to monitor land surface conditions and the way they change. For each tile, data is separated into different bands of light across the full visible spectrum, near-infrared, and infrared light. Sentinel-2 imagery has recently been used for critical applications like:
• Tracking an erupting volcano on the Spanish island of La Palma. Satellite images showed the path of lava flowing across the land and helped evacuate towns in danger
• Mapping deforestation in the Amazon rainforest and identifying effective interventions
• Monitoring wildfires in California to identify their sources and track air pollutants

The biggest challenges in cloud detection are identifying thin clouds and distinguishing between bright clouds and other bright objects (Kristollari & Karathanassi, 2020). The three most common approaches used are Threshold methods, Handcrafted models, and Deep learning.

The Project

The availability of labeled data has been a major obstacle to cloud detection efforts. Existing models have often been used as a proxy for ground truth, significantly limiting performance (Zupanc, 2017).

The labels for this dataset were generated using human annotation of the optical bands of Sentinel-2 imagery. As a first step, in 2021, Radiant Earth Foundation ran a contest to crowdsource data labels identifying clouds in satellite imagery, sponsored by Planet, Microsoft AI for Earth, and Azavea. The result is a diverse set of Sentinel-2 scenes labeled for cloudy pixels. To simplify the crowdsourcing task, a generic “cloud” / “no cloud” classification was implemented rather than categorizing clouds by type.

The resulting crowdsourced dataset, while extensive, had varying degrees of label quality. As a second step, With support from Microsoft AI for Earth, Radiant Earth worked with expert annotators at HAIVO by B.O.T to validate and, as needed, revise these labels on Taqadam mobile app, designed for geospatial annotation use cases.

Outcome

The final dataset is a high-quality human-verified set of cloud labels that spans imagery and cloud conditions across three continents (Africa, South America, and Australia). The dataset has an open license (CC BY 4.0) and will be made publicly available after the competition ends. The labeled dataset was used in a competition by Microsoft AI for Earth and Radiant Earth Foundation to be awarded the best use case.

https://www.drivendata.org/competitions/83/cloud-cover/page/398/