Metadata-Version: 2.4
Name: satellitetools
Version: 2.1.6
Summary: Retrieve Sentinel-2 data and calculate biophysical parameters
License: MIT
License-File: LICENSE
Author: Olli Nevalainen
Author-email: olli.nevalainen.fmi@gmail.com
Requires-Python: >=3.10,<4
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: earthengine-api (>=0.1.334)
Requires-Dist: gcloud (>=0.18.3)
Requires-Dist: numpy (>=1.23.5)
Requires-Dist: pandas (>=2.1.1)
Requires-Dist: pyproj (>=3.7.0,<4.0.0)
Requires-Dist: pystac-client (>=0.7.5)
Requires-Dist: rasterio (>=1.3.4)
Requires-Dist: scipy (>=1.9.3)
Requires-Dist: setuptools (>=70.0.0)
Requires-Dist: shapely (>=2.0.6,<3.0.0)
Requires-Dist: xarray (>=2022.12.0)
Requires-Dist: xmltodict (>=0.13.0)
Description-Content-Type: text/markdown

[![DOI](https://zenodo.org/badge/270676132.svg)](https://zenodo.org/badge/latestdoi/270676132)

# satellitetools 🛰️ 
This package provides methods to get Sentinel-2 L2A data from Google Earth Engine 
(<https://developers.google.com/earth-engine/>) or from cloud-optimized geotiffs in AWS Open data registry 
maintained by Element84 (<https://registry.opendata.aws/sentinel-2-l2a-cogs/>, <https://github.com/Element84/earth-search>).

This package has been mainly used in agricultural projects at the Climate System Research unit at 
the Finnish Meteorological Institute. The package is at the moment under constant development. 
Apologies for the lack of documentation at the moment. Please feel 
free to contact me in case you want more information about this package.

 Package also includes python implementation of ESA's SNAP Biophysical processor v1 
 which can be used to compute biophysical parameters, such as leaf area index (LAI) (Original Java code in SNAP [here](https://github.com/senbox-org/s2tbx/tree/master/s2tbx-biophysical/src/main/java/org/esa/s2tbx/biophysical). Algorithm documentation: <http://step.esa.int/docs/extra/ATBD_S2ToolBox_L2B_V1.1.pdf>.


Documentation: <https://satellitetools.readthedocs.io/>

Source code at: <https://github.com/ollinevalainen/satellitetools>

Citing: If you use this package in your research, I would appeaciate if you cite the relevant version from Zenodo: <https://doi.org/10.5281/zenodo.5993291>

Author: Olli Nevalainen, Finnish Meteorological Institute.



## Installation ##

**From PyPi:**
```console
pip install satellitetools
```

**From GitHub:**
```console
pip install git+https://github.com/ollinevalainen/satellitetools.git
```

**Development branch:**
```console
pip install git+https://github.com/ollinevalainen/satellitetools.git@develop
```

## About Google Earth Engine ##
Earth Engine must bee authenticated and initialized by the user.
Guides to authentication: <https://developers.google.com/earth-engine/guides/auth>

Remember GEE's licence terms.

## About AWS cloud-optimized geotiffs ##
Currently (2024-10-23) 2022 data is missing still from the sentinel-2-c1-l2a collection
because ESA has not yet processed them. 2022 is instead fetched from the sentinel-2-l2a
collection. Follow discussion and issues related to this at: <https://github.com/Element84/earth-search/issues>.

## Warning ##
The biophysical processor implementation in this package does not currently use the 
convex hull check (see the Java code and ATBD) and does not have as extensive 
input/output validity flagging as the original version in SNAP.


## Usage examples ##


**Define area of interest and request parameters:**
```python
from shapely.geometry import Polygon
import satellitetools as sattools
from satellitetools.common.sentinel2 import S2_BANDS_10_20_GEE, S2_FILTER1


qvidja_ec_field_geom = [
    [22.3913931, 60.295311],
    [22.3917056, 60.2951721],
    [22.3922131, 60.2949717],
    [22.3927016, 60.2948124],
    [22.3932251, 60.2946874],
    [22.3931117, 60.2946416],
    [22.3926039, 60.2944037],
    [22.3920127, 60.2941585],
    [22.3918447, 60.2940601],
    [22.391413, 60.2937852],
    [22.3908102, 60.2935286],
    [22.390173, 60.2933897],
    [22.389483, 60.2933106],
    [22.3890777, 60.293541],
    [22.3891442, 60.2936358],
    [22.3889863, 60.2940313],
    [22.3892131, 60.2941537],
    [22.3895462, 60.2942468],
    [22.3899066, 60.2944289],
    [22.3903881, 60.2946329],
    [22.3904738, 60.2948121],
    [22.3913931, 60.295311],
]

qvidja_polygon = Polygon(qvidja_ec_field_geom)
aoi = sattools.AOI("qvidja_ec", qvidja_polygon, "EPSG:4326")
datestart = "2023-06-01"
dateend = "2023-06-30"
bands = sattools.S2Band.get_10m_to_20m_bands()

```

**GEE:**

You need to initialize (and possibly authenticate) Google Earth Engine and define GEE 
data source.
```python
import ee
ee.Initialize(project="your-ee-project-name")
datasource = sattools.DataSource.GEE
```


**AWS:**

Define AWS data source.
```python
datasource = sattools.DataSource.AWS
```

**Evaluate quality and get the data using a wrapper function:**
```python
req_params = sattools.Sentinel2RequestParams(
    datestart, dateend, datasource, bands
)

df_quality_information, ds_s2_data = sattools.wrappers.get_s2_qi_and_data(
    aoi=aoi,
    req_params=req_params,
    qi_threshold=0.02,
    qi_filter=S2_FILTER1,
)
# df_quality_information will include quality information for all available Sentinel-2 images.

# ds_s2_data will include Sentinel-2 data for the requested bands and metadata for dates passing quality filter (see below more on the quality evaluation).
```
**Alternatively without the wrapper:**

This way you can access the data collection class with some extra metadata and with AWS
the source data item. With AWS data source you can speed-up process by concurrently fetching 
multiple images with multiprocessing parameter.
```python
if req_params.datasource == DataSource.GEE:
    s2_data_collection = GEESentinel2DataCollection(aoi, req_params)
if req_params.datasource == DataSource.AWS:
    import os
    number_of_processes = os.cpu_count() - 1 # or something else
    s2_data_collection = AWSSentinel2DataCollection(aoi, req_params, multiprocessing=number_of_processes)
    s2_data_collection.search_s2_items()

s2_data_collection.get_quality_info()
qi_threshold=0.02 # without specifying default 0 will be used
qi_filter=S2_FILTER1
s2_data_collection.filter_s2_items(qi_threshold, qi_filter)
s2_data_collection.get_s2_data()
s2_data_collection.data_to_xarray()

df_quality_information = s2_data_collection.quality_information
ds_s2_data = s2_data_collection.xr_dataset
```

**Calculate vegetation variables:**
```python
# Example to calculate LAI and NDVI.
variable = sattools.biophys.BiophysVariable.LAI
ds_s2_data = sattools.biophys.run_snap_biophys(ds_s2_data, variable)
vi = sattools.biophys.VegetationIndex.NDVI
ds_s2_data = sattools.biophys.compute_vegetation_index(ds_s2_data, vi)
# Both LAI and NDVI are now included in the dataset with variables names lai and ndvi.

# Other biophysical variables and vegetation indices also available:
class VegetationIndex(str, Enum):
    NDVI = "ndvi"
    CI_RED_EDGE = "ci_red_edge"
    GCC = "gcc"


class BiophysVariable(str, Enum):
    FAPAR = "fapar"
    FCOVER = "fcover"
    LAI = "lai"
    LAI_Cab = "lai_cab"
    LAI_Cw = "lai_cw"
```

## Testing
Tests have been written for pytest.

If you need to specify Google Earth Engine project for ee.Initialize(), for test you 
need to do it by having "EE_PROJECT_PYTEST" environment variable set. For example:
```console
export EE_PROJECT_PYTEST=your-ee-project
```

## About the satellite data filtering process ##

The satellite data is filtered based on the information available in the scene 
classification band (SCL) of the Sentinel-2 Level 2A products. 
In the SCL band every pixel is classified into one of the following classes: 

```python
class SCLClass(Enum):
    """Sentinel-2 Scene Classification Layer (SCL) classes."""
    NODATA = 0
    SATURATED_DEFECTIVE = 1
    DARK_FEATURE_SHADOW = 2
    CLOUD_SHADOW = 3
    VEGETATION = 4
    NOT_VEGETATED = 5
    WATER = 6
    UNCLASSIFIED = 7
    CLOUD_MEDIUM_PROBA = 8
    CLOUD_HIGH_PROBA = 9
    THIN_CIRRUS = 10
    SNOW_ICE = 11
```

The percentage of each of those classes within the polygon/area-of-interest(AOI) is 
calculated from SCL data. This information is/can be used to to 
filter the dates for which the actual data is requested from GEE or AWS. 
There's two parameters, qi_threhold and qi_filter, which are used to filter the data. 
The qi_filter is a list of the (unwanted) classes whose percentages within the AOI is 
summed. Currently, qi_filter is by default this:

```python
S2_FILTER1 = [
    "NODATA",
    "SATURATED_DEFECTIVE",
    "CLOUD_SHADOW",
    "UNCLASSIFIED",
    "CLOUD_MEDIUM_PROBA",
    "CLOUD_HIGH_PROBA",
    "THIN_CIRRUS",
    "SNOW_ICE",
]
```
These classes are considered unwanted and user can change the classes included in that
list by the qi_filter parameter. If the sum of the percentages of the classes defined 
in the qi_filter is less than the qi_threshold, the data is considered good/acceptable. 
I have been using the 0.02 which I have empirically tested and found to be ok for our  
purposes.

The SCL band provided in the S2 level-2A products is automatically generated and it has 
errors/misclassifications in it. This affects also the accuracy of the filtering 
processs and some bad data acquisition dates might pass the filter.


## TODO ##
**Realistic:**

- [ ] Include example notebook in the documentation
- [ ] Examples to docstrings
- [ ] Improve tests
- [ ] Add coverage and docs bulding to nox sessions
- [ ] Dcoument branching strategy and add contributing guidelines
- [ ] Refactor ee_* functions in gee.py. They have a lot of repetition.
- [ ] Calculate vegetation indices directly in GEE cloud and return only timeseries

**Wishful thinking:**
- [ ] Point-based AOI (currently AOI must be a polygon)
- [ ] Add Landsat data support
- [ ] Add Sentinel-1 data support
- [ ] Large spatial scale data retrieval from AWS: yield daily xr_datasets
    - currently all data is fetched first to memory

## History of major versions ##

### v2.0.0 ###

The package ended-up being quite useful in multiple research projects, so development of the package has been continuous. This release is a milestone for years of small changes and recent major refactoring of code during past two months. But now develop branch was finally merged to master.

In the future, the plan is to publish releases more frequently when possible developments in the develop branch gets merged to master branch.

Too much has changed compared to the first release, but here's summary of changes respect to the previous develop branch before major refactoring done during Sep-Dec 2024:

 **Breaking changes:**:warning:

* Renamed xrtools.py to timeseries.py
* RequestParams class is now Sentinel2RequestParams in sentinel2 submodule
* Restructured files and removed nesting. For example gee imported as satellitetools.gee instead of satellitetools.gee.gee
* The quality information dataframe doesn't have anymore `"Date"` column, but instead index named `"acquisition_time"` as UTC aware timestamp (from `pd.to_datetime`).

**New features:** 🔧 

* Refactored and made codes more object-oriented and modular:
* There's now parent classes `Sentinel2DataCollection`, `Sentinel2Item` and `Sentinel2Metadata` which have datasource specific child classes:
    - `GEESentinel2DataCollection`
    - `AWSSentinel2DataCollection`, `AWSSentinel2Item`, `AWSSentinel2Metadata`
* The parent classes have methods that are common for both data sources and the child classes have methods that are specific to the data source.
* Improved handling of Sentinel-2 bands and scene classification classes with `S2Band` and `SCLClass` classes
* Biophysical processor is now a class `SNAPBiophysProcessor`, also the biophysical variables and vegetation indices are now Enum classes.
* Enabled easier importing and access of classes and submodules. For example, you can define the data source with `satellitetools.DataSource.GEE` and the bands with `satellitetools.S2Band.B4`.
* Added tests for pytest.
* Improved docstrings, added examples and updated README.
* Documentation using sphinx
* Testing for python versions 3.10, 3.11 and 3.12 using nox
* Changed from print statements to logging.

## v1.0.0
This release is currently (7 February 2022) in use in Field Observatory (https://www.fieldobservatory.org/) and cited in a scientific article "Towards agricultural soil carbon monitoring, reporting and verification through Field Observatory Network (FiON)" that describes Field Observatory and has been accepted for publishing in EGU's Geoscientific Instrumentation, Methods and Data Systems journal (https://doi.org/10.5194/gi-2021-21).
