Metadata-Version: 2.4
Name: mapping-suite-sdk
Version: 1.0.0
Summary: Software development kit for XML-RML mapping rules packaging
License-File: LICENSE
Author: meaningfy.ws
Author-email: hi@meaningfy.ws
Requires-Python: >=3.10,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: click (>=8.0,<8.2)
Requires-Dist: gitpython (>=3.1.0,<4.0.0)
Requires-Dist: lxml (>=5.3.0,<6.0.0)
Requires-Dist: openpyxl (>=3.1.0,<4.0.0)
Requires-Dist: opentelemetry-api (==1.29.0)
Requires-Dist: opentelemetry-instrumentation (==0.50b0)
Requires-Dist: opentelemetry-sdk (==1.29.0)
Requires-Dist: pandas (==2.1.4)
Requires-Dist: pydantic (>=2.10.0,<3.0.0)
Requires-Dist: pymongo (>=4.11.0,<5.0.0)
Requires-Dist: typer (>=0.15.0,<0.16.0)
Description-Content-Type: text/markdown

# mapping-suite-sdk

![pylint](https://img.shields.io/badge/PyLint-8.99-yellow?logo=python&logoColor=white)
> **Note:** The Pylint badge is a static indicator. For actual Pylint scores, see the automated Pylint reports in PR comments generated by our CI checks.

[![PyPI version](https://img.shields.io/pypi/v/mapping-suite-sdk.svg)](https://pypi.org/project/mapping-suite-sdk/)
[![PyPI Downloads](https://static.pepy.tech/badge/mapping-suite-sdk)](https://pepy.tech/projects/mapping-suite-sdk)

[![Stack Overflow](https://img.shields.io/badge/stackoverflow-get%20help-black.svg)](https://stackoverflow.com/questions/tagged/mapping-suite-sdk)

[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Bugs](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=bugs)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Code Smells](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=code_smells)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=coverage)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Duplicated Lines (%)](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=duplicated_lines_density)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=ncloc)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=security_rating)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Technical Debt](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=sqale_index)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)
[![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=meaningfy-ws_mapping-suite-sdk&metric=vulnerabilities)](https://sonarcloud.io/summary/new_code?id=meaningfy-ws_mapping-suite-sdk)

The Mapping Suite SDK, or MSSDK, is a software development kit (SDK) designed to standardize 
and simplify the handling of packages that contain transformation rules and related artefacts
for mapping data from XML to RDF (RDF Mapping Language).

## Mapping package anatomy

A mapping package is a standardized collection of files and directories that contains all the necessary components for transforming data from one format to another, specifically from XML to RDF using RDF Mapping Language (RML).

### Structure Overview

A mapping package consists of the following core components:

1. **Metadata** - Essential identifying information about the package including:
   - Identifier
   - Title
   - Issue date
   - Description
   - Mapping version
   - Ontology version
   - Type
   - Eligibility constraints
   - Signature (hash digest for integrity verification)

2. **Conceptual Mapping Asset** - Excel spreadsheets that define high-level mapping concepts and relationships between source data and target ontologies.

3. **Technical Mapping Suite** - A collection of implementation-specific mapping files:
   - RML Mapping files - Define transformations from heterogeneous data structures to RDF

4. **Vocabulary Mapping Suite** - Files that define specific value transformations and mappings between source and target data values (JSON, CSV, XML).

5. **Test Data Suites** - Collections of test data files used for validation and verification of mapping processes.

6. **SPARQL Test Suites** - Collections of SPARQL query files used for testing and validation of the transformed data.

7. **SHACL Test Suites** - Collections of SHACL (Shapes Constraint Language) files used for RDF data validation.

### Package Structure Diagram

```
mapping-package/
├── metadata.json                  # Package metadata
├── transformation/                # Transformation assets
│   ├── conceptual_mappings.xlsx   # Excel file with conceptual mappings
│   ├── mappings/                  # Technical mapping suite
│   │   ├── mapping1.rml.ttl       # RML mapping files
│   │   ├── mapping2.rml.ttl
│   │   └── mapping3.rml.ttl
│   └── resources/                 # Vocabulary mapping suite
│       ├── codelist1.json         # Value mapping files in various formats
│       └── codelist2.csv
├── validation/                    # Validation assets
│   ├── shacl/                     # SHACL test suites
│   │   └── shacl_suite1/                # Domain-specific SHACL shapes
│   │       └── shape1.ttl         # SHACL shape files
│   └── sparql/                    # SPARQL test suites
│       └── sparql_suite1/              # Category-specific SPARQL queries
│           ├── query1.rq          # SPARQL query files
│           └── query2.rq
└── test_data/                     # Test data suites
    ├── test_data_suite1/                # Test case directory
    │   └── input.xml              # Input test data
    └── test_data_suite2/                # Another test case directory
        └── input.xml              # Input test data
```

The diagram above is the **full** layout. Not all packages include every part; structure varies by version.

**Version variations:**  
- **v3L (lightweight)** - Only what’s needed for transformation: metadata, technical mappings (RML), and vocabulary resources. It has **no** conceptual mapping, test_data, validation (SHACL/SPARQL), or output.  
- **v1** - `metadata.json` with XSD version constraints (`min_xsd_version`); no `@context`.  
- **v2** - `metadata.json` with eForms SDK constraints (`eforms_sdk_versions`); no `@context`.  
- **v3 (full)** - Metadata as JSON-LD (`@context`, `project_identifier`) and must include `transformation/conceptual_mappings.xlsx` plus the full transformation/validation/test_data layout.

This structure supports consistent loading, validation, and conversion across versions.

## Quick Start

**Requires Python 3.10–3.12** (Python 3.13+ is not yet supported due to pandas 2.1.4 compatibility, which is currently needed by some of our users)

Install the SDK using pip:
```bash
pip install mapping-suite-sdk
```

or using poetry:
```bash
poetry add mapping-suite-sdk
```

Supported package versions: **v1**, **v2**, **v3**, and **v3L** (v3 lightweight). Besides the CLI, you can **load, convert, and validate** packages programmatically using **`load_mapping_package`** (no CLI required): it auto-detects version, optionally converts to v3 or v3L, and can validate and/or persist to MongoDB. Version-specific loaders are also available.

### Loading a Mapping Package

**Version-agnostic (recommended):** use the version-agnostic loader to load from folder with auto-detection; optionally convert to v3 or v3L and validate:

```python
from pathlib import Path
from mapping_suite_sdk.tools.services.load_mapping_package import load_mapping_package

# Auto-detect version, convert to v3 (full) or v3L (lightweight), optionally validate
package = load_mapping_package(
    Path("/path/to/mapping/package"),
    include_test_data=True,   # True → v3, False → v3L
    validate_package=False,   # set True to validate before/after conversion
)
```

**Version-specific:** load from folder, archive, or GitHub (example for v2):

```python
from pathlib import Path
from mapping_suite_sdk.mapping_package_v2.services.load_mapping_package_v2 import (
    load_mapping_package_v2_from_folder,
    load_mapping_package_v2_from_archive,
    load_mapping_packages_v2_from_github,
)

package = load_mapping_package_v2_from_folder(mapping_package_folder_path=Path("/path/to/package"))
# Or: load_mapping_package_v2_from_archive(...), load_mapping_packages_v2_from_github(...)
```

Same pattern exists for v1, v3, and v3L under `mapping_suite_sdk.mapping_package_v1`, `mapping_package_v3`, etc.

### Serializing a Mapping Package

Version-specific serialisers write a package to a folder (e.g. `serialise_mapping_package_v2_to_folder`). Conversion (see below) also serialises in-place.

### CLI: Update (convert) and validate packages

**Conversion matrix:** The SDK only supports converting **to** v3 or v3L. You can convert from v1, v2, or v3 into v3 or v3L as below. There is no conversion to v1 or v2 (e.g. v1→v2 is not implemented), and no conversion from v3 or v3L back to v1 or v2.

| From → To | v1 | v2 | v3 | v3L |
|-----------|---:|---:|---:|---:|
| v1        | — | ✗ | ✓ | ✓ |
| v2        | ✗ | — | ✓ | ✓ |
| v3        | ✗ | ✗ | — | ✓ |
| v3L       | ✗ | ✗ | ✗ | — |

**Update (convert) a single package** (in-place):

```bash
# v1 or v2 → v3 (full package)
mssdk convert --to-version v3 --from-version v2 from-package /path/to/package
mssdk convert --to-version v3 --from-version v1 from-package /path/to/package

# v1, v2, or v3 → v3L (lightweight)
mssdk convert --to-version v3L --from-version v3 from-package /path/to/package
mssdk convert --to-version v3L --from-version v2 from-package /path/to/package
```

**Update all packages in a folder** (in-place):

```bash
mssdk convert --to-version v3 --from-version v2 from-folder /path/to/mappings/folder
mssdk convert --to-version v3L --from-version v3 from-folder /path/to/mappings/folder
```

Packages already in the target version are skipped. Use `--verbose` for detailed logs.

**Validate packages** (structure, hash, etc.) from folder, archive, or GitHub:

```bash
# Validate all packages under a folder (optionally fix invalid hashes)
mssdk validate from-folder /path/to/mappings/folder
mssdk validate from-folder --update-hash /path/to/mappings/folder

# Validate a single package from a ZIP
mssdk validate from-archive path/to/package.zip
mssdk validate from-archive --include-test-data path/to/package.zip

# Validate packages from a GitHub repo
mssdk validate from-github https://github.com/org/repo mappings/*
mssdk validate from-github --branch main https://github.com/org/repo mappings/*
```

Validate options: `--include-test-data`, `--include-output`, `--update-hash` (from-folder), `--branch` (from-github), `--verbose`.

## Extractors

Extract packages from ZIP archives or GitHub using `ArchiveExtractor` and `GitHubExtractor`:

```python
from pathlib import Path
from mapping_suite_sdk.core.adapters.extractor import ArchiveExtractor, GitHubExtractor

# Archive: extract to path or temporary directory
extractor = ArchiveExtractor()
output_path = extractor.extract(Path("package.zip"), Path("output_directory"))
with extractor.extract_temporary(Path("package.zip")) as temp_path:
    pass  # cleanup automatic

# GitHub: extract packages matching a pattern
extractor = GitHubExtractor()
with extractor.extract_temporary(
    repository_url="https://github.com/org/repo",
    packages_path_pattern="mappings/package*",
    branch_or_tag_name="v1.0.0"
) as package_paths:
    for path in package_paths:
        print(f"Found package at: {path}")
```

## MongoDB Support

Use `MongoDBRepository` with the model class for the version you use (e.g. `MappingPackageV2`, `MappingPackageV3`):

```python
from pathlib import Path
from pymongo import MongoClient
from mapping_suite_sdk.core.adapters.repository import MongoDBRepository
from mapping_suite_sdk.mapping_package_v2.models.mapping_package_v2 import MappingPackageV2
from mapping_suite_sdk.mapping_package_v2.services.load_mapping_package_v2 import (
    load_mapping_package_v2_from_folder,
    load_mapping_package_v2_from_mongo_db,
)

mongo_client = MongoClient("mongodb://localhost:27017/")
repository = MongoDBRepository(
    model_class=MappingPackageV2,
    mongo_client=mongo_client,
    database_name="mapping_suites",
    collection_name="packages",
)

package = load_mapping_package_v2_from_folder(mapping_package_folder_path=Path("/path/to/package"))
repository.create(package)

retrieved_package = load_mapping_package_v2_from_mongo_db(
    mapping_package_id=package.id,
    mapping_package_repository=repository,
)
packages = repository.read_many({"metadata.version": "1.0.0"})
```

## OpenTelemetry Tracing

Tracing is supported via the SDK config and tracer helpers. Import from the modules that define them (e.g. `mapping_suite_sdk.config` for `mssdk_config`; see docs for `set_mssdk_tracing`, `get_mssdk_tracing`, `add_span_processor_to_mssdk_tracer_provider`). Enable tracing and add span processors (e.g. console or OTLP) as needed; see the full documentation for examples.

## Contributing

Contributions to the Mapping Suite SDK are welcome! Use fork and pull request workflow.

### Development Setup

```bash
git clone https://github.com/meaningfy-ws/mapping-suite-sdk.git
cd mapping-suite-sdk
make install
make test-unit   # runs generate-models then unit tests (LinkML → Python)
```

Python models are generated from LinkML schemas in `resources/schema/`. After schema changes, run `make generate-models` before tests.

### Dependency Restrictions

- LinkML 1.9.5 onwards introduces breaking changes in our data
- Click 8.2 onwards introduces breaking changes in our CLI
- Pandas 2.1.4 and OpenTelemetry 1.29.0 are required due to a downstream consumer which relies on Airflow 2.10.x

## Get in Touch

- **Issues**: Report bugs and feature requests on our [GitHub Issues](https://github.com/meaningfy-ws/mapping-suite-sdk/issues)
- **Email**: Contact the team at [hi@meaningfy.ws](mailto:hi@meaningfy.ws)
- **Website**: Visit our website at [meaningfy.ws](https://meaningfy.ws)

