Metadata-Version: 2.4
Name: zimscraperlib
Version: 5.4.0
Summary: Collection of python tools to re-use common code across scrapers
Project-URL: Donate, https://www.kiwix.org/en/support-us/
Project-URL: Homepage, https://github.com/openzim/python-scraperlib
Author-email: openZIM <dev@openzim.org>
License: GPL-3.0-or-later
License-File: LICENSE
Keywords: offline,openzim,zim
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.14
Requires-Python: <3.15,>=3.14
Requires-Dist: babel<3.0,>=2.9
Requires-Dist: beartype<0.23,>=0.19
Requires-Dist: beautifulsoup4<5.0,>=4.9.3
Requires-Dist: cairosvg<3.0,>=2.2.0
Requires-Dist: colorthief==0.2.1
Requires-Dist: idna<4.0,>=2.5
Requires-Dist: iso639-lang<3.0,>=2.4.0
Requires-Dist: libzim<4.0,>=3.8.0
Requires-Dist: lxml<7.0,>=4.6.3
Requires-Dist: optimize-images<2.0,>=1.3.6
Requires-Dist: piexif==1.1.3
Requires-Dist: pillow<13.0,>=7.0.0
Requires-Dist: pymupdf<2.0,>=1.24.0
Requires-Dist: python-magic<0.5,>=0.4.3
Requires-Dist: python-resize-image<1.2,>=1.1.19
Requires-Dist: regex>=2020.7.14
Requires-Dist: requests<3.0,>=2.25.1
Requires-Dist: types-xxhash<4.0,>=2.0
Requires-Dist: urllib3<2.8.0,>=1.26.5
Requires-Dist: xxhash<4.0,>=2.0
Requires-Dist: yt-dlp
Provides-Extra: dev
Requires-Dist: coverage==7.14.1; extra == 'dev'
Requires-Dist: invoke==3.0.3; extra == 'dev'
Requires-Dist: ipython==9.13.0; extra == 'dev'
Requires-Dist: jinja2==3.1.6; extra == 'dev'
Requires-Dist: mkdocs-gen-files==0.6.1; extra == 'dev'
Requires-Dist: mkdocs-include-markdown-plugin==7.3.0; extra == 'dev'
Requires-Dist: mkdocs-literate-nav==0.6.3; extra == 'dev'
Requires-Dist: mkdocs-material==9.7.6; extra == 'dev'
Requires-Dist: mkdocs==1.6.1; extra == 'dev'
Requires-Dist: mkdocstrings[python]==1.0.3; extra == 'dev'
Requires-Dist: pre-commit==4.6.0; extra == 'dev'
Requires-Dist: pygments==2.19.2; extra == 'dev'
Requires-Dist: pymdown-extensions==10.21.3; extra == 'dev'
Requires-Dist: pyright==1.1.409; extra == 'dev'
Requires-Dist: pytest-mock==3.15.1; extra == 'dev'
Requires-Dist: pytest==9.0.3; extra == 'dev'
Requires-Dist: pyyaml==6.0.3; extra == 'dev'
Requires-Dist: ruff==0.15.14; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-gen-files==0.6.1; extra == 'docs'
Requires-Dist: mkdocs-include-markdown-plugin==7.3.0; extra == 'docs'
Requires-Dist: mkdocs-literate-nav==0.6.3; extra == 'docs'
Requires-Dist: mkdocs-material==9.7.6; extra == 'docs'
Requires-Dist: mkdocs==1.6.1; extra == 'docs'
Requires-Dist: mkdocstrings[python]==1.0.3; extra == 'docs'
Requires-Dist: pygments==2.19.2; extra == 'docs'
Requires-Dist: pymdown-extensions==10.21.3; extra == 'docs'
Requires-Dist: ruff==0.15.14; extra == 'docs'
Provides-Extra: qa
Requires-Dist: pyright==1.1.409; extra == 'qa'
Requires-Dist: pytest==9.0.3; extra == 'qa'
Requires-Dist: ruff==0.15.14; extra == 'qa'
Provides-Extra: scripts
Requires-Dist: invoke==3.0.3; extra == 'scripts'
Requires-Dist: jinja2==3.1.6; extra == 'scripts'
Requires-Dist: pyyaml==6.0.3; extra == 'scripts'
Provides-Extra: test
Requires-Dist: coverage==7.14.1; extra == 'test'
Requires-Dist: pytest-mock==3.15.1; extra == 'test'
Requires-Dist: pytest==9.0.3; extra == 'test'
Description-Content-Type: text/markdown

# zimscraperlib

[![QA Status](https://github.com/openzim/python-scraperlib/actions/workflows/QA.yaml/badge.svg?branch=main)](https://github.com/openzim/python-scraperlib/actions/workflows/QA.yaml)
[![Tests Status](https://github.com/openzim/python-scraperlib/actions/workflows/Tests.yaml/badge.svg?branch=main)](https://github.com/openzim/python-scraperlib/actions/workflows/Tests.yaml)
[![CodeFactor](https://www.codefactor.io/repository/github/openzim/python-scraperlib/badge)](https://www.codefactor.io/repository/github/openzim/python-scraperlib)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/zimscraperlib.svg)](https://pypi.org/project/zimscraperlib/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zimscraperlib.svg)](https://pypi.org/project/zimscraperlib)
[![codecov](https://codecov.io/gh/openzim/python-scraperlib/branch/master/graph/badge.svg)](https://codecov.io/gh/openzim/python-scraperlib)
[![Read the Docs](https://img.shields.io/readthedocs/python-scraperlib)](https://python-scraperlib.readthedocs.io/)

Collection of python code to re-use across python-based scrapers

# Usage

- This library is meant to be installed via PyPI ([`zimscraperlib`](https://pypi.org/project/zimscraperlib/)).
- Make sure to reference it using a version code as the API is subject to frequent changes.
- API should remain the same only within the same _minor_ version.

Example usage:

```pip
zimscraperlib>=1.1,<1.2
```

See documentation at [Read the Docs](https://python-scraperlib.readthedocs.io/) for details.

> [!WARNING]
> While this library brings support for downloading videos with yt-dlp, recent changes in Youtube have forced yt-dlp team
> to require new dependencies for youtube videos (see https://github.com/yt-dlp/yt-dlp/issues/15012). These dependencies
> are significantly big and not needed for all other backend supported by yt-dlp (only youtube needs it). These dependencies
> are hence not included in this library dependencies (yet, see https://github.com/openzim/python-scraperlib/issues/268),
> you have to install them on your own if you intend to download videos from Youtube.

# Dependencies

Most dependencies are installed automatically by pip (from PyPI by default). The following system packages may be required depending on which features you use:

- **libmagic** — required for file type detection (used in most scrapers)
- **wget** — required only for `zimscraperlib.download` functions
- **FFmpeg** — required only for video processing functions
- **gifsicle** (>=1.92) — required only for GIF optimization
- **libcairo** — required only for SVG-to-PNG conversion
- **libzim** — auto-installed via PyPI, not available on Windows
- **Pillow** — auto-installed via PyPI; pre-built wheels are used by default and no system image libraries are needed. Only if you need to build Pillow from source should you install additional system libraries — see [Pillow's build documentation](https://pillow.readthedocs.io/en/latest/installation/building-from-source.html) for details.
  > **Note:** To run the full test suite, all system dependencies listed above must be installed.

## macOS

```sh
brew install libmagic wget ffmpeg gifsicle cairo
```

## Linux

```sh
sudo apt install libmagic1 wget ffmpeg gifsicle libcairo2
```

## Alpine

```sh
apk add ffmpeg gifsicle libmagic wget cairo
```

# Contribution

This project adheres to openZIM's [Contribution Guidelines](https://github.com/openzim/overview/wiki/Contributing).

This project has implemented openZIM's [Python bootstrap, conventions and policies](https://github.com/openzim/_python-bootstrap/docs/Policy.md) **v2.0.0**.

All instructions below must be run from the root of your local clone of this repository.

If you do not already have it on your system, install [hatch](https://hatch.pypa.io/latest/install/):

```shell
pip install hatch
```

Start a hatch shell — this will install all dependencies including dev in an isolated virtual environment:

```shell
hatch shell
```

Set up the pre-commit Git hook (runs linters automatically before each commit):

```shell
pre-commit install
```

Run tests with coverage:

```shell
invoke coverage
```

# Users

Non-exhaustive list of scrapers using it (check status when updating API):

- [openzim/freecodecamp](https://github.com/openzim/freecodecamp)
- [openzim/gutenberg](https://github.com/openzim/gutenberg)
- [openzim/ifixit](https://github.com/openzim/ifixit)
- [openzim/kolibri](https://github.com/openzim/kolibri)
- [openzim/nautilus](https://github.com/openzim/nautilus)
- [openzim/nautilus](https://github.com/openzim/nautilus)
- [openzim/openedx](https://github.com/openzim/openedx)
- [openzim/sotoki](https://github.com/openzim/sotoki)
- [openzim/ted](https://github.com/openzim/ted)
- [openzim/warc2zim](https://github.com/openzim/warc2zim)
- [openzim/wikihow](https://github.com/openzim/wikihow)
- [openzim/youtube](https://github.com/openzim/youtube)
