Metadata-Version: 2.4
Name: small-text
Version: 2.0.0.dev4
Summary: Active Learning for Text Classification in Python.
Author-email: Christopher Schröder <small-text@protonmail.com>
Maintainer-email: Christopher Schröder <small-text@protonmail.com>
License: MIT License
        
        Copyright (c) 2019-2025 Christopher Schröder and Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
        EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
        MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
        IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
        DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
        OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
        OR OTHER DEALINGS IN THE SOFTWARE.
Project-URL: Documentation, https://small-text.readthedocs.io
Project-URL: Repository, https://github.com/webis-de/small-text
Project-URL: Issues, https://github.com/webis-de/small-text/issues
Project-URL: Changelog, https://github.com/webis-de/small-text/blob/main/CHANGELOG.md
Keywords: active learning,text classification
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dill>=0.3.7
Requires-Dist: scipy
Requires-Dist: numpy>=1.21.0
Requires-Dist: scikit-learn>=0.24.1
Requires-Dist: tqdm
Requires-Dist: packaging
Requires-Dist: tokenizers>=0.22.0
Requires-Dist: tomli
Provides-Extra: pytorch
Requires-Dist: torch>=2.0.0; extra == "pytorch"
Provides-Extra: transformers
Requires-Dist: small-text[pytorch]; extra == "transformers"
Requires-Dist: transformers>=4.57.6; extra == "transformers"
Dynamic: license-file

[![PyPI](https://img.shields.io/pypi/v/small-text/v2.0.0.dev4)](https://pypi.org/project/small-text/)
[![Conda Forge](https://img.shields.io/conda/v/conda-forge/small-text?label=conda-forge)](https://anaconda.org/conda-forge/small-text)
[![codecov](https://codecov.io/gh/webis-de/small-text/branch/master/graph/badge.svg?token=P86CPABQOL)](https://codecov.io/gh/webis-de/small-text)
[![Documentation Status](https://readthedocs.org/projects/small-text/badge/?version=v2.0.0.dev4)](https://small-text.readthedocs.io/en/v2.0.0.dev4/) 
![Maintained Yes](https://img.shields.io/badge/maintained-yes-green)
[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen)](CONTRIBUTING.md)
[![MIT License](https://img.shields.io/github/license/webis-de/small-text)](LICENSE)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20358671.svg)](https://zenodo.org/records/20358671)

<p align="center">
<img width="450" src="https://github.com/webis-de/small-text/blob/dev/docs/_static/small-text-logo.png?raw=true" alt="small-text logo" />
</p>

> Active Learning for Text Classification in Python.
<hr>

[Installation](#installation) | [Quick Start](#quick-start) | [Contribution](CONTRIBUTING.md) | [Changelog][changelog] | [**Docs**][documentation_main]

Small-Text provides state-of-the-art **Active Learning** for Text Classification. 
Several pre-implemented query strategies, initialization strategies, and stopping criteria are provided, 
which can be easily mixed and matched to build active learning experiments or applications.

## What is Active Learning?
[Active learning](https://small-text.readthedocs.io/en/latest/active_learning.html) allows you to efficiently label training data for supervised learning when you have little to no labeled data.

<p align="center">

<img src="https://raw.githubusercontent.com/webis-de/small-text/dev/docs/_static/learning-curve-example.gif?raw=true" alt="Learning curve example for the TREC-6 dataset." width="60%">

</p>

## Active Learning in Practice

Active Learning for Text Classification has been applied across diverse fields, including biomedical research, social science, information science, computer science, and political communication:

- [Bootstrapping a biomedical corpus of digenic variant combinations](https://pubmed.ncbi.nlm.nih.gov/38805753/)
- [Detecting disclosures of individuals' employment status on social media](https://aclanthology.org/2022.acl-long.453/)
- [Accelerating systematic literature reviews](https://www.nature.com/articles/s42256-020-00287-7)
- [Topic categorization of citizen contributions](https://link.springer.com/chapter/10.1007/978-3-031-15086-9_24)
- [Classifying Speech Acts in Political Communication](http://doi.org/10.15439/2023F3485)

See [the showcase section][documentation_showcase] specifically for previous active learning applications where small-text was used.

## Features

- Provides unified interfaces for Active Learning, allowing you to 
  easily mix and match query strategies with classifiers provided by [sklearn](https://scikit-learn.org/), [PyTorch](https://pytorch.org/), or [transformers](https://github.com/huggingface/transformers).
- Supports GPU-based [Pytorch](https://pytorch.org/) models and integrates [transformers](https://github.com/huggingface/transformers) 
  so that you can use state-of-the-art Text Classification models for Active Learning.
- GPU is supported but not required. CPU-only use cases require only 
  a lightweight installation with minimal dependencies.
- Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).

---

## News

**Version 2.0.0 dev4** ([v2.0.0.dev4][changelog_2.0.0dev4]) - May 23rd, 2026
  - This is a development release with the most changes so far. You can consider it an alpha release, which does not guarantee you stable interfaces yet, 
    but is otherwise ready to use.
  - Version 2.0.0 offers refined interfaces, new query strategies, improved classifiers, and new functionality such as vector indices. See the [changelog][changelog_2.0.0dev4] for a full list of changes.

**Community Survey** - March 8th, 2026
  - How is active learning used in NLP today? Our EACL 2026 paper, "[Reassessing Active Learning Adoption in Contemporary NLP: A Community Survey](https://arxiv.org/abs/2503.09701v4)",
    investigates this question and presents new insights into its contemporary use.

**Version 1.4.1** ([v1.4.1][changelog_1.4.1]) - August 18th, 2024
  - Bugfix release.

**Paper published at EACL 2023 🎉**
  - The [paper][paper_published] introducing small-text has been accepted at [EACL 2023](https://2023.eacl.org/). Meet us at the conference in May!
  - Update: the paper was awarded [EACL Best System Demonstration](https://aclanthology.org/2023.eacl-demo.11/). Thank you for your support!

[For a complete list of changes, see the change log.][changelog]

---

## Installation

Small-Text can be easily installed via pip:

```bash
pip install small-text
```

The command results in a [slim installation][documentation_install] with only the necessary dependencies. 
For a full installation via pip, you just need to include the `transformers` extra requirement:

```bash
pip install small-text[transformers]
```

The library requires Python 3.10 or newer. For using the GPU, CUDA 10.1 or newer is required. 
More information regarding the installation can be found in the 
[documentation][documentation_install].


## Quick Start

For a quick start, see the provided examples for [binary classification](examples/examplecode/binary_classification.py),
[pytorch multi-class classification](examples/examplecode/pytorch_multiclass_classification.py), and 
[transformer-based multi-class classification](examples/examplecode/transformers_multiclass_classification.py),
or check out the notebooks.

### Notebooks

<div align="center">

| # | Notebook                                                                                                                                                                                                       |                                                                                                                                                                                                                                                  |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 
| 1 | [Intro: Active Learning for Text Classification with Small-Text](https://github.com/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) |
| 2 | [Using Stopping Criteria for Active Learning](https://github.com/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb)                                        | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb)                        |
| 3 | [Active Learning using SetFit](https://github.com/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/03-active-learning-with-setfit.ipynb)                                                                  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/03-active-learning-with-setfit.ipynb)                                   |
| 4 | [Using SetFit's Zero Shot Capabilities for Cold Start Initialization](https://github.com/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/04-zero-shot-cold-start.ipynb)                                  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev4/examples/notebooks/04-zero-shot-cold-start.ipynb)                                          |
| 5 | [Active Learning via the Argilla User Interface](https://github.com/webis-de/small-text/blob/dev/examples/notebooks/05-active-learning-with-argilla-interface.ipynb)                                             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/dev/examples/notebooks/05-active-learning-with-argilla-interface.ipynb)                                |

</div>

### Showcase

- [Need a user interface? Use small-text 2.0.0 in combination with Argilla](examples/notebooks/05-active-learning-with-argilla-interface.ipynb).

A full list of showcases can be found [in the docs][documentation_showcase].

**Would you like to share your use case?** Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the [showcase section][documentation_showcase] or even here.

## Documentation

Read the latest documentation [here][documentation_main]. Noteworthy pages include:

- [Overview of Query Strategies][documentation_query_strategies]
- [Reproducibility Notes][documentation_reproducibility_notes]

---

## Scope of Features

<table align="center">
  <caption>Extension of Table 1 in the <a href="https://aclanthology.org/2023.eacl-demo.11v2.pdf" target="_blank">EACL 2023 paper</a>.</caption>
  <thead>
    <tr>
      <th>Name</th>
      <th colspan="2">Active Learning</th>
    </tr>
    <tr>
      <th></th>
      <th>Query Strategies</th>
      <th>Stopping Criteria</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>small-text v1.3.0</td>
      <td>14</td>
      <td>5</td>
    </tr>
    <tr>
      <td>small-text v2.0.0</td>
      <td>19</td>
      <td>5</td>
    </tr>
  </tbody>
</table>

We use the numbers only to show the tremendous progress that small-text has made over time. 
There are many features and improvements that are not reflected in these numbers.

## Alternatives

[modAL](https://github.com/modAL-python/modAL), [ALiPy](https://github.com/NUAA-AL/ALiPy), [libact](https://github.com/ntucllab/libact), [ALToolbox](https://github.com/AIRI-Institute/al_toolbox)

---

## Contribution

Contributions are welcome. Details can be found in [CONTRIBUTING.md](CONTRIBUTING.md).

## Acknowledgments

This software was created by Christopher Schröder ([@chschroeder](https://github.com/chschroeder)) at Leipzig University's [NLP group](http://asv.informatik.uni-leipzig.de/) 
which is a part of the [Webis](https://webis.de/) research network. 
The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.

## Citation

Small-Text has been introduced in detail in the EACL23 System Demonstration Paper ["Small-Text: Active Learning for Text Classification in Python"](https://aclanthology.org/2023.eacl-demo.11/) which can be cited as follows:
```
@inproceedings{schroeder2023small-text,
    title = "Small-Text: Active Learning for Text Classification in Python",
    author = {Schr{\"o}der, Christopher  and  M{\"u}ller, Lydia  and  Niekler, Andreas  and  Potthast, Martin},
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.eacl-demo.11",
    pages = "84--95"
}
```

## License

[MIT License](LICENSE)


[documentation_main]: https://small-text.readthedocs.io/en/v2.0.0.dev4/
[documentation_install]: https://small-text.readthedocs.io/en/v2.0.0.dev4/install.html
[documentation_query_strategies]: https://small-text.readthedocs.io/en/v2.0.0.dev4/components/query_strategies.html
[documentation_showcase]: https://small-text.readthedocs.io/en/v2.0.0.dev4/showcase.html
[documentation_reproducibility_notes]: https://small-text.readthedocs.io/en/v2.0.0.dev4/reproducibility_notes.html
[changelog]: https://small-text.readthedocs.io/en/latest/changelog.html
[changelog_1.4.0]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-0-2024-06-09
[changelog_1.4.1]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-1-2024-08-18
[changelog_2.0.0dev4]: https://small-text.readthedocs.io/en/latest/changelog.html#version-2-0-0-dev4-2026-05-23
[argilla]: https://github.com/argilla-io/argilla
[argilla_al_tutorial]: https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-smalltext-activelearning.html
[paper_published]: https://aclanthology.org/2023.eacl-demo.11v2.pdf
