Metadata-Version: 2.2
Name: distributed-ml
Version: 0.1.3
Summary: A package for distributed scikit-learn tasks.
Home-page: https://github.com/sumitraut7/CS230-distributed-machine-learning
Author: Sumit Raut
Author-email: Sumit Raut <sumitcraut@gmail.com>
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: setuptools
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: tqdm
Requires-Dist: scikit-learn
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# distributed-ml

distributed-ml is a Python-based machine learning package that facilitates model training, hyperparameter tuning, and data handling through a structured API. It supports various scikit-learn estimators, including classifiers, regressors, and hyperparameter search classes like GridSearchCV and RandomizedSearchCV.

This package allows users to:
    - Create and manage API sessions
    - Check and download datasets
    - Train models with various configurations
    - Monitor job status with progress bars
    - Handle hyperparameter tuning efficiently

## API Reference

### Class

MLTaskManager()
    - Instantiates class.
    - Returns session id.

### Methods

check_data(data_name)
    - Checks if a dataset is available.
    - Arguments:
        - data_name (str): Name of the dataset.
    - Returns path where data was downloaded, otherwise 404 Error.

download_data(data_link, data_name, data_type)
    - Downloads data from a specified source.
    - Arguments:
        - data_link (str): URL or dataset identifier.
        - data_name (str): Name to save dataset as.
        - data_type (str): Source type (e.g., “kaggle”).
    - Returns path where data was downloaded.

train(estimator, dataset_name, train_params=None, wait_for_completion=False)
    - Submits a training job to the API.
    - Arguments:
    - estimator: A scikit-learn model.
    - dataset_name (str): Name of dataset.
    - train_params (dict, optional): Training configurations.
    - wait_for_completion (bool, optional): Whether to wait for the job to complete.
    - Returns training progress and job results (i.e. best results, best parameters)

check_job_status(job_id)
    - Retrieves the status of a training job.
    - Arguments:
        - job_id (str): Unique job identifier.
    - Returns training progress and job results (i.e. best results, best parameters)
