Metadata-Version: 2.4
Name: ghazi-data-lib
Version: 0.1.0
Summary: datalib for machine learning
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: matplotlib>=3.10.0
Requires-Dist: numpy>=2.2.0
Requires-Dist: pandas>=2.2.3
Requires-Dist: pytest>=8.3.4
Requires-Dist: scikit-learn>=1.6.0
Requires-Dist: seaborn>=0.13.2
Description-Content-Type: text/markdown

Here is a structured and comprehensive README file for your project:

---

# **GHAZI_DATA_LIB**

**GHAZI_DATA_LIB** is a Python library for data analysis, machine learning, and statistical operations. It provides various modules for tasks such as data preprocessing, regression, supervised learning, unsupervised learning, statistical analysis, and visualization. The library is designed to be modular, clean, and easy to test.

---

## **Project Structure**

The project is organized into the following folders:

```plaintext
GHAZI_DATA_LIB/
│
├── datalib/                    # Core library modules
│   ├── __init__.py             # Package initializer
│   ├── data_processing.py      # Module for data loading and preprocessing
│   ├── regression.py           # Module for linear and polynomial regression
│   ├── statistics.py           # Module for statistical analysis
│   ├── supervised.py           # Module for supervised learning (classification and regression)
│   ├── unsupervised.py         # Module for clustering and dimensionality reduction
│   └── visualization.py        # Module for data visualization (to be implemented)
│
├── example/                    # Example scripts
│   ├── __init__.py
│   └── iris_example.py         # Example usage of the library with the Iris dataset
│
├── tests/                      # Unit tests for the library modules
│   ├── __init__.py
│   ├── test_data_processing.py
│   ├── test_regression.py
│   ├── test_statistics.py
│   ├── test_supervised.py
│   ├── test_unsupervised.py
│   └── test_visualization.py   # Visualization tests (to be implemented)
│
├── .gitignore                  # Git ignore file
├── LICENSE                     # License for the project
├── pyproject.toml              # Project configuration for uv
├── README.md                   # Project documentation
└── uv.lock                     # uv lockfile for package management
```

---

## **Features**

- **Data Processing**  
  Load, clean, preprocess, and split datasets for machine learning tasks.

- **Regression Models**  
  Perform linear and polynomial regression using scikit-learn.

- **Supervised Learning**  
  Train classification and regression models with Random Forest.

- **Unsupervised Learning**  
  Perform K-Means clustering and PCA for dimensionality reduction.

- **Statistical Analysis**  
  Calculate measures such as mean, median, standard deviation, correlation, and run statistical tests (t-test and chi-square).

- **Unit Testing**  
  Each module has corresponding unit tests under the `tests/` directory using `pytest`.

---

## **Requirements**

This project uses `uv`, a modern Python package manager. The dependencies are managed in the `pyproject.toml` file.

To install `uv`:

```bash
pip install uv
```

To install project dependencies:

```bash
uv pip install -r pyproject.toml
```

---

## **Installation**

1. Clone the repository:

   ```bash
   git clone https://github.com/yourusername/GHAZI_DATA_LIB.git
   cd GHAZI_DATA_LIB
   ```

2. Install dependencies:

   ```bash
   uv pip install -r pyproject.toml
   ```

3. Run tests to verify installation:
   ```bash
   pytest tests/
   ```

---

## **Usage**

### 1. **Data Preprocessing**

Load and preprocess a dataset:

```python
from datalib.data_processing import load_dataset, preprocess_dataset

# Load data
df = load_dataset("data.csv")

# Preprocess data
X, y = preprocess_dataset(df, target_column="target", drop_columns=["id"], fill_missing="mean")
```

### 2. **Regression Models**

Train a linear regression model:

```python
from datalib.regression import linear_regression

# Prepare data
X = [[1], [2], [3], [4]]
y = [2, 4, 6, 8]

# Train model
model = linear_regression(X, y)
print("Coefficients:", model.coef_)
```

### 3. **Supervised Learning**

Train a supervised model:

```python
from datalib.supervised import train_supervised_model

# Train classification model
model = train_supervised_model(X, y, model_type="classification")
```

### 4. **Unsupervised Learning**

Perform clustering:

```python
from datalib.unsupervised import perform_clustering

# Perform K-Means clustering
clusters = perform_clustering(X, n_clusters=3)
print(clusters)
```

---

## **Testing**

The project includes unit tests for all modules. To run the tests, execute the following command:

```bash
pytest tests/
```

---

## **License**

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

---

## **Contributing**

Contributions are welcome! If you would like to contribute:

1. Fork the repository.
2. Create a new branch:
   ```bash
   git checkout -b feature/your-feature-name
   ```
3. Commit your changes:
   ```bash
   git commit -m "Add your feature"
   ```
4. Push the branch:
   ```bash
   git push origin feature/your-feature-name
   ```
5. Create a Pull Request.

---

## **Acknowledgments**

- **Scikit-learn**: For machine learning algorithms.
- **Pandas and NumPy**: For data manipulation and mathematical operations.
- **Pytest**: For unit testing.

---

## **Contact**

For questions or collaboration requests, contact **Ghazi Chaftar** at [ghazichaftar@gmail.com](mailto:ghazichaftar@gmail.com).

---

This README file provides detailed information about your project structure, installation, features, and usage.
