Metadata-Version: 2.4
Name: chuck-data
Version: 0.2.1
Summary: Command line AI for customer data
Project-URL: Homepage, https://github.com/amperity/chuck-data
Project-URL: Repository, https://github.com/amperity/chuck-data
Author-email: John Rush <john.rush@amperity.com>, Caleb Benningfield <noodles@amperity.com>
License: Apache-2.0
License-File: LICENSE
Keywords: agents,ai,customer data,databricks,llm
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Build Tools
Requires-Python: >=3.10
Requires-Dist: boto3>=1.28.0
Requires-Dist: databricks-sdk>=0.50.0
Requires-Dist: jsonschema>=4.23.0
Requires-Dist: openai>=1.76.0
Requires-Dist: prettytable>=3.12.0
Requires-Dist: prompt-toolkit>=3.0.50
Requires-Dist: pydantic>=2.11.3
Requires-Dist: readchar>=4.2.1
Requires-Dist: requests>=2.32.3
Requires-Dist: textual>=0.40.0
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: pexpect; extra == 'dev'
Requires-Dist: pyright; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

[![chuck-banner](https://github.com/user-attachments/assets/abcd9545-e0aa-47a9-bf7f-041fe0c0bc0e)](https://chuckdata.ai)

# Chuck Data

Chuck is a text-based user interface (TUI) for managing Databricks resources including Unity Catalog, SQL warehouses, models, and volumes. Chuck Data provides an interactive shell environment for customer data engineering tasks with AI-powered assistance.

Check us out at [chuckdata.ai](https://chuckdata.ai).

Join our community on [Discord](https://discord.gg/f3UZwyuQqe).

## Features

- Interactive TUI for managing Databricks resources
- AI-powered "agentic" data engineering assistant
- Identity resolution powered by [Amperity's Stitch](https://docs.amperity.com/reference/stitch.html)
- Use LLMs from your Databricks account via Databricks Model Serving
- Browse Unity Catalog resources (catalogs, schemas, tables)
- Profile database tables with automated PII detection (via LLMs)
- Tag tables in Unity Catalog with semantic tags for PII to power compliance and data governance use cases
- Command-based interface with both natural language commands and slash commands

## Authentication
- Authenticates with Databricks using personal access tokens
- Authenticates with Amperity using API keys (/login and /logout commands)

## LLM Provider Support

Chuck supports multiple LLM providers, allowing you to choose the best option for your use case:

### Supported Providers

- **Databricks** (default) - Use LLMs from your Databricks account via Model Serving
- **AWS Bedrock** - Use AWS Bedrock foundation models (Claude, Llama, Nova, etc.)
- **OpenAI** - Direct OpenAI API integration (coming soon)
- **Anthropic** - Direct Anthropic API integration (coming soon)

### AWS Bedrock Setup

To use AWS Bedrock as your LLM provider:

1. **Install AWS dependencies:**
   ```bash
   pip install chuck-data[aws]
   ```

2. **Configure AWS credentials:**

   **Option 1: AWS SSO (Recommended for enterprise)**
   ```bash
   # Login via SSO
   aws sso login --profile your-profile

   # Set profile for session
   export AWS_PROFILE=your-profile
   export AWS_REGION=us-east-1
   ```

   **Option 2: Environment variables**
   ```bash
   export AWS_REGION=us-east-1
   export AWS_ACCESS_KEY_ID=your-access-key
   export AWS_SECRET_ACCESS_KEY=your-secret-key
   ```

   **Option 3: AWS CLI configuration** (`~/.aws/credentials`)
   ```ini
   [default]
   aws_access_key_id = your-access-key
   aws_secret_access_key = your-secret-key
   region = us-east-1
   ```

   **Option 4: IAM role** (for EC2/ECS/Lambda deployments)

3. **Set LLM provider:**

   Via environment variable:
   ```bash
   export CHUCK_LLM_PROVIDER=aws_bedrock
   chuck
   ```

   Or via config file (`~/.chuck_config.json`):
   ```json
   {
     "llm_provider": "aws_bedrock",
     "active_model": "anthropic.claude-3-5-sonnet-20240620-v1:0",
     "llm_provider_config": {
       "aws_bedrock": {
         "region": "us-east-1"
       }
     }
   }
   ```

4. **Request model access in AWS Bedrock console:**

   Some models require explicit approval before use. Visit the [AWS Bedrock console](https://console.aws.amazon.com/bedrock/) and request access to your desired models.

Use `/list-models` within Chuck to see all available models in your AWS account.

### Provider Selection Priority

Chuck resolves the LLM provider in this order:
1. `CHUCK_LLM_PROVIDER` environment variable (highest priority)
2. `llm_provider` in config file
3. Default: `databricks`

## Installation

### Homebrew (Recommended)

```bash
brew tap amperity/chuck-data
brew install chuck-data
```

### pip

```bash
pip install chuck-data
```

## Usage

Chuck Data provides an interactive text-based user interface. Run the application using:

```bash
chuck 
```

Or run directly with Python:

```bash
python -m chuck_data 
```

## Available Commands

Chuck Data supports a command-based interface with slash commands that can be used within the interactive TUI. Type `/help` within the application to see all available commands.

### Some general commands to be aware of are:
- `/status` - Show current connection status and application context
- `/login`, `/logout` - Log in/out of Amperity, this is how Chuck interacts with Amperity to run Stitch
- `/list-models`, `/select-model <model_name>`  - Configure which LLM Chuck should use (Pick one designed for tools, we recommend databricks-claude-3-7-sonnet)
- `/list-warehouses`, `/select-warehouse <warehouse_name>` - Many Chuck tools run SQL so make sure to select a warehouse

Many of Chuck's tools will use your selected Catalog and Schema so that you don't have to constantly specify them. Use these commands to manage your application context.

### Catalog & Schema Management
- `/catalogs`, `/select-catalog <catalog_name>` - Manage Catalog context
- `/schemas`, `/select-schema <schema_name>` - Manage Schema context

## Known Limitations & Best Practices

### Known Limitations
- Unstructured data - Stitch will ignore fields in formats that are not supported
- GCP Support - Currently only AWS and Azure are formally supported, GCP will be added very soon
- Stitching across Catalogs - Technically if you manually create Stitch manifests it can work but Chuck doesn't automatically handle this well

### Best Practices
- Use models designed for tools, we recommend databricks-claude-3-7-sonnet but have also tested extensively with databricks-llama-3.2-7b-instruct
- Denormalized data models will work best with Stitch
- Sample data to try out Stitch is [available on the Databricks marketplace](https://marketplace.databricks.com/details/6bc4843f-3809-4995-8461-9756f6164ddf/Amperity_Amperitys-Identity-Resolution-Agent-30-Day-Trial). (Use the bronze schema PII datasets)

## Amperity Stitch

A key tool Chuck can use is Amperity's Stitch algorithm. This is a ML based identity resolution algorithm that has been refined with the world's biggest companies over the last decade.
- Stitch outputs two tables in a schema called `stitch_outputs`. `unified_coalesced` is a table of standardized PII with Amperity IDs. `unified_scores` are the "edges" of the graph that have links and confidence scores for each match.
- Stitch will create a new notebook in your workspace each time it runs that you can use to understand the results, be sure to check it out!
- For a detailed breakdown of how Stitch works, [see this great article breaking it down step by step](https://docs.amperity.com/reference/stitch.html)

## Support

Chuck is a research preview application that is actively being improved based on your usage and feedback. Always be sure to update to the latest version of Chuck to get the best experience!

### Support Options

1. **GitHub Issues**  
   Report bugs or request features on our GitHub repository:  
   https://github.com/amperity/chuck-data/issues

2. **Discord Community**  
   Join our community to chat with other users and developers:  
   https://discord.gg/f3UZwyuQqe  
   Or run `/discord` in the application

3. **Email Support**  
   Contact our dedicated support team:  
   chuck-support@amperity.com

4. **In-app Bug Reports**  
   Let Chuck submit a bug report automatically with the `/bug` command

## Development

### Requirements

- Python 3.10 or higher
- [uv](https://github.com/astral-sh/uv) - Python package installer and resolver (technically this is not required but it sure makes life easier)

### Project Structure

```
chuck_data/             # Main package
├── __init__.py
├── __main__.py         # CLI entry point
├── commands/           # Command implementations
├── ui/                 # User interface components
├── agent/              # AI agent functionality
├── clients/            # External service clients
├── databricks/         # Databricks utilities
└── ...                 # Other modules
```

### Installation

Install the project with development dependencies:

```bash
uv pip install -e .[dev]
```

### Testing

Run the test suite:

```bash
uv run -m pytest
```

Run linters and static analysis:

```bash
uv run ruff .
uv run black --check --diff chuck_data tests
uv run ruff check
uv run pyright
```

For test coverage:

```bash
uv run -m pytest --cov=chuck_data
```

### CI/CD

This project uses GitHub Actions for continuous integration:

- Automated testing on Python 3.10
- Code linting with flake8
- Format checking with Black

The CI workflow runs on every push to `main` and on pull requests. You can also trigger it manually from the Actions tab in GitHub.
