Metadata-Version: 2.4
Name: NullGazeX
Version: 2.2.0
Summary: Research tool for studying DPI, TLS fingerprinting, and content-delivery protections via ClientHello splitting and parallel-strategy bypass. For educational and research purposes only.
Author-email: Batal <batal@proton.me>
License-Expression: MIT
Project-URL: Homepage, https://github.com/batal/NullGazeX
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.25.0
Requires-Dist: curl_cffi>=0.5.0
Provides-Extra: browser
Requires-Dist: DrissionPage>=4.0.0; extra == "browser"
Provides-Extra: scrape
Requires-Dist: beautifulsoup4>=4.9.0; extra == "scrape"
Provides-Extra: full
Requires-Dist: curl_cffi>=0.5.0; extra == "full"
Requires-Dist: DrissionPage>=4.0.0; extra == "full"
Requires-Dist: beautifulsoup4>=4.9.0; extra == "full"
Dynamic: license-file

# NullGazeX — Research Tool for Network Filtering Analysis

> **Important**: NullGazeX is a **research and educational tool** designed
> to study network filtering, Deep Packet Inspection (DPI), TLS
> fingerprinting, and content-delivery protection mechanisms.  It is
> **not** intended for unauthorized access to protected content.  The
> author assumes **no liability** for any misuse or for any consequences
> arising from its use.  By using this software you agree that you are
> solely responsible for complying with all applicable laws and the
> terms of service of any website you interact with.

NullGazeX implements a multi-tier pipeline of bypass strategies — raced
in parallel for **terrifying speed** — to retrieve content through ISP
SNI filters, Cloudflare challenges, TLS fingerprint checks, and
hotlinking protections.

---

## What NullGazeX Does

| Capability | Description |
| :--- | :--- |
| **Image Downloading** | Download images through any firewall or anti-scraping protection. |
| **Page Scraping** | Retrieve full page HTML, text, titles, or JSON from any URL — undetected. |
| **DPI Bypass** | ClientHello packet splitting evades ISP SNI filters without external proxies or VPNs. |
| **TLS Impersonation** | Rotates curl\_cffi browser fingerprints (Chrome, Safari, Firefox, Edge) so traffic looks like a real browser. |
| **Parallel Strategy Racing** | All bypass strategies launch concurrently — the fastest wins in fractions of a second. |
| **Adaptive Anti-Blocking** | Per-domain cooldown, jittered delays, session cycling, and fingerprint rotation prevent server-side rate-limiting. |
| **Stealth Browser Fallback** | Headless Chromium (via DrissionPage) with anti-detection flags for extreme JS challenges. |

---

## How It Works

NullGazeX uses a **local loopback proxy** that intercepts the TLS
ClientHello handshake packet and splits it into two TCP segments with
an 8 ms gap between them.  The SNI (Server Name Indication) field is
thus broken across two packets.  DPI firewalls inspect packets
individually and do not reassemble them — so the SNI check never fires.

Combined with **rotating TLS fingerprints** (the `curl_cffi` library
impersonates Chrome 110–124, Safari 15–17, Firefox 116–117, and Edge
99–110), the traffic is indistinguishable from a real browser.

For more details see **[DPI_BYPASS_DOC.md](nullgaze/DPI_BYPASS_DOC.md)**.

---

## Installation

### Standard (image downloading + scraping)
```bash
pip install NullGazeX
```

### With browser support (for extreme JS challenges)
```bash
pip install NullGazeX[browser]
```

### Full installation (everything)
```bash
pip install NullGazeX[full]
```

Dependencies are resolved automatically:

| Extra | Includes |
| :--- | :--- |
| *(none)* | `requests`, `curl_cffi` |
| `browser` | `DrissionPage` (headless Chromium) |
| `scrape` | `beautifulsoup4` (HTML parsing) |
| `full` | everything above |

---

## Quick Start — Image Downloading

### Single image
```python
from nullgaze import download_image

download_image(
    url="https://example.com/images/photo.jpg",
    output_path="outputs/photo.jpg",
    verbose=True
)
```

### Batch download (concurrent)
```python
from nullgaze import download_images

targets = [
    ("https://example.com/img1.jpg", "outputs/img1.jpg"),
    ("https://example.com/img2.jpg", "outputs/img2.jpg"),
    ("https://example.com/img3.jpg", "outputs/img3.jpg"),
]

results = download_images(targets, max_workers=10, verbose=True)

for i, res in enumerate(results):
    if isinstance(res, Exception):
        print(f"Failed {i+1}: {res}")
    else:
        print(f"Saved to: {res}")
```

---

## Quick Start — Page Scraping

### Scrape a page (raw HTML)
```python
from nullgaze import scrape_page

html = scrape_page("https://example.com/article", verbose=True)
print(html[:500])
```

### Scrape clean text
```python
from nullgaze import scrape_text

text = scrape_text("https://example.com/article")
print(text)
```

### Get page title
```python
from nullgaze import scrape_title

title = scrape_title("https://example.com/article")
print(f"Title: {title}")
```

### Scrape a JSON API
```python
from nullgaze import scrape_json

data = scrape_json("https://api.example.com/data")
print(data)
```

### Scrape multiple URLs in parallel
```python
from nullgaze import scrape_bulk

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3",
]

results = scrape_bulk(urls, max_workers=10, verbose=True)
for i, res in enumerate(results):
    if isinstance(res, Exception):
        print(f"Failed {urls[i]}: {res}")
    else:
        print(f"Success: {len(res)} chars")
```

---

## Object-Oriented API

### Image Downloader
```python
from nullgaze import ImageDownloader

downloader = ImageDownloader(verbose=True)

custom_headers = {
    "Referer": "https://custom-referer.com/",
    "Cookie": "session=abc123",
}

downloader.download(
    url="https://example.com/protected/image.png",
    output_path="outputs/image.png",
    headers=custom_headers,
    race_timeout=3.0,  # give up after 3s
)
```

### Page Scraper
```python
from nullgaze import PageScraper

scraper = PageScraper(verbose=True)

# All strategies race in parallel
html = scraper.scrape("https://example.com", race_timeout=5.0)

# Get just the title
title = scraper.scrape_title("https://example.com")

# Get clean text
text = scraper.scrape_text("https://example.com")
```

---

## Strategy Pipeline

NullGazeX races these strategies **in parallel** — the first to succeed
wins, the rest are cancelled:

| Tier | Strategy | Speed | Best For |
| :---: | :--- | :---: | :--- |
| 1 | **Direct HTTP** | ⚡⚡⚡  ~0.3s | Unrestricted servers |
| 2 | **DPI-Bypass + TLS Impersonation** | ⚡⚡  ~0.5s | ISP SNI blocks, fingerprint checks |
| 3 | **Stealth Browser (Chromium)** | 🐢  ~3s | Cloudflare Turnstile, JS challenges |
| 4 | **Weserv Proxy** *(images only)* | ⚡⚡  ~1s | General fallback |

The library caches the successful strategy per domain so subsequent
requests to the same domain skip the race and go straight to the
winning strategy.

---

## Anti-Blocking Features

| Mechanism | How It Works |
| :--- | :--- |
| **Fingerprint rotation** | Each request picks a random TLS fingerprint (Chrome 110–124, Safari 15–17, Firefox 116–117, Edge 99–110). |
| **Header rotation** | User-Agent, Accept, Accept-Language, and Sec-Fetch-* headers are randomized per request. |
| **Jittered delays** | 50–200 ms random delay between bulk dispatches to avoid burst detection. |
| **Adaptive cooldown** | When a server returns 429/503, the domain is cooled down with exponential backoff (2s → 4s → 8s → capped at 30s). |
| **Session cycling** | curl\_cffi sessions are cycled every 25 requests to rotate TLS session state. |
| **Domain strategy cache** | Caches which strategy works per domain — avoids re-discovering on every request. |

---

## Error Handling

```python
from nullgaze import (
    download_image,
    DownloadFailedError,
    InvalidURLError,
    ScrapeError,
    BlockedError,
)

# Image errors
try:
    download_image("https://example.com/img.jpg", "output.jpg")
except InvalidURLError:
    print("Bad URL")
except DownloadFailedError as e:
    print(f"All strategies failed: {e}")

# Scraping errors
from nullgaze import scrape_page
try:
    html = scrape_page("https://example.com")
except BlockedError:
    print("Server explicitly blocked us (403/429/503)")
except ScrapeError as e:
    print(f"Scraping failed: {e}")
```

---

## Pre-Warming for Maximum Speed

Call `engine_prewarm()` at startup to pre-launch the DPI-bypass proxy
and open a trial connection.  The first real request will hit a hot path
in under 100 ms:

```python
from nullgaze import engine_prewarm, download_image

engine_prewarm()  # proxy ready, connections hot

# Now every request starts at full speed
download_image("https://example.com/img.jpg", "output.jpg")
```

---

## License

MIT License.  See [LICENSE](LICENSE).

---

## Disclaimer

This library is provided as a **research and educational tool** for
studying network filtering, DPI, TLS fingerprinting, and web scraping
techniques.  The author is **not responsible** for how you use it.
You are solely responsible for ensuring your use complies with all
applicable laws, regulations, and the terms of service of any website
you interact with.  By using this software you acknowledge these terms.
