Astroturf
explorelearn moreadvanced

Analysis request

Waters of the United States

req_p9qco3n8k / created 6/1/2026, 6:46:58 PM

Databricks Jobs modefailed

Rulemaking Metadata

Docket IDEPA-HQ-OW-2021-0602
Agency IDEPA
Topicenvironment
Data Sourceregulations_gov
Expected Scale~5000 comments
Date WindowFull Historical Ingestion
Notes / Reviewer Context

"Smart batch ingestion. Environment/water rule to diversify beyond air/climate."

Status & Control Plane

Execution Failure:

Task main_analysis_job failed with message: Workload failed, see run output for details.

Databricks Workspace Integration
Active Databricks Run ID: 227024140339541

Command-Generation Mode

If you want to run this pipeline locally on your system instead of hosted Databricks, run the following sequence in your terminal:

.uv-test-venv\Scripts\python.exe scripts\run_ingestion.py --docket-id EPA-HQ-OW-2021-0602
.uv-test-venv\Scripts\python.exe scripts\run_embedding.py --docket-id EPA-HQ-OW-2021-0602 --backend databricks
.uv-test-venv\Scripts\python.exe scripts\run_clustering.py --docket-id EPA-HQ-OW-2021-0602 --clustering-mode vector_search

Command-generation mode allows running comment ingestion and clustering locally via python scripts, writing directly to your local delta lakehouse.