Analysis request
Waters of the United States
req_p9qco3n8k / created 6/1/2026, 6:46:58 PM
Databricks Jobs modefailed
Rulemaking Metadata
Docket IDEPA-HQ-OW-2021-0602
Agency IDEPA
Topicenvironment
Data Sourceregulations_gov
Expected Scale~5000 comments
Date WindowFull Historical Ingestion
Notes / Reviewer Context
"Smart batch ingestion. Environment/water rule to diversify beyond air/climate."
Status & Control Plane
Execution Failure:
Task main_analysis_job failed with message: Workload failed, see run output for details.
Databricks Workspace Integration
Active Databricks Run ID: 227024140339541
Command-Generation Mode
If you want to run this pipeline locally on your system instead of hosted Databricks, run the following sequence in your terminal:
.uv-test-venv\Scripts\python.exe scripts\run_ingestion.py --docket-id EPA-HQ-OW-2021-0602 .uv-test-venv\Scripts\python.exe scripts\run_embedding.py --docket-id EPA-HQ-OW-2021-0602 --backend databricks .uv-test-venv\Scripts\python.exe scripts\run_clustering.py --docket-id EPA-HQ-OW-2021-0602 --clustering-mode vector_search
Command-generation mode allows running comment ingestion and clustering locally via python scripts, writing directly to your local delta lakehouse.