Astroturf
explorelearn moreadvanced

Analysis request

EPA — Waters of the United States (WOTUS) Definition

req_0g2ca0v4f / created 5/31/2026, 8:52:55 PM

Databricks Jobs modefailed

Rulemaking Metadata

Docket IDEPA-HQ-OW-2021-0602
Agency IDEPA
Topicenvironment_water
Data Sourceregulations_gov
Expected Scale~2302 comments
Date WindowFull Historical Ingestion
Notes / Reviewer Context

"Requested via discovered rulemakings panel. (estimated runtime at submission: ~2.6 hr; bottleneck: parsing)"

Status & Control Plane

Execution Failure:

Task main_analysis_job failed with message: Workload failed, see run output for details.

Databricks Workspace Integration
Active Databricks Run ID: 509518781365215Open Run in Databricks Workspace (opens externally)

Command-Generation Mode

If you want to run this pipeline locally on your system instead of hosted Databricks, run the following sequence in your terminal:

.uv-test-venv\Scripts\python.exe scripts\run_ingestion.py --docket-id EPA-HQ-OW-2021-0602
.uv-test-venv\Scripts\python.exe scripts\run_embedding.py --docket-id EPA-HQ-OW-2021-0602 --backend databricks
.uv-test-venv\Scripts\python.exe scripts\run_clustering.py --docket-id EPA-HQ-OW-2021-0602 --clustering-mode vector_search

Command-generation mode allows running comment ingestion and clustering locally via python scripts, writing directly to your local delta lakehouse.