Astroturf
explorelearn moreadvanced

Analysis request

Protecting and Promoting the Open Internet

req_vhw3zf3zi / created 6/1/2026, 6:47:00 PM

Databricks Jobs modesubmitted

Rulemaking Metadata

Docket ID14-28
Agency IDFCC
Topictech-regulation
Data Sourceecfs
Expected Scale~5000 comments
Date WindowFull Historical Ingestion
Notes / Reviewer Context

"Smart batch ingestion. One older FCC control case, capped so net neutrality does not dominate."

Status & Control Plane

Live pipeline progressAuto-refreshing every 10s - no manual sync needed.
Elapsed / Projected total
under 1 min / ~38 min
1. IngestACTIVE
0 / 5,000
2. Parse
0 / 5,000
3. Embed
0 / 5,000
4. Cluster
0
5. Export
0
Pipeline is running normally. Databricks compute is warming up (Serverless cold start typically takes 2-4 min). The first row counts will appear shortly. You can leave this page open or come back later - progress is auto-saved.
Databricks Workspace Integration
Active Databricks Run ID: 19403666231444

Command-Generation Mode

If you want to run this pipeline locally on your system instead of hosted Databricks, run the following sequence in your terminal:

.uv-test-venv\Scripts\python.exe scripts\run_ingestion.py --docket-id 14-28
.uv-test-venv\Scripts\python.exe scripts\run_embedding.py --docket-id 14-28 --backend databricks
.uv-test-venv\Scripts\python.exe scripts\run_clustering.py --docket-id 14-28 --clustering-mode vector_search

Command-generation mode allows running comment ingestion and clustering locally via python scripts, writing directly to your local delta lakehouse.