Astroturf
explorelearn moreadvanced

Analysis request

Overdraft Programs and Fees

req_ekjei4q66 / created 6/1/2026, 6:46:59 PM

Databricks Jobs modesubmitted

Rulemaking Metadata

Docket IDCFPB-2018-0035
Agency IDCFPB
Topicbanking-and-lending
Data Sourceregulations_gov
Expected Scale~5000 comments
Date WindowFull Historical Ingestion
Notes / Reviewer Context

"Smart batch ingestion. Banking/lending topic distinct from payday lending."

Status & Control Plane

Live pipeline progressAuto-refreshing every 10s - no manual sync needed.
Elapsed / Projected total
under 1 min / ~5.5 hr
1. IngestACTIVE
0 / 5,000
2. Parse
0 / 5,000
3. Embed
0 / 5,000
4. Cluster
0
5. Export
0
Pipeline is running normally. Databricks compute is warming up (Serverless cold start typically takes 2-4 min). The first row counts will appear shortly. You can leave this page open or come back later - progress is auto-saved.
Databricks Workspace Integration
Active Databricks Run ID: 508450502549458

Command-Generation Mode

If you want to run this pipeline locally on your system instead of hosted Databricks, run the following sequence in your terminal:

.uv-test-venv\Scripts\python.exe scripts\run_ingestion.py --docket-id CFPB-2018-0035
.uv-test-venv\Scripts\python.exe scripts\run_embedding.py --docket-id CFPB-2018-0035 --backend databricks
.uv-test-venv\Scripts\python.exe scripts\run_clustering.py --docket-id CFPB-2018-0035 --clustering-mode vector_search

Command-generation mode allows running comment ingestion and clustering locally via python scripts, writing directly to your local delta lakehouse.