Astroturf
explorelearn moreadvanced
ADVANCED CONFIGURATION

Analyze a docket

Configure rulemakings manually to generate Unity Catalog Delta tables, date windows, and custom ingestion parameters. For broad topic monitoring, use the Watchlist or the Discovered Rulemakings panel.

System Active Execution TierDatabricks Jobs mode

Docket registration

Quick Autofill from Known Dockets
Estimated runtime~16 hr

Stage 2 sequentially fetches one detail page per comment from regulations.gov, capped at ~1000 req/hour by api.data.gov. Bottleneck stage: parsing.

setu
~4 min
inge
~1 hr
pars
~15 hr
embe
~13 min
clus
~4 min
expo
~2 min
  • / regulations.gov parsing is rate-limited at api.data.gov (1000 req/hr). 15,000 comments -> at least 15h just for stage 2 detail fetches. Consider starting smaller or splitting the run.
  • / Runs over 2 hours are higher risk: a cluster restart or transient failure can lose in-flight parser work because the current ParserAgent doesn't checkpoint mid-loop.

Orchestration Settings (Databricks Cloud)

Databricks Jobs Mode ActiveProduction-safe. Submits docket analysis pipelines directly to your serverless hosted Databricks instance.

Tip: Submit Analysis Job sends a pipeline trigger command directly to your hosted Databricks instance and navigates to the tracking dashboard to monitor progress.

configs/dockets.yaml snippet

- docket_id: "SEC-2023-0001"
  source: "regulations_gov"
  topic_id: "finance"
  agency_id: "SEC"
  title: "SEC Digital Asset Custody Requirements for Registered Investment Advisers"
  date_window:
    start_date: null
    end_date: null
  ingestion_mode: "full"
  expected_scale: 15000
  processing_status: "configured_awaiting_run"
  notes: "Registered ingestion template only; no processed campaign results."

Pipeline commands

.uv-test-venv\Scripts\python.exe scripts\run_ingestion.py --docket-id SEC-2023-0001
.uv-test-venv\Scripts\python.exe scripts\run_embedding.py --docket-id SEC-2023-0001 --backend databricks
.uv-test-venv\Scripts\python.exe scripts\run_clustering.py --docket-id SEC-2023-0001 --clustering-mode vector_search

For production-scale runs, use the Databricks workflow task order from the end-to-end runbook: load sample tables, embed, cluster, export dashboard data.

Coverage policy

Analyzed

Appears in primary browsing with semantic clusters and validation receipts.

Baseline only

Appears with exact-hash metrics and an explicit semantic clustering next step.

Ingestion ready

Appears as a workflow or template, never as a zero-result dashboard.

Template topics

Telecom & Net NeutralityClimate / Oil & Gas / MethaneFinance & Consumer ProtectionAI & Technology RegulationPrivacy & Consumer Protection

Supported agencies

FCCEPACFPBFTCSEC

Tip: clicking a chip pre-fills the form by finding a known docket for that agency or topic (e.g./legacy/analyze?agency=SECautofills the SEC digital-asset-custody docket).