Skip to main content

The Universal Discovery Engine

ARDA
The Universal Discovery Engine.

Your agents feed data — time-series, spatial fields, relational graphs, multi-modal observations. ARDA discovers governing equations, causal graphs, and conservation laws. Any domain. One engine. Agent-first.

4 discovery modes. 19 typed claim types. 7 negative controls. Built from first principles — not a wrapper around foundation models. Designed for your agents.

Universal Discovery

One engine. Every domain.
Agent-first.

Your agents send data — time-series, spatial fields, geometric structures, relational graphs, hierarchical observations, tabular experiments, multi-modal measurements. ARDA profiles the data, selects the right discovery mode, runs the pipeline, validates with negative controls, and returns typed scientific claims. Your agents get governed output. You own everything.

Every surface — REST API, Python SDK, MCP, CLI — is agent-first. Your agents call one API for discovery across physics, biology, chemistry, finance, energy, manufacturing, and every scientific domain. Human accessibility built in.

See how to integrate

Profiles

Automatically identifies equation classes, temporal structure, spatial topology, variable types, noise characteristics, and interaction patterns in your data.

Routes

Selects the right discovery mode — symbolic, neural, Neuro-Symbolic, or causal (powered by CDE) — based on data characteristics and your configuration.

Discovers

Runs the computational pipeline and produces typed scientific claims: governing equations, causal graphs, conservation laws, symmetries, regime transitions.

Validates

Applies negative controls — time shuffle, phase randomization, bootstrap stability, out-of-distribution testing — and promotes claims only when they pass.

Records

Writes a hashed evidence ledger entry for every run. Full data provenance, config snapshots, hardware fingerprints, and replay recipes.

Input Data

Any data with underlying dynamics

ARDA is not limited to time-series. Bring any structured observation where governing relationships exist to be discovered.

Time-series

Sensor readings, experimental traces, financial ticks, and any temporally ordered observations with regular or irregular sampling.

Spatial fields

2D/3D scalar and vector fields from simulations, imaging, or environmental monitoring on grids or unstructured meshes.

Geometric

Point clouds, meshes, manifolds, and shape data where the geometry itself encodes physical or biological structure.

Hierarchical

Multi-scale and nested data: molecular–cellular–tissue, component–subsystem–system, or any level-separated observation structure.

Relational

Graphs, networks, and interaction matrices: protein interactions, supply chains, circuit topologies, social dynamics, or causal diagrams.

Tabular

Feature-observation matrices from experiments, surveys, or databases. ARDA discovers governing relationships across columns.

Multi-modal

Combined modalities: time-series with images, spectra with metadata, text annotations with measurements. Fused through explicit interfaces.

The pipeline

From ingestion to ledger

Discovery runs follow a fixed sequence of stages so provenance stays intact. Skipping a stage is an explicit configuration choice, not a hidden shortcut.

01

Data ingestion

Observational streams, experiments, and simulation exports enter ARDA with stable fingerprints so downstream stages reference the same inputs. Schemas are normalized where needed, and lineage records sources, time ranges, and preprocessing assumptions before any discovery run begins.

02

Profiling

The engine summarizes sampling cadence, missingness, noise structure, dimensionality, and signs of multiple regimes or non-stationarity. That profile constrains which discovery modes are appropriate and supplies metadata that validation stages reuse later.

03

Mode selection

Given the profile and your configuration, ARDA selects symbolic, neural, Neuro-Symbolic, or causal-dynamics paths, or a staged combination. The choice is recorded in run metadata so reviewers can see why a strategy was used and revisit it when data or policies change.

04

Discovery

The active mode searches for structure: equations, learned dynamics, hybrid representations, or causal mechanisms, within the limits you set. Intermediate artifacts stay linked to configuration snapshots so the same recipe can be replayed or compared across environments.

05

Validation

Results are checked against held-out data, negative controls, and domain-specific sanity tests before they become candidates for promotion. Failures are stored with context—fit, identifiability, stability, or policy—so a run is explainable, not only marked unsuccessful.

06

Claims

Structure that passes validation is emitted as typed scientific claims: scoped statements with fields for assumptions, evidence links, and governance state. Claims are the interchange format between ARDA, people, and your own agents; they are simpler to diff, audit, and compose than unstructured prose.

07

Evidence ledger

Each run appends a versioned ledger entry: input hashes, configuration, outputs, and claim lineage. The ledger joins data, compute, and scientific statements: trace forward from raw inputs or backward from any promoted claim.

Discovery Modes

Four ways to discover governing laws

Each mode solves a different class of discovery problem. ARDA selects the right one based on your data profile, or you choose explicitly.

Mode 1

Symbolic discovery

Symbolic discovery searches for compact mathematical forms that govern your data, subject to constraints you define. It covers ordinary differential equations (ODEs), partial differential equations (PDEs), stochastic differential equations (SDEs), and graphical relational (GR) structure, where variables interact through an explicit dependency pattern.

Outputs are closed-form equations: relationships a reviewer can read, differentiate, and test on new data without treating the model as an opaque function approximator.

Mode 2

Neural discovery

Neural discovery finds governing patterns in high-dimensional, noisy data where compact closed-form laws are unlikely. Discovered representations remain physically consistent, so results are scientifically meaningful, not just statistically fit.

This mode fits when state is only partly observed, when coupling spans many channels, or when the system is too complex for a single equation. Uncertainty is quantified before results are summarized into claims.

Mode 3

Neuro-Symbolic discovery

Neuro-Symbolic discovery combines learning from complex, noisy, or heterogeneous data with extraction of interpretable equations. It handles sensor fusion, missing data, and nonlinear relationships, then distills the results into governing laws you can read and verify.

Teams can compare residuals to discovered equations, require agreement before promotion, or iterate — tightening the interpretable laws and letting the engine capture what remains unexplained.

Mode 4

Causal discovery (CDE)†

The causal mode targets systems whose behavior is organized by causal mechanisms and interventions. Powered by ARDA's Causal Dynamics Engine (CDE), it learns how entities influence one another along trajectories and focuses on what would change if the generative mechanism were perturbed.

CDE actively proposes targeted experiments designed to resolve ambiguous causal edges — so measurement budget targets reductions in structural uncertainty. Outputs include directed causal graphs with probabilities and identifiability analysis that records what the current experimental design can and cannot distinguish.

Deep dive into CDE

Architecture

Composable and domain-agnostic

ARDA's architecture is composable: functional roles in the pipeline can be swapped, extended, or combined to match your domain and data type. Each role has versioned implementations and ledger references so runs stay reproducible.

This design means ARDA adapts to new domains without rewriting the pipeline. Whether you bring temporal data, spatial fields, relational graphs, or multi-modal observations, the engine assembles the right configuration automatically.

Simulation universes

Built-in worlds for validation

ARDA ships with built-in simulation universes for validating discovery modes and benchmarking configurations. Each universe has known governing equations or dynamics, so you can check whether symbolic, neural, and causal paths recover structure within tolerance before relying on proprietary data.

Spring

Pendulum

Lorenz

Lotka-Volterra

Van der Pol

Duffing

Brusselator

Glycolysis

FitzHugh-Nagumo

Kuramoto

Hodgkin-Huxley

CSTR

Wave

Heat

Burgers

Navier-Stokes

Tokamak Plasma

Battery Cell

Ground truth in these universes supports regression testing, mode comparison, and operator training on failure modes without touching real systems until pipeline behavior is understood.

Scientific Output

Typed scientific claims, not free text

Every ARDA discovery produces typed, machine-readable scientific claims. Each claim carries metadata, confidence scoring, provenance, and governance status. Not paragraphs. Not unstructured output. Typed knowledge that can be audited, compared, and reproduced.

LawClaimCausalClaimConservationClaimStructureClaimRegimeClaimDecompositionClaimTheoryFamilyClaimSymmetryClaimOperatorClaimFieldClaimScopeClaimUncertaintyClaimInvariantSetClaimIndeterminacyClaimTheoryRevisionClaimExperimentRecommendationCDEIdentifiabilityClaimCDEPathLawClaimCDEOODResponseClaim

What ARDA discovers

  • Governing equations — closed-form symbolic expressions with fit quality metrics and complexity scores
  • Causal graphs — directed edges with probabilities, uncertainty estimates, and falsification tests
  • Conservation laws — conserved quantities with drift analysis over time
  • Symmetries and invariants — preserved transformations and invariant sets in the dynamics
  • Regime transitions — change points, regime properties, and state classification
  • Theory families — competing model family scores with rationale for each
  • Experiment recommendations — probes designed to maximize information gain about uncertain edges

Evidence Ledger

Every run writes a hashed, versioned record of everything that happened. Not a log file — a structured evidence entry that supports audit, reproduction, and peer review.

Data Provenance

  • Dataset hash
  • Config hash
  • Split ratios
  • YAML snapshot

Run Metadata

  • Git commit
  • Hardware fingerprint
  • Library versions
  • Timestamps

Results

  • Primary metrics
  • Per-regime metrics
  • Claims list
  • Causal beliefs

Governance

  • Controls results
  • Determinism tier
  • Promotion status
  • Replay recipe

Governance

If a discovery can't be reproduced, it isn't a discovery

Governance in ARDA is structural, not optional. Every claim is typed. Every run produces a hashed evidence ledger entry. Every discovery can be reproduced with a single Truth Dial setting. The governance stack enforces reproducibility from the first run.

The Truth Dial is a single control that governs the rigor-speed tradeoff across the entire pipeline. Set it based on where you are in the research process.

Negative controls are not an afterthought. ARDA applies time-shuffled baselines, phase-randomized controls, label-permutation tests, noise robustness checks, bootstrap stability analysis, feature-shuffle tests, and out-of-distribution evaluations. Claims that survive all applicable controls get promoted. Claims that fail are flagged and recorded in the evidence ledger with the specific control that caused the failure.

Explore

Fast iteration. No negative controls enforced. Claims are tagged as hypotheses. Use this for initial data exploration and rapid ideation.

Validate

Negative controls are applied: time shuffle, phase randomization, label permutation, noise robustness. Determinism tier 1+. Claims that pass are promoted to provisional status.

Publish

Full control suite including bootstrap stability, feature shuffle, and out-of-distribution testing. Determinism tier 3 with seeded randomness. Generates a complete replay recipe with frozen config and pinned library versions.

Why ARDA

Why your agents need a discovery engine.

Literature agents read papers. Writing systems generate manuscripts. Prediction pipelines fit curves. ARDA discovers governing laws.

Literature-reading platforms search existing papers and summarize what is already known.

ARDA discovers new science. It does not read papers. It takes raw data and finds the governing laws that have never been written down.

Paper-writing systems generate research manuscripts in LaTeX with automated peer review.

ARDA produces typed scientific claims — structured, machine-readable, governed. Not documents. Knowledge objects that can be audited, compared, and built upon.

Prediction pipelines fit black-box models that tell you what might happen next.

ARDA discovers governing equations — the actual mathematical laws. Closed-form expressions a physicist can read. Not a neural network output. Interpretable science.

Domain-specific tools serve one field: drug discovery, materials, or molecular design.

ARDA works wherever there is data with underlying dynamics. Physics, biology, chemistry, finance, manufacturing, climate, energy. The engine is domain-agnostic.

Industries

One engine. Every domain.

Wherever there is observational data with underlying physical, biological, chemical, economic, or engineered dynamics, ARDA can discover the laws that govern it.

Life Sciences & Healthcare

Energy & Resources

Integration

Your agents. Our engine. One API.

ARDA exposes the full discovery engine through multiple access surfaces. Your agents, scripts, and workflows connect through whichever interface fits your stack.

REST API

Full OpenAPI spec. Every resource — runs, campaigns, claims, datasets — is a first-class endpoint.

Python SDK

Synchronous and async clients with typed models. Import, configure, discover in a few lines.

MCP

Standard Model Context Protocol server. AI agents discover and connect to ARDA automatically.

CLI

Full API surface from the command line. Scriptable, pipeable, suitable for CI/CD workflows.

Auto-Discovery

Agents find ARDA via standard /.well-known/ endpoints. Zero configuration. The engine publishes typed manifests that describe every available capability.

Persistent Sessions

Stateful sessions that survive restarts and reconnections. Lifecycle management, task queues with heartbeats, lineage tracking, and state persistence.

Autonomy Policies

Set boundaries for what automated workflows can do. Experiment approval gates, budget ceilings, and safety constraints enforced at the engine level.

Multi-Run Campaigns

Orchestrate multi-phase research campaigns that plan sequences of discovery runs, transfer knowledge across runs, adapt based on prior findings, and build toward complex research objectives that no single run can address.

Session Lifecycle

Every agent session moves through defined lifecycle stages: planning, ready, running, completed. Task queues with heartbeat monitoring ensure no work is lost. Full lineage tracking connects every action to its session context.

Custom Contracts

License the engine. Own the discoveries.

You license the engine — the pipeline, the algorithms, the governance stack. Your agents use it. Models built on your data are yours. Discoveries are yours. Cloud-hosted, self-hosted, or air-gapped.

How We Differ

Three paradigms. Only one discovers.

ARDA is built from first principles. Not a prompt layer on GPT. Not a wrapper around an open-source model. A purpose-built engine that discovers governing laws from raw data.

Foundation Model Labs

  • Black-box predictions
  • Predict but don't explain
  • No governance or provenance
  • Benchmark-driven research
  • Scale-first approach

AI Copilots & Assistants

  • Accelerate existing tasks
  • Human-in-the-loop required
  • No autonomous discovery
  • Summarize, don't create knowledge
  • Workflow tool, not an engine

ARDA

  • Governing equations, not predictions
  • Interpretable, typed scientific output
  • Full governance and evidence ledger
  • Autonomous discovery from raw data
  • First-principles, domain-agnostic engine

Engine, not model

A model predicts. An engine discovers.

Symbolic regression tools find equations but have no pipeline. ML models predict but don't explain. Causal discovery packages find correlations, not mechanisms. ARDA is a complete discovery engine: it ingests your data, selects the right discovery mode, builds models internally, validates with falsification testing, and produces governed scientific claims. You bring data. The engine does the rest.

Symbolic Regression & ML ToolsARDA
ApproachSingle method — sparse regression, GP, or neural fitting4 modes: symbolic, neural, neuro-symbolic, causal (CDE)
OutputCandidate equations or predictions19 typed scientific claims — equations, causal graphs, conservation laws
CausalityCorrelations onlyDirected causal graphs via interventional reasoning (CDE)
ValidationManual benchmarking7 negative controls — time shuffle, phase randomization, bootstrap, surrogates, OOD
PipelineNotebooks and scriptsAutomated: ingest → profile → route → discover → validate → govern
Model constructionYou choose and configure the modelEngine builds models from your data automatically
ReproducibilitySeed-dependent, manual trackingDeterministic replay with hashed evidence ledger
GovernanceNoneTruth Dial tiers, autonomy policies, provenance tracking

Why ARDA

What open-source cannot do.

Four Discovery Modes

Symbolic, Neural, Neuro-Symbolic, and Causal Discovery (CDE). The engine selects the right mode for your data. Open-source models have one architecture for everything.

Automated Pipeline

Data ingestion → profiling → mode routing → discovery → validation → governance. Not a notebook. Not a prompt. A governed pipeline.

Causal Discovery Engine (CDE)

Directed causal graphs, not correlations. Active experiment design. Interventional reasoning. No open-source model does this.

Negative Controls

Time shuffle, phase randomization, bootstrap stability, surrogate analysis, and out-of-distribution testing. Built into every run.

Evidence Ledger

Every claim is hashed, versioned, and linked to its data, config, and replay recipe. Deterministic reproducibility.

Truth Dial Governance

Three tiers: Explore (fast, no controls), Validate (4 controls), Publish (full 7-control suite). You choose the rigor-speed tradeoff.

Platform

Your agents. Your data. Your discoveries.

Your data drives discovery

Bring time-series, spatial fields, relational graphs, or any structured observation. The engine profiles it, routes to the right discovery mode, and builds models on it. Small datasets welcome. Your data never leaves your infrastructure.

The engine builds the models

ARDA trains neural architectures, symbolic searchers, neuro-symbolic encoders, and causal mechanisms on your data. You don’t bring a model. The engine builds one from your data, validates it, and produces governed scientific claims.

Deploy on your terms

Cloud-hosted, self-hosted, or air-gapped. Bring your own AI agent and LLM provider. Your data, your compute, your discoveries.

Causal Dynamics Engine (CDE) is patent pending in the United States and other countries. Vareon, Inc.

Your agents. Our engine.
Universal discovery.

One discovery engine for every scientific and industrial domain. Agent-first. Built from first principles. Governed, reproducible, and ready for production.