Comparing models and thinking effort - Without modeling

Each model was given 11 insurance-domain questions (claims, policies, premiums) and asked to answer them via two methods: Semantic Layer (MetricFlow queries) and SQL (direct SQL generation). Each model/effort combination was run 5 times to account for variance. 3 of the 11 questions are "too many hops" — they require joins the Semantic Layer cannot express, testing whether models correctly refuse or fail gracefully.

Note: SQL runs on this page use the schema without modeling (no additional dbt models, raw DDL only).

Comparing models and thinking effort - Without modeling

Summary

Accuracy

Latency

Cost

Tradeoffs