Dataform Hypermarche

Same as dbt_hypermarche but with Dataform made for BigQuery

Dataform in GCP

No local install required 😊

Set-up (cf. GCP quickstart)

Dataform IAM roles
- Dataform Admin (roles/dataform.admin) - for repository setup (git ...)
- Dataform Editor (roles/dataform.editor) - workspaces and workflow invocations
Create a Dataform repository in Dataform in GCP
Create and initialize a Dataform development workspace (~ git branch)
Create some table & views (you can execute them as if you where executing sql queries in BigQuery IDE)
Grant BigQuery access to dataform service account
1st dev deployment/materialisation
Have buisness/partner to review the data
Connect to distant git repo
- Note: if using ssh ... in ssh params > public host key ...
  it's the one from Github, Gitlab ... note yours ...
Push branch (or to master if lazy)
Configure prod version (daily git clone) & workflow (daily dataform run)

Dataform's CLI

Install & set-up

cf. GCP doc use-dataform-cli

install node js & npm
install dataform npm i -g @dataform/cli
dataform init-creds to your BigQuery
- choose your region
- choose credential type : ADC (Application Default Credentials) or json key
if new project, in a new directory : dataform init . GCP_PROJECT DEFAULT_LOCATION
- will init bare dataform files/folders

Useful cli commands

dataform --help to check available options
dataform compile to check repo validity
dataform run --dry-run to check sql validity with BigQuery without materialisation
dataform run --schema-suffix cli_demo to suffix materialized schema (ex: cicd tests)
dataform run --actions stg_commande --actions stg_retour_commande materialize those 2 models
dataform run --actions stg_commande --include-deps equivalent to dbt run +stg_commande
dataform run --actions stg_commande --include-dependents equivalent to dbt run stg_commande+
dataform compile --json > dataform_compiled.json to export project lineage, queries ...

Ideas of CICD deployment

dbt VS Dataform

Same concepts

data analyst takes care of select statement
dbt & dataform materialises it (table, view...) in the Data Warehouse (DDL/DML)
1 select = 1 node / model / definition
lineage is infered automatically with ref('ancestor_table') function
documentation is to be done in the tool
data tests can be added to help assert modelisation hypothesis
- generic tests : unique, not_null, ...
- custom tests : select statement filtering rows in error
you can do macro/functions to help generate sql
& ease Data Analyst life #devExperience #selfServedAnalyst

Some high level differences

	dbt	dataform
Birth	2016	2018
Warehouse	Many DWH supported	BigQuery only
Community	massive	niche
Extensions	many	few
Github	dataform-co/dataform	dbt-labs/dbt-core

Some technical differences

	dbt	dataform
files	1 .sql 1 .yml (tests & doc)	1 .sqlx
hook	`{{config( post_hook=["sql1", …] )}}`	`post_operations{ sql1; sql2; }`
macro	jinja (python) `{{my_function()}}`	javascript `${my_function()}`
macro run_query	yes	no -mitigate with sql scripting
custom materialisation	yes	no - mitigate with sql scripting & operation node `config{hasOutput:true}`
cli	`dbt build -s my_table+ my_second_table+`	`dataform run --actions my_table --actions my_second_table --include-dependents`
defer	`dbt build -s my_table+ --defer --state path/to/artifacts`	no defer to production mitigate with schema clone as view ?
run modified	`dbt ls --select "state:modified" --state path/to/artifacts`	no option to run only modified files
fine selection	`dbt build -s my_table+1 my_second_table`	no fine DAG selection yet or filter only on compile graph
assertion	errors by default	warns by default mitigate with `post_operations{ assert( 0=(select count(1)) ) as "error xxx" }`
control assertion severity	fine control on assert severity (warn or error)	assert severity as error from children nodes one by one with config `dependOnDependencyAssertions:true`
python	yes	not quite yet

Syntax differences

How to choose & alternatives

Either ways, DO monitor your BigQuery spending (queries) Masthead can help you

SQL Mesh - Based on SQL Glot 😎
SDF - Semantic Data Framework

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
definitions		definitions
includes		includes
.gitignore		.gitignore
.sqlfluff		.sqlfluff
README.md		README.md
choosing_dbt_dataform_other.png		choosing_dbt_dataform_other.png
dbt_vs_dataform.png		dbt_vs_dataform.png
sql_framework_dbt.png		sql_framework_dbt.png
sql_framework_history.png		sql_framework_history.png
workflow.png		workflow.png
workflow_settings.yaml		workflow_settings.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataform Hypermarche

Dataform in GCP

Set-up (cf. GCP quickstart)

Dataform's CLI

Install & set-up

Useful cli commands

Ideas of CICD deployment

dbt VS Dataform

Same concepts

Some high level differences

Some technical differences

Syntax differences

How to choose & alternatives

About

Uh oh!

Languages

AntoineGiraud/dataform_hypermarche

Folders and files

Latest commit

History

Repository files navigation

Dataform Hypermarche

Dataform in GCP

Set-up (cf. GCP quickstart)

Dataform's CLI

Install & set-up

Useful cli commands

Ideas of CICD deployment

dbt VS Dataform

Same concepts

Some high level differences

Some technical differences

Syntax differences

How to choose & alternatives

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages