Same as dbt_hypermarche but with Dataform made for BigQuery
No local install required ๐
Set-up (cf. GCP quickstart)
- Dataform IAM roles
- Dataform Admin (roles/dataform.admin) - for repository setup (git ...)
- Dataform Editor (roles/dataform.editor) - workspaces and workflow invocations
- Create a Dataform repository in Dataform in GCP
- Create and initialize a Dataform development workspace (~ git branch)
- Create some table & views (you can execute them as if you where executing sql queries in BigQuery IDE)
- Grant BigQuery access to dataform service account
- 1st dev deployment/materialisation
- Have buisness/partner to review the data
- Connect to distant git repo
- Note: if using ssh ... in ssh params > public host key ...
it's the one from Github, Gitlab ... note yours ...
- Note: if using ssh ... in ssh params > public host key ...
- Push branch (or to master if lazy)
- Configure prod version (daily git clone) & workflow (daily dataform run)
cf. GCP doc use-dataform-cli
- install node js & npm
- install dataform
npm i -g @dataform/cli dataform init-credsto your BigQuery- choose your region
- choose credential type : ADC (Application Default Credentials) or json key
- if new project, in a new directory :
dataform init . GCP_PROJECT DEFAULT_LOCATION- will init bare dataform files/folders
dataform --helpto check available optionsdataform compileto check repo validitydataform run --dry-runto check sql validity with BigQuery without materialisationdataform run --schema-suffix cli_demoto suffix materialized schema (ex: cicd tests)dataform run --actions stg_commande --actions stg_retour_commandematerialize those 2 modelsdataform run --actions stg_commande --include-depsequivalent todbt run +stg_commandedataform run --actions stg_commande --include-dependentsequivalent todbt run stg_commande+dataform compile --json > dataform_compiled.jsonto export project lineage, queries ...
- data analyst takes care of
selectstatement - dbt & dataform materialises it (table, view...) in the Data Warehouse (DDL/DML)
- 1 select = 1 node / model / definition
- lineage is infered automatically with
ref('ancestor_table')function - documentation is to be done in the tool
- data tests can be added to help assert modelisation hypothesis
- generic tests :
unique,not_null, ... - custom tests :
selectstatement filtering rows in error
- generic tests :
- you can do macro/functions to help generate sql
& ease Data Analyst life #devExperience #selfServedAnalyst
| dbt | dataform | |
|---|---|---|
| Birth | 2016 | 2018 |
| Warehouse | Many DWH supported | BigQuery only |
| Community | massive | niche |
| Extensions | many | few |
| Github | dataform-co/dataform | dbt-labs/dbt-core |
| dbt | dataform | |
|---|---|---|
| files | 1 .sql 1 .yml (tests & doc) |
1 .sqlx |
| hook | {{config( post_hook=["sql1", โฆ] )}} |
post_operations{ sql1; sql2; } |
| macro | jinja (python){{my_function()}} |
javascript${my_function()} |
| macro run_query | yes | no -mitigate with sql scripting |
| custom materialisation | yes | no - mitigate with sql scripting & operation node config{hasOutput:true} |
| cli | dbt build -s my_table+ my_second_table+ |
dataform run --actions my_table --actions my_second_table --include-dependents |
| defer | dbt build -s my_table+ --defer --state path/to/artifacts |
no defer to production mitigate with schema clone as view ? |
| run modified | dbt ls --select "state:modified" --state path/to/artifacts |
no option to run only modified files |
| fine selection | dbt build -s my_table+1 my_second_table |
no fine DAG selection yet or filter only on compile graph |
| assertion | errors by default | warns by default mitigate with post_operations{ assert( 0=(select count(1)) ) as "error xxx" } |
| control assertion severity | fine control on assert severity (warn or error) | assert severity as error from children nodes one by one with config dependOnDependencyAssertions:true |
| python | yes | not quite yet |
Either ways, DO monitor your BigQuery spending (queries)
Masthead can help you




