Conversation
Updated the dimensional model proposal with new staging, intermediate, and marts layers. Added details on model rules, metrics, and manual effort estimates for implementation.
| ) | ||
|
|
||
| select | ||
| city_id::text, |
There was a problem hiding this comment.
why text fields are here in intermediate? staging keeps them as integers. This creates fragile joins downstream -- look at dim_crm_partner.sql. Keep IDs as their natural type (integer) throughout, or at least be consistent.
| ) | ||
|
|
||
| select | ||
| meeting_id::text, |
There was a problem hiding this comment.
same thing here fix this. should be in staging as integer or numeric
| ) | ||
|
|
||
| select | ||
| mou_id::text, |
There was a problem hiding this comment.
Same — mou_id::text, partner_id::text.
There was a problem hiding this comment.
why in staging name is different from here?
stg_bubble__children but here its int_bubble__child.sql? let's keep it consistent?
There was a problem hiding this comment.
Naming convention changed
There was a problem hiding this comment.
Missing surrogate key. Add {{ dbt_utils.generate_surrogate_key(['child_id']) }} as child_sk as the first column. Surrogate keys are best practice for dimensional modeling — they decouple your warehouse from source system IDs. (Applies to ALL 10 dimension files.)
There was a problem hiding this comment.
Added surrogate keys in int layer
There was a problem hiding this comment.
If a POC always belongs to exactly one partner (1:1 relationship, never changes) — then keeping partner_id in dim_poc is actually fine. Just drop bridge_poc_partner since it's redundant.
If a POC can belong to multiple partners (or the assignment changes over time) — then remove partner_id from dim_poc and rely on bridge_poc_partner for the relationship. The dimension becomes a pure description of the person
There was a problem hiding this comment.
this is a 1:1 copy of int_bubble__school_volunteer with no added value. A fact table should: (1) have explicit FK references to dimension surrogate keys, (2) include measures (e.g. days_active, assignment_count), (3) filter out soft-deleted rows if appropriate.
what if we do something like this?
-- fct_school_volunteer: One row per volunteer-school assignment
-- Grain: one record per volunteer assigned to one school in one academic year
select
school_volunteer_id,
-- dimension FKs
school_id,
volunteer_id,
academic_year,
-- assignment dates
created_date as assigned_date,
modified_date,
-- measures
case
when is_removed = false
then current_date - created_date::date
else modified_date::date - created_date::date
end as days_assigned,
not is_removed as is_active
from {{ ref('int_bubble__school_volunteer') }}
where is_removed = false
example
days_assigned gives you something to actually aggregate on. Right now there's nothing to SUM() or AVG() in this table. You can answer "what's the average volunteer tenure per school?" directly from the fact
There was a problem hiding this comment.
Same issue — 1:1 copy of int_crm__meetings. Consider adding: meeting duration, days until follow-up, a has_follow_up boolean derived metric, and FK references to dim_crm_partner, dim_poc
There was a problem hiding this comment.
Same — direct pass-through. Add: days_to_conversion, is_converted boolean, FK to dim_crm_partner.
There was a problem hiding this comment.
Missing a volunteer_id FK reference to a dim_volunteer dimension (which doesn't exist yet). The fact references volunteer_id but there's no dimension to join it to.
… for bubble entities wherever required with proper naming , added facts and clean up redundant bridge models.
… and introduce new fact tables for volunteer assignments and child class sections.
Added new models in models_v2 folder