-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Open
Description
Approx distinct uses hll and in some tool generated codes with 100s of approx_distinct, it requires a lot of memory. So optionally when the agg is on a low-cardinality field - like dates or other strings - we combine them into a single set_agg.
So:
select approx_distinct(if(y > 10, x)) a1, approx_distinct(if(y > 20, z)) a2 from (values (1,20,4), (2,30,4)) as t(x,y,z);
Can be combined into a single expression:
select cardinality(array_distinct(remove_nulls(a[1]))) a1, cardinality(array_distinct(remove_nulls(a[2]))) a2 from (select array_transpose(a) a from (select set_agg(array[if(y > 10, x), if(y > 20, z)]) a from (values (1,20,4), (2,30,4)) as t(x,y,z)));
Metadata
Metadata
Assignees
Type
Projects
Status
🆕 Unprioritized
Status
🆕 Unprioritized