i dealing dataset strings in column, , need count number of changes in data frame column. if data frame grouped column 'id', 1 group instance example below:
id vehicle 'abc' 'bmw' 'abc' 'bmw' 'abc' 'yamaha' 'abc' 'suzuki' 'abc' 'suzuki' 'abc' 'kawasaki'
so in case, able id 'abc' changed vehicle brand 3 times. there efficient way on multiple groups column 'id'?
i can think of 2 ways:
1) groupby
on 'id' , call apply
on 'vehicle' column , pass method nunique
, have subtract 1 looking changes rather overall unique count:
in [292]: df.groupby('id')['vehicle'].nunique() -1 out[292]: id 'abc' 3 name: vehicle, dtype: int64
2) apply
lambda tests whether current vehicle not equal previous vehicle using shift
, more semantically correct detects changes rather overall unique count, calling sum
on booleans convert true
, false
1
, 0
respectively:
in [293]: df.groupby('id')['vehicle'].apply(lambda x: x != x.shift()).sum() - 1 out[293]: 3
the -1
required on above first row compare row doesn't exist , comparisons nan
don't make sense in case see below:
in [301]: df.groupby('id')['vehicle'].apply(lambda x: x != x.shift()) out[301]: 0 true 1 false 2 true 3 true 4 false 5 true name: 'abc', dtype: bool
Comments
Post a Comment