i dealing dataset strings in column, , need count number of changes in data frame column. if data frame grouped column 'id', 1 group instance example below:
id vehicle 'abc' 'bmw' 'abc' 'bmw' 'abc' 'yamaha' 'abc' 'suzuki' 'abc' 'suzuki' 'abc' 'kawasaki' so in case, able id 'abc' changed vehicle brand 3 times. there efficient way on multiple groups column 'id'?
i can think of 2 ways:
1) groupby on 'id' , call apply on 'vehicle' column , pass method nunique, have subtract 1 looking changes rather overall unique count:
in [292]: df.groupby('id')['vehicle'].nunique() -1 out[292]: id 'abc' 3 name: vehicle, dtype: int64 2) apply lambda tests whether current vehicle not equal previous vehicle using shift, more semantically correct detects changes rather overall unique count, calling sum on booleans convert true , false 1 , 0 respectively:
in [293]: df.groupby('id')['vehicle'].apply(lambda x: x != x.shift()).sum() - 1 out[293]: 3 the -1 required on above first row compare row doesn't exist , comparisons nan don't make sense in case see below:
in [301]: df.groupby('id')['vehicle'].apply(lambda x: x != x.shift()) out[301]: 0 true 1 false 2 true 3 true 4 false 5 true name: 'abc', dtype: bool
Comments
Post a Comment