python - counting changes in row elements -


i dealing dataset strings in column, , need count number of changes in data frame column. if data frame grouped column 'id', 1 group instance example below:

    id    vehicle    'abc'  'bmw'    'abc'  'bmw'    'abc'  'yamaha'    'abc'  'suzuki'    'abc'  'suzuki'    'abc'  'kawasaki' 

so in case, able id 'abc' changed vehicle brand 3 times. there efficient way on multiple groups column 'id'?

i can think of 2 ways:

1) groupby on 'id' , call apply on 'vehicle' column , pass method nunique, have subtract 1 looking changes rather overall unique count:

in [292]: df.groupby('id')['vehicle'].nunique() -1  out[292]: id 'abc'    3 name: vehicle, dtype: int64 

2) apply lambda tests whether current vehicle not equal previous vehicle using shift, more semantically correct detects changes rather overall unique count, calling sum on booleans convert true , false 1 , 0 respectively:

in [293]: df.groupby('id')['vehicle'].apply(lambda x: x != x.shift()).sum() - 1  out[293]: 3 

the -1 required on above first row compare row doesn't exist , comparisons nan don't make sense in case see below:

in [301]: df.groupby('id')['vehicle'].apply(lambda x: x != x.shift())  out[301]: 0     true 1    false 2     true 3     true 4    false 5     true name: 'abc', dtype: bool 

Comments