import csv to R - numeric values still read as factors -


im trying column read factor read or transformed number. advices "stringasfactors=false" or "as.numeric" not work expected. (see attached code)

the data.csv simple example shows no strange special characters in notepad++ , emeditor. there 1 systematic error in "temp" (row 5) , 1 in "rh" (row 4).

> ftimeseries<- read.csv2('data.csv', header = true, sep=";", dec=",", stringsasfactors=false)  > head(ftimeseries)   station          datumzeit   temp    rh      tp      ld 1     526   02.11.2010 08:36   15,9    58.4    7.7     991.1 2     526   02.11.2010 08:38   15,6    58.8    7.6     991.3 3     526   02.11.2010 08:40   14,9    60.8    7.4     991.1 4     526   02.11.2010 08:42   14,3      na    7.4     991.4 5     526   02.11.2010 08:44    aaa    64.2    7.5     991.3 6     526   02.11.2010 08:46   14,2    64.9    7.7     991.2  > ftimeseries[,3]  [1] "15,9" "15,6" "14,9" "14,3" "aaa"  "14,2" "14,2" "13,9" "13,9" "13,6" "13,6" "13,6" "13,4" "13,4" "13,7" "13,8" "13,9" "14,1" "14,3" "14,4" "14,5" "14,2" "14,2" "14,1" "14,1" "14,2" [27] "14,1" "14,1" "14"   "14"   "14,1" "14"   "13,9" "13,9" "14"   "14"   "13,9" "14"   "14,1" "14,2" "14,2" "14,2" "14,2" "14,2" "14,2" "14,2" "14,2" 

in column 3 "temp" expect numbers , in row 5 instead of "aaa" na.

so tried convert it:

ftimeseries[,3] <- as.numeric(ftimeseries[,3], dec=',') warnmeldung: nas durch umwandlung erzeugt [1] na na na na na na na na na na na na na na na na na na na na na na na na na na na na 14 14 na 14 na na 14 14 na 14 na na na na na na na na na 

but without success. as.numeric seems convert numbers without decimal separator if tell decimal separator is. (i tried without option dec="," without success.)

finally tried levels suggested in answers here:

> levels(ftimeseries$temp) null  > levels(ftimeseries[,3]) null  > levels(ftimeseries) null  > levels(ftimeseries$rh) null  > head(ftimeseries)   station          datumzeit   temp    rh      tp      ld 1     526   02.11.2010 08:36     na    58.4    7.7     991.1 2     526   02.11.2010 08:38     na    58.8    7.6     991.3 3     526   02.11.2010 08:40     na    60.8    7.4     991.1 4     526   02.11.2010 08:42     na      na    7.4     991.4 5     526   02.11.2010 08:44     na    64.2    7.5     991.3 6     526   02.11.2010 08:46     na    64.9    7.7     991.2 

i'm using r in windows7 64bit environment

this 1 works simulated data. df$x factor

df <- data.frame(x=c("12,1","aa","15,6",61))  as.numeric(gsub(",", ".", as.character(df$x)))  # [1] 12.1   na 15.6 61.0 

update example:

your data:

> ftimeseries<- read.csv2('data.csv', header = true, sep=";", dec=",", stringsasfactors=false) > head(ftimeseries)   station        datumzeit temp   rh  tp    ld 1     526 02.11.2010 08:36 15,9 58.4 7.7 991.1 2     526 02.11.2010 08:38 15,6 58.8 7.6 991.3 3     526 02.11.2010 08:40 14,9 60.8 7.4 991.1 4     526 02.11.2010 08:42 14,3 <na> 7.4 991.4 5     526 02.11.2010 08:44  aaa 64.2 7.5 991.3 6     526 02.11.2010 08:46 14,2 64.9 7.7 991.2 

column not recognize:

> class(ftimeseries$temp) [1] "character" 

apply solution:

> ftimeseries$temp <- as.numeric(gsub(",", ".", as.character(ftimeseries$temp))) > class(ftimeseries$temp) [1] "numeric" 

and data.frame becomes:

> ftimeseries   station        datumzeit temp   rh  tp    ld 1     526 02.11.2010 08:36 15.9 58.4 7.7 991.1 2     526 02.11.2010 08:38 15.6 58.8 7.6 991.3 3     526 02.11.2010 08:40 14.9 60.8 7.4 991.1 4     526 02.11.2010 08:42 14.3 <na> 7.4 991.4 5     526 02.11.2010 08:44   na 64.2 7.5 991.3 6     526 02.11.2010 08:46 14.2 64.9 7.7 991.2 

Comments