im trying column read factor read or transformed number. advices "stringasfactors=false" or "as.numeric" not work expected. (see attached code)
the data.csv simple example shows no strange special characters in notepad++ , emeditor. there 1 systematic error in "temp" (row 5) , 1 in "rh" (row 4).
> ftimeseries<- read.csv2('data.csv', header = true, sep=";", dec=",", stringsasfactors=false) > head(ftimeseries) station datumzeit temp rh tp ld 1 526 02.11.2010 08:36 15,9 58.4 7.7 991.1 2 526 02.11.2010 08:38 15,6 58.8 7.6 991.3 3 526 02.11.2010 08:40 14,9 60.8 7.4 991.1 4 526 02.11.2010 08:42 14,3 na 7.4 991.4 5 526 02.11.2010 08:44 aaa 64.2 7.5 991.3 6 526 02.11.2010 08:46 14,2 64.9 7.7 991.2 > ftimeseries[,3] [1] "15,9" "15,6" "14,9" "14,3" "aaa" "14,2" "14,2" "13,9" "13,9" "13,6" "13,6" "13,6" "13,4" "13,4" "13,7" "13,8" "13,9" "14,1" "14,3" "14,4" "14,5" "14,2" "14,2" "14,1" "14,1" "14,2" [27] "14,1" "14,1" "14" "14" "14,1" "14" "13,9" "13,9" "14" "14" "13,9" "14" "14,1" "14,2" "14,2" "14,2" "14,2" "14,2" "14,2" "14,2" "14,2"
in column 3 "temp" expect numbers , in row 5 instead of "aaa" na.
so tried convert it:
ftimeseries[,3] <- as.numeric(ftimeseries[,3], dec=',') warnmeldung: nas durch umwandlung erzeugt [1] na na na na na na na na na na na na na na na na na na na na na na na na na na na na 14 14 na 14 na na 14 14 na 14 na na na na na na na na na
but without success. as.numeric seems convert numbers without decimal separator if tell decimal separator is. (i tried without option dec="," without success.)
finally tried levels suggested in answers here:
> levels(ftimeseries$temp) null > levels(ftimeseries[,3]) null > levels(ftimeseries) null > levels(ftimeseries$rh) null > head(ftimeseries) station datumzeit temp rh tp ld 1 526 02.11.2010 08:36 na 58.4 7.7 991.1 2 526 02.11.2010 08:38 na 58.8 7.6 991.3 3 526 02.11.2010 08:40 na 60.8 7.4 991.1 4 526 02.11.2010 08:42 na na 7.4 991.4 5 526 02.11.2010 08:44 na 64.2 7.5 991.3 6 526 02.11.2010 08:46 na 64.9 7.7 991.2
i'm using r in windows7 64bit environment
this 1 works simulated data. df$x
factor
df <- data.frame(x=c("12,1","aa","15,6",61)) as.numeric(gsub(",", ".", as.character(df$x))) # [1] 12.1 na 15.6 61.0
update example:
your data:
> ftimeseries<- read.csv2('data.csv', header = true, sep=";", dec=",", stringsasfactors=false) > head(ftimeseries) station datumzeit temp rh tp ld 1 526 02.11.2010 08:36 15,9 58.4 7.7 991.1 2 526 02.11.2010 08:38 15,6 58.8 7.6 991.3 3 526 02.11.2010 08:40 14,9 60.8 7.4 991.1 4 526 02.11.2010 08:42 14,3 <na> 7.4 991.4 5 526 02.11.2010 08:44 aaa 64.2 7.5 991.3 6 526 02.11.2010 08:46 14,2 64.9 7.7 991.2
column not recognize:
> class(ftimeseries$temp) [1] "character"
apply solution:
> ftimeseries$temp <- as.numeric(gsub(",", ".", as.character(ftimeseries$temp))) > class(ftimeseries$temp) [1] "numeric"
and data.frame becomes:
> ftimeseries station datumzeit temp rh tp ld 1 526 02.11.2010 08:36 15.9 58.4 7.7 991.1 2 526 02.11.2010 08:38 15.6 58.8 7.6 991.3 3 526 02.11.2010 08:40 14.9 60.8 7.4 991.1 4 526 02.11.2010 08:42 14.3 <na> 7.4 991.4 5 526 02.11.2010 08:44 na 64.2 7.5 991.3 6 526 02.11.2010 08:46 14.2 64.9 7.7 991.2
Comments
Post a Comment