i have 100 space-delimited
text files in folder. each text file has paragraph of text in it. wish extract data in data frame column 1
file id
, column 2
corresponding text paragraph.
this have tried far failed extract text paragraph in desired format.
lf <- list.files(path = "", pattern = "'*.txt", full.names = true, recursive = true, include.dirs = true) data <- lapply(lf, read.table, sep="", header=false)
a sample text file looks this:
"yeah, , and repeated phone calls call in on continuously ask if there's promotional deal going on dvr's because i've had problems hopper , delays , today. bill or exchanging hopper enjoys better dvr's."
the output i'm getting list:
[[1]] v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 1 yeah, , and repeated phone calls call in on continuously ask if there's v18 v19 v20 v21 v22 v23 v24 v25 v26 v27 v28 v29 v30 v31 v32 v33 1 promotional deal going on dvr's because i've had problems hopper , v34 v35 v36 v37 v38 v39 v40 v41 v42 v43 v44 v45 v46 v47 v48 v49 1 delays , today. bill or exchanging hopper enjoys better dvr's.
i wish in data frame format as:
file id text file1.txt yeah, , and repeated phone calls...
any pointers on i'm missing?
thanks in advance.
try this: (you not want have spaces delimiters since there many of them in paragraphs):
dat <- setnames( lapply(lf, read.table, sep="|", header=false), lf)
choose separator suspect not in text. i'm afraid sep=""
bad choice because gets interpreted default read.table "whitespace". "title" of entry each file should file name.
Comments
Post a Comment