I am writing an R function that reads a directory full of files and reports the number of completely observed cases in each data file. The function returns a data frame where the first column is the name of the file and the second column is the number of complete cases.
such as,
id nobs
1 108
2 345
...
etc
Here is the function I wrote:
complete <- function(directory, id = 1:332) {
for(i in 1:332) {
path<-paste(directory,"/",id,".csv",sep="")
mydata<-read.csv(path)
#nobs<-nrow(na.omit(mydata))
nobs<-sum(complete.cases(mydata))
i<-i+1
}
completedata<-c(id,nobs)
}
I execute the function:
complete("specdata",id=1:332)
but I’m getting this error:
Error in file(file, "rt") : invalid 'description' argument
I also tried the traceback()
function to debug my code and it gives this output:
traceback()
# 4: file(file, "rt") at #6
# 3: read.table(file = file, header = header, sep = sep, quote = quote,
# dec = dec, fill = fill, comment.char = comment.char, ...) at #6
# 2: read.csv(path) at #6
# 1: complete("specdata", id = 1:332)
7 Answers
It’s hard to tell without a completely reproducible example, but I suspect your problem is this line:
path<-paste(directory,"/",id,".csv",sep="")
id
here is a vector, so path becomes a vector of character strings, and when you call read.csv
you’re passing it all the paths at once instead of just one. Try changing the above line to
path<-paste(directory,"/",id[i],".csv",sep="")
and see if that works.
It seems you have a problem with your file path.
You are passing the full vector id =c(1:332) to the file path name.
If your files are named 1.csv, 2.csv, 3.csv, etc..
You can change this line:
path<-paste(directory,"/",id,".csv",sep="")
to
path<-paste(directory,"/",i,".csv",sep="")
and leave out or rework the id input of your function.
Instead of using a for
to read the data in, you can try sapply
. For example
mydata <- sapply(path, read.csv)
.
Since path
is a vector, sapply
will iterate the vector and apply read.csv
to it. Therefore, there will be no need for the for
loop and your code will be much cleaner.
From there you will have a matrix
which each of your files and their respective information from which you can extract the observations.
To find the observations, you can do mydata[2,1][[1]]
. Remember that the rows will be your factors and your columns will be your files.
I am working on the exact problem.. file names in the directory “specdata” are named with 001.csv and 002.csv…. 099.csv all the way to file 332.csv however, when you are recalling id=1 then your file name becomes 1.csv which does not exist in the directory. try using this function to get the path of each id file.
filepaths <- function (id){
allfiles = list.files(getwd())
file.path(getwd(), allfiles[id])
}
I had this problem because I was trying to run a for loop against the data frame and not a vector:
ids <- th[th$nobs > threshold,]
for(i in ids) {
this is what the variable “ids” looks like:
id nobs
2 2 1041
154 154 1095
248 248 1005
should have been:
ids <- th[th$nobs > threshold,]
for(i in ids$id) {
I met the same problem in this sentence:
Browse[2]> read.csv(list.files(".", "XCMS-annotated-diffreport--.*csv$"), row.names = 1)
Error in file(file, "rt") : invalid 'description' argument
then, I found there are two different csv files in the same path, like this:
Browse[2]> list.files(".", "XCMS-annotated-diffreport--.*csv$")
[1] "XCMS-annotated-diffreport--1-vs-2-Y.csv" "XCMS-annotated-diffreport--1-vs-2.csv"
When I deleted one file, it works again.
change object id to i – because you are in for loop with iteration object i i.e path<-paste(directory,”/”,id,”.csv”,sep=””) to i.e path<-paste(directory,”/”,i,”.csv”,sep=””)