Hi. I have some data frames I created previously that seem to not be working correctly anymore. I *think* the problem is that some of the variables in the data frame are of a type called labelled. There are other attributes in the data frame as well. I thought that the easiest way to fix this was to convert to, say a csv and re-load. I tried something like read.csv(write.csv(df,row.names=FALSE)) but got the error Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection I guess there must be a way to send the output of write.csv to a connection that read.csv can use but I was mystified by the help page on connections, at least I could not determine how to achieve my desired result. I realize I could write to a file and read it back in, but that feels klunky somehow. Maybe my approach to convert my data to strip the "weird" stuff is wrong-headed and I would accept alternative strategies. I would like a more general solution to fix this because I expect to encounter it some more. For those wondering how I found myself in such a mess, the data frames were initially imported from SAS data sets through the haven package. I then did some standard manipulation and added some additional labels with the upData() function from Hmisc (both packages have been updated since initial creation of the data frames). Thanks, Kevin -- Kevin E. Thorpe Head of Biostatistics,? Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca? Tel: 416.864.5776? Fax: 416.864.3016
William Poling, Ph.D., MPH
2018-Jul-09 14:50 UTC
[R] Using write.csv as a connection for read.csv
Hi Kevin. Maybe? setwd("C:/RPractice") write_csv(yourfile, path = "yourfile.csv") yourfile <- read.csv("yourfile.csv") HTH WHP On Monday, July 9, 2018, 10:42:24 AM EDT, Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote: Hi. I have some data frames I created previously that seem to not be working correctly anymore. I *think* the problem is that some of the variables in the data frame are of a type called labelled. There are other attributes in the data frame as well. I thought that the easiest way to fix this was to convert to, say a csv and re-load. I tried something like read.csv(write.csv(df,row.names=FALSE)) but got the error ? Error in read.table(file = file, header = header, sep = sep, quote = quote,? : ? 'file' must be a character string or connection I guess there must be a way to send the output of write.csv to a connection that read.csv can use but I was mystified by the help page on connections, at least I could not determine how to achieve my desired result. I realize I could write to a file and read it back in, but that feels klunky somehow. Maybe my approach to convert my data to strip the "weird" stuff is wrong-headed and I would accept alternative strategies. I would like a more general solution to fix this because I expect to encounter it some more. For those wondering how I found myself in such a mess, the data frames were initially imported from SAS data sets through the haven package. I then did some standard manipulation and added some additional labels with the upData() function from Hmisc (both packages have been updated since initial creation of the data frames). Thanks, Kevin -- Kevin E. Thorpe Head of Biostatistics,? Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca? Tel: 416.864.5776? Fax: 416.864.3016 ? ? ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Hi Kevin, It's good that you provided the background to the problem. Rather than asking this list to "debug" your proposed solution, I think you would be better off showing some of the "corrupted" data frame and ask for suggestions how to deal with it. (Suggestions may or may not match your initial attempt.) Can you output a piece of your suspect data frame via the dput() function and post to the list? Best, Eric On Mon, Jul 9, 2018 at 5:42 PM, Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:> Hi. > > I have some data frames I created previously that seem to not be working > correctly anymore. I *think* the problem is that some of the variables in > the data frame are of a type called labelled. There are other attributes in > the data frame as well. I thought that the easiest way to fix this was to > convert to, say a csv and re-load. > > I tried something like read.csv(write.csv(df,row.names=FALSE)) but got > the error > > Error in read.table(file = file, header = header, sep = sep, quote > quote, : > 'file' must be a character string or connection > > I guess there must be a way to send the output of write.csv to a > connection that read.csv can use but I was mystified by the help page on > connections, at least I could not determine how to achieve my desired > result. > > I realize I could write to a file and read it back in, but that feels > klunky somehow. Maybe my approach to convert my data to strip the "weird" > stuff is wrong-headed and I would accept alternative strategies. > > I would like a more general solution to fix this because I expect to > encounter it some more. For those wondering how I found myself in such a > mess, the data frames were initially imported from SAS data sets through > the haven package. I then did some standard manipulation and added some > additional labels with the upData() function from Hmisc (both packages have > been updated since initial creation of the data frames). > > Thanks, > > Kevin > > -- > Kevin E. Thorpe > Head of Biostatistics, Applied Health Research Centre (AHRC) > Li Ka Shing Knowledge Institute of St. Michael's > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Although your suggestion to provide the data is excellent and one I typically agree with, they data are currently unpublished and so should not be publicly available. I have tried to make a reproducible example in the past (when similar looking things happened), but was unable to. Maybe I'll try a small subset and see if that works. Kevin -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 ________________________________ From: Eric Berger <ericjberger at gmail.com> Sent: Monday, July 9, 2018 10:51:38 AM To: Kevin Thorpe Cc: R Help Mailing List Subject: Re: [R] Using write.csv as a connection for read.csv Hi Kevin, It's good that you provided the background to the problem. Rather than asking this list to "debug" your proposed solution, I think you would be better off showing some of the "corrupted" data frame and ask for suggestions how to deal with it. (Suggestions may or may not match your initial attempt.) Can you output a piece of your suspect data frame via the dput() function and post to the list? Best, Eric On Mon, Jul 9, 2018 at 5:42 PM, Kevin Thorpe <kevin.thorpe at utoronto.ca<mailto:kevin.thorpe at utoronto.ca>> wrote: Hi. I have some data frames I created previously that seem to not be working correctly anymore. I *think* the problem is that some of the variables in the data frame are of a type called labelled. There are other attributes in the data frame as well. I thought that the easiest way to fix this was to convert to, say a csv and re-load. I tried something like read.csv(write.csv(df,row.names=FALSE)) but got the error Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection I guess there must be a way to send the output of write.csv to a connection that read.csv can use but I was mystified by the help page on connections, at least I could not determine how to achieve my desired result. I realize I could write to a file and read it back in, but that feels klunky somehow. Maybe my approach to convert my data to strip the "weird" stuff is wrong-headed and I would accept alternative strategies. I would like a more general solution to fix this because I expect to encounter it some more. For those wondering how I found myself in such a mess, the data frames were initially imported from SAS data sets through the haven package. I then did some standard manipulation and added some additional labels with the upData() function from Hmisc (both packages have been updated since initial creation of the data frames). Thanks, Kevin -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca<mailto:kevin.thorpe at utoronto.ca> Tel: 416.864.5776 Fax: 416.864.3016 ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
>I tried something like read.csv(write.csv(df,row.names=FALSE)) but gotthe error> > Error in read.table(file = file, header = header, sep = sep, quote quote, : > 'file' must be a character string or connectionTo diagnose this without reading the help(write.csv) look at the return value of write.csv: > df <- data.frame(Col1=1:3, Col2=LETTERS[24:26]) > tmp <- write.csv(df, row.names=FALSE) "Col1","Col2" 1,"X" 2,"Y" 3,"Z" > tmp NULL read.csv complains about reading from NULL: > read.csv(NULL) Error in read.table(file = file, header = header, sep = sep, quote quote, : 'file' must be a character string or connection Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Jul 9, 2018 at 7:42 AM, Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:> Hi. > > I have some data frames I created previously that seem to not be working > correctly anymore. I *think* the problem is that some of the variables in > the data frame are of a type called labelled. There are other attributes in > the data frame as well. I thought that the easiest way to fix this was to > convert to, say a csv and re-load. > > I tried something like read.csv(write.csv(df,row.names=FALSE)) but got > the error > > Error in read.table(file = file, header = header, sep = sep, quote > quote, : > 'file' must be a character string or connection > > I guess there must be a way to send the output of write.csv to a > connection that read.csv can use but I was mystified by the help page on > connections, at least I could not determine how to achieve my desired > result. > > I realize I could write to a file and read it back in, but that feels > klunky somehow. Maybe my approach to convert my data to strip the "weird" > stuff is wrong-headed and I would accept alternative strategies. > > I would like a more general solution to fix this because I expect to > encounter it some more. For those wondering how I found myself in such a > mess, the data frames were initially imported from SAS data sets through > the haven package. I then did some standard manipulation and added some > additional labels with the upData() function from Hmisc (both packages have > been updated since initial creation of the data frames). > > Thanks, > > Kevin > > -- > Kevin E. Thorpe > Head of Biostatistics, Applied Health Research Centre (AHRC) > Li Ka Shing Knowledge Institute of St. Michael's > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
TL;DR: If you want to do this, go ahead and use a temporary file or text connection. Others have pointed out that write.csv returns NULL rather than a file connection, but I haven't seen comments on your impulse to avoid the use of files. *nix operating systems are admirably efficient with multitasking... such that shells can efficiently run multiple programs connected by pipes, pausing the producers to pause if they get ahead of the consumers and resuming them if the consumers run out of data, thus minimizing the amount of temporary disk space usage. R does not presume this to be among the fundamental capabilities of the operating system, rather assuming single tasking capability by default. This means that even if you do connect write.csv to a pipe then it will run to completion before read.csv gets a chance to process any of the data. MSDOS used to simulate command line program chaining by writing all the data to a temporary file before running the consumer program. R is similar... and like MSDOS there is little reason to avoid temporary files in R. set.seed( 42 ) DF <- data.frame( X=1:100, Y=rnorm( 100 ) ) frame <- tempfile() write.csv( DF, file=fname, row.names=FALSE ) DF2 <- read.csv( file=fname ) all.equal( DF$X, DF2$X ) && all.equal( DF$Y, DF2$Y ) unlink( fname ) On July 9, 2018 7:42:00 AM PDT, Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:>Hi. > >I have some data frames I created previously that seem to not be >working correctly anymore. I *think* the problem is that some of the >variables in the data frame are of a type called labelled. There are >other attributes in the data frame as well. I thought that the easiest >way to fix this was to convert to, say a csv and re-load. > >I tried something like read.csv(write.csv(df,row.names=FALSE)) but got >the error > >Error in read.table(file = file, header = header, sep = sep, quote >quote, : > 'file' must be a character string or connection > >I guess there must be a way to send the output of write.csv to a >connection that read.csv can use but I was mystified by the help page >on connections, at least I could not determine how to achieve my >desired result. > >I realize I could write to a file and read it back in, but that feels >klunky somehow. Maybe my approach to convert my data to strip the >"weird" stuff is wrong-headed and I would accept alternative >strategies. > >I would like a more general solution to fix this because I expect to >encounter it some more. For those wondering how I found myself in such >a mess, the data frames were initially imported from SAS data sets >through the haven package. I then did some standard manipulation and >added some additional labels with the upData() function from Hmisc >(both packages have been updated since initial creation of the data >frames). > >Thanks, > >Kevin > >-- > Kevin E. Thorpe > Head of Biostatistics,? Applied Health Research Centre (AHRC) > Li Ka Shing Knowledge Institute of St. Michael's > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.thorpe at utoronto.ca? Tel: 416.864.5776? Fax: 416.864.3016 > > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Thanks Jeff and all others. I will need to use the tempfile route I guess (I'm running in a Linux OS) for the time-being. After I re-loaded the data frames that were broken before and they seemed fine, after using them for awhile they broke again. I am trying to build my analysis with rmarkdown and tools. I have not been able to determine (yet) exactly what set of interactions are "breaking" things. I certainly don't expect the list to debug everything I'm doing. The only thing is can say is that there appears to be some weird interaction between SAS data sets imported by haven and other packages. Note that I encountered (I think) related issues with an imported data set when I tried working with it in the tidyverse. Maybe I'm getting too old to learn new stuff. :-) Sorry I am not being much help with my own problem. I just have not been able to determine where things break. If can come up with a reproducible example that reliably breaks, I'll post it. Kevin -- Kevin E. Thorpe Head of Biostatistics,? Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca? Tel: 416.864.5776? Fax: 416.864.3016 From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Sent: Monday, July 9, 2018 1:01 PM To: r-help at r-project.org; Kevin Thorpe; R Help Mailing List Subject: Re: [R] Using write.csv as a connection for read.csv ? TL;DR: If you want to do this, go ahead and use a temporary file or text connection. Others have pointed out that write.csv returns NULL rather than a file connection, but I haven't seen comments on your impulse to avoid the use of files. *nix operating systems are admirably efficient with multitasking... such that shells can efficiently run multiple programs connected by pipes, pausing the producers to pause if they get ahead of the consumers and resuming them if the consumers run out of data, thus minimizing the amount of temporary disk space usage. R does not presume this to be among the fundamental capabilities of the operating system, rather assuming single tasking capability by default. This means that even if you do connect write.csv to a pipe then it will run to completion before read.csv gets a chance to process any of the data. MSDOS used to simulate command line program chaining by writing all the data to a temporary file before running the consumer program. R is similar... and like MSDOS there is little reason to avoid temporary files in R. set.seed( 42 ) DF <- data.frame( X=1:100, Y=rnorm( 100 ) ) frame <- tempfile() write.csv( DF, file=fname, row.names=FALSE ) DF2 <- read.csv( file=fname ) all.equal( DF$X, DF2$X ) && all.equal( DF$Y, DF2$Y ) unlink( fname ) On July 9, 2018 7:42:00 AM PDT, Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:>Hi. > >I have some data frames I created previously that seem to not be >working correctly anymore. I *think* the problem is that some of the >variables in the data frame are of a type called labelled. There are >other attributes in the data frame as well. I thought that the easiest >way to fix this was to convert to, say a csv and re-load. > >I tried something like read.csv(write.csv(df,row.names=FALSE)) but got >the error > >Error in read.table(file = file, header = header, sep = sep, quote >quote,? : >? 'file' must be a character string or connection > >I guess there must be a way to send the output of write.csv to a >connection that read.csv can use but I was mystified by the help page >on connections, at least I could not determine how to achieve my >desired result. > >I realize I could write to a file and read it back in, but that feels >klunky somehow. Maybe my approach to convert my data to strip the >"weird" stuff is wrong-headed and I would accept alternative >strategies. > >I would like a more general solution to fix this because I expect to >encounter it some more. For those wondering how I found myself in such >a mess, the data frames were initially imported from SAS data sets >through the haven package. I then did some standard manipulation and >added some additional labels with the upData() function from Hmisc >(both packages have been updated since initial creation of the data >frames). > >Thanks, > >Kevin > >-- > Kevin E. Thorpe > Head of Biostatistics,? Applied Health Research Centre (AHRC) > Li Ka Shing Knowledge Institute of St. Michael's > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.thorpe at utoronto.ca? Tel: 416.864.5776? Fax: 416.864.3016 > >???? >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.