Philip Monk
2021-Nov-20 17:08 UTC
[R] Date read correctly from CSV, then reformatted incorrectly by R
Hello, Simple but infuriating problem. Reading in CSV of data using : ``` # CSV file has column headers with date of scene capture in format dd/mm/yyyy # check.names = FALSE averts R incorrectly processing dates due to '/' data <- read.csv("C:/R_data/Bungala (b2000) julian.csv", check.names FALSE) # Converts data table from wide (many columns) to long (many rows) and creates the new object 'data_long' # Column 1 is the 'Buffer' number (100-2000), Columns 2-25 contain monthly data covering 2 years (the header row being the date, and rows 2-21 being a value for each buffer). # Column headers for columns 2:25 are mutated into a column called 'Date', values for each buffer and each date into the column 'LST' data_long <- data %>% pivot_longer(cols = 2:25, names_to = "Date", values_to = "LST") # Instructs R to treat the 'Date' column data as a date data_long$Date <- as.Date(data_long$Date) ``` Using str(data), I can see that R has correctly read the dates in the format %d/%m/%y (e.g. 15/12/2015) though has the data type as chr. Once changing the type to 'Date', however, the date is reconfigured. For instance, 15/01/2010 (15 January 2010), becomes 0015-01-20. I've tried ```data_long$Date <- as.Date(data_long$Date, format "%d/%m.%y")```, and also ```tryformat c("%d/%m%y")```, but either the error persists or I get ```NA```. How do I make R change Date from 'chr' to 'date' without it going wrong? Suggestions/hints/solutions would be most welcome. :) Thanks for your time, Philip Part-time PhD Student (Environmental Science) Lancaster University, UK. ~~~~~ I asked a question a few weeks ago and put together the answer I needed from the responses but didn't know how to say thanks on this list. So, thanks Andrew Simmons, Bert Gunter, Jeff Newmiller and Daniel Nordlund! [[alternative HTML version deleted]]
Eric Berger
2021-Nov-20 17:34 UTC
[R] Date read correctly from CSV, then reformatted incorrectly by R
Hi Philip, This is a recurring question and there are many ways to do this. My preference is to use the lubridate package. library(lubridate) a <- "15/01/2010" b <- dmy(a) b # "2010-01-15" class(b) # [1] "Date" HTH, Eric On Sat, Nov 20, 2021 at 7:09 PM Philip Monk <prmonk at gmail.com> wrote:> Hello, > > Simple but infuriating problem. > > Reading in CSV of data using : > > ``` > # CSV file has column headers with date of scene capture in format > dd/mm/yyyy > # check.names = FALSE averts R incorrectly processing dates due to '/' > data <- read.csv("C:/R_data/Bungala (b2000) julian.csv", check.names > FALSE) > > # Converts data table from wide (many columns) to long (many rows) and > creates the new object 'data_long' > # Column 1 is the 'Buffer' number (100-2000), Columns 2-25 contain monthly > data covering 2 years (the header row being the date, and rows 2-21 being a > value for each buffer). > # Column headers for columns 2:25 are mutated into a column called 'Date', > values for each buffer and each date into the column 'LST' > data_long <- data %>% pivot_longer(cols = 2:25, names_to = "Date", > values_to = "LST") > > # Instructs R to treat the 'Date' column data as a date > data_long$Date <- as.Date(data_long$Date) > ``` > > Using str(data), I can see that R has correctly read the dates in the > format %d/%m/%y (e.g. 15/12/2015) though has the data type as chr. > > Once changing the type to 'Date', however, the date is reconfigured. For > instance, 15/01/2010 (15 January 2010), becomes 0015-01-20. > > I've tried ```data_long$Date <- as.Date(data_long$Date, format > "%d/%m.%y")```, and also ```tryformat c("%d/%m%y")```, but either the error > persists or I get ```NA```. > > How do I make R change Date from 'chr' to 'date' without it going wrong? > > Suggestions/hints/solutions would be most welcome. :) > > Thanks for your time, > > Philip > > Part-time PhD Student (Environmental Science) > Lancaster University, UK. > > ~~~~~ > > I asked a question a few weeks ago and put together the answer I needed > from the responses but didn't know how to say thanks on this list. So, > thanks Andrew Simmons, Bert Gunter, Jeff Newmiller and Daniel Nordlund! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jeff Newmiller
2021-Nov-20 17:41 UTC
[R] Date read correctly from CSV, then reformatted incorrectly by R
a) R data frames are column oriented. Do not fight this. b) Data frame header names are character type. Period. Do not fight this. It sounds like you need to reshape your data after you read it in. Provide the first five lines of your CSV file (or a reasonable facsimile if your data are confidential) and someone (me if I get to it first) can help make it more adapted to R. Also, make sure you configure your email software to send plain text... formatted email gets damaged to varying degrees by the automatic stripping of formatting performed by the mailing list. And periodically read the Posting Guide for other advice. On November 20, 2021 9:08:39 AM PST, Philip Monk <prmonk at gmail.com> wrote:>Hello, > >Simple but infuriating problem. > >Reading in CSV of data using : > >``` ># CSV file has column headers with date of scene capture in format >dd/mm/yyyy ># check.names = FALSE averts R incorrectly processing dates due to '/' >data <- read.csv("C:/R_data/Bungala (b2000) julian.csv", check.names >FALSE) > ># Converts data table from wide (many columns) to long (many rows) and >creates the new object 'data_long' ># Column 1 is the 'Buffer' number (100-2000), Columns 2-25 contain monthly >data covering 2 years (the header row being the date, and rows 2-21 being a >value for each buffer). ># Column headers for columns 2:25 are mutated into a column called 'Date', >values for each buffer and each date into the column 'LST' >data_long <- data %>% pivot_longer(cols = 2:25, names_to = "Date", >values_to = "LST") > ># Instructs R to treat the 'Date' column data as a date >data_long$Date <- as.Date(data_long$Date) >``` > >Using str(data), I can see that R has correctly read the dates in the >format %d/%m/%y (e.g. 15/12/2015) though has the data type as chr. > >Once changing the type to 'Date', however, the date is reconfigured. For >instance, 15/01/2010 (15 January 2010), becomes 0015-01-20. > >I've tried ```data_long$Date <- as.Date(data_long$Date, format >"%d/%m.%y")```, and also ```tryformat c("%d/%m%y")```, but either the error >persists or I get ```NA```. > >How do I make R change Date from 'chr' to 'date' without it going wrong? > >Suggestions/hints/solutions would be most welcome. :) > >Thanks for your time, > >Philip > >Part-time PhD Student (Environmental Science) >Lancaster University, UK. > >~~~~~ > >I asked a question a few weeks ago and put together the answer I needed >from the responses but didn't know how to say thanks on this list. So, >thanks Andrew Simmons, Bert Gunter, Jeff Newmiller and Daniel Nordlund! > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Philip Monk
2021-Nov-20 17:48 UTC
[R] Date read correctly from CSV, then reformatted incorrectly by R
Thanks Eric & Jeff. I'll certainly read up on lubridate, and the posting guide (again) (this should be in plain text). CSV extract below... Philip Buffer 28/10/2016 19/11/2016 31/12/2016 16/01/2017 05/03/2017 100 2.437110889 -8.69674895 3.239299816 2.443183304 2.346743827 200 2.524329899 -7.688862068 3.386811734 2.680347706 2.253885237 300 2.100784256 -8.059855835 3.143786507 2.615152896 2.015645973 400 1.985608385 -10.6707206 2.894572791 2.591925038 2.057913137 500 1.824982163 -9.122519736 2.560350727 2.372226799 1.995863839 On Sat, 20 Nov 2021 at 17:08, Philip Monk <prmonk at gmail.com> wrote:> > Hello, > > Simple but infuriating problem. > > Reading in CSV of data using : > > ``` > # CSV file has column headers with date of scene capture in format dd/mm/yyyy > # check.names = FALSE averts R incorrectly processing dates due to '/' > data <- read.csv("C:/R_data/Bungala (b2000) julian.csv", check.names = FALSE) > > # Converts data table from wide (many columns) to long (many rows) and creates the new object 'data_long' > # Column 1 is the 'Buffer' number (100-2000), Columns 2-25 contain monthly data covering 2 years (the header row being the date, and rows 2-21 being a value for each buffer). > # Column headers for columns 2:25 are mutated into a column called 'Date', values for each buffer and each date into the column 'LST' > data_long <- data %>% pivot_longer(cols = 2:25, names_to = "Date", values_to = "LST") > > # Instructs R to treat the 'Date' column data as a date > data_long$Date <- as.Date(data_long$Date) > ``` > > Using str(data), I can see that R has correctly read the dates in the format %d/%m/%y (e.g. 15/12/2015) though has the data type as chr. > > Once changing the type to 'Date', however, the date is reconfigured. For instance, 15/01/2010 (15 January 2010), becomes 0015-01-20. > > I've tried ```data_long$Date <- as.Date(data_long$Date, format = "%d/%m.%y")```, and also ```tryformat c("%d/%m%y")```, but either the error persists or I get ```NA```. > > How do I make R change Date from 'chr' to 'date' without it going wrong? > > Suggestions/hints/solutions would be most welcome. :) > > Thanks for your time, > > Philip > > Part-time PhD Student (Environmental Science) > Lancaster University, UK. > > ~~~~~ > > I asked a question a few weeks ago and put together the answer I needed from the responses but didn't know how to say thanks on this list. So, thanks Andrew Simmons, Bert Gunter, Jeff Newmiller and Daniel Nordlund!
Richard O'Keefe
2021-Nov-22 02:54 UTC
[R] Date read correctly from CSV, then reformatted incorrectly by R
CSV data is very often strangely laid out. For analysis, Buffer Date Reading 100 ... ... 100 ... ... and so on is more like what a data frame should be. I get quite annoyed when I finally manage to extract data from a government agency only to find that my tax money has been spent on making it harder to access than it needed to be. (1) You do NOT need any additional library to convert dates. ?strptime is quite capable. (2) Just because reshaping CAN be done in R doesn't mean it SHOULD be. Instead of reading data in as the wrong format and then hacking it into shape every time, it makes sense to convert the data once and only once, then load the converted data. It took just a couple of minutes to write (CSVDecoder read: 'transpose-in.csv') bindOwn: [:source | (CSVEncoder write: 'transpose-out.csv') bindOwn: [:target | source next bind: [:header | "Label date-1 ... date-n" target nextPut: {header first. 'Date'. 'Reading'}. [source atEnd] whileFalse: [ source next bind: [:group | group with: header keysAndValuesDo: [:index :reading :date | 1 < index ifTrue: [ (date subStrings: '/') bind: [:dmy | (dmy third,'-',dmy second,'-',dmy first) bind: [:iso | target nextPut: {group first. iso. reading}]]]]]]]]]]]. in another programming language, run it, and turn your example into Buffer,Date,Reading 100,2016-10-28,2.437110889 100,2016-11-19,-8.69674895 100,2016-12-31,3.239299816 100,2017-01-16,2.443183304 100,2017-03-05,2.346743827 200,2016-10-28,2.524329899 200,2016-11-19,-7.688862068 ... You could do the same kind of thing easily in Perl, Python, F#, ... Then just read the table in using read.csv("transpose-out.csv", colClasses = c("integer","Date","numeric")) and you're away laughing. (3) Of course you can do the whole thing in base R. h <- read.csv("transpose-in.csv", header=FALSE, nrows=1, stringsAsFactors=FALSE) d <- strptime(h[1,-1], format="%d/%m/%Y") b <- read.csv("transpose-in.csv", header=FALSE, skip=1) r <- expand.grid(Date=d, Buffer=b[,1]) r$Result <- as.vector(t(as.matrix(b[,-1]))) Lessons: (A) You don't have to read a CSV file (or any other) all in one piece. This pays off when the structure is irregular. (B) You don't HAVE to accept or convert column names. (C) strptime is your friend. (D) expand.grid is particularly handy for "matrix form" CSV data. (E) Someone who suggests doing something in another language because it is easier can end up with egg on his face when doing the whole thing in R turns out to be easier, simpler, and far more obvious. (A) really is an important lesson. (F) It's *amazing* what you can do in base R. It is useful to familiarise yourself with its capabilities before considering other packages. Compositional data? Not in base R. Correspondence analysis? Not in base R. Data reshaping? Very much there. On Sun, 21 Nov 2021 at 06:09, Philip Monk <prmonk at gmail.com> wrote:> Hello, > > Simple but infuriating problem. > > Reading in CSV of data using : > > ``` > # CSV file has column headers with date of scene capture in format > dd/mm/yyyy > # check.names = FALSE averts R incorrectly processing dates due to '/' > data <- read.csv("C:/R_data/Bungala (b2000) julian.csv", check.names > FALSE) > > # Converts data table from wide (many columns) to long (many rows) and > creates the new object 'data_long' > # Column 1 is the 'Buffer' number (100-2000), Columns 2-25 contain monthly > data covering 2 years (the header row being the date, and rows 2-21 being a > value for each buffer). > # Column headers for columns 2:25 are mutated into a column called 'Date', > values for each buffer and each date into the column 'LST' > data_long <- data %>% pivot_longer(cols = 2:25, names_to = "Date", > values_to = "LST") > > # Instructs R to treat the 'Date' column data as a date > data_long$Date <- as.Date(data_long$Date) > ``` > > Using str(data), I can see that R has correctly read the dates in the > format %d/%m/%y (e.g. 15/12/2015) though has the data type as chr. > > Once changing the type to 'Date', however, the date is reconfigured. For > instance, 15/01/2010 (15 January 2010), becomes 0015-01-20. > > I've tried ```data_long$Date <- as.Date(data_long$Date, format > "%d/%m.%y")```, and also ```tryformat c("%d/%m%y")```, but either the error > persists or I get ```NA```. > > How do I make R change Date from 'chr' to 'date' without it going wrong? > > Suggestions/hints/solutions would be most welcome. :) > > Thanks for your time, > > Philip > > Part-time PhD Student (Environmental Science) > Lancaster University, UK. > > ~~~~~ > > I asked a question a few weeks ago and put together the answer I needed > from the responses but didn't know how to say thanks on this list. So, > thanks Andrew Simmons, Bert Gunter, Jeff Newmiller and Daniel Nordlund! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]