I have a number of space separated files of weather data, with some equivalent column names, and differing number of fields in each file. Some of the files have 40 or more vars, but I only want a subset of the fields. I can use colClasses with read.table to drop some of the fields, but only if I know where those columns are in the first place, and they're not always in the same place. So I would like to be able to drop all unwanted columns on import, by name. In addition, most fields have a "Q" (quality) field next to them, and I need to read of those as well, each "Q" next to its relevant field, such as "Temp", and rename to e.g., "Temp.Q". Some example data: Date HrMn I Type Dir Q I Spd Q Visby Q I Q Temp Q Dewpt Q Slp Q Pr Amt I Q 19450101 0900 4 SAO 315 1 N 1.0 1 024000 1 N 1 -37.0 1 -45.9 1 1031.8 1 99 999.9 9 9 19450101 1000 4 SAO 315 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 1032.2 1 99 999.9 9 9 19450101 1100 4 SAO 360 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 1032.5 1 99 999.9 9 9 19450101 1200 4 SAO 315 1 N 1.0 1 024000 1 N 1 -36.4 1 -50.9 1 1032.9 1 99 999.9 9 9 19450101 1300 4 SAO 360 1 N 1.0 1 024000 1 N 1 -36.4 1 -43.1 1 1032.9 1 99 999.9 9 9 19450101 1400 4 SAO 315 1 N 1.0 1 016000 1 N 1 -36.4 1 -42.0 1 1032.5 1 99 999.9 9 9 19450101 1500 4 SAO 180 1 N 1.0 1 016000 1 N 1 -36.4 1 -45.3 1 1032.5 1 99 999.9 9 9 19450101 1600 4 SAO 360 1 N 1.0 1 024000 1 N 1 -37.5 1 -45.9 1 1032.9 1 99 999.9 9 9 So if I want to extract Date, HrMn, Temp, and the Q following Temp: tmp1<-read.table("ex.dat", sep=" ", strip.white=TRUE, colClasses=c("character","character", rep("NULL",11),"numeric","factor",rep("NULL",8)),na.strings="999.9", header=T) But having to alter colClasses for every file, the fields of which may change when next year's data is retrieved, is no fun. And is there a way to specify na.strings per column? -- View this message in context: http://n4.nabble.com/how-to-drop-fields-by-name-when-reading-in-data-tp1601166p1601166.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2010-Mar-19 20:34 UTC
[R] how to drop fields by name when reading in data?
On Mar 19, 2010, at 3:03 PM, Peter Keller wrote:> > I have a number of space separated files of weather data, with some > equivalent column names, and differing number of fields in each > file. Some > of the files have 40 or more vars, but I only want a subset of the > fields. > I can use colClasses with read.table to drop some of the fields, but > only if > I know where those columns are in the first place, and they're not > always in > the same place. So I would like to be able to drop all unwanted > columns on > import, by name. > > In addition, most fields have a "Q" (quality) field next to them, > and I need > to read of those as well, each "Q" next to its relevant field, such as > "Temp", and rename to e.g., "Temp.Q".Those will probably get changed to Q.1, Q.2, etc by check.names()> > Some example data: > Date HrMn I Type Dir Q I Spd Q Visby Q I Q Temp Q Dewpt Q Slp Q Pr > Amt I Q > 19450101 0900 4 SAO 315 1 N 1.0 1 024000 1 N 1 -37.0 1 -45.9 1 > 1031.8 1 99 > 999.9 9 9 > 19450101 1000 4 SAO 315 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 > 1032.2 1 99 > 999.9 9 9 > 19450101 1100 4 SAO 360 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 > 1032.5 1 99 > 999.9 9 9 > 19450101 1200 4 SAO 315 1 N 1.0 1 024000 1 N 1 -36.4 1 -50.9 1 > 1032.9 1 99 > 999.9 9 9 > 19450101 1300 4 SAO 360 1 N 1.0 1 024000 1 N 1 -36.4 1 -43.1 1 > 1032.9 1 99 > 999.9 9 9 > 19450101 1400 4 SAO 315 1 N 1.0 1 016000 1 N 1 -36.4 1 -42.0 1 > 1032.5 1 99 > 999.9 9 9 > 19450101 1500 4 SAO 180 1 N 1.0 1 016000 1 N 1 -36.4 1 -45.3 1 > 1032.5 1 99 > 999.9 9 9 > 19450101 1600 4 SAO 360 1 N 1.0 1 024000 1 N 1 -37.5 1 -45.9 1 > 1032.9 1 99 > 999.9 9 9 > > So if I want to extract Date, HrMn, Temp, and the Q following Temp: > tmp1<-read.table("ex.dat", sep=" ", strip.white=TRUE, > colClasses=c("character","character", > rep("NULL",11),"numeric","factor",rep("NULL",8)),na.strings="999.9", > header=T) > > But having to alter colClasses for every file, the fields of which may > change when next year's data is retrieved, is no fun. And is there > a way to > specify na.strings per column?There might be if you wanted to write an as.Method for a new data type. There was a recent answer to an r-help currency conversion question that illustrated this approach.> > -- > View this message in context: http://n4.nabble.com/how-to-drop-fields-by-name-when-reading-in-data-tp1601166p1601166.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT