thr3ads.net - R help - [R] how to drop fields by name when reading in data? [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Peter Keller

2010-Mar-19 19:03 UTC

[R] how to drop fields by name when reading in data?

I have a number of space separated files of weather data, with some
equivalent column names, and differing number of fields in each file.  Some
of the files have 40 or more vars, but I only want a subset of the fields. 
I can use colClasses with read.table to drop some of the fields, but only if
I know where those columns are in the first place, and they're not always in
the same place.   So I would like to be able to drop all unwanted columns on
import, by name.

In addition, most fields have a "Q" (quality) field next to them, and
I need
to read of those as well, each "Q" next to its relevant field, such as
"Temp", and rename to e.g., "Temp.Q".

Some example data: 
Date HrMn I Type Dir Q I Spd Q Visby Q I Q Temp Q Dewpt Q Slp Q Pr Amt I Q
19450101 0900 4 SAO 315 1 N 1.0 1 024000 1 N 1 -37.0 1 -45.9 1 1031.8 1 99
999.9 9 9
19450101 1000 4 SAO 315 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 1032.2 1 99
999.9 9 9
19450101 1100 4 SAO 360 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 1032.5 1 99
999.9 9 9
19450101 1200 4 SAO 315 1 N 1.0 1 024000 1 N 1 -36.4 1 -50.9 1 1032.9 1 99
999.9 9 9
19450101 1300 4 SAO 360 1 N 1.0 1 024000 1 N 1 -36.4 1 -43.1 1 1032.9 1 99
999.9 9 9
19450101 1400 4 SAO 315 1 N 1.0 1 016000 1 N 1 -36.4 1 -42.0 1 1032.5 1 99
999.9 9 9
19450101 1500 4 SAO 180 1 N 1.0 1 016000 1 N 1 -36.4 1 -45.3 1 1032.5 1 99
999.9 9 9
19450101 1600 4 SAO 360 1 N 1.0 1 024000 1 N 1 -37.5 1 -45.9 1 1032.9 1 99
999.9 9 9

So if I want to extract Date, HrMn, Temp, and the Q following Temp: 
tmp1<-read.table("ex.dat",	sep=" ", strip.white=TRUE,
colClasses=c("character","character",
	rep("NULL",11),"numeric","factor",rep("NULL",8)),na.strings="999.9",
	header=T)

But having to alter colClasses for every file, the fields of which may
change when next year's data is retrieved, is no fun.  And is there a way to
specify na.strings per column?

-- 
View this message in context:
http://n4.nabble.com/how-to-drop-fields-by-name-when-reading-in-data-tp1601166p1601166.html
Sent from the R help mailing list archive at Nabble.com.

David Winsemius

2010-Mar-19 20:34 UTC

head link

[R] how to drop fields by name when reading in data?

On Mar 19, 2010, at 3:03 PM, Peter Keller wrote:
>
> I have a number of space separated files of weather data, with some
> equivalent column names, and differing number of fields in each  
> file.  Some
> of the files have 40 or more vars, but I only want a subset of the  
> fields.
> I can use colClasses with read.table to drop some of the fields, but  
> only if
> I know where those columns are in the first place, and they're not  
> always in
> the same place.   So I would like to be able to drop all unwanted  
> columns on
> import, by name.
>
> In addition, most fields have a "Q" (quality) field next to them,
> and I need
> to read of those as well, each "Q" next to its relevant field,
such as
> "Temp", and rename to e.g., "Temp.Q".
Those will probably get changed to Q.1, Q.2, etc by check.names()
>
> Some example data:
> Date HrMn I Type Dir Q I Spd Q Visby Q I Q Temp Q Dewpt Q Slp Q Pr  
> Amt I Q
> 19450101 0900 4 SAO 315 1 N 1.0 1 024000 1 N 1 -37.0 1 -45.9 1  
> 1031.8 1 99
> 999.9 9 9
> 19450101 1000 4 SAO 315 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1  
> 1032.2 1 99
> 999.9 9 9
> 19450101 1100 4 SAO 360 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1  
> 1032.5 1 99
> 999.9 9 9
> 19450101 1200 4 SAO 315 1 N 1.0 1 024000 1 N 1 -36.4 1 -50.9 1  
> 1032.9 1 99
> 999.9 9 9
> 19450101 1300 4 SAO 360 1 N 1.0 1 024000 1 N 1 -36.4 1 -43.1 1  
> 1032.9 1 99
> 999.9 9 9
> 19450101 1400 4 SAO 315 1 N 1.0 1 016000 1 N 1 -36.4 1 -42.0 1  
> 1032.5 1 99
> 999.9 9 9
> 19450101 1500 4 SAO 180 1 N 1.0 1 016000 1 N 1 -36.4 1 -45.3 1  
> 1032.5 1 99
> 999.9 9 9
> 19450101 1600 4 SAO 360 1 N 1.0 1 024000 1 N 1 -37.5 1 -45.9 1  
> 1032.9 1 99
> 999.9 9 9
>
> So if I want to extract Date, HrMn, Temp, and the Q following Temp:
> tmp1<-read.table("ex.dat",	sep=" ",
strip.white=TRUE,
> colClasses=c("character","character",
> 
rep("NULL",11),"numeric","factor",rep("NULL",8)),na.strings="999.9",
> 	header=T)
>
> But having to alter colClasses for every file, the fields of which may
> change when next year's data is retrieved, is no fun.  And is there  
> a way to
> specify na.strings per column?
There might be if you wanted to write an as.Method for a new data  
type. There was a recent answer to an r-help currency conversion  
question that illustrated this approach.
>
> -- 
> View this message in context:
http://n4.nabble.com/how-to-drop-fields-by-name-when-reading-in-data-tp1601166p1601166.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Mar 2010 - how to drop fields by name when reading in data?

[R] how to drop fields by name when reading in data?

[R] how to drop fields by name when reading in data?

Apparently Analagous Threads