thr3ads.net - R help - [R] splitting a data.frame [Jan 2002]

If this information is useful, please help other people find it:
Share via:

Gary Collins

2002-Jan-22 13:29 UTC

[R] splitting a data.frame

I have the following (simple!?) problem which I am unable to find a
 relatively trivial solution to.
 If I have a dataframe,

 A    1 
 A    7
 B    4
 B    5
 C    3
 D    3
 D    2
 E    5
 F    5
 F    6

 I would like to create a new data.frame in the form

 ID    pt1    pt2
 A    1    7
 B    4    5
 C    3    NA
 D    3    2
 E    5    NA
 F    5    6

 so that for each identifier, in this example, A...F I have a column for 
 each observation for each identifier... (with a maximum of 2 obs per 
 identifier, if only 1 obs exist then the second obs pt2 is set to NA)
 This is so I can find the absolute differences between the obs for each 
 identifier, that is abs(pt1-pt2)

 ID  Diff
 A    6
 B    1
 C    NA
 D    1
 E    NA
 F    1
 for which there may be another approach so as not to mess about creating 
 a new dataframe
 Any ideas?
 Gary


__________________________________________________
Gary S. Collins, PhD,
Statistics Research Fellow,
Quality of Life Unit, 
European Organisation for Research and Treatment of Cancer, 
EORTC Data Center, 
Avenue E. Mounier 83, bte. 11,
B-1200 Brussels, Belgium.

Tel: +32 2 774 1 606
Fax: +32 2 779 4 568
http://www.eortc.be/home/qol/
__________________________________________________



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

David Meyer

2002-Jan-22 14:04 UTC

head link

[R] splitting a data.frame

Gary Collins wrote:> 
>  I have the following (simple!?) problem which I am unable to find a
>  relatively trivial solution to.
>  If I have a dataframe,
> 
>  A    1
>  A    7
>  B    4
>  B    5
>  C    3
>  D    3
>  D    2
>  E    5
>  F    5
>  F    6
> 
>  I would like to create a new data.frame in the form
> 
>  ID    pt1    pt2
>  A    1    7
>  B    4    5
>  C    3    NA
>  D    3    2
>  E    5    NA
>  F    5    6
> 
>  so that for each identifier, in this example, A...F I have a column for
>  each observation for each identifier... (with a maximum of 2 obs per
>  identifier, if only 1 obs exist then the second obs pt2 is set to NA)
>  This is so I can find the absolute differences between the obs for each
>  identifier, that is abs(pt1-pt2)
> 
>  ID  Diff
>  A    6
>  B    1
>  C    NA
>  D    1
>  E    NA
>  F    1
>  for which there may be another approach so as not to mess about creating
>  a new dataframe
>  Any ideas?
>  Gary
What about

by(y,x,function(x) x[2]-x[1])

if x is your factor and y are your values?

-d
> 
> __________________________________________________
> Gary S. Collins, PhD,
> Statistics Research Fellow,
> Quality of Life Unit,
> European Organisation for Research and Treatment of Cancer,
> EORTC Data Center,
> Avenue E. Mounier 83, bte. 11,
> B-1200 Brussels, Belgium.
> 
> Tel: +32 2 774 1 606
> Fax: +32 2 779 4 568
> http://www.eortc.be/home/qol/
> __________________________________________________
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-- 
	Mag. David Meyer		Wiedner Hauptstrasse 8-10
Vienna University of Technology		A-1040 Vienna/AUSTRIA
       Department for			Tel.: (+431) 58801/10772
Statistics and Probability Theory	mail: david.meyer at ci.tuwien.ac.at
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Torsten Hothorn

2002-Jan-22 14:18 UTC

head link

[R] splitting a data.frame

>  I have the following (simple!?) problem which I am unable to find a
>  relatively trivial solution to.
>  If I have a dataframe,
> 
>  A    1 
>  A    7
>  B    4
>  B    5
>  C    3
>  D    3
>  D    2
>  E    5
>  F    5
>  F    6
> 
>  I would like to create a new data.frame in the form
> 
>  ID    pt1    pt2
>  A    1    7
>  B    4    5
>  C    3    NA
>  D    3    2
>  E    5    NA
>  F    5    6
> 
t(as.data.frame(split(1:4, factor(c("A", "A", "B",
"B")))))

(for complete data)

Torsten
>  so that for each identifier, in this example, A...F I have a column for 
>  each observation for each identifier... (with a maximum of 2 obs per 
>  identifier, if only 1 obs exist then the second obs pt2 is set to NA)
>  This is so I can find the absolute differences between the obs for each 
>  identifier, that is abs(pt1-pt2)
> 
>  ID  Diff
>  A    6
>  B    1
>  C    NA
>  D    1
>  E    NA
>  F    1
>  for which there may be another approach so as not to mess about creating 
>  a new dataframe
>  Any ideas?
>  Gary
> 
> 
> __________________________________________________
> Gary S. Collins, PhD,
> Statistics Research Fellow,
> Quality of Life Unit, 
> European Organisation for Research and Treatment of Cancer, 
> EORTC Data Center, 
> Avenue E. Mounier 83, bte. 11,
> B-1200 Brussels, Belgium.
> 
> Tel: +32 2 774 1 606
> Fax: +32 2 779 4 568
> http://www.eortc.be/home/qol/
> __________________________________________________
> 
> 
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Giovanni Petris

2002-Jan-22 15:38 UTC

head link

[R] splitting a data.frame

This gives you what you want, I think (maybe up to the sign). 
However, you must be sure that each identifier occurs at most twice. 

Giovanni
> a   V1 V2
1   A  1
2   A  7
3   B  4
4   B  5
5   C  3
6   D  3
7   D  2
8   E  5
9   F  5
10  F  6> tapply(a$V2, a$V1, diff)$A
[1] 6

$B
[1] 1

$C
numeric(0)

$D
[1] -1

$E
numeric(0)

$F
[1] 1
> From: "Gary Collins" <gco at eortc.be>
> Date: Tue, 22 Jan 2002 14:29:22 +0100
> Organization: EORTC
> X-Priority: 3
> X-MSMail-Priority: Normal
> X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700
> Sender: owner-r-help at stat.math.ethz.ch
> Precedence: SfS-bulk
> Content-Type: text/plain;
> 	charset="iso-8859-1"
> Content-Length: 1597
> 
>  I have the following (simple!?) problem which I am unable to find a
>  relatively trivial solution to.
>  If I have a dataframe,
> 
>  A    1 
>  A    7
>  B    4
>  B    5
>  C    3
>  D    3
>  D    2
>  E    5
>  F    5
>  F    6
> 
>  I would like to create a new data.frame in the form
> 
>  ID    pt1    pt2
>  A    1    7
>  B    4    5
>  C    3    NA
>  D    3    2
>  E    5    NA
>  F    5    6
> 
>  so that for each identifier, in this example, A...F I have a column for 
>  each observation for each identifier... (with a maximum of 2 obs per 
>  identifier, if only 1 obs exist then the second obs pt2 is set to NA)
>  This is so I can find the absolute differences between the obs for each 
>  identifier, that is abs(pt1-pt2)
> 
>  ID  Diff
>  A    6
>  B    1
>  C    NA
>  D    1
>  E    NA
>  F    1
>  for which there may be another approach so as not to mess about creating 
>  a new dataframe
>  Any ideas?
>  Gary
> 
> 
> __________________________________________________
> Gary S. Collins, PhD,
> Statistics Research Fellow,
> Quality of Life Unit, 
> European Organisation for Research and Treatment of Cancer, 
> EORTC Data Center, 
> Avenue E. Mounier 83, bte. 11,
> B-1200 Brussels, Belgium.
> 
> Tel: +32 2 774 1 606
> Fax: +32 2 779 4 568
> http://www.eortc.be/home/qol/
> __________________________________________________
> 
> 
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 
-- 

 __________________________________________________
[                                                  ]
[ Giovanni Petris                 GPetris at uark.edu ]
[ Department of Mathematical Sciences              ]
[ University of Arkansas - Fayetteville, AR 72701  ]
[ Ph: (501) 575-6324, 575-8630 (fax)               ]
[ http://definetti.uark.edu/~gpetris/              ]
[__________________________________________________]

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2002-Jan-22 16:58 UTC

head link

[R] splitting a data.frame

On Tue, 22 Jan 2002, Gary Collins wrote:
>  I have the following (simple!?) problem which I am unable to find a
>  relatively trivial solution to.
>  If I have a dataframe,
>
>  A    1
>  A    7
>  B    4
>  B    5
>  C    3
>  D    3
>  D    2
>  E    5
>  F    5
>  F    6
>
>  I would like to create a new data.frame in the form
>
>  ID    pt1    pt2
>  A    1    7
>  B    4    5
>  C    3    NA
>  D    3    2
>  E    5    NA
>  F    5    6
>
In addition to the specific suggestions already given there is a general
solution to this sort of problem with reshape()

You would need to create a time variable to indicate which is the first
or second observation, which in your case could be
  df$time<-c(0,df$ID[-1]==df$ID[-10])
then
 
newdf<-reshape(df,timevar="time",idvar="ID",direction="wide")


In your case this isn't a big saving over the other approaches. It becomes
really useful when you have many variables, especially if some are
constant over time and others aren't.


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more apparently analagous threads

R help - Jan 2002 - splitting a data.frame

[R] splitting a data.frame

[R] splitting a data.frame

[R] splitting a data.frame

[R] splitting a data.frame

[R] splitting a data.frame

Maybe Matching Threads