thr3ads.net - R help - [R] reshaping some data [Sep 2004]

If this information is useful, please help other people find it:
Share via:

Sundar Dorai-Raj

2004-Sep-14 16:15 UTC

[R] reshaping some data

Hi all,
   I have a data.frame with the following colnames pattern:

x1 y11 x2 y21 y22 y23 x3 y31 y32 ...

I.e. I have an x followed by a few y's. What I would like to do is turn 
this wide format into a tall format with two columns: "x",
"y". The
structure is that xi needs to be associated with yij (e.g. x1 should 
next to y11 and y12, x2 should be next to y21, y22, and y23, etc.).

  x   y
x1 y11
x2 y21
x2 y22
x2 y23
x3 y31
x3 y32
...

I have looked at ?reshape but I didn't see how it could work with this 
structure. I have a solution using nested for loops (see below), but 
it's slow and not very efficient. I would like to find a vectorised 
solution that would achieve the same thing.

Now, for an example:

x <- data.frame(x1 =  1: 5, y11 =  1: 5,
                 x2 =  6:10, y21 =  6:10, y22 = 11:15,
                 x3 = 11:15, y31 = 16:20,
                 x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35)
# which are the x columns
nmx <- grep("^x", names(x))
# which are the y columns
nmy <- grep("^y", names(x))
# grab y values
y <- unlist(x[nmy])
# reserve some space for the x's
z <- vector("numeric", length(y))
# a loop counter
k <- 0
n <- nrow(x)
seq.n <- seq(n)
# determine how many times to repeat the x's
repy <- diff(c(nmx, length(names(x)) + 1)) - 1
for(i in seq(along = nmx)) {
   for(j in seq(repy[i])) {
     # store the x values in the appropriate z indices
     z[seq.n + k * n] <- x[, nmx[i]]
     # move to next block in z
     k <- k + 1
   }
}
data.frame(x = z, y = y, row.names = NULL)

Berton Gunter

2004-Sep-14 16:45 UTC

head link

[R] reshaping some data

Sundar:

As I understand it, you can easily create an index variable (a pointer,
actually) that will pick out the y columns in order:

z<-yourdataframe
y<-as.vector(z[,indexvar])

So if you could cbind() the x's, you'd be all set.

Again, assuming I understand correctly, the x column you want is:

x<-z[,-indexvar] ## still a frame/matrix
nvec<-seq(length=ncol(x))
x<-as.vector(x[,rep(nvec,times=nvec)])

HTH -- and even if I got it wrong, it was fun, so thanks.

-- Bert

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sundar 
> Dorai-Raj
> Sent: Tuesday, September 14, 2004 9:16 AM
> To: R-help
> Subject: [R] reshaping some data
> 
> Hi all,
>    I have a data.frame with the following colnames pattern:
> 
> x1 y11 x2 y21 y22 y23 x3 y31 y32 ...
> 
> I.e. I have an x followed by a few y's. What I would like to 
> do is turn 
> this wide format into a tall format with two columns: "x",
"y". The
> structure is that xi needs to be associated with yij (e.g. x1 should 
> next to y11 and y12, x2 should be next to y21, y22, and y23, etc.).
> 
>   x   y
> x1 y11
> x2 y21
> x2 y22
> x2 y23
> x3 y31
> x3 y32
> ...
> 
> I have looked at ?reshape but I didn't see how it could work 
> with this 
> structure. I have a solution using nested for loops (see below), but 
> it's slow and not very efficient. I would like to find a vectorised 
> solution that would achieve the same thing.
> 
> Now, for an example:
> 
> x <- data.frame(x1 =  1: 5, y11 =  1: 5,
>                  x2 =  6:10, y21 =  6:10, y22 = 11:15,
>                  x3 = 11:15, y31 = 16:20,
>                  x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35)
> # which are the x columns
> nmx <- grep("^x", names(x))
> # which are the y columns
> nmy <- grep("^y", names(x))
> # grab y values
> y <- unlist(x[nmy])
> # reserve some space for the x's
> z <- vector("numeric", length(y))
> # a loop counter
> k <- 0
> n <- nrow(x)
> seq.n <- seq(n)
> # determine how many times to repeat the x's
> repy <- diff(c(nmx, length(names(x)) + 1)) - 1
> for(i in seq(along = nmx)) {
>    for(j in seq(repy[i])) {
>      # store the x values in the appropriate z indices
>      z[seq.n + k * n] <- x[, nmx[i]]
>      # move to next block in z
>      k <- k + 1
>    }
> }
> data.frame(x = z, y = y, row.names = NULL)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Peter Wolf

2004-Sep-14 17:07 UTC

head link

[R] reshaping some data

Try:

x <- data.frame(x1 =  1: 5, y11 =  1: 5,
                x2 =  6:10, y21 =  6:10, y22 = 11:15,
                x3 = 11:15, y31 = 16:20,
                x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35)

df.names<-names(x)
ynames<-df.names[grep("y",df.names)]
xnames<-substring(sub("y","x",ynames),1,2)
cbind(unlist(x[,xnames]),unlist(x[,ynames]))

Peter

Sundar Dorai-Raj wrote:
> Hi all,
>   I have a data.frame with the following colnames pattern:
>
> x1 y11 x2 y21 y22 y23 x3 y31 y32 ...
>
> I.e. I have an x followed by a few y's. What I would like to do is 
> turn this wide format into a tall format with two columns: "x",
"y".
> The structure is that xi needs to be associated with yij (e.g. x1 
> should next to y11 and y12, x2 should be next to y21, y22, and y23, 
> etc.).
>
>  x   y
> x1 y11
> x2 y21
> x2 y22
> x2 y23
> x3 y31
> x3 y32
> ...
>
> I have looked at ?reshape but I didn't see how it could work with this 
> structure. I have a solution using nested for loops (see below), but 
> it's slow and not very efficient. I would like to find a vectorised 
> solution that would achieve the same thing.
>
> Now, for an example:
>
> x <- data.frame(x1 =  1: 5, y11 =  1: 5,
>                 x2 =  6:10, y21 =  6:10, y22 = 11:15,
>                 x3 = 11:15, y31 = 16:20,
>                 x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35)
> # which are the x columns
> nmx <- grep("^x", names(x))
> # which are the y columns
> nmy <- grep("^y", names(x))
> # grab y values
> y <- unlist(x[nmy])
> # reserve some space for the x's
> z <- vector("numeric", length(y))
> # a loop counter
> k <- 0
> n <- nrow(x)
> seq.n <- seq(n)
> # determine how many times to repeat the x's
> repy <- diff(c(nmx, length(names(x)) + 1)) - 1
> for(i in seq(along = nmx)) {
>   for(j in seq(repy[i])) {
>     # store the x values in the appropriate z indices
>     z[seq.n + k * n] <- x[, nmx[i]]
>     # move to next block in z
>     k <- k + 1
>   }
> }
> data.frame(x = z, y = y, row.names = NULL)
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

Gabor Grothendieck

2004-Sep-14 17:56 UTC

head link

[R] reshaping some data

Try this:

is.x <- substr(colnames(x),1,1) == "x"   # TRUE if col name starts
with x
x. <- unlist(rep(x[,is.x], diff(which(c(is.x,TRUE)))-1))   # repeat x cols
names(x.) <- NULL
y. <- unlist(x[,!is.x])
DF <- data.frame(x = x., y = y., row.names = NULL)



Sundar Dorai-Raj <sundar.dorai-raj <at> PDF.COM> writes:

: 
: Hi all,
:    I have a data.frame with the following colnames pattern:
: 
: x1 y11 x2 y21 y22 y23 x3 y31 y32 ...
: 
: I.e. I have an x followed by a few y's. What I would like to do is turn 
: this wide format into a tall format with two columns: "x",
"y". The
: structure is that xi needs to be associated with yij (e.g. x1 should 
: next to y11 and y12, x2 should be next to y21, y22, and y23, etc.).
: 
:   x   y
: x1 y11
: x2 y21
: x2 y22
: x2 y23
: x3 y31
: x3 y32
: ...
: 
: I have looked at ?reshape but I didn't see how it could work with this 
: structure. I have a solution using nested for loops (see below), but 
: it's slow and not very efficient. I would like to find a vectorised 
: solution that would achieve the same thing.
: 
: Now, for an example:
: 
: x <- data.frame(x1 =  1: 5, y11 =  1: 5,
:                  x2 =  6:10, y21 =  6:10, y22 = 11:15,
:                  x3 = 11:15, y31 = 16:20,
:                  x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35)
: # which are the x columns
: nmx <- grep("^x", names(x))
: # which are the y columns
: nmy <- grep("^y", names(x))
: # grab y values
: y <- unlist(x[nmy])
: # reserve some space for the x's
: z <- vector("numeric", length(y))
: # a loop counter
: k <- 0
: n <- nrow(x)
: seq.n <- seq(n)
: # determine how many times to repeat the x's
: repy <- diff(c(nmx, length(names(x)) + 1)) - 1
: for(i in seq(along = nmx)) {
:    for(j in seq(repy[i])) {
:      # store the x values in the appropriate z indices
:      z[seq.n + k * n] <- x[, nmx[i]]
:      # move to next block in z
:      k <- k + 1
:    }
: }
: data.frame(x = z, y = y, row.names = NULL)

Possibly Parallel Threads

Search for more maybe matching threads

R help - Sep 2004 - reshaping some data

[R] reshaping some data

[R] reshaping some data

[R] reshaping some data

[R] reshaping some data

Possibly Parallel Threads