Hi all, I have a data.frame with the following colnames pattern: x1 y11 x2 y21 y22 y23 x3 y31 y32 ... I.e. I have an x followed by a few y's. What I would like to do is turn this wide format into a tall format with two columns: "x", "y". The structure is that xi needs to be associated with yij (e.g. x1 should next to y11 and y12, x2 should be next to y21, y22, and y23, etc.). x y x1 y11 x2 y21 x2 y22 x2 y23 x3 y31 x3 y32 ... I have looked at ?reshape but I didn't see how it could work with this structure. I have a solution using nested for loops (see below), but it's slow and not very efficient. I would like to find a vectorised solution that would achieve the same thing. Now, for an example: x <- data.frame(x1 = 1: 5, y11 = 1: 5, x2 = 6:10, y21 = 6:10, y22 = 11:15, x3 = 11:15, y31 = 16:20, x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35) # which are the x columns nmx <- grep("^x", names(x)) # which are the y columns nmy <- grep("^y", names(x)) # grab y values y <- unlist(x[nmy]) # reserve some space for the x's z <- vector("numeric", length(y)) # a loop counter k <- 0 n <- nrow(x) seq.n <- seq(n) # determine how many times to repeat the x's repy <- diff(c(nmx, length(names(x)) + 1)) - 1 for(i in seq(along = nmx)) { for(j in seq(repy[i])) { # store the x values in the appropriate z indices z[seq.n + k * n] <- x[, nmx[i]] # move to next block in z k <- k + 1 } } data.frame(x = z, y = y, row.names = NULL)
Sundar: As I understand it, you can easily create an index variable (a pointer, actually) that will pick out the y columns in order: z<-yourdataframe y<-as.vector(z[,indexvar]) So if you could cbind() the x's, you'd be all set. Again, assuming I understand correctly, the x column you want is: x<-z[,-indexvar] ## still a frame/matrix nvec<-seq(length=ncol(x)) x<-as.vector(x[,rep(nvec,times=nvec)]) HTH -- and even if I got it wrong, it was fun, so thanks. -- Bert -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sundar > Dorai-Raj > Sent: Tuesday, September 14, 2004 9:16 AM > To: R-help > Subject: [R] reshaping some data > > Hi all, > I have a data.frame with the following colnames pattern: > > x1 y11 x2 y21 y22 y23 x3 y31 y32 ... > > I.e. I have an x followed by a few y's. What I would like to > do is turn > this wide format into a tall format with two columns: "x", "y". The > structure is that xi needs to be associated with yij (e.g. x1 should > next to y11 and y12, x2 should be next to y21, y22, and y23, etc.). > > x y > x1 y11 > x2 y21 > x2 y22 > x2 y23 > x3 y31 > x3 y32 > ... > > I have looked at ?reshape but I didn't see how it could work > with this > structure. I have a solution using nested for loops (see below), but > it's slow and not very efficient. I would like to find a vectorised > solution that would achieve the same thing. > > Now, for an example: > > x <- data.frame(x1 = 1: 5, y11 = 1: 5, > x2 = 6:10, y21 = 6:10, y22 = 11:15, > x3 = 11:15, y31 = 16:20, > x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35) > # which are the x columns > nmx <- grep("^x", names(x)) > # which are the y columns > nmy <- grep("^y", names(x)) > # grab y values > y <- unlist(x[nmy]) > # reserve some space for the x's > z <- vector("numeric", length(y)) > # a loop counter > k <- 0 > n <- nrow(x) > seq.n <- seq(n) > # determine how many times to repeat the x's > repy <- diff(c(nmx, length(names(x)) + 1)) - 1 > for(i in seq(along = nmx)) { > for(j in seq(repy[i])) { > # store the x values in the appropriate z indices > z[seq.n + k * n] <- x[, nmx[i]] > # move to next block in z > k <- k + 1 > } > } > data.frame(x = z, y = y, row.names = NULL) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Try: x <- data.frame(x1 = 1: 5, y11 = 1: 5, x2 = 6:10, y21 = 6:10, y22 = 11:15, x3 = 11:15, y31 = 16:20, x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35) df.names<-names(x) ynames<-df.names[grep("y",df.names)] xnames<-substring(sub("y","x",ynames),1,2) cbind(unlist(x[,xnames]),unlist(x[,ynames])) Peter Sundar Dorai-Raj wrote:> Hi all, > I have a data.frame with the following colnames pattern: > > x1 y11 x2 y21 y22 y23 x3 y31 y32 ... > > I.e. I have an x followed by a few y's. What I would like to do is > turn this wide format into a tall format with two columns: "x", "y". > The structure is that xi needs to be associated with yij (e.g. x1 > should next to y11 and y12, x2 should be next to y21, y22, and y23, > etc.). > > x y > x1 y11 > x2 y21 > x2 y22 > x2 y23 > x3 y31 > x3 y32 > ... > > I have looked at ?reshape but I didn't see how it could work with this > structure. I have a solution using nested for loops (see below), but > it's slow and not very efficient. I would like to find a vectorised > solution that would achieve the same thing. > > Now, for an example: > > x <- data.frame(x1 = 1: 5, y11 = 1: 5, > x2 = 6:10, y21 = 6:10, y22 = 11:15, > x3 = 11:15, y31 = 16:20, > x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35) > # which are the x columns > nmx <- grep("^x", names(x)) > # which are the y columns > nmy <- grep("^y", names(x)) > # grab y values > y <- unlist(x[nmy]) > # reserve some space for the x's > z <- vector("numeric", length(y)) > # a loop counter > k <- 0 > n <- nrow(x) > seq.n <- seq(n) > # determine how many times to repeat the x's > repy <- diff(c(nmx, length(names(x)) + 1)) - 1 > for(i in seq(along = nmx)) { > for(j in seq(repy[i])) { > # store the x values in the appropriate z indices > z[seq.n + k * n] <- x[, nmx[i]] > # move to next block in z > k <- k + 1 > } > } > data.frame(x = z, y = y, row.names = NULL) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Try this: is.x <- substr(colnames(x),1,1) == "x" # TRUE if col name starts with x x. <- unlist(rep(x[,is.x], diff(which(c(is.x,TRUE)))-1)) # repeat x cols names(x.) <- NULL y. <- unlist(x[,!is.x]) DF <- data.frame(x = x., y = y., row.names = NULL) Sundar Dorai-Raj <sundar.dorai-raj <at> PDF.COM> writes: : : Hi all, : I have a data.frame with the following colnames pattern: : : x1 y11 x2 y21 y22 y23 x3 y31 y32 ... : : I.e. I have an x followed by a few y's. What I would like to do is turn : this wide format into a tall format with two columns: "x", "y". The : structure is that xi needs to be associated with yij (e.g. x1 should : next to y11 and y12, x2 should be next to y21, y22, and y23, etc.). : : x y : x1 y11 : x2 y21 : x2 y22 : x2 y23 : x3 y31 : x3 y32 : ... : : I have looked at ?reshape but I didn't see how it could work with this : structure. I have a solution using nested for loops (see below), but : it's slow and not very efficient. I would like to find a vectorised : solution that would achieve the same thing. : : Now, for an example: : : x <- data.frame(x1 = 1: 5, y11 = 1: 5, : x2 = 6:10, y21 = 6:10, y22 = 11:15, : x3 = 11:15, y31 = 16:20, : x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35) : # which are the x columns : nmx <- grep("^x", names(x)) : # which are the y columns : nmy <- grep("^y", names(x)) : # grab y values : y <- unlist(x[nmy]) : # reserve some space for the x's : z <- vector("numeric", length(y)) : # a loop counter : k <- 0 : n <- nrow(x) : seq.n <- seq(n) : # determine how many times to repeat the x's : repy <- diff(c(nmx, length(names(x)) + 1)) - 1 : for(i in seq(along = nmx)) { : for(j in seq(repy[i])) { : # store the x values in the appropriate z indices : z[seq.n + k * n] <- x[, nmx[i]] : # move to next block in z : k <- k + 1 : } : } : data.frame(x = z, y = y, row.names = NULL)