Hi, I want to use reshape to convert from a skinny to a wide data format. My data doesn't have a time variable attached - I have a series of ordered observations for each subject, and it is this ordering that I am interested in (my objective is to model the most recent observation based on the preceding observations). From my understanding, prior to using reshape I have to attach a time variable, which I have done using the code below. The problem is that it is extremely slow - it took about 2 hours, on a dataset of about 800,000 lines. So my questions are 1) Is there a (quick) way to use reshape without adding a time variable? 2) If the time variable is necessary, is there a quicker way to generate it? (I know there is, because I did it in Excel...) Thanks in advance for any advice. Andre> fooLabel Value 1 Alpha 0.57911322 2 Alpha 0.02270605 3 Alpha 0.58487636 4 Alpha 0.33741690 5 Alpha 0.38313390 6 Alpha 0.17298453 7 Beta 0.72645922 8 Beta 0.69010992 9 Beta 0.34449334 10 Gamma 0.13298949 11 Gamma 0.51267369 12 Gamma 0.03582759 13 Gamma 0.50352449 14 Delta 0.07146389 15 Delta 0.96315046> foo[1,3] <- 1 > for (i in 2:length(foo[,1])) {+ if (foo[i,1] ==foo[(i-1),1]) foo[i,3] <- foo[i-1,3] + 1 + else foo[i,3] <- 1 + }> fooLabel Value V3 1 Alpha 0.57911322 1 2 Alpha 0.02270605 2 3 Alpha 0.58487636 3 4 Alpha 0.33741690 4 5 Alpha 0.38313390 5 6 Alpha 0.17298453 6 7 Beta 0.72645922 1 8 Beta 0.69010992 2 9 Beta 0.34449334 3 10 Gamma 0.13298949 1 11 Gamma 0.51267369 2 12 Gamma 0.03582759 3 13 Gamma 0.50352449 4 14 Delta 0.07146389 1 15 Delta 0.96315046 2 ********************************************************************** This email and any attachments are confidential, protect...{{dropped:22}}
Henrique Dallazuanna
2010-Oct-06 14:11 UTC
[R] Adding a time variable prior to using reshape
Try this: with(foo, ave(Value, Label, FUN = seq)) will generate the time variable xtabs(Value ~ ave(Value, Label, FUN = seq) + Label, foo) On Wed, Oct 6, 2010 at 11:07 AM, Andre Easom <AEasom@sportingindex.com>wrote:> Hi, > > I want to use reshape to convert from a skinny to a wide data format. My > data doesn't have a time variable attached - I have a series of ordered > observations for each subject, and it is this ordering that I am interested > in (my objective is to model the most recent observation based on the > preceding observations). From my understanding, prior to using reshape I > have to attach a time variable, which I have done using the code below. The > problem is that it is extremely slow - it took about 2 hours, on a dataset > of about 800,000 lines. So my questions are > > 1) Is there a (quick) way to use reshape without adding a time > variable? > > 2) If the time variable is necessary, is there a quicker way to > generate it? (I know there is, because I did it in Excel...) > > Thanks in advance for any advice. > Andre > > > > > foo > Label Value > 1 Alpha 0.57911322 > 2 Alpha 0.02270605 > 3 Alpha 0.58487636 > 4 Alpha 0.33741690 > 5 Alpha 0.38313390 > 6 Alpha 0.17298453 > 7 Beta 0.72645922 > 8 Beta 0.69010992 > 9 Beta 0.34449334 > 10 Gamma 0.13298949 > 11 Gamma 0.51267369 > 12 Gamma 0.03582759 > 13 Gamma 0.50352449 > 14 Delta 0.07146389 > 15 Delta 0.96315046 > > foo[1,3] <- 1 > > for (i in 2:length(foo[,1])) { > + if (foo[i,1] ==foo[(i-1),1]) foo[i,3] <- foo[i-1,3] + 1 > + else foo[i,3] <- 1 > + } > > foo > Label Value V3 > 1 Alpha 0.57911322 1 > 2 Alpha 0.02270605 2 > 3 Alpha 0.58487636 3 > 4 Alpha 0.33741690 4 > 5 Alpha 0.38313390 5 > 6 Alpha 0.17298453 6 > 7 Beta 0.72645922 1 > 8 Beta 0.69010992 2 > 9 Beta 0.34449334 3 > 10 Gamma 0.13298949 1 > 11 Gamma 0.51267369 2 > 12 Gamma 0.03582759 3 > 13 Gamma 0.50352449 4 > 14 Delta 0.07146389 1 > 15 Delta 0.96315046 2 > ********************************************************************** > This email and any attachments are confidential, prote...{{dropped:20}}
On Oct 6, 2010, at 16:11 , Henrique Dallazuanna wrote:> Try this: > > with(foo, ave(Value, Label, FUN = seq)) will generate the time variableMake that FUN=seq_along to avoid trouble with groups of size 1. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com