I was wondering if there is a way of editting strings in R. I have a set of strings and each set is a row of numbers and paranthesis. For example the first row is: (0 2)(3 4)(7 9)(5 9)(1 5) and I have a thousand or so such rows. I was wondering how I could get the corresponding string obtained by adding 1 to all the numbers in the string above. Dursun [[alternative HTML version deleted]]
On Thu, 2004-07-29 at 15:56, Bulutoglu Dursun A Civ AFIT/ENC wrote:> I was wondering if there is a way of editting strings in R. I > have a set of strings and each set is a row of numbers and paranthesis. > For example the first row is: > (0 2)(3 4)(7 9)(5 9)(1 5) > and I have a thousand or so such rows. I was wondering how I > could get the corresponding string obtained by adding 1 to all the > numbers in the string above. > DursunI don't know if this is the most efficient approach, but working on a few hours of sleep, here goes: NewRow <- function(x) { TempRow <- as.numeric(unlist(strsplit(x, "([\\(\\) ])"))) + 1 TempMat <- matrix(TempRow[!is.na(TempRow)], ncol = 2, byrow = TRUE) paste("(", TempMat[, 1], " ", TempMat[, 2], ")", sep = "", collapse = "") } Basically, the first line splits the character vector into its components using "(", ")" and " " as regex based delimiters. It coerces the result to a numeric vector and adds 1. The second line takes the adjusted non-NA values and converts them into a two column matrix, to make it easier to do the paste in line 3. Line 3 returns the adjusted character vector reconstructed. So: MyRow <- "(0 2)(3 4)(7 9)(5 9)(1 5)"> NewRow(MyRow)[1] "(1 3)(4 5)(8 10)(6 10)(2 6)" So, if you have a bunch of these rows, you could use this function with apply: MyData <- matrix(c("(0 2)(3 4)(7 9)(5 9)(1 5)", "(1 6)(4 5)(3 7)(4 8)(9 0)", "(3 5)(8 1)(4 7)(2 7)(6 1)"))> MyData[,1] [1,] "(0 2)(3 4)(7 9)(5 9)(1 5)" [2,] "(1 6)(4 5)(3 7)(4 8)(9 0)" [3,] "(3 5)(8 1)(4 7)(2 7)(6 1)"> matrix(apply(MyData, 1, NewRow))[,1] [1,] "(1 3)(4 5)(8 10)(6 10)(2 6)" [2,] "(2 7)(5 6)(4 8)(5 9)(10 1)" [3,] "(4 6)(9 2)(5 8)(3 8)(7 2)" Somebody may come up with an approach that is more efficient I suspect. For 1,200 rows:> system.time(apply((matrix(rep(MyData, 400))), 1, NewRow))[1] 0.29 0.00 0.33 0.00 0.00 (Gabor? ;-) HTH, Marc Schwartz
Bulutoglu Dursun A Civ AFIT/ENC <Dursun.Bulutoglu <at> afit.edu> writes:> > I was wondering if there is a way of editting strings in R. I > have a set of strings and each set is a row of numbers and paranthesis. > For example the first row is: > (0 2)(3 4)(7 9)(5 9)(1 5) > and I have a thousand or so such rows. I was wondering how I > could get the corresponding string obtained by adding 1 to all the > numbers in the string above.First do the 1 character translations simultaneously using chartr and then use gsub for the remaining one to two character translation: gsub("0","10",chartr("0123456789","1234567890","(0 2)(3 4)(7 9)(5 9)(1 5)"))
On Thu, 2004-07-29 at 21:08, Gabor Grothendieck wrote:> Bulutoglu Dursun A Civ AFIT/ENC <Dursun.Bulutoglu <at> afit.edu> writes: > > > > > I was wondering if there is a way of editting strings in R. I > > have a set of strings and each set is a row of numbers and paranthesis. > > For example the first row is: > > (0 2)(3 4)(7 9)(5 9)(1 5) > > and I have a thousand or so such rows. I was wondering how I > > could get the corresponding string obtained by adding 1 to all the > > numbers in the string above. > > First do the 1 character translations simultaneously using chartr and > then use gsub for the remaining one to two character translation: > > gsub("0","10",chartr("0123456789","1234567890","(0 2)(3 4)(7 9)(5 9)(1 5)"))Gabor, One problem: Multi-digit numbers in the source string:> gsub("0","10",chartr("0123456789","1234567890","(10 99)(3 4)(7 9)(5 9)(1 5)")) [1] "(21 1010)(4 5)(8 10)(6 10)(2 6)" Note the first number "10" gets transformed to "21" and the "99" goes to "1010". I made a quick update to NewRow, which is not faster, but gets it to two lines, instead of three, and is a bit cleaner: NewRow <- function(x) { TempMat <- matrix(as.numeric(unlist(strsplit(x, "([\\(\\) ])"))), ncol = 3, byrow = TRUE) + 1 paste("(", TempMat[, 2], " ", TempMat[, 3], ")", sep = "", collapse = "") } Note that with multi digit numbers, it gives a correct result:> NewRow("(10 99)(101 4)(7 9)(5 9)(1 5)")[1] "(11 100)(102 5)(8 10)(6 10)(2 6)" HTH, Marc Schwartz