Hello! I have a CSV file and when I try to import it into R with read.csv2 I get the following error: R> read.csv2(file="pasme2.csv", na.strings="") Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0)) : invalid multibyte string There are some "strange" characters in it, but I never experienced such behaviour. Actually, this file was produced with R (2.4.0) on Windows! Utility file under Linux says: $ file pasme2.csv pasme.csv: Non-ISO extended-ASCII text I am attaching few lines of a file for example. And mandatory info: R version 2.4.0 (2006-10-03) i486-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C -- Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europe fax: +386 (0)1 72 17 888 ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C. ----------------------------------------------------------------------
On Tue, 24 Oct 2006, Gregor Gorjanc wrote:> Hello! > > I have a CSV file and when I try to import it into R with read.csv2 I > get the following error: > > R> read.csv2(file="pasme2.csv", na.strings="") > Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings > = character(0)) : > invalid multibyte string > > There are some "strange" characters in it, but I never experienced such > behaviour. Actually, this file was produced with R (2.4.0) on Windows!Well, then it cannot be a UTF-8 file (there are no UTF-8 locales on Windows, not any means to write a UTF-8 file here), and you have told R to read it in your UTF-8 locale. You need to specify the correct encoding: see ?file. But then you would have to do that in any application with such a text file, as an application could at best guess the encoding.> Utility file under Linux says: > > $ file pasme2.csv > pasme.csv: Non-ISO extended-ASCII text > > I am attaching few lines of a file for example. And mandatory info:No attached file appeared.> R version 2.4.0 (2006-10-03) > i486-pc-linux-gnu > > locale: > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley wrote: ...>> There are some "strange" characters in it, but I never experienced such >> behaviour. Actually, this file was produced with R (2.4.0) on Windows! > > Well, then it cannot be a UTF-8 file (there are no UTF-8 locales on > Windows, not any means to write a UTF-8 file here), and you have told R > to read it in your UTF-8 locale. > > You need to specify the correct encoding: see ?file. But then you would > have to do that in any application with such a text file, as an > application could at best guess the encoding.Thank you very much. Your suggestion solved my problem i.e. I have used: read.csv2(file=file("pasme2.csv", encoding="WINDOWS-1252"))>> Utility file under Linux says: >> >> $ file pasme2.csv >> pasme.csv: Non-ISO extended-ASCII text >> >> I am attaching few lines of a file for example. And mandatory info: > > No attached file appeared.That is weird as my sent folder shows mail with an attachment. Anyway, problem is solved now - with Your help! -- Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europe fax: +386 (0)1 72 17 888 ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.