hi, I need to migrate 40T data and 180M files from one storage device to another one, both source and destination will be NFS and mounted to a local suse linux box. The first question is that if there is any risk for such a big number of files? should I divide them into groups and rsync them in parallel or in serial? If yes, how many groups is better? The second question is about memory. How much memory should I install to the linux box? The rsync FAQ(http://rsync.samba.org/FAQ.html#4) says one file will use 100 bytes to store relevant information, so 180M files will use about 18G memory. How much memory should be installed totally? And any other thing I could do to reduce the risk? Thanks in advance. Gao, Ming -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20090811/a43c53b3/attachment.html>
Hi. Tue, 11 Aug 2009 16:14:33 +0800, gaomingcn wrote:> The second question is about memory. > How much memory should I install to the linux box? The rsync FAQ > (http://rsync.samba.org/FAQ.html#4) says one file will use 100 bytes > to store relevant information, so 180M files will use about > 18G memory.That is true of rsync 2. rsync 3 uses an incremental mode that allows to transfer millions of files with 1 GB RAM or less. Sven
On Tue, 2009-08-11 16:14:33 +0800, Ming Gao <gaomingcn at gmail.com> wrote:> I need to migrate 40T data and 180M files from one storage device to another > one, both source and destination will be NFS and mounted to a local suse > linux box. > > The first question is that if there is any risk for such a big number of > files? should I divide them into groups and rsync them in parallel or in > serial? If yes, how many groups is better? > > The second question is about memory. How much memory should I install to the > linux box? The rsync FAQ(http://rsync.samba.org/FAQ.html#4) says one file > will use 100 bytes to store relevant information, so 180M files will use > about 18G memory. How much memory should be installed totally? > > And any other thing I could do to reduce the risk?There are no specific risks these days I think, but it sounds like this is a task like "copy over anything", so it's merely mount both filesystems and use two `tar' instances with a pipe in between... MfG, JBG -- Jan-Benedict Glaw jbglaw at lug-owl.de +49-172-7608481 Signature of: http://perl.plover.com/Questions.html the second : -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: Digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20090811/b026dd4e/attachment.pgp>
Ming Gao wrote:> I need to migrate 40T data and 180M files from one storage device to > another one, both source and destination will be NFS and mounted to a > local suse linux box.Is there any way you could get local access to the write end of the transfer so that you don't have to do this all via NFS? The NFS write performance might cause issues. Personally I'd probably not use rsync for this transfer. The startup time is going to kill you. If the directory structure is agreeable then you could split the problem down into subtrees. Let me guess. You're going from one NetApp box to another NetApp box? Or some other NAS to NAS so you will be unable to get local access to either end? Have you done some testing of the create/sec and large file bandwidth you'll be able to get? Any way you look at it this is going to take a long time. If you could sustain 80MB/s across a gig-e link, which is pretty high for NFS writes, then you're looking at 6 days minimum. If the directories have a high file count then you could be looking at high open times for each file so that 6 days could blow out.> The first question is that if there is any risk for such a big number of > files? should I divide them into groups and rsync them in parallel or in > serial? If yes, how many groups is better?If you only have one path between the source and target then I'd try to go for about 3 copy threads. This number can only really be determined through trial and error.> And any other thing I could do to reduce the risk?Once you get all the file copied over by whatever means then a final rsync would be good to get all the metadata lined up. Based on your file count I'd strongly recommend you break up the filesystem into smaller problems. Brett
Ming Gao wrote:> The first question is that if there is any risk for such a big number of > files? should I divide them into groups and rsync them in parallel or in > serial? If yes, how many groups is better?For that amount of data, you ought to use something simple and recursive, like cp -rp. A tar pipe will typically break after a couple terabytes; at least, that's what happens in my experience. After the initial cp, follow up with an rsync. How long the rsync takes will depend immensely on how good your NFS servers are at cacheing file metadata. If your testing demonstrates the time-to-rsync is not acceptable, and you are not otherwise disk-bound, you may want to investigate breaking the task up into multiple simultaneous rsync processes. It will be easier to manage if you don't have to do that, though. --Kyle