Sunday, 7 June 2009

Linux Tips: SSH

It's been a while since I posted, having gotten most of my rants off my chest in the early days of the blog. Rather than just spout my opinions again, I thought I'd do some posts about Unix/Linux tricks that I find myself using time and again.

First up, uses of secure shell 'SSH'. These tricks should work with any unix that has an ssh command, so they should also work with MacOSX, I think. Obviously, you need two machines for them to communicate between, or it's all pointless.

Ssh is a rather wonderful tool, it's like telnet but encrypted. (If you don't know what telnet is, then it's probably not for you.)

SSH isn't just able to work as a remote terminal, you can pipe the output of a command into it, and send that to a remote machine. Or you can run the command on the remote machine, and have it's output come spewing out of SSH.

Run a command on a remote machine (in this case the date command), and see it's output:

ssh "date"

Run a command on a remote machine and pipe it's output to a file:

ssh "cat /var/log/messages" > messages.remote_machine

Run a command on a remote machine and pipe it's output to a local command (this should get the recent diagnostic messages from the remote machine, and pipe them into a mail command on the local machine).

ssh "dmesg" | mail colum

Run a command on the local machine, and send it's output to the remote (sends the diagnostic messages from the local machine, and counts the lines in it on the remote. I know it's not very useful, but it's just an example, okay?)

dmesg | ssh "wc -l"

One thing I consistently use this for is shifting large numbers of files from one machine to another using the 'tar' command, like this:

tar -zcO file1 file2 *.dat | ssh "mkdir Backups; cd Backups; tar -zxf -"

if you don't know tar, it's an old-school 'zip' utility for unix. The '-zxf' is a condensed form of the command line switches '-z -x -f', as tar lets you run them together. The tar command

tar -z -c -f tarfile.tar.gz file1 file2

Makes a 'tarfile' called tarfile.tar.gz, and this is compressed using gzip compression:
-z = use gzip compression
-c = create tarfile
-f <filename>

you can unpack the tarfile with -zxf tarfile.tar.gz, where 'x' means 'extract files'.

To do our 'send to remote' trick though, we need to not make a tarfile, but to pump the tar output to standard out, which we can then pipe into our ssh command. The command line switch for this is '-O', which, if you don't pipe it into anything, will cause tar output to spew all over your screen. Logically, if 'send to standard output' is '-O', then 'read from standard input' should be '-I'. But it's not, to do this you type '-f -', and this gives us our full command:

tar -z -c -O files | ssh user@somewhere "tar -z -x -f -"

You can add extra commands to the command line at either end, provided they don't print out any output, which will confuse tar. Usually these are commands to make or change directories:

cd /home/colum/mywork ; tar -zcO files | ssh user@somewhere "cd /home/colum/mywork.backup; tar -zxf -"

if you want better compression during the send, and have the bzip command on your system, try replacing '-z' in the command line switches with '-j'. Obviously we have to do this at both ends.

If you want to 'Pull' files from the remote machine, rather than 'Push' too it, do this:

ssh user@somewhere "tar -zcO *" | tar -zxf -

Obviously '*' will back up all files in the login directory on the remote machine. You knew that, right?

Another thing I use this for, is synchonizing clocks between machines. The 'date' command will take the command-line-switch '-s <date>' which lets you set the date/time on a machine. So a short script like:

ssh root@remote_machine "date -s \"$DATE\""

Will set the date, provided that the user has access rights to do such things (which is why the login in 'root' here). Notice the horrible quoting of the $DATE argument on the remote machine. This is because we have to stop spaces in the date being interpreted as breaks between arguments, because the date is all one argument. So we need \" in order that the local machine doesn't try to interpret quote marks intended for the remote machine.

Another use, is watching for changes in files that shouldn't change on a remote machine. You can use the 'md5sum' command to get an md5 checksum for each file. An 'md5 checksum', in case you don't know, is a string of gobbledegook that is supposed to be unique to a file, i.e., no two files should have the same md5. Now, it's actually true that, as there are an infinite number of possible files and an md5 string is only so many bytes long, there *must* be an infinite number of files that have the same checksum. However, the likelyhood of your ever encountering two such files in normal use is vanishingly small. It's the same reasoning as the fact that there is a real possibility of all, or most, of the air molecules in your room choosing to move up one end of it, leaving a suffocating vacuum at the other end. However this never happens because it's tremendously unlikely, as air molecules famously can't get their shit together to be that organized. If you waited forever, you would see it happen, in fact, if you waited for an infinite amount of time, you'd see it happen infinite times. But it's utterly unlikely that you'll see it in your flickeringly brief lifetime, human.

Anyways, I routinely check the /bin /sbin /usr/bin /usr/sbin directories on linux servers, as these directories contain standard programs that should never change. If they do change, then something or someone has changed them, and this may indicate foul play.

ssh user@remote_machine "md5sum /bin/* /sbin/* /usr/bin/* /usr/sbin/* | sort" > remote-files.md5

This gives me a sorted file with md5 sums of the programs in it. Once I have this, I can get a fresh version each day, and write that to a different file, say 'remote-files.latest.md5' and then use a diff command to do a compare between the two. Something like this:

ssh user@remote_machine "md5sum /bin/* /sbin/* /usr/bin/* /usr/sbin/* | sort" > remote-files.latest.md5

VAL=`diff remote-files.md5 remote-files.latest.md5 | grep "^<" | wc -l`

if [ "$VAL" -gt 0 ]

echo "EEEK!! What's going on!?" | mail -s "Files changed on remote machine!"


Note that the 'diff' command produces complex output, containing, amongst other things, the lines from each file that differ. The lines from 'file 1' are prefixed with a '<' and the lines from file 2 with a '>'. This means you get two lines for each difference (at least). Grepping for lines starting with '<' means you get an accurate count of the number of differences.

Okay, getting more advanced, an easy way to do incremental backups is to use 'md5sum' to checksum files on both machines, and then use 'diff' to find differences between these. Here's the script:

cd mywork

find . -exec md5sum {} \; | sort > /tmp/mywork-local.md5sum
ssh colum@remote_machine "cd mywork ; find . -exec md5sum {} \; | sort" > tmp/mywork-remote.md5sum

diff /tmp/mywork-local.md5sum /tmp/mywork-remote.md5sum | grep '^<' | cut -d ' ' -f 4- > /tmp/mywork.diff

tar -T /tmp/mywork.diff -zcO | ssh colum@remote_machine "cd mywork; tar -zxf -"

This will build md5 checksums of the files in the directory 'mywork' on both machines, compare them for differences, and then backup those differences from one machine to another.

We have to use 'find' to run md5sum against all the files, because md5sum doesn't come with a useful 'recursive' option to say 'If there's a directory, go into that, and md5sum the files in that too'.

Then we do a diff between the two lists. Diff produces output that only shows lines that differ between the two files, and prefixes them with '<' or '>' to indicate which file the line is taken from. So, if we do:

diff /tmp/mywork-local.md5sum /tmp/mywork-remote.md5sum | grep '^<'

Then we only get the lines that say 'this is different on the local machine'. The following 'cut' command then clips out the filename from the md5sum output line, so finally we have a list of filenames. We use the '-T' option to tell tar to take the list of files to archive from a file (our diff output file) and thus we are sending only those files that are different between the two machines, to the remote machine.

Okay, the final thing. All the way through, if you've been trying any of these commands, you'll have found that SSH prompts you for a password. Obviously, if you are wanting to make scripts like these automated to do backups or what-have-you, then this is a problem. The solution is to use 'public key authentication', though this can be a bit of a flog to set up.

SSH accepts a command line switch '-i <filename>', which tells it a file that contains a public key to use to authenticate. So far, so easy, but setting a machine up to accept this always seems a little awkward. Basically:

1) as the user that is going to log into the system, run the command 'ssh-keygen'. It will ask you a bunch of stuff, just hit 'enter' to accept defaults every time.

2) You should now have, in the directory '.ssh' in the users home directory, two files. id_rsa and Rename to be 'authorized_keys' and set it to have read-write permissions only for the file owner (so 'ls' shows: '-rw-------' )

3) you will have to make sure that the users home directory, and the .ssh home directory only have read, write and execute permissions for that user (this can be a problem with 'actual' user directories, best to create a user especially for automated processes to log in).

4) Take the other file that ssh-keygen produce ('id_rsa') and put it on the machine that you want to log in from. You can then log in with:

ssh user@remote_machine -i id_rsa

This won't ask for a password (unless something went wrong with the setup) so it's usable for automated processes running on a 'cron job' overnight.

Personally, I tend to rename the 'id_rsa' file to be 'id_rsa.name_of_remote_machine', because that way I can store a bunch of them in the same directory together, and I can also write scripts that work through machine after machine, simply by doing something like this:

for MACHINE in "machine1 machine2 machine3"
ssh user@$MACHINE -i id_rsa.$MACHINE "md5sum /bin/*" > $MACHINE.md5

Obviously, for this to work you have to rename the id_rsa files to be id_rsa.machine1 and id_rsa.machine2 etc, and you have to have something that maps 'machine1' to an ip address. I normally just put entries for them in the hosts file of the machine I'm logging in from.

Okay, I think that's enough of that. If you understood it, then I hope some of it is some use to you!


  1. Lovely post! I still had a tough time getting to trap the output of a mysql query on a remote machine to a file on the local machine. Guess what - the mysql password had a special character in it.

    bash$-> ssh guest@remote "mysql -u mysql_user_on_remote --password='remote$password' -e 'select User from user' mysql" > local_file

    was very puzzling. Funny that the resolution was an escape character

    bash$-> ssh guest@remote "mysql -u mysql_user_on_remote --password='remote\$password' -e 'select User from user' mysql" > local_file.

    Your post helped a lot. Thanks

  2. Hurrah!

    I'm delighted to have been some use to you. Maybe I Should do more of these linux posts? I'm worried about scaring away any non-geek readers though (although, maybe I don't have any anyways)?