Wednesday, October 11, 2006

Parallel xargs and faster ssh connection initiation

You learn something new every day, even with old friends such as ssh and xargs.

SSH: It turns out that it's possible to create a live "Master" connection to a specific remote host which then can be used by other ssh command executions to quickly open sub-channels to that host without going through the authentication process every time and without compromising on security.

Example:

$ ssh -fMN -o ControlPath=~/.ssh-control-sock remote-host
$ ssh -o ControlPath=~/.ssh-control-sock remote-host date

more about this in ssh(1) and ssh_config(5)

XARGS: In response to some Digg pointer to xjobs, which seems to be similar to xargs only it can run multiple jobs in parallel, someone pointed out to xargs' own -P (--max-procs) argument which does exactly the same. Just have to remember to limit the number of file names passed to each job using -n (--max-args), otherwise all the file names will be passed to a single job.

Friday, September 22, 2006

The final IPTables setup and some initial results

Hi and welcome back. I've just realized that I haven't published the final solution for data gathering. So in this post I'll:
  1. Describe the IPTables setup
  2. Show the initial script I wrote to read the results
  3. Show some preliminary raw data
IPTables Setup
I've changed the setup described in the first post about the subject to also add the "foreign" side of each identified Skype connection to the "recent" list (using the "recent" module). That way when an ICMP packet arrives from a host in that list I assume that it's related to Skype (I guess there are very slim chances to have Skype traffic with a server such as web hosts).

Here is the updated setup:

# match all outgoing packets from gid skype, mark their connection
# and add their destination to the "recent list" so we can count ICMP packets to/from them
iptables -A OUTPUT -m owner --gid-owner skype --out-interface eth0 --protocol tcp -m recent --rdest --set --name Skype -j CONNMARK --set-mark 1
iptables -A OUTPUT -m owner --gid-owner skype --out-interface eth0 --protocol udp -m recent --rdest --set --name Skype -j CONNMARK --set-mark 2

# count ICMP packets going to hosts which appear in our "recent" list
iptables -A OUTPUT --out-interface eth0 --protocol icmp -m recent --rdest --name Skype --update -j ACCEPT -m comment --comment skype-out-icmp

# all packets which match the connection should go through the skype rule
iptables -A OUTPUT -m connmark --mark 1 -m comment --comment skype-out-tcp
iptables -A OUTPUT -m connmark --mark 2 -m comment --comment skype-out-udp

# match all packets on Skype's public TCP port and mark their connection
iptables -A INPUT -p tcp -m tcp --dport 21212 --in-interface eth0 -j CONNMARK --set-mark 1
iptables -A INPUT -p udp -m udp --dport 21212 --in-interface eth0 -j CONNMARK --set-mark 2
# count ICMP packets coming from hosts which appear in our "recent" list
iptables -A INPUT -p icmp --in-interface eth0 -m recent --name Skype --update -j ACCEPT -m comment --comment skype-in-icmp

# all packets which match the connection
iptables -A INPUT -m connmark --mark 1 -m comment --comment skype-in-tcp
iptables -A INPUT -m connmark --mark 2 -m comment --comment skype-in-udp

The counter reading script was modified to add "icmp" to the list of protocols it looks for and to total bytes/packets over the various planes: direction (total in vs. total out) and protocol (tcp vs. udp vs. icmp)

Here is what the counters look like after about 25 days of data gathering:

$ sudo ./getcounts.pl
tcp_out_bytes 35838386
icmp_out_bytes 155016
tcp_in_pkts 441671
icmp_out_pkts 1023
icmp_in_pkts 4526
udp_out_bytes 242046393
tcp_out_pkts 540629
icmp_in_bytes 505522
udp_out_pkts 1799468
udp_in_pkts 1607313
udp_in_bytes 204286584
tcp_in_bytes 42500198
====== totals =====
tcp_pkts 982300
tcp_bytes 78338584
icmp_pkts 5549
out_pkts 2341120
icmp_bytes 660538
udp_bytes 446332977
in_bytes 247292304
udp_pkts 3406781
out_bytes 278039795
in_pkts 2053510

I'm still looking for time to add writing of these numbers into an RRD file so it'll be possible to graph them across different periods, but for now my simple conslusion is about the "in_bytes" numbers (and to a lesser degree, the "out_bytes"): they are 235.8 incoming mega bytes and 265.1 outgoing mega bytes.

Over a period of 25 days this puts it at around 9.4 incoming mega bytes per day and 10.6 outgoing megabytes per day. Over a month (let's say it's 30 days) it's 283.0 mega bytes per month of incoming traffic and 318.2 mega bytes of outgoing traffic. This includes my own Skype conversations (admittedly, not much this month).

Whether this proves Cringley's point or not? I'm not sure. I didn't believe that there is that much traffic involved until I startted this experiment, but I'm still not completly convinced it's "too much to handle".

In my personal context it's still less than 2% (1.38%, to be precise) of my download quota of 20Gb per month.

So for now I'm not going to give up on the advantages of a smoother connection (which is the reason I configured my desktop as a "Super-node" in the first place).

I'd be glad to learn from you what you think about the experiment (have I missed some packets?) and the result - do you agree with my conclusion so far or not?

Thursday, September 07, 2006

Work journal - u32 isn't going to make it(?)

I was planning to use the IPTables U32 module to look at the "original packet" which is included in the ICMP packets destined to Skype-related connection attempts but this might not cut it because there are other ICMP messages involved, not just the "port unreachable" I saw at first.

Instead, now I plan to use the "recent" module.

(Note: try running "iptables -m recent -h" to list options not mentioned in the manual. Apparently it's a good thing to do with any IPTables module)

The "recent" module is designed with "keep the bad guys out" situation in mind (by adding attacker's source address to a tempoary list) but with the "secret" --rdest option, which allows me to add destination addresses of outgoing packets, it might be possible to add the IP of any host with which Skype have just attempted to converse to a temporary cache which will allow this host to send back errors and which will automatically expire in a preset time.

Counting Skype traffic - Part 1 - Gathering the data

In his weekly column "I, Cringly", Robext X. Cringly made the statement that Skype's Supernodes (nodes which offer to mediate traffic for other nodes which can't talk directly with each other because both of them are behind NAT) suffer from a very high load of traffic which isn't actually used for the Supernode's owner benefit but for other users.

In a later entry in his column, in response to comments he received from readers, he goes on to insist that his statement is true and gives Standford University banishment of Skype for that reason as a proof.

This is an interesting topic for me since I set up my home box as a Supernode because this cuts down dramatically the number of hops skype uses to connect me with people abroad (from 4 hops to 0). Since I buy quota for my ADSL line from my ISP I was concerned how much Skype uses out of this but so far, over a year since I started doing this, I haven't noticed that I use any significant part of my quota, but I couldn't tell exactly which part of my traffic is Skype-related.

That is, until Cringley's column made that itch to scratch too much and I got off my butt to find out.

After a quick check around with colleagues and a quick question on Linux-Il I learned that Linux's IPTables have an "owner" module which does basically just that - filter packets based on the attributes of the process which generates them, be it by command name, uid, gid or similar stuff.

There are some warnings in the IPTables documents that uid and command-name checks work only on non-SMP kernels. The warnings don't mention problems with GID checks. That shouldn't be a big problem in my particular case since I have an old Athlon AMD x32 CPU but for sake of completeness I created a group "skype", made the Skype binary belong to that group and turned on its set-group-id bit, so any process executing this binary actually has GID of "skype".

It also turns out that Skype apparently sends lots of packets to itself over the loopback interface, so I had to make sure I don't count these since they shouldn't affect my Internet traffic.

But that would work only for outgoing traffic - what about incoming traffic on these connections?

Simple! When Skype sends an outgoing packet the entire connection can be marked as "belonging to Skype" so even incoming packets on the same connection will be counted. So I got this part covered.

For sake of curiosity, I mark TCP and UDP connections with different marks so I can distinguish them in the statistics.

Here are the iptable rules related to this:

# iptables -A OUTPUT -m owner --gid-owner skype --out-interface eth0 --protocol tcp -j CONNMARK --set-mark 1
# iptables -A OUTPUT -m owner --gid-owner skype --out-interface eth0 --protocol udp -j CONNMARK --set-mark 2

The first line means: "Append a rule to the OUTPUT chain which will mark TCP connections containing packets from GID 'skype' with connection mark '1'". The second line does the same for UDP only it marks the connections with connection mark '2'.

This should solve the problem for connections initiated by my Skype client.

Now there is another kind of connections - those initiated by other clients.

Now what does it actually mean that my Skype client is configured as a Supernode? It means that it listens on certain UDP and TCP ports for incoming connections (something that non-Supernodes don't have to do since all their traffic is done over connections which they initiate). Any host on the Internet can access these ports through the firewall directly to my Skype client. (In my case I actually had to also configure my ADSL modem/router/NAT to allow incoming connections to this port but that's a separate issue which shouldn't affect the subject of this post).

The practical meaning of this is that incoming packets which initiate a new connection to Skype don't get counted as belonging to it because the IPTable "owner" module only recognizes outgoing packets. The connections will still eventually get counted because Skype will (hopefully) reply to these connections - but the first incoming packet of that connection won't be counted because it will be gone by the time IPTables realizes that this is a "Skype" traffic. The way to identify incoming new connections is simply to mark all new connections to the published TCP and UDP ports as belonging to Skype too:

# iptables -A INPUT -p tcp -m tcp --dport 21212 --in-interface eth0 -j CONNMARK --set-mark 1
# iptables -A INPUT -p udp -m udp --dport 21212 --in-interface eth0 -j CONNMARK --set-mark 2

The first line says "Append a line to the INPUT chain which marks all incoming TCP connections to Skype's designated TCP port (21212) which come from the Ethernet card with connection mark '1'". The second line does the same for UDP packets and using connection mark of "2".

All these rule do is to attach connection marks to packets. In order to actually count the packets I setup four separate rules with comments, and later grab the data off these rules:

# iptables -A OUTPUT -m connmark --mark 1 -m comment --comment skype-out-tcp
# iptables -A OUTPUT -m connmark --mark 2 -m comment --comment skype-out-udp
# iptables -A INPUT -m connmark --mark 1 -m comment --comment skype-in-tcp
# iptables -A INPUT -m connmark --mark 2 -m comment --comment skype-in-udp

All these rule do is to match the relevant packets - they don't have to do anything about the packet - IPTables already keeps counts of all matching packets and number of bytes for each rule, and the attached comments make it easy to identify the relevant rules:

#!/usr/bin/perl
use IPTables::IPv4;
sub get_counts {
my %counts = ();
my $table = IPTables::IPv4::init('filter');
unless ($table) {
warn "failed to initialize: $!\n"; return undef;
}

my @rules = ($table->list_rules("INPUT"), $table->list_rules("OUTPUT"));
foreach my $rule (@rules) {
exists $rule->{'comment-match-raw'} or next;
$rule->{'comment-match-raw'} =~ /^skype-(in|out)-(tcp|udp)\0+$/ or next;
$counts{"$2_$1_bytes"} = $rule->{bcnt};
$counts{"$2_$1_pkts"} = $rule->{pcnt};
}
return %counts;
}

my %counts = get_counts;
while (my ($key, $value) = each %counts) {
print "$key => $value\n";
}

The "$2_$1_bytes" and "$2_$1_pkts" strings are preparations for DataSource (DS) names to be used in RRD files.

Still to come:
  1. Counting ICMP ECHO requests ("ping") coming from other Skype users (and letting them through but still keeping requests from non-Skype users out) using IPTAbles U32 matching module and possibly "recent" module.

  2. Saving the results in RRD files.

  3. Graphing the results from the RRD files.

  4. Drawing conclusions?

Tuesday, August 29, 2006

Listing the body of a Bash function

Everybody who knows something knows that typing "set" in bash(1) will list the values of all shell variables, aliases and functions.

But I wanted to find the definition of a function without having to "set | less" and manually looking for it in the output.

Turns out that typing "type " will give me what I wanted. Only "drawback" is that it also gives a line at the top saying " is a function", which is sort of good because it should allow some automatic way to know what to expect next.

Wednesday, July 05, 2006

Accessing private web servers through SSH

I've always knew that it's possible to channel any TCP traffic through ssh but never got around to actually use it (beyond running the SSH client with "-X" to forward X11 traffic) but today I got around to actually test this.

The problem: access devices like my Dlink DSL504G ADSL modem web interface or Sipura SPA-3000 ATA's web interface from my desktop at work.

The solution: actually there are a few of them, I'll list them by the order I tried them:

1. "ssh -L 30000:192.168.1.3:80 my-home-machine" - This tells my SSH client at work that if I connect to port 3000 on my desktop at work it should connect host "192.168.1.3" port "80" from my home machine. This is the private-network address of my ADSL modem. Now I just typed "localhost:30000" in Firefox on my work desktop and got the web interface of my ADSL modem at home. I could add another port (let's say port 30001) to forward connections to my ATA device.

2. Just "ssh my-home-machine" then type "~C" this brings up a command line interface which allows me to then type "-L 30000:192.168.1.3:80" - the effect is just the same as specifying this command line argument on the ssh command line but the advantage is that I don't have to open a new session if I already have one.

3. Last but not least (but I ended up not using it): add a line to the configuration in ~/.ssh/config saying "LocalForward 30000 10.1.1.5:80".

Right now I plan to use option 2 - that way my private home devices are not open to anyone on my workplace network whenever I ssh home but on the other hand I don't have to open a new session whenever I want to access my home network devices.