Thursday, September 07, 2006

Counting Skype traffic - Part 1 - Gathering the data

In his weekly column "I, Cringly", Robext X. Cringly made the statement that Skype's Supernodes (nodes which offer to mediate traffic for other nodes which can't talk directly with each other because both of them are behind NAT) suffer from a very high load of traffic which isn't actually used for the Supernode's owner benefit but for other users.

In a later entry in his column, in response to comments he received from readers, he goes on to insist that his statement is true and gives Standford University banishment of Skype for that reason as a proof.

This is an interesting topic for me since I set up my home box as a Supernode because this cuts down dramatically the number of hops skype uses to connect me with people abroad (from 4 hops to 0). Since I buy quota for my ADSL line from my ISP I was concerned how much Skype uses out of this but so far, over a year since I started doing this, I haven't noticed that I use any significant part of my quota, but I couldn't tell exactly which part of my traffic is Skype-related.

That is, until Cringley's column made that itch to scratch too much and I got off my butt to find out.

After a quick check around with colleagues and a quick question on Linux-Il I learned that Linux's IPTables have an "owner" module which does basically just that - filter packets based on the attributes of the process which generates them, be it by command name, uid, gid or similar stuff.

There are some warnings in the IPTables documents that uid and command-name checks work only on non-SMP kernels. The warnings don't mention problems with GID checks. That shouldn't be a big problem in my particular case since I have an old Athlon AMD x32 CPU but for sake of completeness I created a group "skype", made the Skype binary belong to that group and turned on its set-group-id bit, so any process executing this binary actually has GID of "skype".

It also turns out that Skype apparently sends lots of packets to itself over the loopback interface, so I had to make sure I don't count these since they shouldn't affect my Internet traffic.

But that would work only for outgoing traffic - what about incoming traffic on these connections?

Simple! When Skype sends an outgoing packet the entire connection can be marked as "belonging to Skype" so even incoming packets on the same connection will be counted. So I got this part covered.

For sake of curiosity, I mark TCP and UDP connections with different marks so I can distinguish them in the statistics.

Here are the iptable rules related to this:

# iptables -A OUTPUT -m owner --gid-owner skype --out-interface eth0 --protocol tcp -j CONNMARK --set-mark 1
# iptables -A OUTPUT -m owner --gid-owner skype --out-interface eth0 --protocol udp -j CONNMARK --set-mark 2

The first line means: "Append a rule to the OUTPUT chain which will mark TCP connections containing packets from GID 'skype' with connection mark '1'". The second line does the same for UDP only it marks the connections with connection mark '2'.

This should solve the problem for connections initiated by my Skype client.

Now there is another kind of connections - those initiated by other clients.

Now what does it actually mean that my Skype client is configured as a Supernode? It means that it listens on certain UDP and TCP ports for incoming connections (something that non-Supernodes don't have to do since all their traffic is done over connections which they initiate). Any host on the Internet can access these ports through the firewall directly to my Skype client. (In my case I actually had to also configure my ADSL modem/router/NAT to allow incoming connections to this port but that's a separate issue which shouldn't affect the subject of this post).

The practical meaning of this is that incoming packets which initiate a new connection to Skype don't get counted as belonging to it because the IPTable "owner" module only recognizes outgoing packets. The connections will still eventually get counted because Skype will (hopefully) reply to these connections - but the first incoming packet of that connection won't be counted because it will be gone by the time IPTables realizes that this is a "Skype" traffic. The way to identify incoming new connections is simply to mark all new connections to the published TCP and UDP ports as belonging to Skype too:

# iptables -A INPUT -p tcp -m tcp --dport 21212 --in-interface eth0 -j CONNMARK --set-mark 1
# iptables -A INPUT -p udp -m udp --dport 21212 --in-interface eth0 -j CONNMARK --set-mark 2

The first line says "Append a line to the INPUT chain which marks all incoming TCP connections to Skype's designated TCP port (21212) which come from the Ethernet card with connection mark '1'". The second line does the same for UDP packets and using connection mark of "2".

All these rule do is to attach connection marks to packets. In order to actually count the packets I setup four separate rules with comments, and later grab the data off these rules:

# iptables -A OUTPUT -m connmark --mark 1 -m comment --comment skype-out-tcp
# iptables -A OUTPUT -m connmark --mark 2 -m comment --comment skype-out-udp
# iptables -A INPUT -m connmark --mark 1 -m comment --comment skype-in-tcp
# iptables -A INPUT -m connmark --mark 2 -m comment --comment skype-in-udp

All these rule do is to match the relevant packets - they don't have to do anything about the packet - IPTables already keeps counts of all matching packets and number of bytes for each rule, and the attached comments make it easy to identify the relevant rules:

#!/usr/bin/perl
use IPTables::IPv4;
sub get_counts {
my %counts = ();
my $table = IPTables::IPv4::init('filter');
unless ($table) {
warn "failed to initialize: $!\n"; return undef;
}

my @rules = ($table->list_rules("INPUT"), $table->list_rules("OUTPUT"));
foreach my $rule (@rules) {
exists $rule->{'comment-match-raw'} or next;
$rule->{'comment-match-raw'} =~ /^skype-(in|out)-(tcp|udp)\0+$/ or next;
$counts{"$2_$1_bytes"} = $rule->{bcnt};
$counts{"$2_$1_pkts"} = $rule->{pcnt};
}
return %counts;
}

my %counts = get_counts;
while (my ($key, $value) = each %counts) {
print "$key => $value\n";
}

The "$2_$1_bytes" and "$2_$1_pkts" strings are preparations for DataSource (DS) names to be used in RRD files.

Still to come:
  1. Counting ICMP ECHO requests ("ping") coming from other Skype users (and letting them through but still keeping requests from non-Skype users out) using IPTAbles U32 matching module and possibly "recent" module.

  2. Saving the results in RRD files.

  3. Graphing the results from the RRD files.

  4. Drawing conclusions?

No comments: