What are your thoughts on metrics gathering?

There has been some discussion on the Advisory Board mailing list recently about improving the metrics gathering that the fedora project uses to find out approximately how many users we have. And I though it would be a good idea to gather some thoughts on the gathering as an idea.

What are your thoughts on metrics gathering, please respond in this posts comments, on identica or twitter please [@threethirty on them so I see them :)] or email me threethirty@fedoraproject.org. and I will make a follow up post with some of the more interesting replies quoted in it.


There have been questions about what I really mean when I say “gathering metrics” so I guess I’ll more fully explain:

Currently (IIRC) every time you look for updates on an offical yum server it collects your IP address. So what Fedora has been doing is counting every unique IP address, multiplying by 2 or 4 and then claiming that is the number of Fedora users.

The problem with that is it wildly inaccurate and we think that the number is low. There is so good debate in the mailing list, but what has been offered is using uuid’s to more accurately count but privacy has been an ongoing concern.

I don’t think that any new ideas are anywhere close to implementation, I just wanted to know if people were worried about this or had any ideas for the board.


4 thoughts on “What are your thoughts on metrics gathering?

  1. Hmm. Well, I don’t see that my opinion on it changes greatly. Optimally, there would still be an opt-out on first run – I’m assuming anyone “savvy” enough to use Yum would also know where they stand on being counted in this way – which could also be counted toward the total. Treat the opt-outs the way they treat generic IP addresses now and use UUIDs for people who don’t care.

    Otherwise, I don’t think the way it’s handled now is too bad. You’re severely underestimating when it comes to enterprise or computer labs, etc… but you’re overestimating for houses with only one or two Fedora boxes, so it probably evens out alright.

    I’m curious to know why they think the numbers are “very low,” though.

  2. It needs to be opt-in. Anything else will be a privacy concern, no matter how widely publicised. If you are counting IPs, that’s fine provided you are not saving them — hashing is usually the answer here.

    I think the best idea is to make it all opt-in but publicise it much more, telling people exactly why the data is useful for the project. It’s not perfect, ut it’s the right thing to do from a privacy perspective.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s