Advertisement

Data collection and privacy

Started by September 27, 2009 08:38 PM
18 comments, last by Yann L 15 years, 1 month ago
I'm working on a soon-to-be released app that has quite a bit of data collection and crash reporting strapped on. The data collection aspect is primarily app-specific usage information : not things like "files played" or "websites visited", more like "times file/open clicked" and "times ctrl+O clicked", to help with things like feature development prioritization and things like toolbar layout in future versions. For crash information, a minidump is uploaded along with some non-identifiable machine data like amount of memory, OS, CPU speed, etc. MACs and the like are obviously not included. However, I've been thinking about adding a "machine ID" field to the collected data - helps calculate some metrics such as average crashes per machine, or isolate crashes that are reported in high frequency (hypothetically, I know how this math is botched) on few machines as special cases of really unlucky users (or users with lots of IE toolbars on their systems). Now this machine ID isn't used for DRM, pseudo-DRM, or anything like that - my idea of generating it is a big-enough random number, like a random UUID sans time field. Just a uniformly random, say, 1024 bit number - and we're not storing IPs. I'd like to know what people think about that. Any privacy worries / invasions? Would you NOT want to have this enabled in software you install? Gimme your $.02s..
Holy crap I started a blog - http://unobvious.typepad.com/
Make it opt-in. Do not enable it by default, but ask the user if he wants to participate. Make it so that users can switch it on or off at any time after installation. That should address all privacy concerns.
Advertisement
Quote: Original post by Yann L
Make it opt-in. Do not enable it by default, but ask the user if he wants to participate. Make it so that users can switch it on or off at any time after installation. That should address all privacy concerns.


That's the easy-way-out for programmers: Make the user choose :)

But we have a rule that apps must be zero-conf: You can't ask questions past installation, and even during installation the amount of questions you're allowed to ask is low - and there's a quick "Install with default options" button.

With this set of restrictions, making it opt in will only make the users that bother to (hypothetical design follows) go to the tools / options menu and select that option. Would be too low I presume. Let's consider, for the sake of the argument, that it's opt-out or possibly fully automated.
Holy crap I started a blog - http://unobvious.typepad.com/
Quote: Original post by Starfox
That's the easy-way-out for programmers: Make the user choose :)

If it concerns the users privacy (and that includes all phoning-home scenarios), then you have to make the user choose.

I don't know if you intend to distribute your application outside of the US, but if so, keep in mind that certain legislations actually require this. And even if this doesn't apply to you, not letting the user choose whether or not to participate in such a data gathering operation would be a serious privacy violation, as far as I see it. Even if no personal data is transfered.
I think the concept is reasonable, but for goodness sake don't call it a UUID - too many usersprogrammers avoid UUID's since the v1 standard, which could be reverse engineered to provide both the MAC address of the generating computer, and the time at which it was generated [wink]

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote: Original post by Yann L
Quote: Original post by Starfox
That's the easy-way-out for programmers: Make the user choose :)

If it concerns the users privacy (and that includes all phoning-home scenarios), then you have to make the user choose.


The question is, does it raise a privacy concern? There's no way the user can be identified.

Look at it like this: Does sending an anonymous metric, like number of files opened, without a user ID (whether reversible or not) violate the user's privacy? Does sending the previous info with a completely random ID violate it? I'd like to know what you think about those two questions. Do anonymous metrics matter? Without an ID, all you know is "The 'Open' button has been clicked X times". Adding a user ID allows you to add "N times by user A, M times by user B, etc". Again, how does that violate a user's privacy? Are there any concerns I'm not paying attention to?

THEN you talk about minidumps, and how they might expose user data. Anyone got solid info on that?
Holy crap I started a blog - http://unobvious.typepad.com/
Advertisement
Quote: Original post by swiftcoder
I think the concept is reasonable, but for goodness sake don't call it a UUID - too many usersprogrammers avoid UUID's since the v1 standard, which could be reverse engineered to provide both the MAC address of the generating computer, and the time at which it was generated [wink]


Ahahaha, yeah, I hear you. It's called a Machine ID so far, but I'm looking for a better name as Machine ID screams "Your computer's serial number". Suggestions welcome.
Holy crap I started a blog - http://unobvious.typepad.com/
Quote: Original post by Starfox
The question is, does it raise a privacy concern? There's no way the user can be identified.

The IP is always transfered. Even if you don't keep it explicitly, traces of the connection will always be left in various log files, many of them outside of your direct control.

It's not so much a question about whether the transfered data allows a user to be identified or not. The mere concept of an unsolicited connection to a foreign sever and the transfer of statistical data without letting the user know/choose is a major privacy concern. Your application will raise a firewall alert the first time it is run anyway. If the user wasn't informed of that activity beforehand, then he will see this behaviour as very suspicious (and rightfully so).

I really don't see a problem with asking the user to opt-in during installation. Unless it's an unsupervised installation, in which case you should opt-out by default. That's the way most (all ?) current Microsoft products do it, for example.
You can put the opt-in option on the very last screen of the installtion (where you click "Close"). Normally those screens just have a "Your installation is complete" and sometimes they have a "Start XYZ now" so you can just add an "Opt-in to provide anonymous usage statistics" checkbox as well, for example.
Quote: Original post by Codeka
You can put the opt-in option on the very last screen of the installtion (where you click "Close"). Normally those screens just have a "Your installation is complete" and sometimes they have a "Start XYZ now" so you can just add an "Opt-in to provide anonymous usage statistics" checkbox as well, for example.


The one-click-install option involves literally one click..
Holy crap I started a blog - http://unobvious.typepad.com/

This topic is closed to new replies.

Advertisement