Excerpt: MIT’s Data Retention Policies
This column marks the return of Ask SIPB, last published in 2011. In this issue, we cover parts of MIT’s policies on data retention. (This republication in The Tech is heavily excerpted and excludes some important caveats, as well as a section on privacy when using MITnet. The full column is available online: http://www.mit.edu/~asksipb/2016columns/2016-03-01-data-retention/.)
What does MIT know about you?
In particular, when MIT learns something about you, what does it remember, and for how long? In short, what are MIT’s data-retention policies, and how do they affect you?
Data-retention policies matter because privacy is a basic human right.
The data-retention policies in this article are mostly (but not entirely) about data which is transactional in nature, meaning that they’re about data MIT gathers in order to some other job, but (usually) not data you explicitly provided to MIT. (Sometimes, certain types of this transactional data are called metadata, but data is data, and so-called metadata is often the most dangerous kind.) So, for example, we are not talking here about your educational records (covered under FERPA and other laws), your medical records (covered under HIPPA and other laws), or your email (covered under ECPA and other laws).
Instead, we’re looking at issues such as use of card keys, surveillance cameras (public and in-dorm), backups, clusters, and dialups. (This reprint omits a discussion on network traffic in general).
To begin with, let’s assume you’re not doing anything illegal, that your data isn’t leaving campus, and is covered by MIT’s general policies and not some more-specific policies of individual labs or departments. What is the Institute collecting about you, and how long is it keeping it?
When it comes to access to physical spaces and video surveillance, turn to the Security and Emergency Management Office (SEMO). Email confirming that the policies they post on their website are current (and not stale or abandoned) was promptly answered by their Manager of Facilities Operations, Thomas W. Komola. In addition, a message to Housing was answered by (since departed) Dean Henry Humphreys of DSL, confirming that all card key and dorm-visitor data is kept by SEMO, not DSL, and that DSL adheres to MIT’s general privacy policies. (Note that this page says nothing specifically about what information Housing/DSL may collect or retain; it’s generic to the whole Institute.)
Card keys. SEMO’s posted policies clearly state that card key data is kept for 14 days and then erased, and can be used only for debugging system problems or as part of a criminal investigation by the MIT Campus Police. (Left unstated, like all privacy policies, is that any outside party with a warrant or subpoena might also be legally authorized to get this data) SEMO states categorically on their page that card key tracking data will not be used for active tracking of individuals or groups.
Surveillance cameras. SEMO’s policy page again states 14-day retention, with no audio. This includes cameras in the dorms, as well as cameras installed elsewhere, such as outside or at ATMs.
Dorm visitors. When visitors arrive at a dorm, they are required to check in at a desk staffed by AlliedBarton employees. Their MIT IDs are scanned, or, if the visitors don’t have MIT IDs, other IDs (such as a driver’s licenses) are recorded instead. This information also goes to SEMO, not DSL, and is likewise deleted in 14 days.
IS&T handles most of the networking on campus, with the exception of large labs (like CSAIL and the Media Lab) which often have their own internal infrastructure. Some of their policies are posted online, but there are also large gaps, and trying to confirm validity or fill in gaps was much less successful with IS&T than with other MIT departments.
Backups. IS&T maintains a service called CrashPlan, which allows everyone on campus to keep their files backed up. The CrashPlan service (and its parent company, Code42) see only encrypted data and do not themselves have keys to decrypt it; MIT’s management server holds individual keys for each user instead and encryption of the backups happens before the data is handed to CrashPlan’s servers. Were MIT to receive a subpoena for a user’s backed up data, it would be possible for MIT to comply and to hand over everything you’ve backed up—which might also include credentials to non-MIT services stored in your backed-up files. If you want to keep your data safe from such scenarios, you’ll need to encrypt it before CrashPlan is asked to back it up—in other words, keep it encrypted on-disk, or decrypt to a location that you haven’t asked CrashPlan to back up.
Clusters and dialups. IS&T dialups run tcpspy. This program logs all TCP connections on the machine, ten times per second, to logfiles on the local filesystem. These logs are kept for seven days. (The dialups have been targets of attacks in the past, and compromising one can allow attacking hundreds of users simultaneously; forensics after an attack may be one reason that connection information is logged.) It is unclear whether these log files are themselves copied elsewhere or backed up; they are also vulnerable to manipulation if root is compromised on the dialup—though a root compromise there could much more severely impact users directly. In addition, cluster machines log which binaries are being run, though IS&T explains that such logging is intended not to identify individual users. (Whether such identification might be made when fused with other sources, such as netflow, isn’t answered and may not have been considered.)
When it comes to MIT’s data-retention policies, those affecting access to physical spaces and surveillance of those physical spaces are quite restrictive and well-documented. SEMO’s web pages are clear and its employees are quick to answer questions about them.
On the other hand, when it comes to MIT’s computational infrastructure, the picture is much more fragmented. In those areas where policies have been posted, information is retained much longer (IS&T retains various logs from twice as long to six times as long as SEMO does). More concerningly, many important aspects aren’t documented at all, and official channels appear effectively useless at either verifying what’s already posted, or at answering questions about what’s not.
In both cases, it would also be helpful for policy pages to be dated, for links to older versions to be posted (to make it possible to see what changed, and when), and for those pages to be reviewed every so often (perhaps annually) and for that review date to also be posted on the relevant pages, so it’s obvious at a glance that those responsible for those systems have ensured that their posted policies match reality.
So, where does this leave you? Barring unusual circumstances—such as an ongoing investigation—you can be reasonably assured that SEMO’s details of your physical movements are likely gone after two weeks. Some details of your on-campus electronic activities, when using IS&T’s infrastructure, are likely gone after three months—but there are too many undocumented places where data may accumulate, without published policies about how long it may persist, to have much assurance that this is always the case. And, of course, the majority of the traces you leave online are in networks and servers that aren’t managed by IS&T at all, each with their own policies. Be careful out there.
Want to learn about computing at MIT? Turn to “Ask SIPB,” a series of columns published by the Student Information Processing Board. You can find our complete archive at http://www.mit.edu/~asksipb. Send questions to email@example.com and we’ll try to help.