News

Behind the data: how we analyzed the alcohol and drug survey

In 2014, The Tech sent a survey to undergraduates about their use of alcohol and restricted substances. The story behind this survey is in three acts: administering the survey, analyzing the data, and publishing the results. This article aims to shed some light on the analysis, to ensure adequate transparency and reproducibility. Finally, we want to guarantee to all survey participants that their anonymity has been respected.

Over the summer, we started the long process of transforming the data we gathered into a machine-readable format. This might sound trivial, but the 258 different questions were cryptically labelled, with names such as “p3q27b.”

Once we had a single dataset containing every respondent’s answers to every question, we were faced with the questions central to any survey analysis: What are we looking for? What is interesting? What do we not know? How do we guarantee anonymity?

Our first concern was that of anonymity: we promised that we would not release any information that would identify “individuals or small groups of individuals.” We decided that a group of 10 or larger would be our threshold for sharing data.

Our next objective was to make our analysis as transparent as possible. In order to do that, we created a website which allows anyone to segment the data according to various demographics. We removed highly correlated demographics variables, such as age and semester number, to simplify the user interface.

The sheer number of possible answers to certain survey questions make interpretation difficult. In those cases, we have combined them in broader, more readable categories. For example, the frequency of use of different substances contained eight categories, which were simplified to “daily,” “weekly,” “monthly,” and “in past year.”

The final decision we made was to include every question on the website to let readers visually explore the richness of the dataset. Readers will also find a summarized version of the full dataset on the website. Willing readers are encouraged to conduct their own analysis. Contact us at news@tech.mit.edu to report any error, ask any question, or suggest any new perspectives.

Finally, all the analysis has been independently verified by Tech members who were not involved in the analysis, but were given access to the original survey responses.