by Dylan Sam '21
Every Google search you perform, each Amazon purchase you make, and each Spotify playlist you create holds information. Nowadays, every online action yields useful information to companies that amass large amounts of data. Certain companies exist solely to collect data and sell it to interested customers, who utilize this data to improve their product and its marketing. The field of data science focuses on finding information in large data sets to improve services, technologies, and much more. While data science has many benefits, there are many social consequences, the most important of which is the loss of digital privacy. Data science is progressing and gaining popularity in the commercial sector, but it should be regulated or controlled in its growth to maintain people's privacy.
Data science has appeal in its universal application; it is currently used to produce new, innovative, and useful technology, as well as to make headway in research to benefit the population. Micah Altman, the director at the Program of Information Science at MIT, states, “Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behavior, and relationships and advance the state of science, public policy, and innovation.”1
Although many people associate data with technological developments or commercial solutions, there are many benefits in other disciplines. Trends in data can be “mined” to discover problems in public policy or diseases within certain populations. For instance, Dr. Altman describes the Framingham Heart Study, which “precipitated the discovery of risk factors for heart disease and many other groundbreaking advances in cardiovascular research.” The utilization of data science for health care allows for progress in research to benefit the population. This heart study illustrates data science’s power in many different applications. Many other areas could use the benefits of data science; for example, energy research has a lack of open-source data, which prohibits many advances: “While top journals in many monodisciplinary fields … [require] release of data, software, and other information required to replicate published results, Energy Economics is the only major energy journal to have put such policies in place.”  Energy research is unlike the heart studies insofar as it lacks the data and information necessary for much progress. The lack of open-source information and information from publications prevents the usage of data science to make progress in the energy sector. While data science may have applications and benefits in many different disciplines, it has strongly negative consequences regarding privacy.
Data science requires the collection of many different types of data from many different sources. The many benefits of this data analysis come from collecting information about people and online users, as Dr. Altman claims, “Collection of such detailed information exposes individuals to potential harm to their reputations and personal relationships, risk of future loss of employability and insurability, risks of financial loss and identity theft, and potential civil or criminal liability, among others.”  Companies that now collect online information often access customers’ private information. By collecting such data, these companies gain leverage over customers; they can license this information to advertisers or other companies for a profit. This is a major threat to people; many customers unknowingly give away their personal and private information that can lead to identity theft or stealing. The Gregor Michener states, “Similar to personal health information, some types of education data can be intensely personal, critically important for policy evaluations, and at once vulnerable to breaches and inappropriate uses. The US Software & Information Industry Association estimates the current value of the kindergarten to grade 12 market at a staggering 8 billion dollars.”  There must be regulations in place to prevent companies from licensing valuable and important information and controlling customers with threats. Although data science augments innovation and progress in research, such data collection serves as a risk to many customers and must be regulated.
This conflict includes everyone, especially those with a large internet presence. The general population must be aware of both the pros and cons of data science; it is an incredibly powerful and popular field that has the potential to progress many disciplines. However, data collection carries the risk of exposing important personal information. You need to be engaged in this conflict; your information helps comprise these data sets, so you are intertwined with data science, even if you may not feel or think so.