Data Science Institute
Center for Technological Responsibility, Reimagination and Redesign

Engage With Us

We engage researchers at all stages of education and in a range of disciplines.

Please contact us at cntr-info@brown.edu if you’d like to be involved with the CNTR. Join a reading group or let us know if you are interested in a research project. There are occasionally undergraduate research assistant positions available.

The CNTR is forming an undergraduate group, which is a student-led group dedicated to exploring issues of technological responsibility through a reading group, research projects, or any other activity that the group would like to pursue, with CNTR faculty support. 

Get involved with CNTR; See the flowchart below:

CNTR Mailing List

Sign up for the CNTR mailing list to receive announcements about CNTR news, events, and opportunities.

CNTR Reading Group

We read and discuss sociotechnical papers that address problems at the collision between technology and society--with topics that include privacy, surveillance, algorithmic fairness, and AI governance and policy

Held on bi-weekly Thursdays, 12-1pm, Data Science Institute (164 Angell St, Floor 3), Room 320

Spring 2025 Schedule

This spring's seminar series is being organized by Karina LaRubbio (karina_larubbio@brown.edu) and Yujia Gao (yujia_gao@brown.edu)🚨

DatePresenterTitleAbstract
2/6

Shayne Longpre (MIT)

Shayne is a PhD Candidate at MIT. His research focuses on the intersection of AI and policy: responsibly training, evaluating, and governing general-purpose AI systems. He leads the Data Provenance Initiative, has contributed to training models like Bloom, Aya, and Flan-T5/PaLM, and his research has received recognition: Oustanding Paper Awards from ACL 2024, NAACL 2024, as well as coverage by the NYT, Washington Post, Atlantic, 404 Media, Vox, and MIT Tech Review.

Consent in Crisis: The Rapid Decline of the AI Data CommonsGeneral-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AIspecific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites’ expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crises in data consent, for both developers and creators. The foreclosure of much of the open web will impact not only commercial AI, but also non-commercial AI and academic research.
2/20

Robin Lange (Northeastern)

Robin is a Ph.D. Student in the Network Science Institute at Northeastern University. They study the hidden people and contributions in open source software such as invisible labor.

Invisible Labor: the Backbone of Open Source SoftwareInvisible labor is an intrinsic part of the modern workplace, and includes labor that is undervalued or unrecognized such as creating collaborative atmospheres. Open source software (OSS) is software that is viewable, editable and shareable by anyone with internet access. Contributors are mostly volunteers, who participate for personal edification and because they believe in the spirit of OSS rather than for employment. Volunteerism often leads to high personnel turnover, poor maintenance and inconsistent project management. This in turn, leads to a difficulty with sustainability long term. We believe that the key to sustainable management is the invisible labor that occurs behind the scenes. It is unclear how OSS contributors think about the invisible labor they perform or how that affects OSS sustainability. We interviewed OSS contributors and asked them about their invisible labor contributions, leadership departure, membership turnover and sustainability. We found that invisible labor is responsible for good leadership, reducing contributor turnover, and creating legitimacy for the project as an organization.
3/6

Shira Michel (Northeastern)

Shira Michel is a CS PhD Student at Northeastern University advised by Mahsan Nourani and Christo Wilson. Her research interests are in responsible and human-centered AI. She holds a B.S. in data science from Arcadia University.

“It’s not a representation of me”: Examining Accent Bias and Digital Exclusion in
Synthetic AI Voice Services
Recent artificial intelligence (AI) advancements in speech generation and voice cloning technologies have produced naturalistic speech and accurate voice replication, yet their influence on sociotechnical systems across diverse accents and linguistic traits is not fully understood. This study evaluates two synthetic AI voice services (Speechify and ElevenLabs) through a mixed methods approach using surveys and interviews to assess technical performance and uncover how users’ lived experiences influence their perceptions of accent variations in these speech technologies. Our findings reveal technical performance disparities across five regional, English-language accents and demonstrate how current speech generation technologies may inadvertently reinforce linguistic privilege and accent-based discrimination, potentially creating new forms of digital exclusion. Overall, our study highlights the need for inclusive design and regulation by providing actionable insights for developers, policymakers, and organizations to ensure equitable and socially responsible AI speech technologies.
3/20

Ira Globus-Harris (UPenn)

Ira Globus-Harris is a PhD student at the University of Pennsylvania, advised by Michael Kearns and Aaron Roth. They work on the algorithmic foundations of responsible computing, looking at mechanisms to mitigate the harms incurred by AI-driven decision-making.

Preventing D-Hacking in Fairness AuditsOngoing regulatory efforts for machine learning such as the Blueprint for an AI Bill of Rights and Executive Order 14110 outline the need to measure bias and audit models for potentially discriminatory behavior. In Black et al 24, the authors survey ongoing AI regulation attempts and observe that their wording admits (potentially unintentional) adversarial behavior by data analysts, which they term "D-hacking": analysts may overfit their fairness metrics to their dataset while in general performing poorly on their target metrics. In order to mitigate this problem, the authors suggest a variety of tools such as cross-validation, pre-registration, and use of a secret holdout held by a third party. In this work, I demonstrate that without care, such approaches can still admit D-hacking, even when a separate holdout secret to the auditors is used. Using tools from the adaptive data analysis literature, I propose alternative mechanisms that prevent overfitting and provide concrete recommendations for instantiating such an auditing framework.
4/10

Jeffrey Gleason (Northeastern)

Jeffrey Gleason is a Computer Science PhD Candidate at Northeastern advised by Christo Wilson. His research interests include computational social science, algorithm auditing, and causal inference. This past summer, he worked as a data science intern on the Trust and Safety team at Roblox.

In Suspense About Suspensions? The Relative Effectiveness of Suspension Durations on a Popular Social PlatformIt is common for digital platforms to issue consequences for behaviors that violate Community Standards policies. However, there is limited evidence about the relative effectiveness of consequences, particularly lengths of temporary suspensions. This paper analyzes two massive field experiments (N1 = 511,304; N2 = 262,745) on Roblox (a social gaming platform) that measure the impact of suspension duration on safety- and engagement-related outcomes. The experiments show that longer suspensions are more effective than shorter ones at reducing reoffense rate, the number of consequences, and the number of user reports. Further, they suggest that the effect of longer suspensions on reoffense rate wanes over time, but persists for at least 3 weeks. Finally, they demonstrate that longer suspensions are more effective for first-time violating users. These results have significant implications for theory around digitally-enforced punishments, understanding recidivism online, and the practical implementation of product changes and policy development around consequences.
4/24

Sagar Kumar (Northeastern)

Sagar Kumar is a PhD student at the Network Science Institute. Advised by Dr. Brooke Foucault Welles, their work leverages mathematical and computational techniques to interrogate the complex interdependencies between language, culture, and information. Sagar holds a BS in Physics and Philosophy from Northeastern University.

A Unifying Model of Information Loss in Communication Across PopulationsAlmost all of today's most pressing issues require a more robust understanding of how information spreads in populations. Current models of information spread can be thought of as falling into one of two camps: epidemiologically-inspired rumor spreading models, which do not account for the noisy nature of communication, or information theory-inspired communication models, which do not account for spreading dynamics in populations. The viral proliferation of misinformation and harmful messages seen both online and offline, however, suggests the need for a model which accounts for both noise in the communication process, as well as viral spreading dynamics. 

In this work, we leverage communication theory to meaningfully synthesize models of rumor spreading with models of noisy communication to develop a model for the noisy spread of structurally encoded messages. Furthermore, we use this model to develop a framework which allows us to not only consider the dynamics of messages, but the amount of information in the average message received by members of the population at any point in time. We find that spreading models and noisy communication models constitute the upper and lower bounds for the amount of information received by the population, while our model fills the space in between. We conclude by considering our model and findings with respect to both modern communication theory and the current information landscape to glean important insights for the development of communication-based interventions to fight rising threats to democracy, public health, and justice.
5/8

Alyssa Hasegawa Smith (Northeastern)

Alyssa Hasegawa Smith is a PhD candidate at the Northeastern University Network Science Institute; she is advised by Brooke Foucualt Welles and David Lazer. Her research interests revolve around making sense of the effects that structure and agency have on information ecosystems. She received a B.S. in Humanities and Engineering from MIT in 2017 and worked in industry for four years before beginning her PhD.

Emergent structures of attention on social media are driven by amplification and triad transitivityAs they evolve, social networks tend to form transitive triads more often than random chance and structural constraints would suggest. However, the mechanisms by which triads in these networks become transitive are largely unexplored. We leverage a unique combination of data and methods to demonstrate a causal link between amplification and triad transitivity in a directed social network. Additionally, we develop the concept of the "attention broker," an extension of the previously theorized tertius iungens (or "third who joins"). We use a novel technique to identify time-bounded Twitter/X following events, and then use difference-in-differences to show that attention brokers cause triad transitivity by amplifying content. Attention brokers intervene in the evolution of any sociotechnical system where individuals can amplify content while referencing its originator.

 

Fall 2024 Schedule

The Fall 2024 seminar series is being organized by Rui-Jie Yew (rui-jie_yew@brown.edu)

DatePresenterTitleAbstract
9/26Shomik Jain (MIT IDSS)As an AI Language Model, “Yes I Would Recommend Calling the Police”: Norm Inconsistency in LLM Decision-Making  
 Aurora Zhang (MIT IDSS)Structural Interventions and the Dynamics of Inequality  
10/3Dr. Nasim Sonboli (Brown CS/DSI/CNTR) The trade-off between data minimization and fairness in collaborative filtering 
10/10

Stephen Casper (MIT EECS)

Stephen Casper is a PhD student at MIT advised by Dylan Hadfield-Menell. His research focuses on red-teaming and robustness in AI systems. He holds a BA in statistics from Harvard College.

Technical and sociotechnical evaluations of LLMsEvaluations of large language model (LLM) capabilities are increasingly incorporated into AI safety and governance frameworks. However, there is a laundry list of technical, sociotechnical, and political challenges toward ensuring meaningful oversight from evaluations. This talk will focus on current challenges with access, tooling, and politics for evals.
10/17

Rui-Jie Yew (Brown CS/DSI/CNTR)

Rui-Jie Yew is a PhD student at Brown advised by Suresh Venkatasubramanian. Her research lies at the intersection of computer science and law. She holds an SM from MIT and a joint BA in computer science and mathematics from Scripps College.

You Still See Me: How Data Protection Supports the Architecture of AI SurveillanceData forms the backbone of artificial intelligence (AI). Privacy and data protection laws thus have strong bearing on AI systems. Shielded by the rhetoric of compliance with data protection and privacy regulations, privacy-preserving techniques have enabled the extraction of more and new forms of data. We illustrate how the application of privacy-preserving techniques in the development of AI systems--from private set intersection as part of dataset curation to homomorphic encryption and federated learning as part of model computation--can further support surveillance infrastructure under the guise of regulatory permissibility. Finally, we propose technology and policy strategies to evaluate privacy-preserving techniques in light of the protections they actually confer. We conclude by highlighting the role that technologists could play in devising policies that combat surveillance AI technologies.
10/24

Dora Zhao (Stanford CS)

Dora Zhao is a PhD student at Stanford co-advised by Michael Bernstein and Diyi Yang. Her research lies at the intersection of human-computer interaction and machine learning fairness. She holds an AB and MSE in computer science from Princeton University.

Encoding Human Values in Social Media Feed Ranking AlgorithmsWhile social media feed rankings are primarily driven by engagement signals rather than any explicit value system, the resulting algorithmic feeds are not value-neutral: engagement may prioritize specific individualistic values. This paper presents an approach aiming to be more intentional about the values that feeds encode. We adopt Schwartz’s theory of Basic Human Values---a complete set of human values that articulates complementary and opposing values that form the basic building blocks of many cultures---and we implement an algorithmic approach that models and then ranks feeds by expressions of Schwartz’s values in social media posts. Our ranking approach enables controls where end users can express weights on their desired values, then combines these weights and post value expressions together into a ranking that respects the users’ articulated trade-offs. Through controlled experiments (N=209 and N=352), we demonstrate that users can use these controls to architect feeds that reflect their desired values.
10/31

Palak Jain (BU CS)

Palak Jain is a PhD student at Boston University advised by Adam Smith. Their research uses the lenses of cryptography and differential privacy to design privacy-respecting systems and understand the downstream effects of those technologies on the individuals they intend to protect.

Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release

In our work, we introduce a new framework for reasoning about the privacy of large data releases; our framework is designed intentionally with socio-technical usability in mind. This talk will present our approach, which characterises the adversary as a predictive model and introduces the notion of "incoherent predictions" to capture potentially harmful inferences. Finally, I’ll explain what it means for a data curation algorithm to be "coherence enforcing" and briefly touch upon how some existing privacy tools can be used to achieve this notion.


Based on joint work with Mark Bun, Marco Carmisino, Gabe Kaptchuk, and Satchit Sivakumar

11/7

David Liu (Northeastern CS)

David Liu is a CS PhD student at Northeastern advised by Tina Eliassi-Rad. His research interests lie at the intersection of graph machine learning, algorithmic fairness, and the societal impact of AI. He received a B.S.E. in computer science from Princeton University.

When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations
Collaborative-filtering recommender systems leverage low-rank approximations of high-dimensional user data to recommend relevant items to users. The low-rank approximation encodes latent group structure among the items, such as genres in the case of music and cuisines in the case of restaurants. Given that items often vary widely in their popularity, I will present work that addresses the question: do collaborative-filtering recommender systems disproportionately discard revealed preferences for low-popularity items, and if so, to what extent are recommendations for low-popularity items harmed? I will show that in the case of PCA, on common benchmark datasets, two unfairness mechanisms arise. First, the trailing, discarded PCA components characterize interest in less popular items. Second, the leading, preserved components specialize in individual popular items instead of capturing latent groupings. To address these limitations, I will then introduce Item-Weighted PCA, an algorithm for identifying principal components that re-weights less-popular items while preserving convexity. On benchmark datasets, Item-Weighted PCA improves the characterization of both popular and unpopular items. I will conclude the talk with a discussion about future work on fairness notions that focus less on equalizing performance among groups and more on inducing models that capture what makes groups different from each other.
11/14

Edgar Ramirez Sanchez (MIT Civil and Environmental Engineering)

Edgar Ramirez Sanchez is a PhD student in Civil and Environmental Engineering at MIT advised by Cathy Wu. He holds an S.M. in Technology and Policy from MIT and a B.S. in Engineering Physics from the TechnolĂłgico de Monterrey in Mexico.

A data-driven traffic reconstruction framework for identifying stop-and-go congestion on highwaysIdentifying stop-and-go events (SAGs) in traffic flow presents an important avenue for advancing data-driven research for climate change mitigation and sustainability, owing to their substantial impact on carbon emissions, travel time, fuel consumption, and roadway safety. In fact, SAGs are estimated to account for 33-50% of highway driving externalities. However, insufficient attention has been paid to precisely quantifying where, when, and how much these SAGs take place, which is necessary for downstream decision making, such as intervention design and policy analysis. A key challenge is that the data available to researchers and governments are typically sparse and aggregated to a granularity that obscures SAGs. To overcome such data limitations, this study thus explores the use of traffic reconstruction techniques for SAG identification. In particular, we introduce a kernel-based method for identifying spatio-temporal features in traffic and leverage bootstrapping to quantify the uncertainty of the reconstruction process. Experimental results on California highway data demonstrate the promise of the method for capturing SAGs. This work contributes to a foundation for data-driven decision making to advance the sustainability of traffic systems.
11/21

Minseok Jung (MIT TPP)

Minseok “Mason” Jung is a graduate student at MIT. He was selected as a Social and Ethical Responsibility of Computing (SERC) Scholar at Schwarzman College of Computing, and conducted research around AI fairness, policy, and HCI. Mason is developing research with Dr. Paul Pu Liang and Dr. Lalana Kagal.

Quantitative Insights into Language Model Usage and Trust in Academia: An Empirical StudyLanguage models (LMs) are revolutionizing knowledge retrieval and processing in academia. However, concerns regarding their misuse and erroneous outputs, such as hallucinations and fabrications, are reasons for distrust in LMs within academic communities. Consequently, there is a pressing need to deepen the understanding of how actual practitioners use and trust these models. There is a notable gap in quantitative evidence regarding the extent of LM usage, user trust in their outputs, and issues to prioritize for real-world development. This study addresses these gaps by providing data and analysis of LM usage and trust. Specifically, our study surveyed 125 individuals at a private school and secured 88 data points after pre-processing. Through both quantitative analysis and qualitative evidence, we found a significant variation in trust levels, which are strongly related to usage time and frequency. Additionally, we discover through a polling process that fact-checking is the most critical issue limiting usage. These findings inform several actionable insights: distrust can be overcome by providing exposure to the models, policies should be developed that prioritize fact-checking, and user trust can be enhanced by increasing engagement. By addressing these critical gaps, this research not only adds to the understanding of user experiences and trust in LMs but also informs the development of more effective LMs.
11/28Thanksgiving Recess  
12/5

Princess Sampson (UPenn)

Princess Sampson is a PhD student at Penn advised by Danaë Metaxa. In their research, they explore the intersection of human-AI interaction, algorithmic justice, and platform governance. Princess' work is supported by the NSF Graduate Research Fellowship and they hold a B.S. in Computer Science from Spelman College.

Representation, Self-Determination, and Refusal: Queer People’s Experiences with Targeted AdvertisingTargeted online advertising systems increasingly draw scrutiny for the surveillance underpinning their collection of people’s private data, and subsequent automated categorization and inference. The experiences of LGBTQ+ people, whose identities call into question dominant assumptions about who is seen as “normal,” and deserving of privacy, autonomy, and the right to self-determination, are a fruitful site for exploring the impacts of ad targeting. We conducted semi-structured interviews with LGBTQ+ individuals (N=18) to understand their experiences with online advertising, their perceptions of ad targeting, and the interplay of these systems with their queerness and other identities. Our results reflect participants’ overall negative experiences with online ad content—they described it as stereotypical and tokenizing in its lack of diversity and nuance. But their desires for better ad content also clashed with their more fundamental distrust and rejection of the non-consensual and extractive nature of ad targeting. They voiced privacy concerns about continuous data aggregation and behavior tracking, a desire for greater control over their data and attention, and even the right to opt-out entirely. Drawing on scholarship from queer and feminist theory, we explore targeted ads’ homonormativity in their failure to represent multiply-marginalized queer people, the harms of automated inference and categorization to identity formation and self-determination, and the theory of refusal underlying participants’ queer visions for a better online experience.
12/12

Zoë Bell (UC Berkeley)

Zoe Bell is a PhD student at UC Berkeley studying Theoretical Computer Science (TCS) and also plans to complete the PhD Designated Emphasis in Science, Technology, and Society studies (STS). Her current research investigates how to support accountability in data sharing systems, such as statistical data analysis and machine learning. Currently, she believes that utilizing tools from cryptography, and more specifically interactive proof systems, is a promising avenue given their ability to reason about parties with disparate resources, power, and goals and expand the solution space in surprising ways. In this work, she also draws on the intellectual toolkit of feminist and anti-colonial STS to think rigorously about the societal work done by mathematical abstractions.

The Purification of Cryptography: Reactionary Mathematics & Random OraclesIn this talk, I will provide a preliminary micro-history of the cryptographic controversy over the Random Oracle Model, or the ROM. Harsh words were exchanged over this “sour state of affairs” with the two sides arguing to either “abolish” or maintain “unshaken” confidence in the ROM [Gol06,KM07]. What’s going on here? And, why should anyone outside of the subfield care about this academic tiff in the first place? I will use tools from Science, Technology, and Society studies (STS) to draw out the political work done by invoking mathematical purity in this setting and open up discussion questions such as: What makes a mathematical abstraction “good,” and what are useful methods for designing good abstractions?