Work In Progress
What’s in a Name? Auditing Large Language Models for Race and Gender Bias. With Amit Haim and Alejandro Salinas Show More We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4. In our study, we prompt the models for advice regarding an individual across a variety of scenarios, such as during car purchase negotiations or election outcome predictions. We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women. Names associated with Black women receive the least advantageous outcomes. The biases are consistent across 42 prompt templates and several models, indicating a systemic issue rather than isolated incidents. While providing numerical, decision-relevant anchors in the prompt can successfully counteract the biases, qualitative details have inconsistent effects and may even increase disparities. Our findings underscore the importance of conducting audits at the point of LLM deployment and implementation to mitigate their potential for harm against marginalized communities.
Forecasting Algorithms for Causal Inference with Panel Data. With Jacob Goldin and Justin Young Show More Conducting causal inference with panel data is a core challenge in social science research. Advances in forecasting methods can facilitate this task by more accurately predicting the counterfactual evolution of a treated unit had treatment not occurred. In this paper, we draw on a newly developed deep neural architecture for time series forecasting (the N-BEATS algorithm). We adapt this method from conventional time series applications by incorporating leading values of control units to predict a “synthetic” untreated version of the treated unit in the post-treatment period. We refer to the estimator derived from this method as SyNBEATS, and find that it significantly outperforms traditional two-way fixed effects and synthetic control methods across a range of settings. We also find that SyNBEATS attains comparable or more accurate performance relative to more recent panel estimation methods such as matrix completion and synthetic difference in differences. Our results highlight how advances in the forecasting literature can be harnessed to improve causal inference in panel settings.
The Value of M&A Drafting. With Adam Badawi and Elisabeth de Fontenay Show More This article examines how drafters of M&A agreements value individual clauses, using the relative degree of tailoring of different clauses under time pressure as a proxy. Empirical work on the content of M&A agreements faces a number of methodological challenges. We address two of them in this paper. The first is the problem of analyzing the text in M&A agreements in a way that reflects how lawyers draft those agreements. Deal lawyers almost universally draft from templates that are edited to suit the deal at hand. We introduce a method that is able to distinguish between borrowed text and edited text without requiring access to the underlying template. The second challenge stems from the endogeneity and selection effects that are inherent to M&A deals. We address this challenge by identifying deals that appear to be leaked shortly before their announcement, which is a plausibly exogenous shock to the drafting conditions of the agreement. We apply these methods to a set of 2,141 public-company M&A agreements signed between 2000 and 2020. After identifying shared templates among these agreements, we assess how much individual clauses deviate from the language of the template. We find strong evidence that agreements for leaked deals are edited less than those for other deals based on the same template, suggesting that time constraints can and do alter the final contract signed by M&A parties. We also show that lawyers prioritize some clauses over others when under time pressure, and identify which clauses are prioritized. Overall, the findings help to validate some theoretical models of contract drafting and vindicate what lawyers say about which terms in M&A agreements are the most important.
Automated Court Date Reminders Reduce Warrants for Arrest: Evidence from a Text Messaging Experiment. With Alex Chohlas-Wood, Madison Coots, Joe Nudell, Emma Brunskill, Todd Rogers and Sharad Goel. Show More Millions of people in the U.S. every year are required to attend mandatory court dates as their cases proceed through the criminal legal system. Despite potentially severe consequences from missing court—including arrest and incarceration—many still fail to appear. Past work suggests that court absences stem in part from people forgetting about their court dates, as well as confusion about when and where to show up. In response, automated court date reminders, sent via text message, are increasingly used across the U.S. with the hope that they will increase court attendance. But previous research offers mixed evidence on whether these reminders are effective, in part due to the difficulty of running experiments that are sufficiently powered to detect anticipated effects. Here we report the results of a large field experiment that we ran in partnership with the Santa Clara County Public Defender Office to examine whether automated text message reminders improve appearance rates. We randomly assigned 4,691 public defender clients either to receive regular reminders about their upcoming court dates (treatment) or to not receive these reminders (control). Clients in the treatment condition received a text message reminder seven days, three days, and one day before each court date. We found that automated reminders reduced the number of warrants for arrest issued for missing court by over 20%, with 12.4% of clients in the control condition issued a warrant compared to 9.7% of clients in the treatment condition. Our results bolster a growing body of evidence demonstrating the promise of automated reminders to improve court appearance rates and reduce the concomitant negative consequences of missing court.
On the Opportunities and Risks of Foundation Models. With Rishi Bommasani, Percy Liang, et al. Show More AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Publications
Introducing a New Corpus of Definitive M&A Agreements, 2000-2020. Journal of Empirical Legal Studies (forthcoming). With Peter Adelson, Matthew Jennejohn and Eric Talley Show More Contract design and architecture is an important topic within economics, finance, and law. However, attempts to study it are significantly constrained by the limited availability of public, high quality data. In this paper, we introduce a new corpus of 7,929 Definitive Merger Agreements submitted to the SEC between 2000 and 2020 involving a transaction in excess of \$100 million. Through a combination of machine learning and human evaluation, we associate these agreements with other metadata, such as deal size, industry classification, and advising law firms. In addition, we identify and make available the text of individual clauses contained in these agreements. In a final step, we provide an illustration of how these data can be used to generate novel insights into M\&A contract design and drafting practices.
Risk Scores, Label Bias, and Everything but the Kitchen Sink. Science Advances (2024). With Michael Zanger-Tishler and Sharad Goel Show More In designing risk assessment algorithms, many scholars promote a “kitchen sink” approach that utilizes all available features as inputs. The use of these models rests on the assumption that additional information either increases predictive quality or—if the information is not statistically relevant—is ignored by the model. We show, however, that this rationale often fails when algorithms are trained to predict a proxy of the true outcome, as is the case in most contexts where predictive algorithms are deployed. In the presence of such “label bias”, we show that one should exclude a feature if its correlation with the proxy and its correlation with the true label have opposite signs, conditional on the other features in the model. This criterion is often satisfied when a feature is weakly correlated with the true label, and, additionally, that feature and the true label are both direct causes of the remaining features. For example, due to patterns of police deployment, criminal behavior and geography may be weakly correlated and direct causes of one’s criminal record, suggesting it can be problematic to include geography in criminal risk assessments trained to predict arrest as a proxy for behavior.
Reconciling Legal and Empirical Conceptions of Disparate Impact: An Analysis of Police Stops Across California. With Joshua Grossman and Sharad Goel. Journal of Law & Empirical Analysis (2024) Show More We evaluate the statistical and conceptual foundations of empirical tests for disparate impact. We begin by considering a recent, popular proposal in the economics literature that seeks to assess disparate impact via a comparison of error rates for the majority and the minority group. Building on past work, we show that this approach suffers from what is colloquially known as “the problem of inframarginality”, in turn putting it in direct conflict with legal understandings of discrimination. We then analyze two alternative proposals that quantify disparate impact either in terms of risk-adjusted disparities or by comparing existing disparities to those under a statistically optimized decision policy. Both approaches have differing, context-specific strengths and weaknesses, and we discuss how they relate to the individual elements in the legal test for disparate impact. We then turn towards assessing disparate impact of search decisions among approximately 1.5 million police stops recorded across California in 2022 pursuant to its Racial Identity and Profiling Act (RIPA). The results are suggestive of disparate impact against Black and Hispanic drivers for several large law enforcement agencies. We further propose alternative search strategies that more efficiently recover contraband while also exerting fewer racial disparities
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. With Neel Guha, Daniel E. Ho and Christopher Ré, et al. NeurIPS (2024) Show More Can foundation models be guided to execute tasks involving legal reasoning? We believe that building a benchmark to answer this question will require sustained collaborative efforts between the computer science and legal communities. To that end, this short paper serves three purposes. First, we describe how IRAC-a framework legal scholars use to distinguish different types of legal reasoning-can guide the construction of a Foundation Model oriented benchmark. Second, we present a seed set of 44 tasks built according to this framework. We discuss initial findings, and highlight directions for new tasks. Finally-inspired by the Open Science movement-we make a call for the legal and computer science communities to join our efforts by contributing new tasks.
Designing Equitable Algorithms. Nature: Computational Science (2023). With Alex Chohlas-Wood, Madison Coots and Sharad Goel. Show More Predictive algorithms are now commonly used to distribute society’s resources and sanctions. But these algorithms can entrench and exacerbate inequities. To guard against this possibility, many have suggested that algorithms be subject to formal fairness constraints. We argue, however, that popular constraints—while intuitively appealing—often worsen outcomes for individuals in marginalized groups, and can even leave all groups worse off. We outline a more holistic path forward for improving the equity of algorithmically guided decisions. [Code]
Don’t Use a Cannon to Kill a Fly: An Efficient Cascading Pipeline for Long Documents. ICAIL: International Conference on Artificial Intelligence and Law (2023). With Zehua Li & Neel Guha Show More The computational cost of transformer-based models has a quadratic dependence on the length of the input sequence. This makes it challenging to deploy these models in domains in which long documents are especially lengthy, such as the legal domain. To address this issue, we propose a two-stage cascading approach for long document classification. We begin by filtering out likely irrelevant information with a lightweight logistic regression model before passing the more challenging inputs to the transformer-based model. We evaluate our approach using CUAD, a legal dataset with 510 manually-annotated, long contracts. We find that the cascading approach reduces training time by up to 80% while improving baseline performance. We hypothesize that the gains in performance stem from localizing the classification task of the transformer-model to particularly difficult examples.
Conceptual Questions in Developing Expert-Annotated Data. ICAIL: International Conference on Artificial Intelligence and Law (2023). With Megan Ma & Brandon Waldon Show More It has been argued that specialized domains, such as the legal field, frequently are notare rarely exposed to research in deep learning due to the high costs of expert annotations. Coupled with the proprietarynature of legal documents, few datasets are broadly available for research. Accordingly, methodology around how expertise may be used to create legal, and specifically contract, datasets remain relatively unexplored. Coupled with the recent explosion of interest in legal applications of generative AI, existing practices around data annotation requires further assessment. This paper aims to reflect on the role and use of expertise in data annotation for Legal NLP. We put forth a small qualitative study to assess the annotation practices of an expert-annotated contract dataset.
Police agencies on Facebook overreport on Black suspects. 119 Proceedings of the National Academy of Sciences e2203089119 (2022). With Ben Grunwald and John Rappaport Show More A large and growing share of the American public turns to Facebook for news. On this platform, reports about crime increasingly come directly from law enforcement agencies, raising questions about content curation. We gathered all posts from almost 14,000 Facebook pages maintained by US law enforcement agencies, focusing on reporting about crime and race. We found that Facebook users are exposed to posts that overrepresent Black suspects by 25 percentage points relative to local arrest rates. This overexposure occurs across crime types and geographic regions and increases with the proportion of both Republican voters and non-Black residents. Widespread exposure to overreporting risks reinforcing racial stereotypes about crime and exacerbating punitive preferences among the polity more generally.
Racial Bias as a Multi-Stage, Multi-Actor Problem: An Analysis of Pretrial Detention. Journal of Empirical Legal Studies 1–48 (2023). With Joshua Grossman & Sharad Goel Show More After arrest, criminal defendants are often detained before trial to mitigate potential risks to public safety. There is widespread concern, however, that detention decisions are biased against racial minorities. When assessing potential racial discrimination in pretrial detention, past studies have typically worked to quantify the extent to which the ultimate judicial decision is conditioned on the defendant’s race. While often useful, this approach suffers from three important limitations. First, it ignores the multi-stage nature of the pretrial process, in which decisions and recommendations are made over multiple court appearances that influence the final judgement. Second, it does not consider the multiple actors involved, including prosecutors, defense attorneys, and judges, each of whom have different responsibilities and incentives. Finally, a narrow focus on disparate treatment fails to consider potential disparate impact arising from facially neutral policies and practices. Addressing these limitations, here we present a framework for quantifying disparate impact in multi-stage, multi-actor settings, illustrating our approach using ten years of data on pretrial decisions from a federal district court. We find that Hispanic defendants are released at lower rates than white defendants of similar safety and non-appearance risk. We trace these disparities to decisions of assistant U.S. attorneys at the initial hearings, decisions driven in part by a statutory mandate that lowers the procedural bar for moving for detention of defendants in certain types of cases. We also find that the Pretrial Services Agency recommends detention of Black defendants at higher rates than white defendants of similar risk, though we do not find evidence that these recommendations translate to disparities in actual release rates. Finally, we find that traditional disparate treatment analyses yield more modest evidence of discrimination in pretrial detention outcomes, highlighting the value of our more expansive analysis for identifying, and ultimately remediating, unjust disparities in the pretrial process. We conclude with a discussion of how risk-based threshold release policies could help to mitigate observed disparities, and the estimated impact of various policies on violation rates in the partner jurisdiction.
A Statistical Test for Legal Interpretation: Theory and Applications. 38 The Journal of Law, Economics, and Organization 539 (2022). With Sarath Sanga Show More Many questions of legal interpretation hinge on whether two groups of people assign different meanings to the same word. For example: Do 18th- and 21st-century English speakers assign the same meaning to commerce? Do judges and laypersons agree on what makes conduct reasonable? We propose a new statistical test to answer such questions. In three applications, we use our test to (1) quantify differences in the meanings of specialized words from civil procedure, (2) identify statistically significant differences between judges and laypersons’ understandings of reasonable and consent, and (3) assess differences across various effort standards in commercial contracts (phrases like best effort and good faith effort). Our approach may be readily applied outside the law to quantify semantic disagreements between or within groups.
Contractual Evolution. 89 The University of Chicago Law Review 901 (2022). With Matthew Jennejohn and Eric Talley Show More Conventional wisdom portrays contracts as static distillations of parties’ shared intent at some discrete point in time. In reality, however, contract terms evolve in response to their environments, including new laws, legal interpretations, and economic shocks. While several legal scholars have offered stylized accounts of this evolutionary process, we still lack a coherent, general theory that broadly captures the dynamics of real-world contracting practice. This paper advances such a theory, in which the evolution of contract terms is a byproduct of several key features, including efficiency concerns, information, and sequential learning by attorneys who negotiate several deals over time. Each of these factors contributes to the underlying evolutionary process, and their relative prominence bears directly on the speed, direction, and desirability of how contractual innovations diffuse. Using a formal model of bargaining in a sequence of similar transactions, we demonstrate how different evolutionary patterns can manifest over time, in both desirable and undesirable directions. We then take these insights to real-world dataset of over 2,000 merger agreements negotiated over the last two decades, tracking the adoption of several contractual clauses, including pandemic-related terms, #MeToo provisions, CFIUS conditions, and reverse termination fees. Our analysis suggests that there is not a “one size fits all” paradigm for contractual evolution; rather, the constituent forces affecting term evolution appear manifest in varying strengths across differing circumstances. We highlight several constructive applications of our framework, including the study of contract negotiation unfolds when price cannot easily be adjusted, and how to incorporate other forms of cognitive and behavioral biases into our general framework.
Natural Language Processing in Legal Tech. in Legal Tech and the Future of Civil Justice (David Engstrom ed.) (2022). With Jens Frankenreiter Show More Natural language processing techniques promise to automate an activity that lies at the core of many tasks performed by lawyers, namely the extraction and processing of information from unstructured text. The relevant methods are thought to be a key ingredient for both current and future legal tech applications. This chapter provides a non-technical introduction to a selection of natural language processing techniques that are expected to play a major role in legal tech. In addition, it critically discusses the promises and pitfalls of natural language processing tools in this context, using technology-assisted review in discovery and outcome predictions as examples.
Regulatory Diffusion. 74 Stanford Law Review 897 (2022). With Jennifer Nou Show More Regulatory diffusion occurs when an agency adopts substantially similar rules to another agency. Indeed, regulatory texts proliferate just like other forms of law like constitutions, statutes, and contracts do. While this insight has been largely explored as an international matter, this dynamic also occurs closer to home: American administrative agencies regularly borrow language from one another. By one measure, one out of every ten paragraphs of the Code of Federal Regulations is reused from another rulemaking. This behavior appears to vary by whether the agency is executive or independent in nature. These insights — into how rules are drafted — are timely given a recent Supreme Court decision calling for judges to engage in more independent regulatory interpretation. As a result, there is newfound significance as to questions of how legislative rules are written and why. This Article explores the descriptive and normative implications of regulatory diffusion. The empirical analysis suggests that agencies have been engaging in more text reuse over time. Both the number of borrowing and lending agencies has increased, with a relatively small number of agencies borrowing text from an increasingly larger group. In other words, regulatory text has diffused from more agencies. These results, in turn, raise important questions about whether and when such diffusion is desirable; how to interpret the regulations that result; and the ways in which agencies should update borrowed texts. In light of these questions, we propose that rulewriters should be required to explain why they are emulating other regulatory texts, in effect, to cite and justify their work. Doing so will allow more judicial and political oversight over the practice. We also argue in favor of the in pari materia canon — the idea that similar regulations should be interpreted similarly — and propose ways for judges to decide when and how to apply it. Finally, we explore mechanisms for updating policies under borrowed regulatory texts and the tradeoffs each entails.
Breaking Taboos in Fair Machine Learning: An Experimental Study. Proceedings of ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (2021). With Sharad Goel and Roseanna Sommers Show More Many scholars, engineers, and policymakers believe that algorithmic fairness requires disregarding information about certain characteristics of individuals, such as their race or gender. Often, the mandate to “blind” algorithms in this way is conveyed as an unconditional ethical imperative—a minimal requirement of fair treatment—and any contrary practice is assumed to be morally and politically untenable. However, in some circumstances, prohibiting algorithms from considering information about race or gender can in fact lead to worse outcomes for racial minorities and women, complicating the rationale for blinding. In this paper, we conduct a series of randomized studies to investigate attitudes toward blinding algorithms, both among the general public as well as among computer scientists and professional lawyers. We find, first, that people are generally averse to the use of race and gender in algorithmic determinations of “pretrial risk”—the risk that criminal defendants pose to the public if released while awaiting trial. We find, however, that this preference for blinding shifts in response to a relatively mild intervention. In particular, we show that support for the use of race and gender in algorithmic decision-making increases substantially after respondents read a short passage about the possibility that blinding could lead to higher detention rates for Black and female defendants, respectively. Similar effect sizes are observed among the general public, computer scientists, and professional lawyers. These findings suggest that, while many respondents attest that they prefer blind algorithms, their preference is not based on an absolute principle. Rather, blinding is perceived as a way to ensure better outcomes for members of marginalized groups. Accordingly, in circumstances where blinding serves to disadvantage marginalized groups, respondents no longer view the exclusion of protected characteristics as a moral imperative, and the use of such information may become politically viable.
Blind Justice: Algorithmically Masking Race in Charging Decisions. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (2021). With Alex Chohlas-Wood, Joe Nudell, Zhiyuan “Jerry” Lin and Sharad Goel Show More A prosecutor’s decision to charge or dismiss a criminal case is a particularly high-stakes choice. There is concern, however, that these judgements may suffer from explicit or implicit racial bias, as with many other such actions in the criminal justice system. To reduce potential bias in charging decisions, we designed a system that algorithmically redacts race-related information from free-text case narratives. In a first-of-its-kind initiative, we deployed this system at a large American district attorney’s office to help prosecutors make race-blind charging decisions, where it is used to review many incoming felony cases. We report on both the design, efficacy, and impact of our tool for aiding equitable decision making. We demonstrate that our redaction algorithm is able to successfully mask race-related information, making it difficult for a human reviewer to guess the race of a suspect. In the jurisdiction we study, we found little evidence of disparate treatment in charging decisions even prior to deployment of our intervention. Thus, as expected, our tool did not substantially alter charging rates. Nevertheless, our study demonstrates the feasibility of race-blind charging, and, more generally highlights the promise of algorithms to bolster equitable decision making in the criminal justice system.
Stickiness and Incomplete Contracts. 88 The University of Chicago Law Review 1 (2021) Show More Both economic and legal theory assumes that sophisticated parties routinely write agreements that maximize their joint surplus. But more recent studies analyzing covenants in corporate and government bond agreements have shown that many contract provisions are highly path dependent and “sticky,” with future covenants only rarely improving upon previous ones. This Article demonstrates that the stickiness-hypothesis explains the striking lack of choice-of-forum provisions in commercial contracts, which are absent in more than half of all material agreements reported to the SEC. When drafting these agreements, external counsel relies heavily on templates and whether or not a contract includes a forum selection clause is almost exclusively driven by the template that is used to supply the first draft. There is no evidence to suggest that counsel negotiates over the inclusion of choice-of-forum provisions, nor that law firm templates are revised in response to changes in the costs and benefits of incomplete contracting. Together, the findings reveal a distinct apathy with respect to forum choice among transactional lawyers that perpetuates the existence of contractual gaps. The persistence of these gaps suggests that default rules may have significantly greater implications for the final allocation of the contractual surplus than is assumed under traditional theory. [Video]
We’ll See You in . . . Court! The Lack of Arbitration Clauses in International Commercial Contracts. 58 International Review of Law and Economics 6 (2019) Show More It is a widely held assumption that sophisticated parties prefer arbitration over litigation in international agreements for three reasons. First, the flexibility granted by arbitration would allow parties to write dispute settlement clauses that are tailored to their individual preferences. Second, concerns for home biases would provide incentives to remove the dispute settlement process from either parties’ domestic judicial system. And third, a greater ease of enforcement would cause parties to prefer arbitration over litigation. This study examines the validity of these theoretical claims relying on over half a million contracts filed with the SEC between 2000 and 2016. The results suggest that arbitration clauses are less frequently adopted than clauses referring parties to the domestic court system. If they are included, arbitration clauses serve the specific purpose of strategically reducing the discretion granted to the courts enforcing the decision. Absent serious threats to enforcement, parties prefer courts over arbitration, making arbitration a second-best-alternative to a well-functioning domestic judiciary. [SSRN]
Giving the Treaty a Purpose: Comparing the Durability of Treaties and Executive Agreements. 113 American Journal of International Law 54 (2019) Show More Scholars have argued that Senate-approved treaties are becoming increasingly irrelevant in the United States, because their role can be fulfilled by their close but less politically costly cousin, the congressional-executive agreement. This study demonstrates that treaties are more durable than congressional-executive agreements, supporting the view that there are qualitative differences between the two instruments. Abandoning the treaty may therefore lead to unintended consequences by decreasing the tools that the executive has available to design optimal agreements. [SSRN / Appendix] [Video]
Conforming Against Expectations: The Formalism of Nonlawyers at the WTO. 48 The Journal of Legal Studies 341 (2019). With Jerome Hsiang Show More There is a long-standing debate about the relative merits of lawyers and nonlawyers as adjudicators in international dispute settlement. Some argue that lawyers encourage predictability and coherence in jurisprudence. Others believe that nonlawyers better protect state interests. Both sides of the debate assume that lawyers are more formalist and nonlawyers more instrumentalist. However, this assumption has never been empirically verified. Combining multiple-imputation, matching, and postmatching regression analysis, we find that panel chairs without law degrees and substantial experience make greater efforts than lawyers to signal adherence to formalist rules and competence in the World Trade Organization’s jurisprudence. The Appellate Body deems the signal credible, in turn rewarding inexperienced nonlawyers with a decrease in reversal rates. Our findings suggest that nonlawyers display levels of formalism that are similar to (if not greater than) those of lawyers, which calls into question one of the classical reservations against nonlawyers serving in adjudicatory positions. [SSRN / Online Appendix]
A Computational Analysis of Constitutional Polarization. 105 Cornell Law Review 1 (2019). With David Pozen and Eric Talley Show More This Article is the first to use computational methods to investigate the ideological and partisan structure of constitutional discourse outside the courts. We apply a range of machine-learning and text-analysis techniques to a newly available data set comprising all remarks made on the U.S. House and Senate floors from 1873 to 2016, as well as a collection of more recent newspaper editorials. Among other findings, we demonstrate: (1) that constitutional discourse has grown increasingly polarized over the past four decades; (2) that polarization has grown faster in constitutional discourse than in non-constitutional discourse; (3) that conservative-leaning speakers have driven this trend; (4) that members of Congress whose political party does not control the presidency or their own chamber are significantly more likely to invoke the Constitution in some, but not all, contexts; and (5) that contemporary conservative legislators have developed an especially coherent constitutional vocabulary, with which they have come to “own” not only terms associated with the document’s original meaning but also terms associated with textual provisions such as the First Amendment. Above and beyond these concrete contributions, this Article demonstrates the potential for computational methods to advance the study of constitutional history, politics, and culture. [SSRN / Online Appendix]