Authors: Jennifer Nou, Julian Nyarko This codebook describes the data set "regulation_section.csv" used in Jennifer Nou, Julian Nyarko, "Regulatory Diffusion", 74 Stanford Law Review 897 (2022) Description: The data set contains all paragraphs scraped from the .xml files of the Federal Register between January 1, 2000 and December 31, 2020. The variables are defined as follows: text: The text of the paragraph nw_text: The number of words in the paragraph nchar_text: The number of characters in the paragraph heading: The heading section_subject: The subject of the section in which the paragraph appears. See FR documentation for the tag section_number: The section number in which the paragraph appears. See FR documentation for the tag agency: The agencies listed in the FR, with minor spelling errors etc. corrected by us. See FR documentation for the tag sub-agency: The sub-agencies listed in the FR, with minor spelling errors etc. corrected by us. See FR documentation for the tag authority: The cited authority. See FR documentation for the tag amdpar: The amendatory instructions. See FR documentation for the tag tech_sum: Ignore nonsub_sum: Ignore republish_sum: Does the summary text identify this regulatory text as a republication? dates: The "effective on" date in the FR. See FR documentation for the tag act: The act identified in the FR. See FR documentation for the tag part: The part under which this paragraph appears. See FR documentation for the tag title: The title under which this paragraph appears. See FR documentation for the tag section_counter: A counter that identifies individual sections. Resets for every URL. url: The URL of the .xml file that was scraped to get the paragraph year: The year in which the paragraph appeared in the FR, as pulled from the URL agency_bureau: A combination of the <agency> and <bureau> fields, joined by (+++). This should probably be deleted to decrease the filesize. id: A unique identifier for each section. It is extremely long and not data-science-y, as it is a combination of agency_bureau, url and section_counter. Should probably be deleted to decrease filesize. interesting: Is this paragraph substantive? procedural: Is this paragraph procedural?