The National Council of State Legislatures (NCSL) provides information on its website about the gender and party compositions of state legislatures from 2009 to 2016.
The source data was available in PDF and HTML formats, so it required additional processing before becoming usable.
Seeking machine-readable versions of the NCSL data, I found an open source GitHub repository, but it hadn’t been updated in a few years. Happy to contribute to an open source project, I sought to update the repository with the most recent years’ data.
Through a Twitter conversation with the repository owner, I learned the existing conversion process involved software called Tabula. When I used Tabula to convert the most recent two years of party composition data, I ran into issues.
@derekwillis @NCSLorg not very well, at least with tabula - they end up looking like this: https://t.co/Epc9uFKfv4 pic.twitter.com/DraBGzcT8v
— MJ Rossetti (@s2t2) August 15, 2016
In response to these pdf-conversion issues, the owner suggested the pdftotext
command line utility, which ultimately produced adequate TXT files.
@s2t2 @NCSLorg looks much better using pdftotext -layout pic.twitter.com/Lvl98WUZ8c
— Derek Willis (@derekwillis) August 15, 2016
After writing a Ruby script leveraging the pdftotext
library to convert the PDF files to TXT format, I wrote scripts to convert the TXT files to CSV and JSON formats. I then wrote scripts to convert the gender composition HTML tables to CSV and JSON formats.
After initial satisfaction with the conversion results, I introduced validation checks into the process. Theses validations uncovered a few errors in the source data, which I communicated to NCSL via Twitter and remediated by updating the conversion scripts.
When satisfied with the validation effort, I submitted a pull request to add the most recent data and the automated conversion scripts to the original repository.
After producing a full compliment of machine-readable data, I created an interactive dashboard to consume the data and aid in exploration.
The sections below contain findings from my analysis of 2016 NCSL Legislature Composition data.
Despite its small size, New Hampshire’s legislature is the largest (424 seats), while the 13-seat DC City Council is the smallest.
Colorado and Vermont legislatures have the highest concentration of females (each over 40%).
Legislatures from Wyoming, Oklahoma, South Carolina, West Virginia, Alabama, and Mississippi have the highest concentration of males (each over 85%).
Nebraska’s Legislature is nonpartisan.
The state legislatures with the greatest concentration of Democrat Party members are Hawaii, District of Columbia, and Rhode Island.
The state legislatures with the greatest concentration of Republican Party members are Wyoming, Utah, and South Dakota.