Bringing bill text into the 21st Century

Dealing with bill text has always been a complex and complicated task. It took upwards of 200 years for Congress to standardize to the language and terminology it uses in the text of bills and resolutions today, and it took until just the last few years for Congress to make the text of its bills available in a format embedded with machine-readable semantics.

Sometime during the 108th Congress, but starting in earnest beginning with the 109th, the House and Senate began making bill text available in an XML format, known colloquially as HouseXML. This XML format marks up the various portions of a bill, resolution, or amendment (there are three separate schemata) in a way that makes it easy to identify and separate each portion of a legislative document, often down to minute detail. As a bonus, the HouseXML format is extensively but not completely documented on the Web.

Included in the markup are some cross-reference citations, generally limited to broad portions of the U.S. Code or Statutes at Large, and Public Laws. This is useful for connecting one legislative document to another (most commonly, proposed law to existing law), but it does not provide fine-grained linking, such as to a particular section or paragraph or clause of a law. To rectify this situation, the Cato Institute has created the DeepBills project. Beginning with the 113th Congress, the DeepBills project extends HouseXML with CatoXML and provides, among other things, direct links to the exact segment of a law that is being cited by a particular legislative document, as well as cross-references for each mention of a governmental body or office within the proposed law. This allows for extensive cross-referencing between documents and results in a greater experience for end users in understanding what a legislative proposal intends to accomplish.

Here at GovTrack, we had not, until recently, been taking advantage of the availability of this additional information.

Instead, we were simply displaying the plaintext contents of the bill, with a limited amount of styling. Now, thanks to the CongressXML utility that I’ve created, we are beginning to realize the full potential of the data available to us. And, not only can we use this data to replicate the look of the paper copy of a bill, but we can leverage the technologies of the Web to extend and enhance the look of a bill and make it easier for a reader to discern precisely how a bill is structured and how it relates within itself and within the entire context of the law.

In addition to linking to numerous cross-references within the bill text, we are now able to use special styling to distinguish a table of contents (useful as a navigation aid for a large bill) and large quoted sections (generally used to represent modifications to existing law) from the rest of the text of the bill. We intend to improve this styling as time goes on and as we become more familiar with the full range of how the markup is used within legislative documents.

This post was written by GovTrack staffer Gordon Hemsley.

Thanks to the Cato Institute for their work on CatoXML. We also want to mention some related work: Eric Mill at the Sunlight Foundation has developed two projects, unitedstates/documents and unitedstates/citation, that convert bill XML into HTML and extract deep citations. — Josh


  1. Yet another positive step in the direction of transparency and informed citizenry. Thank you for keeping us informed.


