Numerical methods for determining leadership and ideology in Congress

Today I am publishing two new types of statistics for understanding the behavioral relationships between Members of Congress. The first is a new approach to the leader-follower scores, based on the same algorithm Google uses to rank pages on the web. The second statistic is an update to my political spectrum graph. New charts are presented at the end.

UPDATE 1/22/2011: These images are now posted in zoomable form here.


Bulk access to legislative information makes large-scale statistical analyses possible. GovTrack has shown over the last six years that many millions of Americans are interested in getting a deeper understanding of what laws are coming down the pipes and what their elected representatives are doing. Though normally statistical analysis are in the domain of political science and economics research, when presented in a form useful to the public it becomes a valuable resource, among many, for citizens to be engaged with what is happening here in Washington, DC.


The first large-scale statistical analysis I did on legislative data — my 2004 political spectrum — was in the language of statistics a principle components analysis (PCA) of something like a term-document matrix. The idea is that Members of Congress (“terms”) who cosponsor similar sets of bills (“documents”) should be grouped together, while Members of Congress who don’t cosponsor any of the same bills should be grouped far apart. I got the idea after my undergraduate advisor suggested I write a paper on latent semantic indexing, which is based on the same idea. A similar analysis by Professor Keith Poole using voting records rather than cosponsorship produces similar results; as far as I know, I was the first to apply PCA to congressional (UPDATE:) cosponsorship behavior.

The process doesn’t look at the content of the bills or the party affiliation or anything else about the Members of Congress, but it is able to infer underlying behavioral patterns, some of which correspond to real-world concepts like left-right ideology. If you follow the link above, you’ll see that the political spectrum analysis does a good job at separating the Dems from the GOP, and within each party the moderates from the extremes. If you wanted to know how your representatives stood in relation to their peers ideologically, the political spectrum is a good place to start.

The second novel analysis I published was a leader-follower score. This came directly out a suggestion from Joseph Barillari (who I knew in college). The idea behind a leader-follower score is that if I cosponsor your bills but you do not cosponsor my bills, then I am a follower relative to you being a leader. (I formalized this as follows: To compute a leader-follower score for representative X, make a table that lists all other representatives. On each row put the following: the number bills sponsored by X and cosponsored by the representative in that row divided by the number of bills sponsored by the representative in that row and cosponsored by X. The higher the number, the more times others are cosponsoring X’s bills without X returning the favor. Then take the logarithm of each number, and then the mean.)

New Leadership Scores

The first new statistic I am publishing today involves a completely new type of analysis of congressional behavior. The inspiration for this analysis comes from Google’s PageRank algorithm, which governs how Google ranks the order of pages in its search results. Google’s method is widely known: the more links you get, the higher ranked your page but links you get from highly ranked pages are even better. Determining a site’s ranking isn’t trivial because you need to know the ranking of all of the sites linking in, and to get their ranking you need the ranking of the sites linking to them, and on and on. Fortunately there is an elegant mathematical solution that now makes the Web go round.

Google’s PageRank works because it learns which pages are, let’s say, useful by the implicit votes of usefulness found on the web in the form of links. A link is a vote of confidence that the target website is probably useful. This idea can be adapted to any domain that we can view as a network (or “graph”).

In Congress, we can look at the network of who is cosponsoring whose bills. When a representative cosponsors a bill, it is a vote of confidence not only for that bill but also a vote of confidence or loyalty for the bill’s sponsor. If we imagine Members of Congress each as a “web page” and each time a Member cosponsors another Member’s bill it is a link from one “web page” to that of the other, then the PageRank algorithm is going to reveal the ranking of the implicit loyalties directly from the public, official behavior of the Members of Congress.

The results of this Congressional PageRank-style Leadership Analysis run over the last two years of sponsorship data look roughly good. In the Senate, the highest value is given to Harry Reid, the Majority Leader. The Minority Leader, Mitch McConnell, has nearly the highest value among the Republicans. In the House, the leadership values are overall relatively low for the Speaker, party leaders, and party whips. I could only guess about why the Senate and House have this difference. One of the lowest values in the House was given to little-known Rep. Chakka Fattah (PA2), my former congressman, though famous recently for his unique idea of replacing the income tax with a transaction tax.

The results are similar to the old leadership-follower scores.

New Political Spectrum

I am also presenting an update to the political spectrum using the same PCA method but based on a different underlying term-document matrix. In the original version, the terms were Members of Congress and the documents were bills. Basically, you form a matrix (a grid of numbers) with columns representing the representatives and rows representing the bills and put  a 1 in each cell where the representative (co)sponsored the bill (and zeros everywhere else). Then you do the PCA magic (UPDATE: singular value decomposition). In the new version, the documents are also Members of Congress. Here the matrix’s rows are also members, and I put a 1 in each cell where the representative for the column cosponsored any bill of the representative for the row (and zeros everywhere else).

The results are similar to the old political spectrum. I don’t believe there are any particular benefits of this new method, except that its formulation is more parallel to the new Leadership scores than the old political spectrum formulation.

New Charts

Well finally here are some graphics. Each chart below is a scatterplot of Members of Congress. The x-axis is the political spectrum value from the new method (oriented with Democrats on the left, color indicates party for reference). The y-axis is the new Leadership score. In other words, we’d expect Democratic leaders to be in the top left; GOP leaders in the top right; GOP followers in the bottom-right; and so on. The first chart is for the Senate, the second for the House.

I’ve additionally labeled in green the leadership positions in the Senate and House so you can easily locate those folks. Again, it seems to work well in the Senate, not so much in the House.



  1. It’ll be interesting to compare these charts to those for a year or two from now. If you created weekly snapshots you might be able to create an animation showing change over time or in relation to particular legislative debates and votes.


  2. Great job.

    I agree with the first comment — a moving bubble chart, with each rep/sen being his/her own bubble, would be extremely interesting — but as Paul mentioned, you need more data.


  3. Awesome research. Have you looked at combining the two methods?

    Since there are a lot of dead-beat members in both the chambers, start by tracking the bills and amendments offered by each member (leadership), then track follower loyalty by how often colleagues voted with/against.


  4. I am a registered Independent. I am personal friend to Brian Baird. We will miss his representation dearly. Are you a Representative, or do you lean in the Leadership direction? I lean in the Representative direction. I need no leader. There is far far too much leadership going on in D.C. Hopefully you will learn to be a great listener like Brian.

    In closing, I would ask that you not join the obstructionists in congress. It will be a quick way to lose my support. Please remember you are acting in my name, so I’ll be watching your voting record closely.


  5. I’m reading the comments for this post and laughing out loud. Mr. Tauberer must be reading these with his jaw hanging open. After creating a stunning presentation, the first thing he hears is, “Yes, but it could be awesome if you only…” We’re NEVER satisfied!
    Great work.


  6. I think that this helped a lot and we should wait and compare this chart with future charts and see the difference. I now understand the bubbles and it also needs more data.


  7. The ‘Political Spectrum’ is very informative, but the on for the House is very difficult to read. Is it possible to provide a high resolution image for the image? And is there a way to keep tack of when the Political spectrum is updated? Also, it would be good to have a date shown within the image. One more thing, could the image for the House have a different file name than the one for the Senate? Many thanks for your innovative work, Ricky


  8. Josh, in your assignment of ideological position, did you take into account more than one continuum, or set of polarities with a sliding scale between? As a Neo-Freudian, I see (at least) two continua, 1) one to chart the degree to which any political unit opts EITHER a) to share OR b)to exclusively claim/possess “Mother,” (by extension)”Mother Earth”/matter/resources/the material means to life; and 2) a second continuum/criteria, a range of preferences/expectations regarding sovereignty, which obviously are in keeping with the (either/or)/dichotomous outcomes/resolutions of the “infant vs. father aspect” of anyone’s relationship with “Father,” a.k.a.(again, by extension) authority. Typically, infantile expression of will creates chaos, whereas the father -or mother as father- strives to restore (“top-down”)order. This struggle lays the groundwork for poltical disagreement regarding a citizen’s alleged sovereignty. The LEFT champions and trusts the individual’s freely expressed will, whereas the RIGHT strives to force patriotic submission to the “Fatherland,” (to “one nation [firmly] UNDER God,” meaning, if you ask the Religious Right, not the progressive God of brotherly love, but the warring, dictatorial God of Olden Times!)all of which which is -for the Left-far too reminiscent of every crying infant’s terrifying threat of castration to be tolerable.) Both of these continua (I made them the x and y axes of a grid) have one pole (left or lower) predisposing LIBERAL action (more inclined than the other pole toward anarchy)and one pole predisposing CONSERVATIVE (right or higher)on the grid. Make a chart like mine, and you’ll find the anti-egalitarian, anti-democracy push for authoritarianism, (“might/money makes right” and the endless war necessary to maintain that belief) which is scaring and motivating peace-seeking “rebels” all over the world right now, in the upper right corner.


  9. Josh, can you tell me where I can learn more about the “several thousand implicit continuums” upon which you base your analysis/analyses? I am very interested in how you determined how far to the right or left each person on your chart actually is. Re-reading my previous post, I must acknowledge I did “run off on a tangent” there! -But I’m not quite sorry; I have thought that too little gets said about what distinguishes “the LEFT” from “the RIGHT,” and what inclines individuals to ally themselves with one or the other, ever since my college political theory instructor (who could be exasperating)set his jaw and refused to define “left” and “right!” -But I am sincerely, keenly interested in others’ analyses and graphic representation of those concepts, so please do tell me -email me privately, if you wish!- what identifying criteria and labeling/sorting technique(s) did you use?


    1. If you follow the link to the political spectrum above, then look below the charts, there is a longer explanation. For more, I recommend looking up some of the terms on Wikipedia. It involves advanced statistical techniques that are hard to explain in a few words, except that, basically, Members of Congress with similar patterns of cosponsorship are put close together, and Members of Congress with more different patterns of cosponsorship are put far apart on the chart. Importantly, there is no scoring technique, which has certain benefits like not relying on any person’s particular definition of left/right. (It also has some drawbacks, especially that it is harder to interpret what the chart means when you don’t start with a definition.)


  10. I believe leadership can only be scored on bills enacted, not just sponsored. Sponsorship shows an ability to think, enactment shows an ability to deliver. Presumably, bills receiving bi-partisan sponsorship are more likely to be enacted. The effectiveness of leadership (bills enacted) probably represent the will of the people instead of special interests. I’d support a constitutional amendment requiring future bills have at least one co-sponsor from the opposing party, but wonder how such legislation would impact the trend.


  11. Go Josh Go!!!
    Watching the Occupy Wall Street (OWS)situation, I can see that paying attention to congress (via your analysis) would be far more benefitial than going camping. Voting for leaders in our government will solve a lot more problems. It is not as dramatic, but it would be so much more effective.
    Your thoughts.


  12. Stan Dombrowski,

    A constitutional amendment requiring future bills have at least one co-sponsor from the opposing party assumes that more (in number) bipartisan laws would be uniformly positive for supporting the tenets of the Constitution and maintaining the rights of the citizenry. While we currently have a de facto two party system, which some could argue is in itself detrimental to the health of our democracy, that does not preclude the rise of alternative parties. In addition to adding yet another hurdle to enacting legislation, would the amendment also force us to remain two party ad infinitum or would it have a provision for three or more parties?


  13. “I’d support a constitutional amendment requiring future bills have at least one co-sponsor from the opposing party”

    That’s an unworkable idea, the Republicans would refuse to co-sign any legislation.


  14. I’m just a kid, seriously a 14 year old boy, from rich Scarsdale NY with dreams of running for congress. And this is the greatest thing ever. Its helped open my eyes to the admirable politicians, the ideological and leadership qualities of not only certain politicians but the parties themselves. Thanks to this I can now study up on things like higher leadership and more centrist politicians to see what smart political moves can be made for good outcomes. Just don’t tell congress, they might have you take it down over the course of a decade!


  15. This is super-cool stuff! Did you do your analyses in R? The plots are making me think so…. You addressed the pros and cons of your purely empirical approach to ideological classification well; I just have to chime in that the benefit of not imposing a theory and letting the data drive the results can’t be overstated. To my mind, though it’s not quite carving nature at the joints, PCA is ideal for this sort of application, and it gives results as good as the data that are fed into it. I wonder what other kinds of data one could use for similar analyses….


  16. Thanks Josh, Great Job!!
    This helps to see who are the representives in the center to contact and support to help pull the rest of the extremes together so we can try to get something done.
    The idea of changing over time would be great as well but it only comes about from looking at your original chart. Very helpful. The Founding Fathers would approve.


  17. Josh, I am a journalist interested in the Berman Sherman election and find your analysis most helpful. I have read your explanation but am unclear on the factors that made Sherman “rank and file” and Berman a “leader.” Was it because Berman had more and influential cosigners on his legislation?


  18. Bill- That’s a good way to explain what “leader” means here, yes. Just keep in mind that “influential” in this case just means other people with high leadership scores.

    As for the labels, “far left/right”, “rank-and-file”, and “centrist” are applied based on each person’s percentile compared to the other Members of Congress in their party and in their chamber on the *ideology* dimension. The extreme 20% is “far left/right”, the middle 60% is labeled “rank-and-file”, and the moderate 20% are “centrist.” Independents are labeled “left-leaning”/”moderate”/”right-leaning” according to the same 20/60/20 split, but when compared across both parties.

    Likewise, “leader” is applied to the top 20% on the *leadership* dimension. The terms “lonely” and “follower” are applied to the bottom 20%. “Lonely” is used for the “far left/right” people and “follower” for the others.


Comments are closed.