An Exercise in Elementary Statistics: Application to Italian Electoral Polls

As political elections in Italy are getting close, several blogs and sites are publishing predictions, especially about the outcome at the Senate where, thanks to the peculiar electoral law, it is well possible that no clear majority would emerge.

So, I could not resist to make my own prediction. And I start by giving you the results:

 PdL PD SA UDC Others 162 131 17 4 1

Which is indeed a very close call, possibly not enough for PdL in order to rule the country. Berlusconi would have only four seats more than the ones obtained by Prodi in the last elections. Who, however, had initially on his side all of the 7 senators nominated for life by the President, and not counted above. Actually, he got 165 favorable votes when he formed his government in 2006.

I should warn you however that I would be very surprised if this prediction will be exactly fulfilled, and I’m going to tell you why. But, before that, I should explain how I got the prediction in the first place. The method is the one developed by Sandro Brusco from noiseFromAmeriKa. Essentially, one takes the results of 2006 (Senate, divided by Regions), and re-normalize them using the ratio between whatever pre-electoral poll one is trusting and the global results (for the lower chamber) in 2006. Technically, a so-called uniform national swing is assumed. Simple, if not at all granted. But the main complication is how to take into account the different political alliances of 2008. For that I have used a slightly different logic than Brusco (for instance, differently from him, I had allocated UDEUR votes half to UDC and half to PD, while I counted for PD only half of the votes of the former alliance between Radicals – now with PD – and socialists – now independent). There’s more to it, but it does not amount to a large effect in the end. I have checked against Brusco system, using data from the last 7 polls. In 5 cases I obtained the same seats than him for PdL, while in 2 cases I had 2 and 4 seats less, respectively. A result that in my mind reflects well the quasi-chaotic behavior of the system.

In any case, my prediction above is obtained by using an average of all published polls from the beginning of March to the 26th (from Toqueville), a grand total of 41! Now, having taken care of the first order, and being a physicist, I could not refrain to have a look at the whole distribution. Here it is:

In the histogram the bars show how many times (out of the 41 samples) PdL had obtained in the simualtion a given number of seats, while the line indicates the cumulative percentage of the total samples below a certain number of seats. For instance, about 50% of the times PdL totalized 162 seats or less. The standard deviation is almost exactly 4. Now, I believe this information adds a lot to the bare prediction. If the polls are correctly done, if the hypothesis of a uniform national swing is correct, and if I have made no additional mistakes, the conclusions are:

1) It will indeed be a close call.

2) No need to discuss about a difference of plus or minus 4 seats in the predictions for PdL – one cannot possibly be more precise than that.

3) It’s quite likely that the situation in the senate will not be better for Berlusconi than it was for Prodi in 2006.

noiseFromAmeriKa, like other sites, uses for its prediction an average of the last “few” polls, in order to better reflect the evolution of the public opinion. However, the spread between different polling institutes and individual polls is larger (at least since the last month) that the variation of the average, as one can see in the following picture, showing the simulated number of seats for PdL obtained for the different polls:

The line is a second order fit. The evolution of the average is negligible. However, if anything, it looks like the standard deviation is decreasing (especially discounting the 14 March MAKNO poll, really sticking out from the rest and with a low statistical coverage – less than 400 interviews). I have no idea of what this means. It could be a stabilization of the vote towards the smaller parties (there’s no evolution visible it the bare percentage for PdL, as shown below), or, one could mischievously suspect, a tendency of the polling institutes to uniform their results with each other.

The last plot shows the correlation between simulated number of seats and the percentage of votes in each poll, again for the PdL. As pointed out already by Brusco, the correlation is small compared to the spread, showing the impact of the distribution of the remaining votes among other parties. It shows as well that even exceeding 45% of the votes, the PdL could not be sure to obtain a clear majority.

I conclude with a warning: the hypothesis of a uniform national swing is far from being sure. There are reasons to believe that local factors could result in an uneven redistribution at the regional level, and the electoral rules makes the system very much sensitive to this. On top of that, in 2006 the pre-electoral polls had been shown to be not exactly reliable. Therefore, anything can happen. It’s well known it’s not easy to make predictions, especially about the future.

Hat tip to Dorigo, to whom I’ve stolen half of the post title.

Update 30 March – many pre-electoral polls have been published just before the black-out (no polls during the last 15 days before the elections). Using the average of all polls (now there are 50 of them), the prediction does not change. I did also an update of the distribution, which I show below. Note that the lowest (156) and two of the highest (172, 173) predictions are from one month old polls(1,2 and 3 March). The other high one (174) is the Makno one I talked about above. I would tend to exclude them, and I report them here only for completeness.

7 Responses to “An Exercise in Elementary Statistics: Application to Italian Electoral Polls”

1. Honored. And thank you for the very nice analysis. Do you have an idea where I can find some discussion of the weird senate method of assigning seats and an analysis of what someone willing to favor PD or PDL should really vote for (such as voting for a third party if one of the two is surely in the lead in the region) ?

Cheers,
T.

Thanks to appreciate. The analysis made by Brusco in noiseFromAmeriKa is very informative on the “porcellum” method for the senate. A discussion on the possibilities for Italians abroad (in Europe) did appear in Scandinaria recently, I’ve commented on it in a previous post. Third party voting is a clear option in Emilia and Toscana (SA to subtract seats to PdL).
In other regions I think the situation is less clear.
Cheers,
Roberto

2. Ciao,
actually I vote in Veneto 🙂
I understand little of this porcellum thing though. Will have to study a bit.
Cheers,
T.

3. On a related note, it’s quite unbelievable that pollsters aren’t doing more regional polls. Small voting differences in Lazio or Piemonte can change the balance much more than big differences in the aggregate data…

Some regional polls were indeed published in the last few days, limited to the most influential regions. One must understand, each regional poll has roughly the same number of interviews as a national one, so it will cost the same…
I did an exercise over lunch, including some of them. Still need to add a few ones (Piemonte and Campania, for instance), and do a final check. Will post an update here. But true, small changes can make a big difference, and be not even visible locally – see below for an example.

Dorigo:
You indeed choose to live in one of the craziest example of “third party porcellum strategy”.
As I mentioned, I just included regional polls for Veneto in the model. No changes from before (PdL 15, Pd 9). But UDC is locally estimated at 7.9 % (was 7.2 % with simple scaling from 2006). If it would reach 8 %, the results will be: (PdL 14, PD 8, UDC 2). So, if one wants to vote AGAINST PdL, the best bet would be to vote UDC, irrespectively of his political position. Note that the amount of votes needed for the swing are less than three thousands, and corresponds to ONE SINGLE PERSON in the regional poll. I think porcellum is not an underserved label after all.

5. I would like you to see http://www.politiche08.org which predicts 156 for PdL and 159 for all the others at Senato. Tomorrow now regional polls will be added. I’m sorry but it’s in italian!

I have seen it – good job! As I commented above, I had started introducing the regional polls myself. At present I have 158 for PdL, but I still miss some poll, and a global check. No time right now, will update later.

And, of course, I’m Italian myself. It’s a bit awkward to use English here, since all comments on this post are indeed by Italians for the moment, but I would like to keep it readable by foreigners. I reply in Italian to comments in Italian, though.
Cheers,
Roberto

6. Roberto, con l’aggiornamento di domani metterò un link a questo post: facciamo scambio? Ciao, Paolo

Certo, a dopo.
Roberto

7. […] Gravitas Free Zone Weblog floating in the web « An Exercise in Elementary Statistics: Application to Italian Electoral Polls […]