Tuesday, February 5, 2013

How econophysics describes the income distribution

It has been a while since I last discussed a paper from econophysics, where it appears there is a substantial literature trying to describe the distribution of income. It turns out to be quite difficult, because the goal is to do this with a single equation. What one would want to do with that equation is not clear to me, but anyway.

Maciej Jagielski and Ryszard Kutner claim success with this endeavor by essentially dividing up the distribution in three parts, fitting each to a different distribution function, and then rejoining them into a single equation. But what income are they taking about, you may ask? They look at European income in 2006 and 2008, and take the data from the SILC EU project. That still does not determine what income they are considering, as the dataset allows multiple different ways to define income. It is not even clear whether this is income before or after taxes and whether it includes capital gains.

One problem the authors realized is that they need oversampling for to incomes. To take care of this, they look at the European billionaires on the Forbes list of the richest people over several years, conclude that changes in wealth must be "income" and take that, dropping all negative incomes along the way. Then they notice a large discontinuity from merging the two dataset and decide to divide the top incomes by 100 to make the joint distribution continuous. Oh boy. And this is the dataset they used for their study, believe it or not.


Anonymous said...

The paper is labeled with "final draft submitted to Physica A". Does this mean that is actually getting published? Really?

ivansml said...

Yep, looks like it's been accepted: http://www.sciencedirect.com/science/article/pii/S0378437113000629

Even overlooking data problems, the whole exercise of pooling data across all 27 EU countries, with different absolute income levels and distribution patterns, is just so pointless.

Vilfredo said...

Why again are my very careful data analyses being rejected by journals when this crap gets accepted? I should try Physics journals.

PLW said...

It's easy to fit it with a single CDF. Just use enough indicator functions. The whole exercise is inane.

Anonymous said...

This paper opens so many questions. What is the refereeing process in Physics? Do physicists usually explain and document the data they use? Do researchers in Physics sometimes ask why they pursue particular research agendas? And why do they try to make simple and clear things complicated and obscure?

Kevin said...

From my experience, physicists are familiar with data coming from a lab where the interpretation is much cleaner than the typical case in economics. Explaining and documenting data is not a big focus. You explain what your apparatus/experiment was and basically don't mention how you prettied up the data.

As a physics graduate student, I'm embarrassed by papers like this.

MJandRK said...

We are strongly disappointed by your superficial reading of our publication. To make things clear:

- we analysed Total household gross income. As the economist (as we suppose), you should know that as far as EU-SILC database is concerned, the definition of this variable is the most uniform and standardised among the EU countries. Definitions of other variables of income (from the EU-SILC database) vary from country to country and therefore these variables should not be used in our analysis;
- if you had closely examined our research, you'd know that in publication we do not assume that approximation, because “Although the Forbes empirical data only roughly estimate the wealth of billionaires, they quite well establish the billionaires’ rank, thus sufficiently justifying our approach. This is because our purpose is to classify billionaires to concrete universality class rather than finding their total incomes. Our procedure of linking data from two different bases does not violate this universality class." Once again, we only analyse universality class which is characterised by the Pareto exponent;
- you said "What one would want to do with that equation is not clear to me". Yes, we know that. Hence, we strongly encourage you to refer to statistical physics. There you find how many information about system (in our paper – household incomes) you can obtain from "that equation", which is in fact the dynamic equation for the probability distribution.
- even if we could use only the EU-SILC database (without using the Forbes database), our results would be still correct. The only change is the different value of the Pareto exponent for high-income society class. But, this change is obvious, because EU-SILC database have only few observations for high-income households that is really insufficient amount;
- it is a pity that instead of taking the time to understand our article, you criticise something you do not understand. With this in mind, we can say that tagging our research as a “bad research” is a compliment for us.
To finish this para-discussion, we should add that we are not used to argue with anonymous people. Why you afraid to stand with an open face?

MJ and RK

Kansan said...

That response is just hilarious.

So you think every economist is well versed in that database and thus you do not need to define the variables you used?

Physical systems are not like economic ones, as people are sentient and react in different ways to their environment, and you do not control that environment. For example, policy may change. Your equation has indeed not useful application. I guess this lack of understanding by physicists is the core reason why their studies are so wrong.

And your splicing of datasets is a perfect example on how to not handle data.

Vilfredo said...

MJandRK: your paper is awful. Do not make it worse by trying to defend it.

Anonymous said...

My experience is that physicists do not take it lightly when told their work is crap. Their techniques may be fine for their field, but they do not want to understand that they do not apply to other fields. They see physics as central to all sciences, including social sciences (I do not know about humanities), and what is good for physics is good for anything. This is likely why they cannot publish outside of physics.

Economic Logician said...

MJandRK, thanks for replying to my post. I stand by my comments.

I thought physicists would be very careful about replicability of research, and thus I was surprised to see that you would never define variables, even if they seem obvious to you. Different variables apply to different questions and policy issues. And as you paper did not seem to have a broader question than just finding an equation, let alone address a policy issue, the reader is left empty-handed.

As for splicing very different datasets without describing he data in detail, that left me speechless, especially after seeing how you did it.

SamW said...

Well, I was a humanities major and math is hard; but I am having difficulty with the prose of their defense (posted above) and the prose of the paper. It would appear that there is yet another author to the paper. Even granted that English is a second language for the claimed authors (hats off for that accomplishment) I am puzzled by the stylistic origins of the English.
I fully expect the mathematics to be beyond me, but I look forward to reading through the paper, especially as I suspect an element of satire to be buried somewhere.
The idea of economists (math heavy sets) and physicists (math heavy sets) sniping at each other is a Monty Python sketch in the making.