I’m using Pandas to compare the outputs of two files loaded into two data frames (uat, prod): …

```
uat = uat[['Customer Number','Product']]
prod = prod[['Customer Number','Product']]
print uat['Customer Number'] == prod['Customer Number']
print uat['Product'] == prod['Product']
print uat == prod
The first two match exactly:
74357 True
74356 True
Name: Customer Number, dtype: bool
74357 True
74356 True
Name: Product, dtype: bool
```

For the third print, I get an error: Can only compare identically-labeled DataFrame objects. If the first two compared fine, what’s wrong with the 3rd?

Thanks

### 4 Answers

Here’s a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):

```
In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])
In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])
In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects
```

One solution is to sort the index first (Note: some functions require sorted indexes):

```
In [4]: df2.sort_index(inplace=True)
In [5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True
```

Note: `==`

is also sensitive to the order of columns, so you may have to use `sort_index(axis=1)`

:

```
In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]:
0 1
0 True True
1 True True
```

Note: This can still raise (if the index/columns aren’t identically labelled after sorting).

You can also try dropping the index column if it is not needed to compare:

```
print(df1.reset_index(drop=True) == df2.reset_index(drop=True))
```

I have used this same technique in a unit test like so:

```
from pandas.util.testing import assert_frame_equal
assert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True))
```

At the time when this question was asked there wasn’t another function in Pandas to test equality, but it has been added a while ago: `pandas.equals`

You use it like this:

```
df1.equals(df2)
```

Some differenes to `==`

are:

- You don’t get the error described in the question
- It returns a simple boolean.
- NaN values in the same location are considered equal
- 2 DataFrames need to have the same
`dtype`

to be considered equal, see this stackoverflow question

When you compare two DataFrames, you must ensure that the number of records in the first DataFrame matches with the number of records in the second DataFrame. In our example, each of the two DataFrames had 4 records, with 4 products and 4 prices.

If, for example, one of the DataFrames had 5 products, while the other DataFrame had 4 products, and you tried to run the comparison, you would get the following error:

**ValueError: Can only compare identically-labeled Series objects**

this should work

```
import pandas as pd
import numpy as np
firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
'Price1': [1200,800,200,350]
}
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])
secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
'Price2': [900,800,300,350]
}
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])
df1['Price2'] = df2['Price2'] #add the Price2 column from df2 to df1
df1['pricesMatch?'] = np.where(df1['Price1'] == df2['Price2'], 'True', 'False') #create new column in df1 to check if prices match
df1['priceDiff?'] = np.where(df1['Price1'] == df2['Price2'], 0, df1['Price1'] - df2['Price2']) #create new column in df1 for price diff
print (df1)
```

example from https://datatofish.com/compare-values-dataframes/