# Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Having issue filtering my result dataframe with an `or` condition. I want my result `df` to extract all column `var` values that are above 0.25 and below -0.25.

This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. What is happening here? not sure where to use the suggested `a.empty(), a.bool(), a.item(),a.any() or a.all()`.

``````result = result[(result['var']>0.25) or (result['var']<-0.25)]
``````

### 9 Answers

The `or` and `and` python statements require `truth`-values. For `pandas` these are considered ambiguous so you should use “bitwise” `|` (or) or `&` (and) operations:

``````result = result[(result['var']>0.25) | (result['var']<-0.25)]
``````

These are overloaded for these kind of datastructures to yield the element-wise `or` (or `and`).

Just to add some more explanation to this statement:

The exception is thrown when you want to get the `bool` of a `pandas.Series`:

``````>>> import pandas as pd
>>> x = pd.Series()
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
``````

What you hit was a place where the operator implicitly converted the operands to `bool` (you used `or` but it also happens for `and`, `if` and `while`):

``````>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
``````

Besides these 4 statements there are several python functions that hide some `bool` calls (like `any`, `all`, `filter`, …) these are normally not problematic with `pandas.Series` but for completeness I wanted to mention these.

In your case the exception isn’t really helpful, because it doesn’t mention the right alternatives. For `and` and `or` you can use (if you want element-wise comparisons):

• ``````>>> import numpy as np
>>> np.logical_or(x, y)
``````

or simply the `|` operator:

``````>>> x | y
``````
• ``````>>> np.logical_and(x, y)
``````

or simply the `&` operator:

``````>>> x & y
``````

If you’re using the operators then make sure you set your parenthesis correctly because of the operator precedence.

There are several logical numpy functions which should work on `pandas.Series`.

The alternatives mentioned in the Exception are more suited if you encountered it when doing `if` or `while`. I’ll shortly explain each of these:

• If you want to check if your Series is empty:

``````>>> x = pd.Series([])
>>> x.empty
True
>>> x = pd.Series()
>>> x.empty
False
``````

Python normally interprets the `len`gth of containers (like `list`, `tuple`, …) as truth-value if it has no explicit boolean interpretation. So if you want the python-like check, you could do: `if x.size` or `if not x.empty` instead of `if x`.

• If your `Series` contains one and only one boolean value:

``````>>> x = pd.Series()
>>> (x > 50).bool()
True
>>> (x < 50).bool()
False
``````
• If you want to check the first and only item of your Series (like `.bool()` but works even for not boolean contents):

``````>>> x = pd.Series()
>>> x.item()
100
``````
• If you want to check if all or any item is not-zero, not-empty or not-False:

``````>>> x = pd.Series([0, 1, 2])
>>> x.all()   # because one element is zero
False
>>> x.any()   # because one (or more) elements are non-zero
True
``````

For boolean logic, use `&` and `|`.

``````np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df
A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863
``````

To see what is happening, you get a column of booleans for each comparison, e.g.

``````df.C > 0.25
0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool
``````

When you have multiple criteria, you will get multiple columns returned. This is why the join logic is ambiguous. Using `and` or `or` treats each column separately, so you first need to reduce that column to a single boolean value. For example, to see if any value or all values in each of the columns is True.

``````# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False
``````

One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.

``````>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863
``````

For more details, refer to Boolean Indexing in the docs.

Well pandas use bitwise `&` `|` and each condition should be wrapped in a `()`

For example following works

``````data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]
``````

But the same query without proper brackets does not

``````data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]
``````

Or, alternatively, you could use Operator module. More detailed information is here Python docs

``````import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438
``````

This excellent answer explains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the `query` method:

``````result = result.query("(var > 0.25) or (var < -0.25)")
``````

(Some tests with a dataframe I’m currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 µs)

A piece of warning: At least one situation where this is not straightforward is when column names happen to be python expressions. I had columns named `WT_38hph_IP_2`, `WT_38hph_input_2` and `log2(WT_38hph_IP_2/WT_38hph_input_2)` and wanted to perform the following query: `"(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"`

I obtained the following exception cascade:

• `KeyError: 'log2'`
• `UndefinedVariableError: name 'log2' is not defined`
• `ValueError: "log2" is not a supported function`

I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.

A possible workaround is proposed here.

I encountered the same error and got stalled with a pyspark dataframe for few days, I was able to resolve it successfully by filling na values with 0 since I was comparing integer values from 2 fields.

You need to use bitwise operators `|` instead of `or` and `&` instead of `and` in pandas, you can’t simply use the bool statements from python.

For much complex filtering create a `mask` and apply the mask on the dataframe.
Put all your query in the mask and apply it.
Suppose,

``````mask = (df["col1"]>=df["col2"]) & (stock["col1"]<=df["col2"])
df_new = df[mask]
``````

One minor thing, which wasted my time.

Put the conditions(if comparing using ” = “, ” != “) in parenthesis, failing to do so also raises this exception. This will work

``````df[(some condition) conditional operator (some conditions)]
``````

This will not

``````df[some condition conditional-operator some condition]
``````

I’ll try to give the benchmark of the three most common way (also mentioned above):

``````from timeit import repeat

setup = """
import numpy as np;
import random;
x = np.linspace(0,100);
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) * (x <= ub)]', 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'

for _ in range(3):
for stmt in stmts:
t = min(repeat(stmt, setup, number=100_000))
print('%.4f' % t, stmt)
print()
``````

result:

``````0.4808 x[(x > lb) * (x <= ub)]
0.4726 x[(x > lb) & (x <= ub)]
0.4904 x[np.logical_and(x > lb, x <= ub)]

0.4725 x[(x > lb) * (x <= ub)]
0.4806 x[(x > lb) & (x <= ub)]
0.5002 x[np.logical_and(x > lb, x <= ub)]

0.4781 x[(x > lb) * (x <= ub)]
0.4336 x[(x > lb) & (x <= ub)]
0.4974 x[np.logical_and(x > lb, x <= ub)]
``````

But, `*` is not supported in Panda Series, and NumPy Array is faster than pandas data frame (arround 1000 times slower, see number):

``````from timeit import repeat

setup = """
import numpy as np;
import random;
import pandas as pd;
x = pd.DataFrame(np.linspace(0,100));
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'

for _ in range(3):
for stmt in stmts:
t = min(repeat(stmt, setup, number=100))
print('%.4f' % t, stmt)
print()
``````

result:

``````0.1964 x[(x > lb) & (x <= ub)]
0.1992 x[np.logical_and(x > lb, x <= ub)]

0.2018 x[(x > lb) & (x <= ub)]
0.1838 x[np.logical_and(x > lb, x <= ub)]

0.1871 x[(x > lb) & (x <= ub)]
0.1883 x[np.logical_and(x > lb, x <= ub)]
``````

Note: adding one line of code `x = x.to_numpy()` will need about 20 µs.

For those who prefer `%timeit`:

``````import numpy as np
import random
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
lb, ub
x = pd.DataFrame(np.linspace(0,100))

def asterik(x):
x = x.to_numpy()
return x[(x > lb) * (x <= ub)]

def and_symbol(x):
x = x.to_numpy()
return x[(x > lb) & (x <= ub)]

def numpy_logical(x):
x = x.to_numpy()
return x[np.logical_and(x > lb, x <= ub)]

for i in range(3):
%timeit asterik(x)
%timeit and_symbol(x)
%timeit numpy_logical(x)
print('n')
``````

result:

``````23 µs ± 3.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.6 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
31.3 µs ± 8.9 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

21.4 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.9 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.7 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

25.1 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
36.8 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
28.2 µs ± 5.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
``````