Skip to content

DataFrame methods

_obj = pandas_obj instance-attribute

__init__(pandas_obj)

assert_all_nulls(fail_message=' ㄨ Assert all nulls failed ', pass_message=' βœ”οΈ Assert all nulls passed ', subset=None, raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether Dataframe or subset of columns has all nulls. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.assert_all_nulls(subset=["sepal_length"])
    )

    # Will raise an exception "ㄨ Assert all nulls failed"

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message str

Message to display if the condition fails.

' ㄨ Assert all nulls failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert all nulls passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_data(condition, fail_message=' ㄨ Assertion failed ', pass_message=' βœ”οΈ Assertion passed ', subset=None, raise_exception=True, exception_to_raise=DataError, message_shows_condition=True, verbose=False)

Tests whether Dataframe meets condition. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    # Validate that the Dataframe has at least 2 rows

    (
        iris
        .check.assert_data(lambda df: df.shape[0]>1)

        # Or customize the message displayed when alert fails
        .check.assert_data(lambda df: df.shape[0]>1, "Assertion failed, DataFrame has no rows!")

        # Or show a warning instead of raising an exception
        .check.assert_data(lambda df: s.shape[0]>1, "FYI Series has no rows", raise_exception=False)

        # Or show a message if it passes, and raise a specific exception (ValueError) if it fails.
        .check.assert_data(
            lambda df: s.shape[0]>1,
            fail_message="FYI Series has no rows",
            pass_message="Series has rows!",
            exception_to_raise=ValueError,
            verbose=True # To show pass_message when assertion passes
            )
    )

Parameters:

Name Type Description Default
condition Callable

Assertion criteria in the form of a lambda function, such as lambda df: df.shape[0]>10.

required
fail_message str

Message to display if the condition fails.

' ㄨ Assertion failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assertion passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against. Applied after fn. Subsetting can also be done within the condition, such as lambda df: df['column_name'].sum()>10

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
message_shows_condition bool

Whether the fail/pass message should also print the assertion criteria

True
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_datetime(fail_message=None, pass_message=' βœ”οΈ Assert datetime passed ', subset=None, raise_exception=True, exception_to_raise=TypeError, verbose=False)

Tests whether Dataframe or subset of columns is datetime or timestamp. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        df
        .check.assert_datetime(subset="datetime_col")
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
fail_message Union[str, None]

Message to display if the condition fails. If None, will report expected vs observed type.

None
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert datetime passed '
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

TypeError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_float(fail_message=None, pass_message=' βœ”οΈ Assert float passed ', subset=None, raise_exception=True, exception_to_raise=TypeError, verbose=False)

Tests whether Dataframe or subset of columns is floats. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        df
        .check.assert_float(subset="float_col")
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message Union[str, None]

Message to display if the condition fails.

None
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert float passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

TypeError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_greater_than(min, fail_message=' ㄨ Assert minimum failed ', pass_message=' βœ”οΈ Assert minimum passed ', or_equal_to=False, subset=None, raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether all values in a Dataframe or subset of columns is > or >= a minimum threshold. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        iris
        # Validate that sepal_length is always greater than 0.1
        .check.assert_greater_than(0.1, subset="sepal_length")

        # Validate that two columns are each always greater than or equal to 0.1
        .check.assert_greater_than(0.1, subset=["sepal_length", "petal_length"], or_equal_to=True)
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
min Any

the minimum value to compare DataFrame to. Accepts any type that can be used in >, such as int, float, str, datetime

required
fail_message str

Message to display if the condition fails.

' ㄨ Assert minimum failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert minimum passed '
or_equal_to bool

whether to test for >= min (True) or > min (False)

False
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_int(fail_message=None, pass_message=' βœ”οΈ Assert integeer passed ', subset=None, raise_exception=True, exception_to_raise=TypeError, verbose=False)

Tests whether Dataframe or subset of columns is integers. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        df
        .check.assert_int(subset="int_col")
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message Union[str, None]

Message to display if the condition fails.

None
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert integeer passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

TypeError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_less_than(max, fail_message=' ㄨ Assert maximum failed ', pass_message=' βœ”οΈ Assert maximum passed ', or_equal_to=False, subset=None, raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether all values in a Dataframe or subset of columns is < or <= a maximum threshold. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        iris

        # Validate that sepal_length is always < 1000
        .check.assert_less_than(1000, subset="sepal_length")

        # Validate that two columns are each always less than or equal too 100
        .check.assert_less_than(1000, subset=["sepal_length", "petal_length"], or_equal_to=True)
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
max Any

the max value to compare DataFrame to. Accepts any type that can be used in <, such as int, float, str, datetime

required
or_equal_to bool

whether to test for <= max (True) or < max (False)

False
fail_message str

Message to display if the condition fails.

' ㄨ Assert maximum failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert maximum passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_negative(fail_message=' ㄨ Assert negative failed ', pass_message=' βœ”οΈ Assert negative passed ', subset=None, assert_no_nulls=True, raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether Dataframe or subset of columns has all negative values. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        df
        .check.assert_negative(subset="column_name")
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message str

Message to display if the condition fails.

' ㄨ Assert negative failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert negative passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.`

None
assert_no_nulls bool

Whether to also enforce that data has no nulls.

True
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_no_nulls(fail_message=' ㄨ Assert no nulls failed ', pass_message=' βœ”οΈ Assert no nulls passed ', subset=None, raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether Dataframe or subset of columns has no nulls. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.assert_no_nulls(subset=["sepal_length"])
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message str

Message to display if the condition fails.

' ㄨ Assert no nulls failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert no nulls passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_nrows(nrows, fail_message=' ㄨ Assert nrows failed ', pass_message=' βœ”οΈ Assert nrows passed ', raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether Dataframe has a given number of rows. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.assert_nrows(20)
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
nrows int

The expected number of rows

required
fail_message str

Message to display if the condition fails.

' ㄨ Assert nrows failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert nrows passed '
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_positive(fail_message=' ㄨ Assert positive failed ', pass_message=' βœ”οΈ Assert positive passed ', subset=None, assert_no_nulls=True, raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether Dataframe or subset of columns has all positive values. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.assert_positive(subset=["sepal_length"])
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message str

Message to display if the condition fails.

' ㄨ Assert positive failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert positive passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
assert_no_nulls bool

Whether to also enforce that data has no nulls.

True
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_same_nrows(other, fail_message=' ㄨ Assert same_nrows failed ', pass_message=' βœ”οΈ Assert same_nrows passed ', raise_exception=True, exception_to_raise=DataError, verbose=False)

Tests whether Dataframe has the same number of rows as another DataFrame/Series has.

Optionally raises an exception. Does not modify the DataFrame itself.

Example
    # Validate that an expected one-to-one join didn't add rows due to duplicate keys in the right table.
    (
        transactions_df
        .merge(how="left", right=products_df, on="product_id")
        .check.assert_same_nrows(transactions_df, "Left join changed row count! Check for duplicate `product_id` keys in product_df.")
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
other Union[DataFrame, Series]

The DataFrame or Series that we expect to have the same # of rows as

required
fail_message str

Message to display if the condition fails.

' ㄨ Assert same_nrows failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert same_nrows passed '
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_str(fail_message=None, pass_message=' βœ”οΈ Assert string passed ', subset=None, raise_exception=True, exception_to_raise=TypeError, verbose=False)

Tests whether Dataframe or subset of columns is strings. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.assert_str(subset=["species", "another_string_column"])
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message Union[str, None]

Message to display if the condition fails. If None, will report expected vs observed type.

None
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert string passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

TypeError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_timedelta(fail_message=None, pass_message=' βœ”οΈ Assert timedelta passed ', subset=None, raise_exception=True, exception_to_raise=TypeError, verbose=False)

Tests whether Dataframe or subset of columns is of type timedelta. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        df
        .check.assert_timedelta(subset=["timedelta_col"])
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message Union[str, None]

Message to display if the condition fails. If None, will report expected vs observed type.

None
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert timedelta passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

TypeError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_type(dtype, fail_message=None, pass_message=' βœ”οΈ Assert type passed ', subset=None, raise_exception=True, exception_to_raise=TypeError, verbose=False)

Tests whether Dataframe or subset of columns meets type assumption. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    # Validate that a column of mixed types has overall type `object`
    (
        iris
        .check.assert_type(object, subset="column_with_mixed_types")
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
dtype Type[Any]

The required variable type

required
fail_message Union[str, None]

Message to display if the condition fails. If None, will report expected vs observed type.

None
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert type passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

TypeError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

assert_unique(fail_message=' ㄨ Assert unique failed ', pass_message=' βœ”οΈ Assert unique passed ', subset=None, raise_exception=True, exception_to_raise=DataError, verbose=False)

Validates that a subset of columns have no duplicate values, or validates that a DataFrame has no duplicate rows. Optionally raises an exception. Does not modify the DataFrame itself.

Example
    (
        df
        # Validate that a column has no duplicate values
        .check.assert_unique(subset="id_column")

        # Validate that a DataFrame has no duplicate rows
        .check.assert_unique()
    )

See docs for .check.assert_data() for examples of how to customize assertions.

Parameters:

Name Type Description Default
fail_message str

Message to display if the condition fails.

' ㄨ Assert unique failed '
pass_message str

Message to display if the condition passes.

' βœ”οΈ Assert unique passed '
subset Union[str, List, None]

Optional, which column or columns to check the condition against.

None
raise_exception bool

Whether to raise an exception if the condition fails.

True
exception_to_raise Type[BaseException]

The exception to raise if the condition fails and raise_exception is True.

DataError
verbose bool

Whether to display the pass message if the condition passes.

False

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

columns(fn=lambda df: df, subset=None, check_name='πŸ›οΈ Columns')

Prints the column names of a DataFrame, without modifying the DataFrame itself.

Example
    (
        df
        .check.columns()
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before printing columns. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before printing their names. Applied after fn.

None
check_name Union[str, None]

An optional name for the check to preface the result with.

'πŸ›οΈ Columns'

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

describe(fn=lambda df: df, subset=None, check_name='πŸ“ Distributions', **kwargs)

Displays descriptive statistics about a DataFrame without modifying the DataFrame itself.

See Pandas docs for describe() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        df
        .check.describe()
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas describe(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before running Pandas describe(). Applied after fn.

None
check_name Union[str, None]

An optional name for the check to preface the result with.

'πŸ“ Distributions'
**kwargs Any

Optional, additional arguments that are accepted by Pandas describe() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

disable_checks(enable_asserts=True)

Turns off Pandas Checks globally, such as in production mode. Calls to .check functions will not be run. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.disable_checks()
        .check.assert_data(lambda df: df.shape[0]>10) #  This check will NOT be run
        .check.enable_checks() # Subsequent calls to .check will be run
    )

Args enable_assert: Optionally, whether to also enable or disable assert statements

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

dtypes(fn=lambda df: df, subset=None, check_name='πŸ—‚οΈ Data types')

Displays the data types of a DataFrame's columns without modifying the DataFrame itself.

Example
    (
        iris
        .check.dtypes()
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas dtypes. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before running Pandas .dtypes. Applied after fn.

None
check_name Union[str, None]

An optional name for the check to preface the result with.

'πŸ—‚οΈ Data types'

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

enable_checks(enable_asserts=True)

Globally enables Pandas Checks. Subequent calls to .check methods will be run. Does not modify the DataFrame itself.

Example
    (
        iris
        ["sepal_length"]
        .check.disable_checks()
        .check.assert_data(lambda s: s.shape[0]>10) #  This check will NOT be run
        .check.enable_checks() # Subsequent calls to .check will be run
    )

Parameters:

Name Type Description Default
enable_asserts bool

Optionally, whether to globally enable or disable calls to .check.assert_data().

True

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

function(fn=lambda df: df, subset=None, check_name=None)

Applies an arbitrary function on a DataFrame and shows the result, without modifying the DataFrame itself.

Example
    (
        iris
        .check.function(fn=lambda df: df.shape[0]>10, check_name='Has at least 10 rows?')
    )
    # Will return either 'True' or 'False'

Parameters:

Name Type Description Default
fn Callable

A lambda function to apply to the DataFrame. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before running Pandas describe(). Applied after fn.

None
check_name Union[str, None]

An optional name for the check to preface the result with.

None

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

get_mode(check_name='🐼🩺 Pandas Checks mode')

Displays the current values of Pandas Checks global options enable_checks and enable_asserts. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.get_mode()
    )

    # The check will print:
    # "🐼🩺 Pandas Checks mode: {'enable_checks': True, 'enable_asserts': True}"

Parameters:

Name Type Description Default
check_name Union[str, None]

An optional name for the check. Will be used as a preface the printed result.

'🐼🩺 Pandas Checks mode'

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

head(n=5, fn=lambda df: df, subset=None, check_name=None)

Displays the first n rows of a DataFrame, without modifying the DataFrame itself.

See Pandas docs for head() for additional usage information.

Example
    (
        iris
        .check.head(10)
    )

Parameters:

Name Type Description Default
n int

The number of rows to display.

5
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas head(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before running Pandas head(). Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

hist(fn=lambda df: df, subset=[], check_name=None, **kwargs)

Displays a histogram for the DataFrame, without modifying the DataFrame itself.

See Pandas docs for hist() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.hist(subset=["sepal_length", "sepal_width"])
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas hist(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before running Pandas hist(). Applied after fn.

[]
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None
**kwargs Any

Optional, additional arguments that are accepted by Pandas hist() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

Note

If more than one column is passed, displays a grid of histograms.

Only renders in interactive mode (IPython/Jupyter), not in terminal.

info(fn=lambda df: df, subset=None, check_name='ℹ️ Info', **kwargs)

Displays summary information about a DataFrame, without modifying the DataFrame itself.

See Pandas docs for info() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.info()
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas info(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before running Pandas info(). Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

'ℹ️ Info'
**kwargs Any

Optional, additional arguments that are accepted by Pandas info() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

memory_usage(fn=lambda df: df, subset=None, check_name='πŸ’Ύ Memory usage', **kwargs)

Displays the memory footprint of a DataFrame, without modifying the DataFrame itself.

See Pandas docs for memory_usage() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.memory_usage()
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas memory_usage(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before running Pandas memory_usage(). Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

'πŸ’Ύ Memory usage'
**kwargs Any

Optional, additional arguments that are accepted by Pandas info() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

Note

Include argument deep=True to get further memory usage of object dtypes in the DataFrame. See Pandas docs for memory_usage() for more info.

ncols(fn=lambda df: df, subset=None, check_name='πŸ›οΈ Columns')

Displays the number of columns in a DataFrame, without modifying the DataFrame itself.

Example
    (
        iris
        .check.ncols()
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before counting the number of columns. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before counting the number of columns. Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

'πŸ›οΈ Columns'

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

ndups(fn=lambda df: df, subset=None, check_name=None, **kwargs)

Displays the number of duplicated rows in a DataFrame, without modifying the DataFrame itself.

See Pandas docs for duplicated() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    # Count the number of rows with duplicate pairs of values across two columns:
    (
        iris
        .check.ndups(subset=["sepal_length", "sepal_width"])
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before counting the number of duplicates. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before counting duplicate rows. Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None
**kwargs Any

Optional, additional arguments that are accepted by Pandas duplicated() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

nnulls(fn=lambda df: df, subset=None, by_column=True, check_name='πŸ‘» Rows with NaNs')

Displays the number of rows with null values in a DataFrame, without modifying the DataFrame itself.

See Pandas docs for isna() for additional usage information.

Example
    # Count the number of rows that have any nulls, one count per column
    (
        iris
        .check.nnulls()
    )

    # Count the number of rows in the DataFrame that have a null in any column
    (
        iris
        .check.nnulls(by_column=False)
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before counting the number of rows with a null. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string to select a subset of columns before counting nulls.

None
by_column bool

If True, count null values with each column separately. If False, count rows with a null value in any column. Applied after fn.

True
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

'πŸ‘» Rows with NaNs'

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

nrows(fn=lambda df: df, subset=None, check_name='☰ Rows')

Displays the number of rows in a DataFrame, without modifying the DataFrame itself.

Example
    (
        iris
        .check.nrows()
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before counting the number of rows. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string name of one column to limit which columns are considered when counting rows. Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

'☰ Rows'

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

nunique(column, fn=lambda df: df, check_name=None, **kwargs)

Displays the number of unique rows in a single column, without modifying the DataFrame itself.

See Pandas docs for nunique() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.nunique(column="sepal_width")
    )

Parameters:

Name Type Description Default
column str

The name of a column to count uniques in. Applied after fn.

required
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas nunique(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None
**kwargs Any

Optional, additional arguments that are accepted by Pandas nunique() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

plot(fn=lambda df: df, subset=None, check_name='', **kwargs)

Displays a plot of the DataFrame, without modifying the DataFrame itself.

See Pandas docs for plot() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.plot(kind="scatter", x="sepal_width", y="sepal_length", title="Sepal width vs sepal length")
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas plot(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string name of one column to limit which columns are plotted. Applied after fn.

None
check_name Union[str, None]

An optional title for the plot.

''
**kwargs Any

Optional, additional arguments that are accepted by Pandas plot() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

Note

Plots are only displayed when code is run in IPython/Jupyter, not in terminal.

If you pass a 'title' kwarg, it becomes the plot title, overriding check_name

print(object=None, fn=lambda df: df, subset=None, check_name=None, max_rows=10)

Displays text, another object, or (by default) the current DataFrame's head. Does not modify the DataFrame itself.

Example
    # Print messages and milestones
    (
        iris
        .check.print("Starting data cleaning..."")
        ...
    )

    # Inspect a DataFrame, such as the interim result of data processing
    (
        iris
        ...
        .check.print(fn=lambda df: df.query("sepal_width<0"), check_name="Rows with negative sepal_width")
    )

Parameters:

Name Type Description Default
object Any

Object to print. Can be anything printable: str, int, list, another DataFrame, etc. If None, print the DataFrame's head (with max_rows rows).

None
fn Callable

An optional lambda function to apply to the DataFrame before printing object. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string name of one column to limit which columns are printed. Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None
max_rows int

Maximum number of rows to print if object=None.

10

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

print_time_elapsed(start_time, lead_in='Time elapsed', units='auto')

Displays the time elapsed since start_time.

Example

    import pandas_checks as pdc

    start_time = pdc.start_timer()

    (
        iris
        ... # Do some data processing
        .check.print_time_elapsed(start_time, "Cleaning took")

        ... # Do more
        .check.print_time_elapsed(start_time, "Processing total time", units="seconds") # Force units to stay in seconds

    )

    # Result: "Cleaning took: 17.298324584960938 seconds
    #         "Processing total time: 71.0400543212890625 seconds

Parameters:

Name Type Description Default
start_time float

The index time when the stopwatch started, which comes from the Pandas Checks start_timer()

required
lead_in Union[str, None]

Optional text to print before the elapsed time.

'Time elapsed'
units str

The units in which to display the elapsed time. Allowed values: "auto", "milliseconds", "seconds", "minutes", "hours" or shorthands "ms", "s", "m", "h".

'auto'

Raises:

Type Description
ValueError

If units is not one of allowed values.

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

reset_format()

Globally restores all Pandas Checks formatting options to their default "factory" settings. Does not modify the DataFrame itself.

Example
    (
        iris
        .check.set_format(precision=9, use_emojis=False)

        # Print DF summary stats with precision 9 digits and no Pandas Checks emojis
        .check.describe()

        .check.reset_format() # Go back to default precision and emojis πŸ₯³
    )

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

set_format(**kwargs)

Configures selected formatting options for Pandas Checks. Does not modify the DataFrame itself.

Run pandas_checks.describe_options() to see a list of available options.

Example
    (
        iris
        .check.set_format(precision=9, use_emojis=False)

        # Print DF summary stats with precision 9 digits and no Pandas Checks emojis
        .check.describe()

        .check.reset_format() # Go back to default precision and emojis πŸ₯³
    )

Parameters:

Name Type Description Default
**kwargs Any

Pairs of setting name and its new value.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

set_mode(enable_checks, enable_asserts)

Configures the operation mode for Pandas Checks globally. Does not modify the DataFrame itself.

Example

    # Disable checks except keep running assertions. Same as using `.check.disable_checks()`:
    (
        iris
        .check.set_mode(enable_checks=False)
        .check.describe() # This check will not be run
        .check.assert_data(lambda s: s.shape[0]>10) #  This check will still be run
    )

    # Disable checks _and_ assertions
    (
        iris
        .check.set_mode(enable_checks=False, enable_asserts=False)
    )

Parameters:

Name Type Description Default
enable_checks bool

Whether to run any Pandas Checks methods globally. Does not affect .check.assert_*().

required
enable_asserts bool

Whether to run calls to Pandas Checks .check.assert_*() statements globally.

required

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

shape(fn=lambda df: df, subset=None, check_name='πŸ“ Shape')

Displays the Dataframe's dimensions, without modifying the DataFrame itself.

Example
    (
        iris
        .check.shape()
        .check.shape(fn=lambda df: df.query("sepal_length<5"), check_name="Shape of DataFrame subgroup with sepal_length<5")
    )

Parameters:

Name Type Description Default
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas shape. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string name of one column to limit which columns are considered when printing the shape. Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

'πŸ“ Shape'

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

Note

See also .check.nrows() and .check.ncols()

tail(n=5, fn=lambda df: df, subset=None, check_name=None)

Displays the last n rows of the DataFrame, without modifying the DataFrame itself.

See Pandas docs for tail() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.tail(10)
    )

Parameters:

Name Type Description Default
n int

Number of rows to show.

5
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas tail(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string name of one column to limit which columns are displayed. Applied after fn.

None
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

unique(column, fn=lambda df: df, check_name=None)

Displays the unique values in a column, without modifying the DataFrame itself.

See Pandas docs for unique() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.unique("species")
    )
    # The check will print: "🌟 Unique values of species: ['setosa', 'versicolor', 'virginica']"

Parameters:

Name Type Description Default
column str

Column to check for unique values.

required
fn Callable

An optional lambda function to apply to the DataFrame before calling Pandas unique(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

Note

fn is applied to the dataframe before selecting column. If you want to select the column before modifying it, set column=None and start fn with a column selection, i.e. fn=lambda df: df["my_column"].stuff()

value_counts(column, fn=lambda df: df, max_rows=10, check_name=None, **kwargs)

Displays the value counts for a column, without modifying the DataFrame itself.

See Pandas docs for value_counts() for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris
        .check.value_counts("sepal_length")
    )

Parameters:

Name Type Description Default
column str

Column to check for value counts.

required
max_rows int

Maximum number of rows to show in the value counts.

10
fn Callable

An optional lambda function to apply to the DataFrame before running Pandas value_counts(). Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
check_name Union[str, None]

An optional name for the check, to be printed as preface to the result.

None
**kwargs Any

Optional, additional arguments that are accepted by Pandas value_counts() method.

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

Note

fn is applied to the dataframe before selecting column. If you want to select the column before modifying it, set column=None and start fn with a column selection, i.e. fn=lambda df: df["my_column"].stuff()

write(path, format=None, fn=lambda df: df, subset=None, verbose=False, **kwargs)

Exports DataFrame to file, without modifying the DataFrame itself.

The file format is inferred from the extension. Supports: - .csv - .feather - .parquet - .pkl # Pickle - .tsv # Tab-separated data file - .xlsx

This functions uses the corresponding Pandas export function, such as to_csv() and to_feather(). See [Pandas docs for those corresponding export functions][Pandas docs for those export functions](https://pandas.pydata.org/docs/reference/io.html) for additional usage information, including more configuration options you can pass to this Pandas Checks method.

Example
    (
        iris

        # Process data
        ...

        # Export the interim data for inspection
        .check.write("iris_interim.xlsx")

        # Continue processing
        ...
    )

Parameters:

Name Type Description Default
path str

Path to write the file to.

required
format Union[str, None]

Optional file format to force for the export. If None, format is inferred from the file's extension in path.

None
fn Callable

An optional lambda function to apply to the DataFrame before exporting. Example: lambda df: df.shape[0]>10. Applied before subset.

lambda df: df
subset Union[str, List, None]

An optional list of column names or a string name of one column to limit which columns are exported. Applied after fn.

None
verbose bool

Whether to print a message when the file is written.

False
**kwargs Any

Optional, additional keyword arguments to pass to the Pandas export function (e.g. .to_csv()).

{}

Returns:

Type Description
DataFrame

The original DataFrame, unchanged.

Note

Exporting to some formats such as Excel, Feather, and Parquet may require you to install additional packages.