To find the total NaN count in databrick:
Mandatory library needed for this operation:
- pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. Here it is used to create pandas dataframe.
import pandas as pd
Create dataframe :
pathname='file path' DimEmployee = pd.read_csv(pathname) DimEmployee.head()
Function to find the total NaN count in particular column in dataframe :
The function nanCount returns the NaN rate of the respective column with total count of the NaN in the column.
The function takes dataframeName and columnName as its parameter. Using python isnull() function its does the calculation.
def nanCount(dataframeName,columnName): df1 = dataframeName[[columnName]].copy() Nan_Rate=df1[df1[columnName].isnull()].shape * 1.0 / df1.shape #nan rate Nan_Count = df1[df1[columnName].isnull()].shape * 1. 0 #nan count print( "NAN rate in column " + str(columnName) +" is : " + str(Nan_Rate) + "\n"+ "NAN Count in column " + str(columnName)+" is : "+ str(Nan_Count))
Calling the function :