Feb-10-2022, 10:41 PM
I'm trying to design a function that will insert both a different "oldTable", string, and column name for each iteration. The "withColumn" calculation below works fine, but "withColumnRenamed" and the "where" line do not.
What I want, for example with newTable1, is "oldVar2" renamed to "string1_newVar2" and any rows with null values in the "oldVar_dropNull" variable dropped.
What I want, for example with newTable1, is "oldVar2" renamed to "string1_newVar2" and any rows with null values in the "oldVar_dropNull" variable dropped.
import pyspark.sql.functions as F
def functionName(x,y,z):
return x.withColumn("newVar1", F.when(F.col("oldVar1") > 0, x.oldVar1*100/x.oldVar1)\
.otherwise(0)) \
.withColumnRenamed("oldVar2", (y,"_newVar2")) \
.where(F.col(z).isNotNull())
newTable1 = functionName(oldTable1,"string1","oldVar_dropNull")
newTable2 = functionName(oldTable2,"string2","oldVar_dropNull")Some sample data:import pandas as pd
df = {'oldVar1':['18.50', '649.27', '523.52'],
'oldVar2':['24.56', '4564.56', '34.45'],
'oldVar_dropNull':['12.54', '656.89', '0']
}
oldTable1 = pd.DataFrame(df)
print(oldTable1)
