-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Description
I have identified a false negative in Python DataFlow analysis where taint tracking is lost when a class is defined inside a function.
If a tainted variable is passed as an argument to a function, and that argument is subsequently used inside a class defined within that function (a function-local class), CodeQL fails to track the data flow to the class instance's attributes.
However, if a similar logic is applied using a top-level (module-level) class, the data flow is detected correctly. This suggests an issue with how data flow is handled across the scope boundary of locally defined classes.
Reproduction Case (False Negative)
In this example, taint_src is passed to constructor_field_001_T. The class A is defined inside the function and captures taint_src. The flow to os.system is NOT detected.
import os
def constructor_field_001_T(taint_src):
# Class defined inside the function scope
class A:
def __init__(self):
# ISSUE: The analyzer fails to track 'taint_src' from the
# outer function argument into this local class scope.
self.data = taint_src
self.sani = '_'
obj = A()
taint_sink(obj.data)
def taint_sink(o):
os.system(o)
if __name__ == "__main__":
taint_src = "taint_src_value"
constructor_field_001_T(taint_src)Control Case (Working)
In this example, the class A is defined at the module level. The flow to os.system IS detected correctly.
import os
# Class defined at module level
class A:
def __init__(self):
# Accessing taint_src (as a global/captured in this context) works fine
self.data = taint_src
self.sani = '_'
def constructor_field_001_T(taint_src):
obj = A()
taint_sink(obj.data)
def taint_sink(o):
os.system(o)
if __name__ == "__main__":
taint_src = "taint_src_value"
constructor_field_001_T(taint_src)Additional Control Case (Working: Explicit Argument)
Significantly, if I keep the class inside the function but pass taint_src as an explicit argument to __init__, the flow IS detected.
import os
def constructor_field_explicit_arg(taint_src):
# Class defined inside function
class A:
# Explicit argument instead of capture
def __init__(self, val):
self.data = val
# Passing taint explicitly
obj = A(taint_src)
taint_sink(obj.data)
def taint_sink(o):
os.system(o)
if __name__ == "__main__":
taint_src = "taint_src_value"
constructor_field_explicit_arg(taint_src)CodeQL Query Used
I am using a standard DataFlow::Global configuration looking for the specific string literal flowing to os.system.
Click to view query
/**
* @name Python Taint Reproduction
* @kind path-problem
* @problem.severity error
* @id py/taint-reproduction
*/
import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
class TaintSource extends DataFlow::Node {
TaintSource() {
exists(StrConst str |
str.getText() = "taint_src_value" and
this.asExpr() = str
)
}
}
class DangerousSink extends DataFlow::Node {
DangerousSink() {
exists(Call call |
(
call.getFunc().(Attribute).getName() = "system" and
call.getFunc().(Attribute).getObject().(Name).getId() = "os"
) and
this.asExpr() = call.getAnArg()
)
}
}
module TaintConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof TaintSource
}
predicate isSink(DataFlow::Node sink) {
sink instanceof DangerousSink
}
}
module TaintFlow = TaintTracking::Global<TaintConfig>;
import TaintFlow::PathGraph
from TaintFlow::PathNode source, TaintFlow::PathNode sink
where TaintFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Taint flow detected"Expected Behavior
CodeQL should be able to track the taint_src argument into the __init__ method of the locally defined class A, eventually leading to the os.system sink, just as it does for top-level classes.