Skip to content

[Python] False Negative: Taint flow fails to penetrate function-local class definitions #20947

@android1257

Description

@android1257

Description

I have identified a false negative in Python DataFlow analysis where taint tracking is lost when a class is defined inside a function.

If a tainted variable is passed as an argument to a function, and that argument is subsequently used inside a class defined within that function (a function-local class), CodeQL fails to track the data flow to the class instance's attributes.

However, if a similar logic is applied using a top-level (module-level) class, the data flow is detected correctly. This suggests an issue with how data flow is handled across the scope boundary of locally defined classes.

Reproduction Case (False Negative)

In this example, taint_src is passed to constructor_field_001_T. The class A is defined inside the function and captures taint_src. The flow to os.system is NOT detected.

import os

def constructor_field_001_T(taint_src):  
    # Class defined inside the function scope
    class A:
        def __init__(self):
            # ISSUE: The analyzer fails to track 'taint_src' from the 
            # outer function argument into this local class scope.
            self.data = taint_src       
            self.sani = '_'

    obj = A()
    taint_sink(obj.data)

def taint_sink(o):
    os.system(o)

if __name__ == "__main__":
    taint_src = "taint_src_value"
    constructor_field_001_T(taint_src)

Control Case (Working)

In this example, the class A is defined at the module level. The flow to os.system IS detected correctly.

import os

# Class defined at module level
class A:
    def __init__(self):
        # Accessing taint_src (as a global/captured in this context) works fine
        self.data = taint_src       
        self.sani = '_'

def constructor_field_001_T(taint_src):  
    obj = A()
    taint_sink(obj.data)

def taint_sink(o):
    os.system(o)

if __name__ == "__main__":
    taint_src = "taint_src_value"
    constructor_field_001_T(taint_src)

Additional Control Case (Working: Explicit Argument)


Significantly, if I keep the class inside the function but pass taint_src as an explicit argument to __init__, the flow IS detected.

import os

def constructor_field_explicit_arg(taint_src):
    # Class defined inside function
    class A:
        # Explicit argument instead of capture
        def __init__(self, val):
            self.data = val

    # Passing taint explicitly
    obj = A(taint_src)
    taint_sink(obj.data)

def taint_sink(o):
    os.system(o)

if __name__ == "__main__":
    taint_src = "taint_src_value"
    constructor_field_explicit_arg(taint_src)

CodeQL Query Used

I am using a standard DataFlow::Global configuration looking for the specific string literal flowing to os.system.

Click to view query
/**
 * @name Python Taint Reproduction
 * @kind path-problem
 * @problem.severity error
 * @id py/taint-reproduction
 */

import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking

class TaintSource extends DataFlow::Node {
  TaintSource() {
    exists(StrConst str |
      str.getText() = "taint_src_value" and
      this.asExpr() = str
    )
  }
}

class DangerousSink extends DataFlow::Node {
  DangerousSink() {
    exists(Call call |
      (
        call.getFunc().(Attribute).getName() = "system" and
        call.getFunc().(Attribute).getObject().(Name).getId() = "os"
      ) and
      this.asExpr() = call.getAnArg()
    )
  }
}

module TaintConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source instanceof TaintSource
  }

  predicate isSink(DataFlow::Node sink) {
    sink instanceof DangerousSink
  }
}

module TaintFlow = TaintTracking::Global<TaintConfig>;
import TaintFlow::PathGraph

from TaintFlow::PathNode source, TaintFlow::PathNode sink
where TaintFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Taint flow detected"

Expected Behavior

CodeQL should be able to track the taint_src argument into the __init__ method of the locally defined class A, eventually leading to the os.system sink, just as it does for top-level classes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions