Code Security

Mar 30, 2026

What “AI Reads Your Code” Actually Means (And Why SAST Can’t)

Amartya | CodeAnt AI Code Review Platform

Sonali Sood

Founding GTM, CodeAnt AI

The Question Nobody Answers: What Does "AI Reads Your Code" Actually Mean?

Every AI penetration testing vendor says some version of the same thing:

Our AI analyzes your source code
Our engine understands your codebase
We read your code the way an attacker would

None of them explain what that actually means at a technical level. What is the AI doing when it "reads" code? How is it different from a SAST scanner? What does it find that traditional static analysis tools miss? What does "dataflow tracing" actually look like in practice?

These aren't rhetorical questions. The difference between genuine AI code reasoning and pattern-matching SAST with an AI badge determines whether you find authentication bypasses that are invisible from the outside, or whether you get a list of known vulnerability signatures and call it a code review.

This guide answers those questions precisely. Not at a marketing level, at the level of what the analysis actually does, what data structures it operates on, what algorithms it applies, what it produces, and crucially: what it finds that no other approach can.

If you're a security engineer trying to understand what you're evaluating when you compare AI pentesting providers, a CTO trying to understand why code review matters in a security engagement, or a developer trying to understand what "dataflow analysis" means for your specific codebase, this is the technical explanation that doesn't exist anywhere else.

Related:

The Fundamental Gap: Why SAST Isn't What We're Talking About

Before explaining what AI code analysis does, it's necessary to be precise about what it isn't, because the market uses the terms interchangeably and they're not the same thing.

What SAST Does

Static Application Security Testing tools analyze source code by pattern matching. They maintain a database of vulnerability signatures, patterns that correspond to known vulnerability classes, and check whether the code contains those patterns.

# SAST detection model for SQL injection:

# SAST looks for patterns like these:
patterns = [
    r'execute\\s*\\(\\s*[f"\\'].*%s',              # String formatting in execute()
    r'cursor\\.execute\\s*\\(\\s*f["\\']',           # f-string in execute()
    r'\\.raw\\s*\\(\\s*[f"\\']',                     # f-string in ORM raw()
    r'session\\.execute\\s*\\(\\s*text\\s*\\(',       # SQLAlchemy text() with string
]

# If any pattern matches → flag as potential SQL injection
# The SAST doesn't know:
#   - Whether the flagged code is actually reachable
#   - Whether the input that reaches the sink is actually user-controlled
#   - Whether there's sanitization between the source and sink
#   - Whether the pattern is a false positive in this specific context

# SAST detection model for SQL injection:

# SAST looks for patterns like these:
patterns = [
    r'execute\\s*\\(\\s*[f"\\'].*%s',              # String formatting in execute()
    r'cursor\\.execute\\s*\\(\\s*f["\\']',           # f-string in execute()
    r'\\.raw\\s*\\(\\s*[f"\\']',                     # f-string in ORM raw()
    r'session\\.execute\\s*\\(\\s*text\\s*\\(',       # SQLAlchemy text() with string
]

# If any pattern matches → flag as potential SQL injection
# The SAST doesn't know:
#   - Whether the flagged code is actually reachable
#   - Whether the input that reaches the sink is actually user-controlled
#   - Whether there's sanitization between the source and sink
#   - Whether the pattern is a false positive in this specific context

# SAST detection model for SQL injection:

# SAST looks for patterns like these:
patterns = [
    r'execute\\s*\\(\\s*[f"\\'].*%s',              # String formatting in execute()
    r'cursor\\.execute\\s*\\(\\s*f["\\']',           # f-string in execute()
    r'\\.raw\\s*\\(\\s*[f"\\']',                     # f-string in ORM raw()
    r'session\\.execute\\s*\\(\\s*text\\s*\\(',       # SQLAlchemy text() with string
]

# If any pattern matches → flag as potential SQL injection
# The SAST doesn't know:
#   - Whether the flagged code is actually reachable
#   - Whether the input that reaches the sink is actually user-controlled
#   - Whether there's sanitization between the source and sink
#   - Whether the pattern is a false positive in this specific context

SAST is fast, broad, and operates without execution. It's valuable as a first pass. But its fundamental limitation is pattern matching: it finds code that looks like a vulnerability, not code that is a vulnerability in the specific context of how your application actually runs.

What AI Code Reasoning Does

AI code reasoning builds a semantic model of the application and reasons about what an attacker can do given the specific code, configuration, and data flows of your system. It doesn't look for patterns, it understands meaning.

The difference in what this produces:

Dimension	SAST Pattern Matching	AI Code Reasoning
Analysis unit	Individual code pattern	Complete application model
User input tracking	Not tracked, pattern must contain input reference	Traced from HTTP entry through all transformations
Context awareness	None, each pattern evaluated independently	Full, knows what's reachable, what's sanitized, what's exploitable
Framework understanding	Pattern library per framework	Semantic understanding of auth chain execution
False positive rate	40–70%	Near zero (findings are confirmed exploitable
Novel vulnerability discovery	Cannot, only finds known patterns	Yes, finds vulnerabilities no pattern exists for
Authentication bypass detection	Only if bypass matches a known pattern	Yes, reads the auth chain and finds where it breaks
Business logic analysis	No, no concept of application intent	Yes, understands what the application should do

Step 1: Abstract Syntax Tree Construction: Building the Code's Skeleton

The first thing the AI does with your source code is parse it into an Abstract Syntax Tree (AST). This is the foundation everything else is built on.

An AST is a tree representation of the program's structure. Every function definition, every variable assignment, every conditional, every function call, each becomes a node in the tree with a defined type and defined relationships to its children.

# Source code:
def get_user_document(request, doc_id):
    user_id = request.user.id
    doc = Document.objects.get(id=doc_id)
    return doc.content

# Corresponding AST (simplified):
FunctionDef(
    name='get_user_document',
    args=['request', 'doc_id'],
    body=[
        Assign(
            target=Name('user_id'),
            value=Attribute(
                value=Attribute(
                    value=Name('request'),
                    attr='user'
                ),
                attr='id'
            )
        ),
        Assign(
            target=Name('doc'),
            value=Call(
                func=Attribute(
                    value=Attribute(
                        value=Name('Document'),
                        attr='objects'
                    ),
                    attr='get'
                ),
                args=[],
                kwargs={'id': Name('doc_id')}  # ← doc_id used here
            )
        ),
        Return(
            value=Attribute(
                value=Name('doc'),
                attr='content'
            )
        )
    ]
)

# Source code:
def get_user_document(request, doc_id):
    user_id = request.user.id
    doc = Document.objects.get(id=doc_id)
    return doc.content

# Corresponding AST (simplified):
FunctionDef(
    name='get_user_document',
    args=['request', 'doc_id'],
    body=[
        Assign(
            target=Name('user_id'),
            value=Attribute(
                value=Attribute(
                    value=Name('request'),
                    attr='user'
                ),
                attr='id'
            )
        ),
        Assign(
            target=Name('doc'),
            value=Call(
                func=Attribute(
                    value=Attribute(
                        value=Name('Document'),
                        attr='objects'
                    ),
                    attr='get'
                ),
                args=[],
                kwargs={'id': Name('doc_id')}  # ← doc_id used here
            )
        ),
        Return(
            value=Attribute(
                value=Name('doc'),
                attr='content'
            )
        )
    ]
)

# Source code:
def get_user_document(request, doc_id):
    user_id = request.user.id
    doc = Document.objects.get(id=doc_id)
    return doc.content

# Corresponding AST (simplified):
FunctionDef(
    name='get_user_document',
    args=['request', 'doc_id'],
    body=[
        Assign(
            target=Name('user_id'),
            value=Attribute(
                value=Attribute(
                    value=Name('request'),
                    attr='user'
                ),
                attr='id'
            )
        ),
        Assign(
            target=Name('doc'),
            value=Call(
                func=Attribute(
                    value=Attribute(
                        value=Name('Document'),
                        attr='objects'
                    ),
                    attr='get'
                ),
                args=[],
                kwargs={'id': Name('doc_id')}  # ← doc_id used here
            )
        ),
        Return(
            value=Attribute(
                value=Name('doc'),
                attr='content'
            )
        )
    ]
)

With the AST, the AI can answer structural questions about the code:

Which functions are called from which other functions? (Call graph)
Which variables are assigned from which sources? (Data dependency)
Which code paths are conditionally executed? (Control flow)
Which parameters flow into which function calls? (Parameter tracing)

The AST is not just a representation, it's a queryable data structure. The AI can ask: "Show me every call to Document.objects.get() what parameters are passed, where do those parameters come from, and does any of them originate from user input?"

import ast

def find_orm_calls_without_user_filter(source_code, orm_model_name):
    """
    Find all ORM .get() calls that don't include a user ownership filter.
    This is the programmatic detection of missing authorization.
    """
    tree = ast.parse(source_code)

    vulnerable_calls = []

    for node in ast.walk(tree):
        # Find all attribute access chains (e.g., Document.objects.get())
        if isinstance(node, ast.Call):
            func = node.func

            # Check if this is a Model.objects.get() or Model.objects.filter() call
            if (isinstance(func, ast.Attribute) and
                func.attr in ('get', 'filter', 'exclude', 'first', 'last') and
                isinstance(func.value, ast.Attribute) and
                func.value.attr == 'objects'):

                model_name = func.value.value.id if isinstance(func.value.value, ast.Name) else None

                if model_name == orm_model_name:
                    # Check keywords for user/owner filter
                    has_user_filter = any(
                        kw.arg in ('user', 'owner', 'user_id', 'created_by', 'tenant')
                        for kw in node.keywords
                    )

                    if not has_user_filter:
                        vulnerable_calls.append({
                            'line': node.lineno,
                            'col': node.col_offset,
                            'model': model_name,
                            'method': func.attr,
                            'has_user_filter': False,
                            'finding': f'Missing user ownership filter in {model_name}.objects.{func.attr}()',
                            'recommendation': f'Add user={request.user} or owner={request.user} to filter'
                        })

    return vulnerable_calls

# Example usage:
source = open('views/documents.py').read()
findings = find_orm_calls_without_user_filter(source, 'Document')

for finding in findings:
    print(f"Line {finding['line']}: {finding['finding']}")
# Output:
# Line 47: Missing user ownership filter in Document.objects.get()
# Line 93: Missing user ownership filter in Document.objects.filter()
# Line 156: Missing user ownership filter in Document.objects.first()

import ast

def find_orm_calls_without_user_filter(source_code, orm_model_name):
    """
    Find all ORM .get() calls that don't include a user ownership filter.
    This is the programmatic detection of missing authorization.
    """
    tree = ast.parse(source_code)

    vulnerable_calls = []

    for node in ast.walk(tree):
        # Find all attribute access chains (e.g., Document.objects.get())
        if isinstance(node, ast.Call):
            func = node.func

            # Check if this is a Model.objects.get() or Model.objects.filter() call
            if (isinstance(func, ast.Attribute) and
                func.attr in ('get', 'filter', 'exclude', 'first', 'last') and
                isinstance(func.value, ast.Attribute) and
                func.value.attr == 'objects'):

                model_name = func.value.value.id if isinstance(func.value.value, ast.Name) else None

                if model_name == orm_model_name:
                    # Check keywords for user/owner filter
                    has_user_filter = any(
                        kw.arg in ('user', 'owner', 'user_id', 'created_by', 'tenant')
                        for kw in node.keywords
                    )

                    if not has_user_filter:
                        vulnerable_calls.append({
                            'line': node.lineno,
                            'col': node.col_offset,
                            'model': model_name,
                            'method': func.attr,
                            'has_user_filter': False,
                            'finding': f'Missing user ownership filter in {model_name}.objects.{func.attr}()',
                            'recommendation': f'Add user={request.user} or owner={request.user} to filter'
                        })

    return vulnerable_calls

# Example usage:
source = open('views/documents.py').read()
findings = find_orm_calls_without_user_filter(source, 'Document')

for finding in findings:
    print(f"Line {finding['line']}: {finding['finding']}")
# Output:
# Line 47: Missing user ownership filter in Document.objects.get()
# Line 93: Missing user ownership filter in Document.objects.filter()
# Line 156: Missing user ownership filter in Document.objects.first()

import ast

def find_orm_calls_without_user_filter(source_code, orm_model_name):
    """
    Find all ORM .get() calls that don't include a user ownership filter.
    This is the programmatic detection of missing authorization.
    """
    tree = ast.parse(source_code)

    vulnerable_calls = []

    for node in ast.walk(tree):
        # Find all attribute access chains (e.g., Document.objects.get())
        if isinstance(node, ast.Call):
            func = node.func

            # Check if this is a Model.objects.get() or Model.objects.filter() call
            if (isinstance(func, ast.Attribute) and
                func.attr in ('get', 'filter', 'exclude', 'first', 'last') and
                isinstance(func.value, ast.Attribute) and
                func.value.attr == 'objects'):

                model_name = func.value.value.id if isinstance(func.value.value, ast.Name) else None

                if model_name == orm_model_name:
                    # Check keywords for user/owner filter
                    has_user_filter = any(
                        kw.arg in ('user', 'owner', 'user_id', 'created_by', 'tenant')
                        for kw in node.keywords
                    )

                    if not has_user_filter:
                        vulnerable_calls.append({
                            'line': node.lineno,
                            'col': node.col_offset,
                            'model': model_name,
                            'method': func.attr,
                            'has_user_filter': False,
                            'finding': f'Missing user ownership filter in {model_name}.objects.{func.attr}()',
                            'recommendation': f'Add user={request.user} or owner={request.user} to filter'
                        })

    return vulnerable_calls

# Example usage:
source = open('views/documents.py').read()
findings = find_orm_calls_without_user_filter(source, 'Document')

for finding in findings:
    print(f"Line {finding['line']}: {finding['finding']}")
# Output:
# Line 47: Missing user ownership filter in Document.objects.get()
# Line 93: Missing user ownership filter in Document.objects.filter()
# Line 156: Missing user ownership filter in Document.objects.first()

This is not pattern matching. The AST-based analysis understands the code's structure, which calls belong to which models, what keyword arguments are present, what their semantic meaning is, and can reason about the absence of required arguments, not just the presence of suspicious patterns.

Step 2: Call Graph Construction: Understanding How Functions Connect

With the AST built, the AI constructs a call graph, a directed graph where each node is a function or method, and each edge represents "this function calls that function."

The call graph answers the question: if an HTTP request arrives at endpoint X, which functions execute, in what order, with what data?

import ast
from collections import defaultdict

class CallGraphBuilder(ast.NodeVisitor):
    """Build a call graph from Python source code"""

    def __init__(self):
        self.call_graph = defaultdict(set)  # function → set of functions it calls
        self.current_function = None
        self.function_locations = {}  # function name → (file, line)

    def visit_FunctionDef(self, node):
        """Record function definition"""
        previous_function = self.current_function
        self.current_function = node.name
        self.function_locations[node.name] = (self.current_file, node.lineno)

        # Visit the function body
        self.generic_visit(node)

        self.current_function = previous_function

    def visit_Call(self, node):
        """Record function calls"""
        if self.current_function:
            # Resolve the called function's name
            called_function = self.resolve_call_target(node)

            if called_function:
                self.call_graph[self.current_function].add(called_function)

        self.generic_visit(node)

    def resolve_call_target(self, call_node):
        """Extract the name of the function being called"""
        func = call_node.func

        if isinstance(func, ast.Name):
            return func.id
        elif isinstance(func, ast.Attribute):
            # method.call() → "method.call"
            if isinstance(func.value, ast.Name):
                return f"{func.value.id}.{func.attr}"
            elif isinstance(func.value, ast.Attribute):
                # chain.method.call()
                return f"{func.value.attr}.{func.attr}"

        return None

    def find_all_callers(self, target_function):
        """Find all functions that eventually call target_function"""
        callers = set()

        def dfs(func):
            for caller, callees in self.call_graph.items():
                if func in callees and caller not in callers:
                    callers.add(caller)
                    dfs(caller)  # Recursive — find callers of callers

        dfs(target_function)
        return callers

    def find_all_callees(self, source_function, depth=10):
        """Find all functions called by source_function (transitive)"""
        visited = set()

        def dfs(func, current_depth):
            if current_depth == 0 or func in visited:
                return
            visited.add(func)
            for callee in self.call_graph.get(func, set()):
                dfs(callee, current_depth - 1)

        dfs(source_function, depth)
        return visited - {source_function}

import ast
from collections import defaultdict

class CallGraphBuilder(ast.NodeVisitor):
    """Build a call graph from Python source code"""

    def __init__(self):
        self.call_graph = defaultdict(set)  # function → set of functions it calls
        self.current_function = None
        self.function_locations = {}  # function name → (file, line)

    def visit_FunctionDef(self, node):
        """Record function definition"""
        previous_function = self.current_function
        self.current_function = node.name
        self.function_locations[node.name] = (self.current_file, node.lineno)

        # Visit the function body
        self.generic_visit(node)

        self.current_function = previous_function

    def visit_Call(self, node):
        """Record function calls"""
        if self.current_function:
            # Resolve the called function's name
            called_function = self.resolve_call_target(node)

            if called_function:
                self.call_graph[self.current_function].add(called_function)

        self.generic_visit(node)

    def resolve_call_target(self, call_node):
        """Extract the name of the function being called"""
        func = call_node.func

        if isinstance(func, ast.Name):
            return func.id
        elif isinstance(func, ast.Attribute):
            # method.call() → "method.call"
            if isinstance(func.value, ast.Name):
                return f"{func.value.id}.{func.attr}"
            elif isinstance(func.value, ast.Attribute):
                # chain.method.call()
                return f"{func.value.attr}.{func.attr}"

        return None

    def find_all_callers(self, target_function):
        """Find all functions that eventually call target_function"""
        callers = set()

        def dfs(func):
            for caller, callees in self.call_graph.items():
                if func in callees and caller not in callers:
                    callers.add(caller)
                    dfs(caller)  # Recursive — find callers of callers

        dfs(target_function)
        return callers

    def find_all_callees(self, source_function, depth=10):
        """Find all functions called by source_function (transitive)"""
        visited = set()

        def dfs(func, current_depth):
            if current_depth == 0 or func in visited:
                return
            visited.add(func)
            for callee in self.call_graph.get(func, set()):
                dfs(callee, current_depth - 1)

        dfs(source_function, depth)
        return visited - {source_function}

import ast
from collections import defaultdict

class CallGraphBuilder(ast.NodeVisitor):
    """Build a call graph from Python source code"""

    def __init__(self):
        self.call_graph = defaultdict(set)  # function → set of functions it calls
        self.current_function = None
        self.function_locations = {}  # function name → (file, line)

    def visit_FunctionDef(self, node):
        """Record function definition"""
        previous_function = self.current_function
        self.current_function = node.name
        self.function_locations[node.name] = (self.current_file, node.lineno)

        # Visit the function body
        self.generic_visit(node)

        self.current_function = previous_function

    def visit_Call(self, node):
        """Record function calls"""
        if self.current_function:
            # Resolve the called function's name
            called_function = self.resolve_call_target(node)

            if called_function:
                self.call_graph[self.current_function].add(called_function)

        self.generic_visit(node)

    def resolve_call_target(self, call_node):
        """Extract the name of the function being called"""
        func = call_node.func

        if isinstance(func, ast.Name):
            return func.id
        elif isinstance(func, ast.Attribute):
            # method.call() → "method.call"
            if isinstance(func.value, ast.Name):
                return f"{func.value.id}.{func.attr}"
            elif isinstance(func.value, ast.Attribute):
                # chain.method.call()
                return f"{func.value.attr}.{func.attr}"

        return None

    def find_all_callers(self, target_function):
        """Find all functions that eventually call target_function"""
        callers = set()

        def dfs(func):
            for caller, callees in self.call_graph.items():
                if func in callees and caller not in callers:
                    callers.add(caller)
                    dfs(caller)  # Recursive — find callers of callers

        dfs(target_function)
        return callers

    def find_all_callees(self, source_function, depth=10):
        """Find all functions called by source_function (transitive)"""
        visited = set()

        def dfs(func, current_depth):
            if current_depth == 0 or func in visited:
                return
            visited.add(func)
            for callee in self.call_graph.get(func, set()):
                dfs(callee, current_depth - 1)

        dfs(source_function, depth)
        return visited - {source_function}

The call graph enables critical security reasoning:

# Using the call graph for security analysis:

builder = CallGraphBuilder()
# ... (build graph from all source files)

# Question 1: Which HTTP endpoints eventually call cursor.execute()?
# (identifying all SQL execution paths from HTTP entry points)
endpoints = find_http_endpoints()  # returns view functions
sql_execution_paths = []

for endpoint in endpoints:
    callees = builder.find_all_callees(endpoint)
    if 'cursor.execute' in callees or 'cursor.executemany' in callees:
        sql_execution_paths.append({
            'endpoint': endpoint,
            'sql_functions': callees & {'cursor.execute', 'cursor.executemany'}
        })

# Question 2: Which functions call the dangerous subprocess.run?
# (finding command injection attack surface)
command_execution_callers = builder.find_all_callers('subprocess.run')
# Now check: do any of these callers receive user input?

# Question 3: Is the authentication check always executed before this admin function?
auth_callers = builder.find_all_callers('require_admin_auth')
admin_function_callers = builder.find_all_callers('admin_user_export')

# If admin_user_export has callers that don't call require_admin_auth → auth bypass
unprotected_paths = admin_function_callers - auth_callers

# Using the call graph for security analysis:

builder = CallGraphBuilder()
# ... (build graph from all source files)

# Question 1: Which HTTP endpoints eventually call cursor.execute()?
# (identifying all SQL execution paths from HTTP entry points)
endpoints = find_http_endpoints()  # returns view functions
sql_execution_paths = []

for endpoint in endpoints:
    callees = builder.find_all_callees(endpoint)
    if 'cursor.execute' in callees or 'cursor.executemany' in callees:
        sql_execution_paths.append({
            'endpoint': endpoint,
            'sql_functions': callees & {'cursor.execute', 'cursor.executemany'}
        })

# Question 2: Which functions call the dangerous subprocess.run?
# (finding command injection attack surface)
command_execution_callers = builder.find_all_callers('subprocess.run')
# Now check: do any of these callers receive user input?

# Question 3: Is the authentication check always executed before this admin function?
auth_callers = builder.find_all_callers('require_admin_auth')
admin_function_callers = builder.find_all_callers('admin_user_export')

# If admin_user_export has callers that don't call require_admin_auth → auth bypass
unprotected_paths = admin_function_callers - auth_callers

# Using the call graph for security analysis:

builder = CallGraphBuilder()
# ... (build graph from all source files)

# Question 1: Which HTTP endpoints eventually call cursor.execute()?
# (identifying all SQL execution paths from HTTP entry points)
endpoints = find_http_endpoints()  # returns view functions
sql_execution_paths = []

for endpoint in endpoints:
    callees = builder.find_all_callees(endpoint)
    if 'cursor.execute' in callees or 'cursor.executemany' in callees:
        sql_execution_paths.append({
            'endpoint': endpoint,
            'sql_functions': callees & {'cursor.execute', 'cursor.executemany'}
        })

# Question 2: Which functions call the dangerous subprocess.run?
# (finding command injection attack surface)
command_execution_callers = builder.find_all_callers('subprocess.run')
# Now check: do any of these callers receive user input?

# Question 3: Is the authentication check always executed before this admin function?
auth_callers = builder.find_all_callers('require_admin_auth')
admin_function_callers = builder.find_all_callers('admin_user_export')

# If admin_user_export has callers that don't call require_admin_auth → auth bypass
unprotected_paths = admin_function_callers - auth_callers

Step 3: Taint Analysis: Following User Input Through the Entire Application

Taint analysis is the core of finding injection vulnerabilities. It tracks the flow of "tainted" data, data that originates from untrusted sources (user input), through the application to "sinks," places where that data is used in a security-sensitive context.

Defining Sources and Sinks

# Sources: Where user-controlled data enters the application
TAINT_SOURCES = {
    # Django
    'request.GET.get': 'user_input',
    'request.POST.get': 'user_input',
    'request.data.get': 'user_input',      # DRF
    'request.query_params.get': 'user_input',
    'request.json.get': 'user_input',      # Flask
    'request.form.get': 'user_input',      # Flask
    'request.args.get': 'user_input',      # Flask

    # Generic HTTP frameworks
    'request.body': 'user_input',
    'request.path': 'user_input',
    'request.headers.get': 'user_input',
    'request.cookies.get': 'user_input',

    # File uploads
    'request.FILES.get': 'user_input',
    'request.files.get': 'user_input',     # Flask
}

# Sinks: Where tainted data causes vulnerabilities when it arrives unsanitized
TAINT_SINKS = {
    # SQL injection sinks
    'cursor.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'cursor.executemany': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'connection.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'Model.objects.raw': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'session.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},

    # Command injection sinks
    'os.system': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.run': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.Popen': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.call': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'eval': {'vuln_type': 'Code Injection', 'severity': 'CRITICAL'},
    'exec': {'vuln_type': 'Code Injection', 'severity': 'CRITICAL'},

    # Path traversal sinks
    'open': {'vuln_type': 'Path Traversal', 'severity': 'HIGH'},
    'os.path.join': {'vuln_type': 'Path Traversal', 'severity': 'MEDIUM'},
    'os.listdir': {'vuln_type': 'Path Traversal', 'severity': 'MEDIUM'},
    'os.remove': {'vuln_type': 'Arbitrary File Deletion', 'severity': 'HIGH'},

    # SSRF sinks
    'requests.get': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'requests.post': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'urllib.request.urlopen': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'httpx.get': {'vuln_type': 'SSRF', 'severity': 'HIGH'},

    # Template injection sinks
    'render_template_string': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},
    'Template': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},
    'jinja2.Template': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},

    # Deserialization sinks
    'pickle.loads': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},
    'pickle.load': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},
    'yaml.load': {'vuln_type': 'Unsafe Deserialization', 'severity': 'HIGH'},
    'marshal.loads': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},

    # XSS sinks (server-side rendering)
    'mark_safe': {'vuln_type': 'XSS', 'severity': 'HIGH'},
    'Markup': {'vuln_type': 'XSS', 'severity': 'HIGH'},
}

# Sources: Where user-controlled data enters the application
TAINT_SOURCES = {
    # Django
    'request.GET.get': 'user_input',
    'request.POST.get': 'user_input',
    'request.data.get': 'user_input',      # DRF
    'request.query_params.get': 'user_input',
    'request.json.get': 'user_input',      # Flask
    'request.form.get': 'user_input',      # Flask
    'request.args.get': 'user_input',      # Flask

    # Generic HTTP frameworks
    'request.body': 'user_input',
    'request.path': 'user_input',
    'request.headers.get': 'user_input',
    'request.cookies.get': 'user_input',

    # File uploads
    'request.FILES.get': 'user_input',
    'request.files.get': 'user_input',     # Flask
}

# Sinks: Where tainted data causes vulnerabilities when it arrives unsanitized
TAINT_SINKS = {
    # SQL injection sinks
    'cursor.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'cursor.executemany': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'connection.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'Model.objects.raw': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'session.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},

    # Command injection sinks
    'os.system': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.run': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.Popen': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.call': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'eval': {'vuln_type': 'Code Injection', 'severity': 'CRITICAL'},
    'exec': {'vuln_type': 'Code Injection', 'severity': 'CRITICAL'},

    # Path traversal sinks
    'open': {'vuln_type': 'Path Traversal', 'severity': 'HIGH'},
    'os.path.join': {'vuln_type': 'Path Traversal', 'severity': 'MEDIUM'},
    'os.listdir': {'vuln_type': 'Path Traversal', 'severity': 'MEDIUM'},
    'os.remove': {'vuln_type': 'Arbitrary File Deletion', 'severity': 'HIGH'},

    # SSRF sinks
    'requests.get': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'requests.post': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'urllib.request.urlopen': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'httpx.get': {'vuln_type': 'SSRF', 'severity': 'HIGH'},

    # Template injection sinks
    'render_template_string': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},
    'Template': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},
    'jinja2.Template': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},

    # Deserialization sinks
    'pickle.loads': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},
    'pickle.load': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},
    'yaml.load': {'vuln_type': 'Unsafe Deserialization', 'severity': 'HIGH'},
    'marshal.loads': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},

    # XSS sinks (server-side rendering)
    'mark_safe': {'vuln_type': 'XSS', 'severity': 'HIGH'},
    'Markup': {'vuln_type': 'XSS', 'severity': 'HIGH'},
}

# Sources: Where user-controlled data enters the application
TAINT_SOURCES = {
    # Django
    'request.GET.get': 'user_input',
    'request.POST.get': 'user_input',
    'request.data.get': 'user_input',      # DRF
    'request.query_params.get': 'user_input',
    'request.json.get': 'user_input',      # Flask
    'request.form.get': 'user_input',      # Flask
    'request.args.get': 'user_input',      # Flask

    # Generic HTTP frameworks
    'request.body': 'user_input',
    'request.path': 'user_input',
    'request.headers.get': 'user_input',
    'request.cookies.get': 'user_input',

    # File uploads
    'request.FILES.get': 'user_input',
    'request.files.get': 'user_input',     # Flask
}

# Sinks: Where tainted data causes vulnerabilities when it arrives unsanitized
TAINT_SINKS = {
    # SQL injection sinks
    'cursor.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'cursor.executemany': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'connection.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'Model.objects.raw': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},
    'session.execute': {'vuln_type': 'SQL Injection', 'severity': 'HIGH'},

    # Command injection sinks
    'os.system': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.run': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.Popen': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'subprocess.call': {'vuln_type': 'Command Injection', 'severity': 'CRITICAL'},
    'eval': {'vuln_type': 'Code Injection', 'severity': 'CRITICAL'},
    'exec': {'vuln_type': 'Code Injection', 'severity': 'CRITICAL'},

    # Path traversal sinks
    'open': {'vuln_type': 'Path Traversal', 'severity': 'HIGH'},
    'os.path.join': {'vuln_type': 'Path Traversal', 'severity': 'MEDIUM'},
    'os.listdir': {'vuln_type': 'Path Traversal', 'severity': 'MEDIUM'},
    'os.remove': {'vuln_type': 'Arbitrary File Deletion', 'severity': 'HIGH'},

    # SSRF sinks
    'requests.get': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'requests.post': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'urllib.request.urlopen': {'vuln_type': 'SSRF', 'severity': 'HIGH'},
    'httpx.get': {'vuln_type': 'SSRF', 'severity': 'HIGH'},

    # Template injection sinks
    'render_template_string': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},
    'Template': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},
    'jinja2.Template': {'vuln_type': 'SSTI', 'severity': 'CRITICAL'},

    # Deserialization sinks
    'pickle.loads': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},
    'pickle.load': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},
    'yaml.load': {'vuln_type': 'Unsafe Deserialization', 'severity': 'HIGH'},
    'marshal.loads': {'vuln_type': 'Unsafe Deserialization', 'severity': 'CRITICAL'},

    # XSS sinks (server-side rendering)
    'mark_safe': {'vuln_type': 'XSS', 'severity': 'HIGH'},
    'Markup': {'vuln_type': 'XSS', 'severity': 'HIGH'},
}

The Taint Propagation Engine

class TaintPropagationEngine:
    """
    Track tainted data through the application.
    Follows user input from source to sink through all transformations.
    """

    def __init__(self, ast_trees, call_graph):
        self.ast_trees = ast_trees
        self.call_graph = call_graph
        self.taint_facts = {}  # variable → taint_label
        self.findings = []

    def analyze_function(self, function_node, initial_taint=None):
        """
        Analyze a function for taint flow.
        initial_taint: dict of parameter_name → taint_label
        """
        taint_state = initial_taint or {}

        for stmt in ast.walk(function_node):

            # Assignment: propagate taint from right to left
            if isinstance(stmt, ast.Assign):
                for target in stmt.targets:
                    if self.is_tainted(stmt.value, taint_state):
                        taint_label = self.get_taint_label(stmt.value, taint_state)

                        if isinstance(target, ast.Name):
                            taint_state[target.id] = taint_label

            # Call: check if tainted data reaches a sink
            elif isinstance(stmt, ast.Call):
                sink_name = self.resolve_call_name(stmt)

                if sink_name in TAINT_SINKS:
                    # Check if any argument is tainted
                    for i, arg in enumerate(stmt.args):
                        if self.is_tainted(arg, taint_state):
                            self.report_finding(
                                sink=sink_name,
                                taint_label=self.get_taint_label(arg, taint_state),
                                sink_info=TAINT_SINKS[sink_name],
                                argument_index=i,
                                location=stmt
                            )

                    for kw in stmt.keywords:
                        if self.is_tainted(kw.value, taint_state):
                            self.report_finding(
                                sink=sink_name,
                                taint_label=self.get_taint_label(kw.value, taint_state),
                                sink_info=TAINT_SINKS[sink_name],
                                keyword=kw.arg,
                                location=stmt
                            )

                # Inter-procedural: propagate taint into called functions
                elif sink_name in self.call_graph:
                    tainted_args = {
                        i: self.get_taint_label(arg, taint_state)
                        for i, arg in enumerate(stmt.args)
                        if self.is_tainted(arg, taint_state)
                    }

                    if tainted_args:
                        # Recursively analyze the called function with tainted params
                        called_func = self.find_function(sink_name)
                        if called_func:
                            param_taint = {
                                called_func.args.args[i].arg: label
                                for i, label in tainted_args.items()
                                if i < len(called_func.args.args)
                            }
                            self.analyze_function(called_func, param_taint)

        return taint_state

    def is_tainted(self, node, taint_state):
        """Check if an AST node represents tainted data"""
        if isinstance(node, ast.Name):
            return node.id in taint_state

        elif isinstance(node, ast.Call):
            func_name = self.resolve_call_name(node)
            return func_name in TAINT_SOURCES

        elif isinstance(node, ast.JoinedStr):  # f-string
            # f-string is tainted if any of its values are tainted
            return any(
                self.is_tainted(value, taint_state)
                for value in node.values
                if isinstance(value, ast.FormattedValue)
            )

        elif isinstance(node, ast.BinOp):  # String concatenation
            return (self.is_tainted(node.left, taint_state) or
                    self.is_tainted(node.right, taint_state))

        elif isinstance(node, ast.Subscript):  # dict['key'] or list[i]
            return self.is_tainted(node.value, taint_state)

        elif isinstance(node, ast.Attribute):  # obj.attr
            return self.is_tainted(node.value, taint_state)

        return False

class TaintPropagationEngine:
    """
    Track tainted data through the application.
    Follows user input from source to sink through all transformations.
    """

    def __init__(self, ast_trees, call_graph):
        self.ast_trees = ast_trees
        self.call_graph = call_graph
        self.taint_facts = {}  # variable → taint_label
        self.findings = []

    def analyze_function(self, function_node, initial_taint=None):
        """
        Analyze a function for taint flow.
        initial_taint: dict of parameter_name → taint_label
        """
        taint_state = initial_taint or {}

        for stmt in ast.walk(function_node):

            # Assignment: propagate taint from right to left
            if isinstance(stmt, ast.Assign):
                for target in stmt.targets:
                    if self.is_tainted(stmt.value, taint_state):
                        taint_label = self.get_taint_label(stmt.value, taint_state)

                        if isinstance(target, ast.Name):
                            taint_state[target.id] = taint_label

            # Call: check if tainted data reaches a sink
            elif isinstance(stmt, ast.Call):
                sink_name = self.resolve_call_name(stmt)

                if sink_name in TAINT_SINKS:
                    # Check if any argument is tainted
                    for i, arg in enumerate(stmt.args):
                        if self.is_tainted(arg, taint_state):
                            self.report_finding(
                                sink=sink_name,
                                taint_label=self.get_taint_label(arg, taint_state),
                                sink_info=TAINT_SINKS[sink_name],
                                argument_index=i,
                                location=stmt
                            )

                    for kw in stmt.keywords:
                        if self.is_tainted(kw.value, taint_state):
                            self.report_finding(
                                sink=sink_name,
                                taint_label=self.get_taint_label(kw.value, taint_state),
                                sink_info=TAINT_SINKS[sink_name],
                                keyword=kw.arg,
                                location=stmt
                            )

                # Inter-procedural: propagate taint into called functions
                elif sink_name in self.call_graph:
                    tainted_args = {
                        i: self.get_taint_label(arg, taint_state)
                        for i, arg in enumerate(stmt.args)
                        if self.is_tainted(arg, taint_state)
                    }

                    if tainted_args:
                        # Recursively analyze the called function with tainted params
                        called_func = self.find_function(sink_name)
                        if called_func:
                            param_taint = {
                                called_func.args.args[i].arg: label
                                for i, label in tainted_args.items()
                                if i < len(called_func.args.args)
                            }
                            self.analyze_function(called_func, param_taint)

        return taint_state

    def is_tainted(self, node, taint_state):
        """Check if an AST node represents tainted data"""
        if isinstance(node, ast.Name):
            return node.id in taint_state

        elif isinstance(node, ast.Call):
            func_name = self.resolve_call_name(node)
            return func_name in TAINT_SOURCES

        elif isinstance(node, ast.JoinedStr):  # f-string
            # f-string is tainted if any of its values are tainted
            return any(
                self.is_tainted(value, taint_state)
                for value in node.values
                if isinstance(value, ast.FormattedValue)
            )

        elif isinstance(node, ast.BinOp):  # String concatenation
            return (self.is_tainted(node.left, taint_state) or
                    self.is_tainted(node.right, taint_state))

        elif isinstance(node, ast.Subscript):  # dict['key'] or list[i]
            return self.is_tainted(node.value, taint_state)

        elif isinstance(node, ast.Attribute):  # obj.attr
            return self.is_tainted(node.value, taint_state)

        return False

class TaintPropagationEngine:
    """
    Track tainted data through the application.
    Follows user input from source to sink through all transformations.
    """

    def __init__(self, ast_trees, call_graph):
        self.ast_trees = ast_trees
        self.call_graph = call_graph
        self.taint_facts = {}  # variable → taint_label
        self.findings = []

    def analyze_function(self, function_node, initial_taint=None):
        """
        Analyze a function for taint flow.
        initial_taint: dict of parameter_name → taint_label
        """
        taint_state = initial_taint or {}

        for stmt in ast.walk(function_node):

            # Assignment: propagate taint from right to left
            if isinstance(stmt, ast.Assign):
                for target in stmt.targets:
                    if self.is_tainted(stmt.value, taint_state):
                        taint_label = self.get_taint_label(stmt.value, taint_state)

                        if isinstance(target, ast.Name):
                            taint_state[target.id] = taint_label

            # Call: check if tainted data reaches a sink
            elif isinstance(stmt, ast.Call):
                sink_name = self.resolve_call_name(stmt)

                if sink_name in TAINT_SINKS:
                    # Check if any argument is tainted
                    for i, arg in enumerate(stmt.args):
                        if self.is_tainted(arg, taint_state):
                            self.report_finding(
                                sink=sink_name,
                                taint_label=self.get_taint_label(arg, taint_state),
                                sink_info=TAINT_SINKS[sink_name],
                                argument_index=i,
                                location=stmt
                            )

                    for kw in stmt.keywords:
                        if self.is_tainted(kw.value, taint_state):
                            self.report_finding(
                                sink=sink_name,
                                taint_label=self.get_taint_label(kw.value, taint_state),
                                sink_info=TAINT_SINKS[sink_name],
                                keyword=kw.arg,
                                location=stmt
                            )

                # Inter-procedural: propagate taint into called functions
                elif sink_name in self.call_graph:
                    tainted_args = {
                        i: self.get_taint_label(arg, taint_state)
                        for i, arg in enumerate(stmt.args)
                        if self.is_tainted(arg, taint_state)
                    }

                    if tainted_args:
                        # Recursively analyze the called function with tainted params
                        called_func = self.find_function(sink_name)
                        if called_func:
                            param_taint = {
                                called_func.args.args[i].arg: label
                                for i, label in tainted_args.items()
                                if i < len(called_func.args.args)
                            }
                            self.analyze_function(called_func, param_taint)

        return taint_state

    def is_tainted(self, node, taint_state):
        """Check if an AST node represents tainted data"""
        if isinstance(node, ast.Name):
            return node.id in taint_state

        elif isinstance(node, ast.Call):
            func_name = self.resolve_call_name(node)
            return func_name in TAINT_SOURCES

        elif isinstance(node, ast.JoinedStr):  # f-string
            # f-string is tainted if any of its values are tainted
            return any(
                self.is_tainted(value, taint_state)
                for value in node.values
                if isinstance(value, ast.FormattedValue)
            )

        elif isinstance(node, ast.BinOp):  # String concatenation
            return (self.is_tainted(node.left, taint_state) or
                    self.is_tainted(node.right, taint_state))

        elif isinstance(node, ast.Subscript):  # dict['key'] or list[i]
            return self.is_tainted(node.value, taint_state)

        elif isinstance(node, ast.Attribute):  # obj.attr
            return self.is_tainted(node.value, taint_state)

        return False

A Real Taint Trace: Finding a Non-Obvious SQL Injection

This example shows taint analysis finding a SQL injection that SAST pattern matching cannot, because the injection point is separated from the source by multiple function calls:

# views.py — HTTP entry point
class ProductSearchView(APIView):
    def get(self, request):
        search_term = request.GET.get('q', '')          # TAINT SOURCE
        category = request.GET.get('category', '')      # TAINT SOURCE
        sort_field = request.GET.get('sort_field', 'name')  # TAINT SOURCE

        results = ProductSearchService.search(
            query=search_term,
            category=category,
            sort=sort_field       # ← taint propagates into search()
        )
        return Response(results)

# services/product_search.py — taint travels here
class ProductSearchService:
    @staticmethod
    def search(query, category, sort='name'):
        # Applies basic text filtering (safe — uses ORM)
        queryset = Product.objects.filter(
            name__icontains=query,      # Safe — ORM parameterizes
            category=category           # Safe — ORM parameterizes
        )

        # Passes sort to the ordering method — taint travels further
        return ProductSearchService._apply_ordering(queryset, sort)

    @staticmethod
    def _apply_ordering(queryset, sort_field):
        # TAINT ARRIVES HERE
        # sort_field originated from request.GET — user controlled

        # SINK: raw SQL with f-string — SQL injection
        return queryset.extra(
            order_by=[f"products_product.{sort_field}"]  # ← VULNERABLE
        )

# views.py — HTTP entry point
class ProductSearchView(APIView):
    def get(self, request):
        search_term = request.GET.get('q', '')          # TAINT SOURCE
        category = request.GET.get('category', '')      # TAINT SOURCE
        sort_field = request.GET.get('sort_field', 'name')  # TAINT SOURCE

        results = ProductSearchService.search(
            query=search_term,
            category=category,
            sort=sort_field       # ← taint propagates into search()
        )
        return Response(results)

# services/product_search.py — taint travels here
class ProductSearchService:
    @staticmethod
    def search(query, category, sort='name'):
        # Applies basic text filtering (safe — uses ORM)
        queryset = Product.objects.filter(
            name__icontains=query,      # Safe — ORM parameterizes
            category=category           # Safe — ORM parameterizes
        )

        # Passes sort to the ordering method — taint travels further
        return ProductSearchService._apply_ordering(queryset, sort)

    @staticmethod
    def _apply_ordering(queryset, sort_field):
        # TAINT ARRIVES HERE
        # sort_field originated from request.GET — user controlled

        # SINK: raw SQL with f-string — SQL injection
        return queryset.extra(
            order_by=[f"products_product.{sort_field}"]  # ← VULNERABLE
        )

# views.py — HTTP entry point
class ProductSearchView(APIView):
    def get(self, request):
        search_term = request.GET.get('q', '')          # TAINT SOURCE
        category = request.GET.get('category', '')      # TAINT SOURCE
        sort_field = request.GET.get('sort_field', 'name')  # TAINT SOURCE

        results = ProductSearchService.search(
            query=search_term,
            category=category,
            sort=sort_field       # ← taint propagates into search()
        )
        return Response(results)

# services/product_search.py — taint travels here
class ProductSearchService:
    @staticmethod
    def search(query, category, sort='name'):
        # Applies basic text filtering (safe — uses ORM)
        queryset = Product.objects.filter(
            name__icontains=query,      # Safe — ORM parameterizes
            category=category           # Safe — ORM parameterizes
        )

        # Passes sort to the ordering method — taint travels further
        return ProductSearchService._apply_ordering(queryset, sort)

    @staticmethod
    def _apply_ordering(queryset, sort_field):
        # TAINT ARRIVES HERE
        # sort_field originated from request.GET — user controlled

        # SINK: raw SQL with f-string — SQL injection
        return queryset.extra(
            order_by=[f"products_product.{sort_field}"]  # ← VULNERABLE
        )

The taint trace:

TAINT PATH:
  Source:  request.GET.get('sort_field')          [views.py:7]
           Label: user_input

  Step 1:  sort_field passed to ProductSearchService.search()
           [views.py:11, services/product_search.py:17]
           Label propagates: sort → user_input

  Step 2:  sort passed to _apply_ordering() as sort_field
           [services/product_search.py:26]
           Label propagates: sort_field → user_input

  Sink:    queryset.extra(order_by=[f"...{sort_field}"])
           [services/product_search.py:33]

TAINT PATH:
  Source:  request.GET.get('sort_field')          [views.py:7]
           Label: user_input

  Step 1:  sort_field passed to ProductSearchService.search()
           [views.py:11, services/product_search.py:17]
           Label propagates: sort → user_input

  Step 2:  sort passed to _apply_ordering() as sort_field
           [services/product_search.py:26]
           Label propagates: sort_field → user_input

  Sink:    queryset.extra(order_by=[f"...{sort_field}"])
           [services/product_search.py:33]

TAINT PATH:
  Source:  request.GET.get('sort_field')          [views.py:7]
           Label: user_input

  Step 1:  sort_field passed to ProductSearchService.search()
           [views.py:11, services/product_search.py:17]
           Label propagates: sort → user_input

  Step 2:  sort passed to _apply_ordering() as sort_field
           [services/product_search.py:26]
           Label propagates: sort_field → user_input

  Sink:    queryset.extra(order_by=[f"...{sort_field}"])
           [services/product_search.py:33]

Step 4: Authentication Configuration Analysis: Reading the Security Chain

This is the analysis category most unique to genuine AI code reasoning, and the one that finds the vulnerabilities most invisible to external testing.

Authentication frameworks like Spring Security, Django middleware, and Express.js implement authentication as a chain of filters or middleware that execute before the request reaches the controller. The security guarantee is only as strong as the completeness of this chain.

Reading Spring Security Filter Chains

// The AI reads every SecurityConfig class in the repository

@Configuration
@EnableWebSecurity
public class ApplicationSecurityConfig {

    // Configuration 1: The main filter chain
    @Bean
    @Order(1)
    public SecurityFilterChain apiFilterChain(HttpSecurity http) throws Exception {
        http
            .securityMatcher("/api/**")
            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/api/public/**").permitAll()
                .requestMatchers("/api/v1/admin/**").hasRole("ADMIN")
                .requestMatchers("/api/v1/users/**").hasRole("USER")
                .anyRequest().authenticated()
            )
            .oauth2ResourceServer(OAuth2ResourceServerConfigurer::jwt);
        return http.build();
    }

    // Configuration 2: The web security customizer
    // THIS IS THE VULNERABILITY — excludes entire namespace from ALL security
    @Bean
    public WebSecurityCustomizer webSecurityCustomizer() {
        return (web) -> web.ignoring()
            .requestMatchers(
                new AntPathRequestMatcher("/api/v2/**"),      // ← BYPASS
                new AntPathRequestMatcher("/actuator/health"),
                new AntPathRequestMatcher("/swagger-ui/**")
            );
    }

    // Configuration 3: Method security
    @Bean
    public MethodSecurityExpressionHandler methodSecurityExpressionHandler() {
        DefaultMethodSecurityExpressionHandler handler =
            new DefaultMethodSecurityExpressionHandler();
        return handler;
    }
}

// The AI reads every SecurityConfig class in the repository

@Configuration
@EnableWebSecurity
public class ApplicationSecurityConfig {

    // Configuration 1: The main filter chain
    @Bean
    @Order(1)
    public SecurityFilterChain apiFilterChain(HttpSecurity http) throws Exception {
        http
            .securityMatcher("/api/**")
            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/api/public/**").permitAll()
                .requestMatchers("/api/v1/admin/**").hasRole("ADMIN")
                .requestMatchers("/api/v1/users/**").hasRole("USER")
                .anyRequest().authenticated()
            )
            .oauth2ResourceServer(OAuth2ResourceServerConfigurer::jwt);
        return http.build();
    }

    // Configuration 2: The web security customizer
    // THIS IS THE VULNERABILITY — excludes entire namespace from ALL security
    @Bean
    public WebSecurityCustomizer webSecurityCustomizer() {
        return (web) -> web.ignoring()
            .requestMatchers(
                new AntPathRequestMatcher("/api/v2/**"),      // ← BYPASS
                new AntPathRequestMatcher("/actuator/health"),
                new AntPathRequestMatcher("/swagger-ui/**")
            );
    }

    // Configuration 3: Method security
    @Bean
    public MethodSecurityExpressionHandler methodSecurityExpressionHandler() {
        DefaultMethodSecurityExpressionHandler handler =
            new DefaultMethodSecurityExpressionHandler();
        return handler;
    }
}

// The AI reads every SecurityConfig class in the repository

@Configuration
@EnableWebSecurity
public class ApplicationSecurityConfig {

    // Configuration 1: The main filter chain
    @Bean
    @Order(1)
    public SecurityFilterChain apiFilterChain(HttpSecurity http) throws Exception {
        http
            .securityMatcher("/api/**")
            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/api/public/**").permitAll()
                .requestMatchers("/api/v1/admin/**").hasRole("ADMIN")
                .requestMatchers("/api/v1/users/**").hasRole("USER")
                .anyRequest().authenticated()
            )
            .oauth2ResourceServer(OAuth2ResourceServerConfigurer::jwt);
        return http.build();
    }

    // Configuration 2: The web security customizer
    // THIS IS THE VULNERABILITY — excludes entire namespace from ALL security
    @Bean
    public WebSecurityCustomizer webSecurityCustomizer() {
        return (web) -> web.ignoring()
            .requestMatchers(
                new AntPathRequestMatcher("/api/v2/**"),      // ← BYPASS
                new AntPathRequestMatcher("/actuator/health"),
                new AntPathRequestMatcher("/swagger-ui/**")
            );
    }

    // Configuration 3: Method security
    @Bean
    public MethodSecurityExpressionHandler methodSecurityExpressionHandler() {
        DefaultMethodSecurityExpressionHandler handler =
            new DefaultMethodSecurityExpressionHandler();
        return handler;
    }
}

What the AI extracts from this configuration:

# Auth config analysis results:

auth_config = {
    'protected_paths': [
        {
            'pattern': '/api/public/**',
            'requirement': 'NONE (permitAll)',
            'is_public': True
        },
        {
            'pattern': '/api/v1/admin/**',
            'requirement': 'ROLE_ADMIN',
            'is_public': False
        },
        {
            'pattern': '/api/v1/users/**',
            'requirement': 'ROLE_USER',
            'is_public': False
        },
        {
            'pattern': '/api/**',  # anyRequest
            'requirement': 'authenticated',
            'is_public': False
        }
    ],

    'security_exclusions': [
        {
            'pattern': '/api/v2/**',
            'excluded_from': 'ALL security filters',
            'impact': 'CRITICAL — completely bypasses authentication',
            'finding': 'Any endpoint under /api/v2/ is accessible without credentials'
        },
        {
            'pattern': '/actuator/health',
            'excluded_from': 'ALL security filters',
            'impact': 'LOW — health endpoint, typically low sensitivity'
        }
    ]
}

# Cross-reference with discovered routes:
# From controller scan:
#   /api/v2/admin/users      → GET, POST — returns all users, creates users
#   /api/v2/data/export      → GET — exports complete dataset
#   /api/v2/internal/config  → GET — returns application configuration

# Result: Security exclusion of /api/v2/** makes ALL of these public
# Finding: Authentication bypass via webSecurityCustomizer exclusion
# CVSS: 9.8 (Critical) — unauthenticated access to admin, export, config endpoints

# Auth config analysis results:

auth_config = {
    'protected_paths': [
        {
            'pattern': '/api/public/**',
            'requirement': 'NONE (permitAll)',
            'is_public': True
        },
        {
            'pattern': '/api/v1/admin/**',
            'requirement': 'ROLE_ADMIN',
            'is_public': False
        },
        {
            'pattern': '/api/v1/users/**',
            'requirement': 'ROLE_USER',
            'is_public': False
        },
        {
            'pattern': '/api/**',  # anyRequest
            'requirement': 'authenticated',
            'is_public': False
        }
    ],

    'security_exclusions': [
        {
            'pattern': '/api/v2/**',
            'excluded_from': 'ALL security filters',
            'impact': 'CRITICAL — completely bypasses authentication',
            'finding': 'Any endpoint under /api/v2/ is accessible without credentials'
        },
        {
            'pattern': '/actuator/health',
            'excluded_from': 'ALL security filters',
            'impact': 'LOW — health endpoint, typically low sensitivity'
        }
    ]
}

# Cross-reference with discovered routes:
# From controller scan:
#   /api/v2/admin/users      → GET, POST — returns all users, creates users
#   /api/v2/data/export      → GET — exports complete dataset
#   /api/v2/internal/config  → GET — returns application configuration

# Result: Security exclusion of /api/v2/** makes ALL of these public
# Finding: Authentication bypass via webSecurityCustomizer exclusion
# CVSS: 9.8 (Critical) — unauthenticated access to admin, export, config endpoints

# Auth config analysis results:

auth_config = {
    'protected_paths': [
        {
            'pattern': '/api/public/**',
            'requirement': 'NONE (permitAll)',
            'is_public': True
        },
        {
            'pattern': '/api/v1/admin/**',
            'requirement': 'ROLE_ADMIN',
            'is_public': False
        },
        {
            'pattern': '/api/v1/users/**',
            'requirement': 'ROLE_USER',
            'is_public': False
        },
        {
            'pattern': '/api/**',  # anyRequest
            'requirement': 'authenticated',
            'is_public': False
        }
    ],

    'security_exclusions': [
        {
            'pattern': '/api/v2/**',
            'excluded_from': 'ALL security filters',
            'impact': 'CRITICAL — completely bypasses authentication',
            'finding': 'Any endpoint under /api/v2/ is accessible without credentials'
        },
        {
            'pattern': '/actuator/health',
            'excluded_from': 'ALL security filters',
            'impact': 'LOW — health endpoint, typically low sensitivity'
        }
    ]
}

# Cross-reference with discovered routes:
# From controller scan:
#   /api/v2/admin/users      → GET, POST — returns all users, creates users
#   /api/v2/data/export      → GET — exports complete dataset
#   /api/v2/internal/config  → GET — returns application configuration

# Result: Security exclusion of /api/v2/** makes ALL of these public
# Finding: Authentication bypass via webSecurityCustomizer exclusion
# CVSS: 9.8 (Critical) — unauthenticated access to admin, export, config endpoints

Reading Django Middleware Chains

# Django settings.py — middleware configuration

MIDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    # ← AuthenticationMiddleware is here
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
    # Missing: Permission enforcement middleware
    # Authentication only → no authorization
]

# Django settings.py — middleware configuration

MIDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    # ← AuthenticationMiddleware is here
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
    # Missing: Permission enforcement middleware
    # Authentication only → no authorization
]

# Django settings.py — middleware configuration

MIDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    # ← AuthenticationMiddleware is here
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
    # Missing: Permission enforcement middleware
    # Authentication only → no authorization
]

# The AI reads every view in the application and checks for missing decorators

# views/admin.py — what the analysis finds:

class AdminUserExportView(View):
    # MISSING: @method_decorator(login_required, name='dispatch')
    # MISSING: @method_decorator(permission_required('admin.view_user'), name='dispatch')

    def get(self, request):
        # This view is completely public despite handling admin data
        users = User.objects.all().values(
            'id', 'email', 'first_name', 'last_name',
            'phone', 'address', 'date_joined'
        )
        return JsonResponse({'users': list(users)})

class UserProfileView(View):
    @method_decorator(login_required)  # ✓ Protected
    def dispatch(self, *args, **kwargs):
        return super().dispatch(*args, **kwargs)

    def get(self, request, user_id):
        # Protected by login_required but missing ownership check
        user = User.objects.get(id=user_id)  # ← IDOR
        return JsonResponse(model_to_dict(user))

# The AI reads every view in the application and checks for missing decorators

# views/admin.py — what the analysis finds:

class AdminUserExportView(View):
    # MISSING: @method_decorator(login_required, name='dispatch')
    # MISSING: @method_decorator(permission_required('admin.view_user'), name='dispatch')

    def get(self, request):
        # This view is completely public despite handling admin data
        users = User.objects.all().values(
            'id', 'email', 'first_name', 'last_name',
            'phone', 'address', 'date_joined'
        )
        return JsonResponse({'users': list(users)})

class UserProfileView(View):
    @method_decorator(login_required)  # ✓ Protected
    def dispatch(self, *args, **kwargs):
        return super().dispatch(*args, **kwargs)

    def get(self, request, user_id):
        # Protected by login_required but missing ownership check
        user = User.objects.get(id=user_id)  # ← IDOR
        return JsonResponse(model_to_dict(user))

# The AI reads every view in the application and checks for missing decorators

# views/admin.py — what the analysis finds:

class AdminUserExportView(View):
    # MISSING: @method_decorator(login_required, name='dispatch')
    # MISSING: @method_decorator(permission_required('admin.view_user'), name='dispatch')

    def get(self, request):
        # This view is completely public despite handling admin data
        users = User.objects.all().values(
            'id', 'email', 'first_name', 'last_name',
            'phone', 'address', 'date_joined'
        )
        return JsonResponse({'users': list(users)})

class UserProfileView(View):
    @method_decorator(login_required)  # ✓ Protected
    def dispatch(self, *args, **kwargs):
        return super().dispatch(*args, **kwargs)

    def get(self, request, user_id):
        # Protected by login_required but missing ownership check
        user = User.objects.get(id=user_id)  # ← IDOR
        return JsonResponse(model_to_dict(user))

The analysis produces:

Authentication Analysis Results:

File: views/admin.py

Finding 1: Missing authentication on AdminUserExportView
  Class: AdminUserExportView
  Decorator: NONE (no @login_required or @permission_required)
  URL pattern: /api/admin/users/export/ (from urls.py analysis)
  Accessible: Unauthenticated
  Data exposed: All user PII (email, name, phone, address)
  CVSS: 9.1 (Critical)

Finding 2: Missing authorization on UserProfileView
  Class: UserProfileView
  Decorator: @login_required (authentication enforced)
  Authorization: MISSING (no owner check)
  URL pattern: /api/users/<id>

Authentication Analysis Results:

File: views/admin.py

Finding 1: Missing authentication on AdminUserExportView
  Class: AdminUserExportView
  Decorator: NONE (no @login_required or @permission_required)
  URL pattern: /api/admin/users/export/ (from urls.py analysis)
  Accessible: Unauthenticated
  Data exposed: All user PII (email, name, phone, address)
  CVSS: 9.1 (Critical)

Finding 2: Missing authorization on UserProfileView
  Class: UserProfileView
  Decorator: @login_required (authentication enforced)
  Authorization: MISSING (no owner check)
  URL pattern: /api/users/<id>

Authentication Analysis Results:

File: views/admin.py

Finding 1: Missing authentication on AdminUserExportView
  Class: AdminUserExportView
  Decorator: NONE (no @login_required or @permission_required)
  URL pattern: /api/admin/users/export/ (from urls.py analysis)
  Accessible: Unauthenticated
  Data exposed: All user PII (email, name, phone, address)
  CVSS: 9.1 (Critical)

Finding 2: Missing authorization on UserProfileView
  Class: UserProfileView
  Decorator: @login_required (authentication enforced)
  Authorization: MISSING (no owner check)
  URL pattern: /api/users/<id>

Reading Express.js Middleware Order

// app.js — Express application setup

const express = require('express');
const app = express();

// Route 1: Admin routes — registered BEFORE auth middleware
// This is the vulnerability — middleware applies to routes registered after it
app.get('/api/admin/users', (req, res) => {
    // This handler executes WITHOUT any authentication check
    db.query('SELECT * FROM users').then(users => {
        res.json(users);
    });
});

app.get('/api/admin/export', (req, res) => {
    // Also unprotected — same reason
    res.download('exports/users.csv');
});

// Auth middleware registered here — too late for the admin routes above
const authenticate = require('./middleware/auth');
app.use(authenticate);  // ← Applies only to routes registered AFTER this line

// Route 2: Protected routes — registered AFTER auth middleware
app.get('/api/users/profile', (req, res) => {
    // This IS protected — registered after authenticate middleware
    res.json(req.user);
});

// app.js — Express application setup

const express = require('express');
const app = express();

// Route 1: Admin routes — registered BEFORE auth middleware
// This is the vulnerability — middleware applies to routes registered after it
app.get('/api/admin/users', (req, res) => {
    // This handler executes WITHOUT any authentication check
    db.query('SELECT * FROM users').then(users => {
        res.json(users);
    });
});

app.get('/api/admin/export', (req, res) => {
    // Also unprotected — same reason
    res.download('exports/users.csv');
});

// Auth middleware registered here — too late for the admin routes above
const authenticate = require('./middleware/auth');
app.use(authenticate);  // ← Applies only to routes registered AFTER this line

// Route 2: Protected routes — registered AFTER auth middleware
app.get('/api/users/profile', (req, res) => {
    // This IS protected — registered after authenticate middleware
    res.json(req.user);
});

// app.js — Express application setup

const express = require('express');
const app = express();

// Route 1: Admin routes — registered BEFORE auth middleware
// This is the vulnerability — middleware applies to routes registered after it
app.get('/api/admin/users', (req, res) => {
    // This handler executes WITHOUT any authentication check
    db.query('SELECT * FROM users').then(users => {
        res.json(users);
    });
});

app.get('/api/admin/export', (req, res) => {
    // Also unprotected — same reason
    res.download('exports/users.csv');
});

// Auth middleware registered here — too late for the admin routes above
const authenticate = require('./middleware/auth');
app.use(authenticate);  // ← Applies only to routes registered AFTER this line

// Route 2: Protected routes — registered AFTER auth middleware
app.get('/api/users/profile', (req, res) => {
    // This IS protected — registered after authenticate middleware
    res.json(req.user);
});

The AI analysis reconstructs the middleware execution order:

# Middleware ordering analysis for Express.js apps:

def analyze_express_middleware_order(ast_tree):
    """
    Reconstruct the order in which middleware and routes are registered.
    Routes registered before a middleware don't go through it.
    """
    registration_order = []

    # Walk the AST looking for app.use() and app.get/post/etc() calls
    for node in ast.walk(ast_tree):
        if isinstance(node, ast.Call):
            if is_express_route_registration(node) or is_middleware_registration(node):
                registration_order.append({
                    'type': 'middleware' if is_middleware_registration(node) else 'route',
                    'path': extract_path(node),
                    'handler': extract_handler_name(node),
                    'line': node.lineno
                })

    # Find routes registered before authentication middleware
    auth_middleware_line = None
    for item in registration_order:
        if item['type'] == 'middleware' and is_auth_middleware(item['handler']):
            auth_middleware_line = item['line']
            break

    unprotected_routes = []
    if auth_middleware_line:
        for item in registration_order:
            if (item['type'] == 'route' and
                item['line'] < auth_middleware_line):
                unprotected_routes.append({
                    'path': item['path'],
                    'line': item['line'],
                    'finding': 'Route registered before authentication middleware'
                })

    return unprotected_routes

# Result:
# Unprotected route: GET /api/admin/users (line 8, before auth at line 21)
# Unprotected route: GET /api/admin/export (line 13, before auth at line 21)
# Finding: Authentication bypass via middleware ordering
# Impact: Two admin endpoints fully accessible without credentials

# Middleware ordering analysis for Express.js apps:

def analyze_express_middleware_order(ast_tree):
    """
    Reconstruct the order in which middleware and routes are registered.
    Routes registered before a middleware don't go through it.
    """
    registration_order = []

    # Walk the AST looking for app.use() and app.get/post/etc() calls
    for node in ast.walk(ast_tree):
        if isinstance(node, ast.Call):
            if is_express_route_registration(node) or is_middleware_registration(node):
                registration_order.append({
                    'type': 'middleware' if is_middleware_registration(node) else 'route',
                    'path': extract_path(node),
                    'handler': extract_handler_name(node),
                    'line': node.lineno
                })

    # Find routes registered before authentication middleware
    auth_middleware_line = None
    for item in registration_order:
        if item['type'] == 'middleware' and is_auth_middleware(item['handler']):
            auth_middleware_line = item['line']
            break

    unprotected_routes = []
    if auth_middleware_line:
        for item in registration_order:
            if (item['type'] == 'route' and
                item['line'] < auth_middleware_line):
                unprotected_routes.append({
                    'path': item['path'],
                    'line': item['line'],
                    'finding': 'Route registered before authentication middleware'
                })

    return unprotected_routes

# Result:
# Unprotected route: GET /api/admin/users (line 8, before auth at line 21)
# Unprotected route: GET /api/admin/export (line 13, before auth at line 21)
# Finding: Authentication bypass via middleware ordering
# Impact: Two admin endpoints fully accessible without credentials

# Middleware ordering analysis for Express.js apps:

def analyze_express_middleware_order(ast_tree):
    """
    Reconstruct the order in which middleware and routes are registered.
    Routes registered before a middleware don't go through it.
    """
    registration_order = []

    # Walk the AST looking for app.use() and app.get/post/etc() calls
    for node in ast.walk(ast_tree):
        if isinstance(node, ast.Call):
            if is_express_route_registration(node) or is_middleware_registration(node):
                registration_order.append({
                    'type': 'middleware' if is_middleware_registration(node) else 'route',
                    'path': extract_path(node),
                    'handler': extract_handler_name(node),
                    'line': node.lineno
                })

    # Find routes registered before authentication middleware
    auth_middleware_line = None
    for item in registration_order:
        if item['type'] == 'middleware' and is_auth_middleware(item['handler']):
            auth_middleware_line = item['line']
            break

    unprotected_routes = []
    if auth_middleware_line:
        for item in registration_order:
            if (item['type'] == 'route' and
                item['line'] < auth_middleware_line):
                unprotected_routes.append({
                    'path': item['path'],
                    'line': item['line'],
                    'finding': 'Route registered before authentication middleware'
                })

    return unprotected_routes

# Result:
# Unprotected route: GET /api/admin/users (line 8, before auth at line 21)
# Unprotected route: GET /api/admin/export (line 13, before auth at line 21)
# Finding: Authentication bypass via middleware ordering
# Impact: Two admin endpoints fully accessible without credentials

Step 5: Control Flow Analysis: Understanding What's Actually Reachable

SAST tools flag every instance of a dangerous pattern regardless of whether it's reachable by user input. Control flow analysis determines which code paths are actually reachable from an HTTP entry point, with what inputs, under what conditions.

class ControlFlowAnalyzer:
    """
    Build a Control Flow Graph (CFG) and determine reachability.
    """

    def __init__(self):
        self.cfg = {}  # block_id → {successors, predecessors, statements}
        self.block_counter = 0

    def build_cfg(self, function_node):
        """Build CFG for a function"""
        entry_block = self.new_block()

        for stmt in function_node.body:
            if isinstance(stmt, ast.If):
                # Branch: create two paths, merge at convergence
                then_block = self.new_block()
                else_block = self.new_block()
                merge_block = self.new_block()

                self.add_edge(entry_block, then_block, condition=stmt.test)
                self.add_edge(entry_block, else_block, condition=f"NOT {stmt.test}")
                self.add_edge(then_block, merge_block)
                self.add_edge(else_block, merge_block)

            elif isinstance(stmt, ast.Try):
                # Exception handling: normal path and exception path
                try_block = self.new_block()
                except_block = self.new_block()

                self.add_edge(entry_block, try_block)
                self.add_edge(try_block, except_block, condition='exception')

    def is_reachable_from_http(self, function_name, target_line):
        """
        Determine if target_line is reachable from an HTTP endpoint
        without an intervening authentication check.
        """
        # Find all HTTP endpoints (Django views, Flask routes, Express handlers)
        http_endpoints = self.find_http_endpoints()

        for endpoint in http_endpoints:
            # BFS through call graph from endpoint to target function
            path = self.find_path(
                source=endpoint,
                target=function_name
            )

            if path:
                # Check if any step in the path includes an auth check
                auth_checks = self.find_auth_checks_in_path(path)

                if not auth_checks:
                    return {
                        'reachable': True,
                        'path': path,
                        'auth_protected': False,
                        'finding': f'{function_name} reachable from {endpoint} without auth'
                    }
                else:
                    return {
                        'reachable': True,
                        'path': path,
                        'auth_protected': True,
                        'auth_check_location': auth_checks
                    }

        return {'reachable': False}

class ControlFlowAnalyzer:
    """
    Build a Control Flow Graph (CFG) and determine reachability.
    """

    def __init__(self):
        self.cfg = {}  # block_id → {successors, predecessors, statements}
        self.block_counter = 0

    def build_cfg(self, function_node):
        """Build CFG for a function"""
        entry_block = self.new_block()

        for stmt in function_node.body:
            if isinstance(stmt, ast.If):
                # Branch: create two paths, merge at convergence
                then_block = self.new_block()
                else_block = self.new_block()
                merge_block = self.new_block()

                self.add_edge(entry_block, then_block, condition=stmt.test)
                self.add_edge(entry_block, else_block, condition=f"NOT {stmt.test}")
                self.add_edge(then_block, merge_block)
                self.add_edge(else_block, merge_block)

            elif isinstance(stmt, ast.Try):
                # Exception handling: normal path and exception path
                try_block = self.new_block()
                except_block = self.new_block()

                self.add_edge(entry_block, try_block)
                self.add_edge(try_block, except_block, condition='exception')

    def is_reachable_from_http(self, function_name, target_line):
        """
        Determine if target_line is reachable from an HTTP endpoint
        without an intervening authentication check.
        """
        # Find all HTTP endpoints (Django views, Flask routes, Express handlers)
        http_endpoints = self.find_http_endpoints()

        for endpoint in http_endpoints:
            # BFS through call graph from endpoint to target function
            path = self.find_path(
                source=endpoint,
                target=function_name
            )

            if path:
                # Check if any step in the path includes an auth check
                auth_checks = self.find_auth_checks_in_path(path)

                if not auth_checks:
                    return {
                        'reachable': True,
                        'path': path,
                        'auth_protected': False,
                        'finding': f'{function_name} reachable from {endpoint} without auth'
                    }
                else:
                    return {
                        'reachable': True,
                        'path': path,
                        'auth_protected': True,
                        'auth_check_location': auth_checks
                    }

        return {'reachable': False}

class ControlFlowAnalyzer:
    """
    Build a Control Flow Graph (CFG) and determine reachability.
    """

    def __init__(self):
        self.cfg = {}  # block_id → {successors, predecessors, statements}
        self.block_counter = 0

    def build_cfg(self, function_node):
        """Build CFG for a function"""
        entry_block = self.new_block()

        for stmt in function_node.body:
            if isinstance(stmt, ast.If):
                # Branch: create two paths, merge at convergence
                then_block = self.new_block()
                else_block = self.new_block()
                merge_block = self.new_block()

                self.add_edge(entry_block, then_block, condition=stmt.test)
                self.add_edge(entry_block, else_block, condition=f"NOT {stmt.test}")
                self.add_edge(then_block, merge_block)
                self.add_edge(else_block, merge_block)

            elif isinstance(stmt, ast.Try):
                # Exception handling: normal path and exception path
                try_block = self.new_block()
                except_block = self.new_block()

                self.add_edge(entry_block, try_block)
                self.add_edge(try_block, except_block, condition='exception')

    def is_reachable_from_http(self, function_name, target_line):
        """
        Determine if target_line is reachable from an HTTP endpoint
        without an intervening authentication check.
        """
        # Find all HTTP endpoints (Django views, Flask routes, Express handlers)
        http_endpoints = self.find_http_endpoints()

        for endpoint in http_endpoints:
            # BFS through call graph from endpoint to target function
            path = self.find_path(
                source=endpoint,
                target=function_name
            )

            if path:
                # Check if any step in the path includes an auth check
                auth_checks = self.find_auth_checks_in_path(path)

                if not auth_checks:
                    return {
                        'reachable': True,
                        'path': path,
                        'auth_protected': False,
                        'finding': f'{function_name} reachable from {endpoint} without auth'
                    }
                else:
                    return {
                        'reachable': True,
                        'path': path,
                        'auth_protected': True,
                        'auth_check_location': auth_checks
                    }

        return {'reachable': False}

Sanitization Detection: Eliminating False Positives

A critical function of control flow analysis is detecting sanitization, cases where user input reaches a sink but passes through adequate sanitization that prevents exploitation:

SANITIZERS = {
    # SQL injection sanitizers
    'parameterize': 'sql_safe',
    'escape': 'sql_safe',
    'quote': 'sql_safe',

    # HTML/XSS sanitizers
    'escape_html': 'xss_safe',
    'bleach.clean': 'xss_safe',
    'html.escape': 'xss_safe',
    'markupsafe.escape': 'xss_safe',

    # Path traversal sanitizers
    'os.path.basename': 'path_safe',
    'werkzeug.utils.secure_filename': 'path_safe',

    # Command injection sanitizers
    'shlex.quote': 'cmd_safe',
    'pipes.quote': 'cmd_safe',
}

def check_sanitization_on_path(taint_path, sanitizers=SANITIZERS):
    """
    Check if user input is sanitized before reaching the sink.
    Returns None if unsafe, or the sanitizer if safe.
    """
    for step in taint_path:
        if isinstance(step, ast.Call):
            func_name = resolve_call_name(step)

            if func_name in sanitizers:
                sanitizer_label = sanitizers[func_name]

                # Check if the sanitizer covers the sink's vulnerability type
                if is_adequate_sanitizer(sanitizer_label, sink_vulnerability_type):
                    return {
                        'sanitized': True,
                        'sanitizer': func_name,
                        'label': sanitizer_label
                    }

    return None  # No adequate sanitization found

# This is why AI taint analysis has near-zero false positives:
# A flagged SQL injection only appears in the report if:
# 1. User input reaches the SQL sink (confirmed by taint propagation)
# 2. No adequate SQL sanitizer appears on the path (confirmed by sanitization check)
# 3. The code path is reachable from an HTTP endpoint (confirmed by CFG)
# All three conditions must be true simultaneously

SANITIZERS = {
    # SQL injection sanitizers
    'parameterize': 'sql_safe',
    'escape': 'sql_safe',
    'quote': 'sql_safe',

    # HTML/XSS sanitizers
    'escape_html': 'xss_safe',
    'bleach.clean': 'xss_safe',
    'html.escape': 'xss_safe',
    'markupsafe.escape': 'xss_safe',

    # Path traversal sanitizers
    'os.path.basename': 'path_safe',
    'werkzeug.utils.secure_filename': 'path_safe',

    # Command injection sanitizers
    'shlex.quote': 'cmd_safe',
    'pipes.quote': 'cmd_safe',
}

def check_sanitization_on_path(taint_path, sanitizers=SANITIZERS):
    """
    Check if user input is sanitized before reaching the sink.
    Returns None if unsafe, or the sanitizer if safe.
    """
    for step in taint_path:
        if isinstance(step, ast.Call):
            func_name = resolve_call_name(step)

            if func_name in sanitizers:
                sanitizer_label = sanitizers[func_name]

                # Check if the sanitizer covers the sink's vulnerability type
                if is_adequate_sanitizer(sanitizer_label, sink_vulnerability_type):
                    return {
                        'sanitized': True,
                        'sanitizer': func_name,
                        'label': sanitizer_label
                    }

    return None  # No adequate sanitization found

# This is why AI taint analysis has near-zero false positives:
# A flagged SQL injection only appears in the report if:
# 1. User input reaches the SQL sink (confirmed by taint propagation)
# 2. No adequate SQL sanitizer appears on the path (confirmed by sanitization check)
# 3. The code path is reachable from an HTTP endpoint (confirmed by CFG)
# All three conditions must be true simultaneously

SANITIZERS = {
    # SQL injection sanitizers
    'parameterize': 'sql_safe',
    'escape': 'sql_safe',
    'quote': 'sql_safe',

    # HTML/XSS sanitizers
    'escape_html': 'xss_safe',
    'bleach.clean': 'xss_safe',
    'html.escape': 'xss_safe',
    'markupsafe.escape': 'xss_safe',

    # Path traversal sanitizers
    'os.path.basename': 'path_safe',
    'werkzeug.utils.secure_filename': 'path_safe',

    # Command injection sanitizers
    'shlex.quote': 'cmd_safe',
    'pipes.quote': 'cmd_safe',
}

def check_sanitization_on_path(taint_path, sanitizers=SANITIZERS):
    """
    Check if user input is sanitized before reaching the sink.
    Returns None if unsafe, or the sanitizer if safe.
    """
    for step in taint_path:
        if isinstance(step, ast.Call):
            func_name = resolve_call_name(step)

            if func_name in sanitizers:
                sanitizer_label = sanitizers[func_name]

                # Check if the sanitizer covers the sink's vulnerability type
                if is_adequate_sanitizer(sanitizer_label, sink_vulnerability_type):
                    return {
                        'sanitized': True,
                        'sanitizer': func_name,
                        'label': sanitizer_label
                    }

    return None  # No adequate sanitization found

# This is why AI taint analysis has near-zero false positives:
# A flagged SQL injection only appears in the report if:
# 1. User input reaches the SQL sink (confirmed by taint propagation)
# 2. No adequate SQL sanitizer appears on the path (confirmed by sanitization check)
# 3. The code path is reachable from an HTTP endpoint (confirmed by CFG)
# All three conditions must be true simultaneously

Step 6: Dependency Reachability: Not All CVEs Are Equal

When a dependency has a known CVE, the standard scanner response is to flag it. The AI code analysis response is to ask: is the vulnerable function in this dependency actually called by this application?

class DependencyReachabilityAnalyzer:
    """
    Determine which vulnerable dependency functions are actually reachable
    in the specific application context.
    """

    def analyze_dependency_reachability(self, cve_findings, call_graph):
        """
        For each CVE finding, determine if the vulnerable code path
        is actually reachable in this application.
        """
        reachability_results = []

        for cve in cve_findings:
            package = cve['package']
            vulnerable_function = cve['vulnerable_function']  # e.g., "imagemagick.convert"

            # Find all application code that calls this vulnerable function
            direct_callers = call_graph.find_callers(vulnerable_function)

            if not direct_callers:
                # Application doesn't call this function at all
                reachability_results.append({
                    'cve': cve['id'],
                    'package': package,
                    'reachable': False,
                    'reason': 'Vulnerable function not called by application code',
                    'risk_adjustment': 'Downgrade to INFORMATIONAL',
                    'original_cvss': cve['cvss'],
                    'adjusted_cvss': 0.0
                })
                continue

            # Function is called — check if it's called with user-controlled input
            for caller in direct_callers:
                caller_path = call_graph.find_path_from_http(caller)

                if not caller_path:
                    reachability_results.append({
                        'cve': cve['id'],
                        'reachable': True,
                        'called_from': caller,
                        'http_reachable': False,
                        'reason': 'Called by application but not from HTTP endpoint',
                        'risk_adjustment': 'Downgrade to LOW',
                    })
                else:
                    # Check if user input reaches the vulnerable parameter
                    user_input_reaches_sink = check_taint_to_vulnerable_param(
                        path=caller_path,
                        sink=vulnerable_function,
                        vulnerable_param=cve['vulnerable_parameter']
                    )

                    reachability_results.append({
                        'cve': cve['id'],
                        'reachable': True,
                        'http_reachable': True,
                        'user_input_reaches_vuln': user_input_reaches_sink,
                        'call_path': caller_path,
                        'risk_adjustment': 'MAINTAIN original CVSS' if user_input_reaches_sink else 'Downgrade',
                        'original_cvss': cve['cvss'],
                        'adjusted_cvss': cve['cvss'] if user_input_reaches_sink else cve['cvss'] * 0.3
                    })

        return reachability_results

# Real-world impact of reachability analysis:
# A typical SCA scan of a Node.js application flags 47 CVEs
# Reachability analysis:
#   - 19 CVEs: vulnerable function not imported by application
#   - 11 CVEs: imported but never called
#   - 8 CVEs: called but not from HTTP-reachable code paths
#   - 6 CVEs: called from HTTP but user input doesn't reach vulnerable parameter
#   - 3 CVEs: genuinely exploitable in this application context

# Result: 47 scanner findings → 3 actual security findings
# The 3 real findings get proper attention and remediation
# The 44 false positives don't consume engineering time

class DependencyReachabilityAnalyzer:
    """
    Determine which vulnerable dependency functions are actually reachable
    in the specific application context.
    """

    def analyze_dependency_reachability(self, cve_findings, call_graph):
        """
        For each CVE finding, determine if the vulnerable code path
        is actually reachable in this application.
        """
        reachability_results = []

        for cve in cve_findings:
            package = cve['package']
            vulnerable_function = cve['vulnerable_function']  # e.g., "imagemagick.convert"

            # Find all application code that calls this vulnerable function
            direct_callers = call_graph.find_callers(vulnerable_function)

            if not direct_callers:
                # Application doesn't call this function at all
                reachability_results.append({
                    'cve': cve['id'],
                    'package': package,
                    'reachable': False,
                    'reason': 'Vulnerable function not called by application code',
                    'risk_adjustment': 'Downgrade to INFORMATIONAL',
                    'original_cvss': cve['cvss'],
                    'adjusted_cvss': 0.0
                })
                continue

            # Function is called — check if it's called with user-controlled input
            for caller in direct_callers:
                caller_path = call_graph.find_path_from_http(caller)

                if not caller_path:
                    reachability_results.append({
                        'cve': cve['id'],
                        'reachable': True,
                        'called_from': caller,
                        'http_reachable': False,
                        'reason': 'Called by application but not from HTTP endpoint',
                        'risk_adjustment': 'Downgrade to LOW',
                    })
                else:
                    # Check if user input reaches the vulnerable parameter
                    user_input_reaches_sink = check_taint_to_vulnerable_param(
                        path=caller_path,
                        sink=vulnerable_function,
                        vulnerable_param=cve['vulnerable_parameter']
                    )

                    reachability_results.append({
                        'cve': cve['id'],
                        'reachable': True,
                        'http_reachable': True,
                        'user_input_reaches_vuln': user_input_reaches_sink,
                        'call_path': caller_path,
                        'risk_adjustment': 'MAINTAIN original CVSS' if user_input_reaches_sink else 'Downgrade',
                        'original_cvss': cve['cvss'],
                        'adjusted_cvss': cve['cvss'] if user_input_reaches_sink else cve['cvss'] * 0.3
                    })

        return reachability_results

# Real-world impact of reachability analysis:
# A typical SCA scan of a Node.js application flags 47 CVEs
# Reachability analysis:
#   - 19 CVEs: vulnerable function not imported by application
#   - 11 CVEs: imported but never called
#   - 8 CVEs: called but not from HTTP-reachable code paths
#   - 6 CVEs: called from HTTP but user input doesn't reach vulnerable parameter
#   - 3 CVEs: genuinely exploitable in this application context

# Result: 47 scanner findings → 3 actual security findings
# The 3 real findings get proper attention and remediation
# The 44 false positives don't consume engineering time

class DependencyReachabilityAnalyzer:
    """
    Determine which vulnerable dependency functions are actually reachable
    in the specific application context.
    """

    def analyze_dependency_reachability(self, cve_findings, call_graph):
        """
        For each CVE finding, determine if the vulnerable code path
        is actually reachable in this application.
        """
        reachability_results = []

        for cve in cve_findings:
            package = cve['package']
            vulnerable_function = cve['vulnerable_function']  # e.g., "imagemagick.convert"

            # Find all application code that calls this vulnerable function
            direct_callers = call_graph.find_callers(vulnerable_function)

            if not direct_callers:
                # Application doesn't call this function at all
                reachability_results.append({
                    'cve': cve['id'],
                    'package': package,
                    'reachable': False,
                    'reason': 'Vulnerable function not called by application code',
                    'risk_adjustment': 'Downgrade to INFORMATIONAL',
                    'original_cvss': cve['cvss'],
                    'adjusted_cvss': 0.0
                })
                continue

            # Function is called — check if it's called with user-controlled input
            for caller in direct_callers:
                caller_path = call_graph.find_path_from_http(caller)

                if not caller_path:
                    reachability_results.append({
                        'cve': cve['id'],
                        'reachable': True,
                        'called_from': caller,
                        'http_reachable': False,
                        'reason': 'Called by application but not from HTTP endpoint',
                        'risk_adjustment': 'Downgrade to LOW',
                    })
                else:
                    # Check if user input reaches the vulnerable parameter
                    user_input_reaches_sink = check_taint_to_vulnerable_param(
                        path=caller_path,
                        sink=vulnerable_function,
                        vulnerable_param=cve['vulnerable_parameter']
                    )

                    reachability_results.append({
                        'cve': cve['id'],
                        'reachable': True,
                        'http_reachable': True,
                        'user_input_reaches_vuln': user_input_reaches_sink,
                        'call_path': caller_path,
                        'risk_adjustment': 'MAINTAIN original CVSS' if user_input_reaches_sink else 'Downgrade',
                        'original_cvss': cve['cvss'],
                        'adjusted_cvss': cve['cvss'] if user_input_reaches_sink else cve['cvss'] * 0.3
                    })

        return reachability_results

# Real-world impact of reachability analysis:
# A typical SCA scan of a Node.js application flags 47 CVEs
# Reachability analysis:
#   - 19 CVEs: vulnerable function not imported by application
#   - 11 CVEs: imported but never called
#   - 8 CVEs: called but not from HTTP-reachable code paths
#   - 6 CVEs: called from HTTP but user input doesn't reach vulnerable parameter
#   - 3 CVEs: genuinely exploitable in this application context

# Result: 47 scanner findings → 3 actual security findings
# The 3 real findings get proper attention and remediation
# The 44 false positives don't consume engineering time

Step 7: Putting It All Together: What the Analysis Produces

All six analysis steps produce a unified findings model. Here's what the output looks like for a real finding, the kind that appears in a CodeAnt AI report:

FINDING: SQL Injection via Taint Path Through Service Layer

ID: FIND-2026-047
Severity: HIGH (CVSS 4.0: 8.3)
Type: SQL Injection (CWE-89)
Confidence: CONFIRMED (taint path verified, sink confirmed exploitable)

TAINT PATH:
  Source:   views/products.py:7
            sort_field = request.GET.get('sort_field', 'name')
            Label: user_input [untrusted]

  Step 1:   views/products.py:14
            ProductSearchService.search(sort=sort_field)
            Label: user_input propagates through parameter 'sort'

  Step 2:   services/product_search.py:26
            ProductSearchService._apply_ordering(queryset, sort)
            Label: user_input propagates through parameter 'sort_field'

  Sink:     services/product_search.py:33
            queryset.extra(order_by=[f"products_product.{sort_field}"])
            Type: ORM extra() with f-string interpolation
            Sanitization on path: NONE

HTTP REACHABILITY:
  Endpoint: GET /api/v1/products/search
  Requires authentication: Yes (LOGIN_REQUIRED confirmed by middleware analysis)
  Parameter: sort_field (query string)

PROOF OF CONCEPT:
  GET /api/v1/products/search?q=laptop&sort_field=name,(SELECT+SLEEP(5))--
  → Time-based blind SQL injection confirmed
  → Full data extraction possible via UNION technique

ROOT CAUSE:
  File:    services/product_search.py
  Class:   ProductSearchService
  Method:  _apply_ordering
  Line:    33
  Code:    queryset.extra(order_by=[f"products_product.{sort_field}"])

REMEDIATION:
  Before:
    queryset.extra(order_by=[f"products_product.{sort_field}"]

FINDING: SQL Injection via Taint Path Through Service Layer

ID: FIND-2026-047
Severity: HIGH (CVSS 4.0: 8.3)
Type: SQL Injection (CWE-89)
Confidence: CONFIRMED (taint path verified, sink confirmed exploitable)

TAINT PATH:
  Source:   views/products.py:7
            sort_field = request.GET.get('sort_field', 'name')
            Label: user_input [untrusted]

  Step 1:   views/products.py:14
            ProductSearchService.search(sort=sort_field)
            Label: user_input propagates through parameter 'sort'

  Step 2:   services/product_search.py:26
            ProductSearchService._apply_ordering(queryset, sort)
            Label: user_input propagates through parameter 'sort_field'

  Sink:     services/product_search.py:33
            queryset.extra(order_by=[f"products_product.{sort_field}"])
            Type: ORM extra() with f-string interpolation
            Sanitization on path: NONE

HTTP REACHABILITY:
  Endpoint: GET /api/v1/products/search
  Requires authentication: Yes (LOGIN_REQUIRED confirmed by middleware analysis)
  Parameter: sort_field (query string)

PROOF OF CONCEPT:
  GET /api/v1/products/search?q=laptop&sort_field=name,(SELECT+SLEEP(5))--
  → Time-based blind SQL injection confirmed
  → Full data extraction possible via UNION technique

ROOT CAUSE:
  File:    services/product_search.py
  Class:   ProductSearchService
  Method:  _apply_ordering
  Line:    33
  Code:    queryset.extra(order_by=[f"products_product.{sort_field}"])

REMEDIATION:
  Before:
    queryset.extra(order_by=[f"products_product.{sort_field}"]

FINDING: SQL Injection via Taint Path Through Service Layer

ID: FIND-2026-047
Severity: HIGH (CVSS 4.0: 8.3)
Type: SQL Injection (CWE-89)
Confidence: CONFIRMED (taint path verified, sink confirmed exploitable)

TAINT PATH:
  Source:   views/products.py:7
            sort_field = request.GET.get('sort_field', 'name')
            Label: user_input [untrusted]

  Step 1:   views/products.py:14
            ProductSearchService.search(sort=sort_field)
            Label: user_input propagates through parameter 'sort'

  Step 2:   services/product_search.py:26
            ProductSearchService._apply_ordering(queryset, sort)
            Label: user_input propagates through parameter 'sort_field'

  Sink:     services/product_search.py:33
            queryset.extra(order_by=[f"products_product.{sort_field}"])
            Type: ORM extra() with f-string interpolation
            Sanitization on path: NONE

HTTP REACHABILITY:
  Endpoint: GET /api/v1/products/search
  Requires authentication: Yes (LOGIN_REQUIRED confirmed by middleware analysis)
  Parameter: sort_field (query string)

PROOF OF CONCEPT:
  GET /api/v1/products/search?q=laptop&sort_field=name,(SELECT+SLEEP(5))--
  → Time-based blind SQL injection confirmed
  → Full data extraction possible via UNION technique

ROOT CAUSE:
  File:    services/product_search.py
  Class:   ProductSearchService
  Method:  _apply_ordering
  Line:    33
  Code:    queryset.extra(order_by=[f"products_product.{sort_field}"])

REMEDIATION:
  Before:
    queryset.extra(order_by=[f"products_product.{sort_field}"]

This is the output that no SAST scanner produces. The taint path is traced across function boundaries. The HTTP reachability is confirmed. The sanitization gap is identified. The root cause is pinpointed to the specific line. The remediation is a specific code change. And the chain analysis identifies the finding's potential in combination with other findings.

Why This Is Different From SAST With Better Patterns

The question always comes up: isn't this just SAST with a more sophisticated rule set?

No. The distinction is fundamental:

SAST operates on individual patterns independently. It checks each pattern against each file or function. It doesn't model relationships between functions, doesn't track data across function boundaries, doesn't understand the application's execution context.
AI code reasoning builds a complete application model. The AST, call graph, CFG, and taint analysis are all representations of the same thing, what the application actually does, from the perspective of an attacker who can control user input. The findings emerge from the model, not from pattern matching.

The practical consequence: SAST has a 40–70% false positive rate because patterns fire regardless of context. AI code reasoning has a near-zero false positive rate because findings only emerge when all conditions are simultaneously true, user input is tainted, the taint reaches a sink, no sanitization appears on the path, the path is HTTP-reachable, and the sink is exploitable in this specific context.

And SAST cannot find the finding classes that require cross-function reasoning:

authentication bypasses via middleware ordering
taint flows through three layers of service abstraction
business logic violations that require understanding what the application is supposed to do

Those findings are structurally invisible to pattern matching, they only exist in the relationships between code elements, not in individual code elements themselves.

The Difference Between Seeing Code and Understanding It

Reading source code for security vulnerabilities is not the same as searching for patterns in source code. A security researcher reads code with a mental model of how an attacker approaches it:

which parameters are attacker-controlled
which execution paths lead to dangerous operations
where the security boundary is supposed to be
whether the code actually enforces that boundary consistently

That mental model, applied systematically, across an entire codebase, with no triage fatigue, no missed edge cases, no "I'll come back to that," is what AI code reasoning provides. The AST gives it the code's structure. The call graph gives it the execution relationships. The taint engine gives it the data flows. The auth chain analysis gives it the security boundaries. And the control flow analysis tells it what's actually reachable.

The output is not a list of patterns that look like vulnerabilities. It's a set of confirmed exploitable paths, each one traceable from a specific HTTP entry point, through specific function calls, to a specific line of code, with the CVSS context, the remediation diff, and the chain analysis that reveals whether this finding contributes to something larger.

That's the difference between a scanner that reads your code and an AI that understands it.

→ Book a 30-minute scoping call. Bring your questions. We'll answer all that you have on the free demo call that you book here.

Continue reading: