Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Creating path queries

You can create path queries to visualize the flow of information through a codebase.

您可以创建路径查询来可视化代码库中的信息流。

Overview

Security researchers are particularly interested in the way that information flows in a program. Many vulnerabilities are caused by seemingly benign data flowing to unexpected locations, and being used in a malicious way. Path queries written with CodeQL are particularly useful for analyzing data flow as they can be used to track the path taken by a variable from its possible starting points (source) to its possible end points (sink). To model paths, your query must provide information about the source and the sink, as well as the data flow steps that link them.

安全研究人员对程序中的信息流动方式特别感兴趣。许多漏洞都是由于看似良性的数据流到了意想不到的位置,并被恶意使用而造成的。用CodeQL编写的路径查询对分析数据流特别有用,因为它们可以用来跟踪一个变量从其可能的起点(源点)到其可能的终点(汇点)的路径。为了对路径进行建模,你的查询必须提供关于源点和汇点的信息,以及连接它们的数据流步骤。

This topic provides information on how to structure a path query file so you can explore the paths associated with the results of data flow analysis.

本主题提供了如何构建路径查询文件的信息,以便您可以探索与数据流分析结果相关的路径。

Note

The alerts generated by path queries are displayed by default in LGTM and included in the results generated using the CodeQL CLI. You can also view the path explanations generated by your path query directly in LGTM or in the CodeQL extension for VS Code.

路径查询产生的警报在LGTM中默认显示,并包含在使用CodeQL CLI生成的结果中。您也可以直接在LGTM或VS Code的CodeQL扩展中查看路径查询生成的路径解释。

To learn more about modeling data flow with CodeQL, see “About data flow analysis.” For more language-specific information on analyzing data flow, see:

要了解更多关于用CodeQL建模数据流的信息,请参见 "关于数据流分析"。有关分析数据流的更多特定语言信息,请参见:

Path query examples

The easiest way to get started writing your own path query is to modify one of the existing queries. For more information, see the CodeQL query help.

开始编写自己的路径查询的最简单方法是修改现有的一个查询。更多信息,请参见CodeQL查询帮助。

The Security Lab researchers have used path queries to find security vulnerabilities in various open source projects. To see articles describing how these queries were written, as well as other posts describing other aspects of security research such as exploiting vulnerabilities, see the GitHub Security Lab website.

安全实验室的研究人员已经使用路径查询来寻找各种开源项目的安全漏洞。要查看描述这些查询如何编写的文章,以及其他描述安全研究其他方面的帖子,如利用漏洞,请参见GitHub安全实验室网站。

Constructing a path query

Path queries require certain metadata, query predicates, and select statement structures. Many of the built-in path queries included in CodeQL follow a simple structure, which depends on how the language you are analyzing is modeled with CodeQL.

路径查询需要一定的元数据、查询谓词和选择语句结构。CodeQL中包含的许多内置路径查询都遵循一个简单的结构,这取决于你分析的语言是如何用CodeQL建模的。

You should use the following template:

你应该使用下面的模板:

/**
 * ...
 * @kind path-problem
 * ...
 */

import <language>
// For some languages (Java/C++/Python) you need to explicitly import the data flow library, such as
// import semmle.code.java.dataflow.DataFlow
import DataFlow::PathGraph
...

from MyConfiguration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "<message>"

Where:

  • DataFlow::Pathgraph is the path graph module you need to import from the standard CodeQL libraries.

    DataFlow::Pathgraph是你需要从标准CodeQL库中导入的路径图模块。

  • source and sink are nodes on the path graph, and DataFlow::PathNode is their type.

    source和sink是路径图上的节点,DataFlow::PathNode是它们的类型。

  • MyConfiguration is a class containing the predicates which define how data may flow between the source and the sink.

    MyConfiguration是一个包含谓词的类,它定义了数据如何在源和汇之间流动。

The following sections describe the main requirements for a valid path query.

下面的章节描述了有效路径查询的主要要求。

Path query metadata

Path query metadata must contain the property @kind path-problem–this ensures that query results are interpreted and displayed correctly. The other metadata requirements depend on how you intend to run the query. For more information, see “Metadata for CodeQL queries.”

路径查询元数据必须包含属性@kind path-problem--这可以确保查询结果被正确解释和显示。其他元数据要求取决于你打算如何运行查询。有关更多信息,请参阅 "CodeQL查询的元数据"。

Generating path explanations

In order to generate path explanations, your query needs to compute a path graph. To do this you need to define a query predicate called edges in your query. This predicate defines the edge relations of the graph you are computing, and it is used to compute the paths related to each result that your query generates. You can import a predefined edges predicate from a path graph module in one of the standard data flow libraries. In addition to the path graph module, the data flow libraries contain the other classes, predicates, and modules that are commonly used in data flow analysis.

为了生成路径解释,你的查询需要计算一个路径图。要做到这一点,你需要在查询中定义一个叫做 edges 的查询谓词。这个谓词定义了你正在计算的图的边缘关系,它被用来计算与你的查询生成的每个结果相关的路径。你可以从标准数据流库中的一个路径图模块中导入一个预定义的边缘谓词。除了路径图模块,数据流库还包含了数据流分析中常用的其他类、谓词和模块。

import DataFlow::PathGraph

This statement imports the PathGraph module from the data flow library (DataFlow.qll), in which edges is defined.

该语句从数据流库(DataFlow.qll)中导入PathGraph模块,其中定义了edge。

You can also import libraries specifically designed to implement data flow analysis in various common frameworks and environments, and many additional libraries are included with CodeQL. To see examples of the different libraries used in data flow analysis, see the links to the built-in queries above or browse the standard libraries.

您也可以导入专门设计的库,以便在各种常见的框架和环境中实现数据流分析,CodeQL中包含了许多附加库。要查看数据流分析中使用的不同库的例子,请看上面内置查询的链接或浏览标准库。

For all languages, you can also optionally define a nodes query predicate, which specifies the nodes of the path graph that you are interested in. If nodes is defined, only edges with endpoints defined by these nodes are selected. If nodes is not defined, you select all possible endpoints of edges.

对于所有的语言,你还可以选择定义一个节点查询谓词,它指定你感兴趣的路径图的节点。如果定义了节点,则只选择由这些节点定义的端点的边。如果没有定义节点,则选择所有可能的边的端点。

Defining your own edges predicate

You can also define your own edges predicate in the body of your query. It should take the following form:

你也可以在查询的正文中定义你自己的edge谓词。它应该采用以下形式:

query predicate edges(PathNode a, PathNode b) {
/** Logical conditions which hold if `(a,b)` is an edge in the data flow graph */
}

For more examples of how to define an edges predicate, visit the standard CodeQL libraries and search for edges.

关于如何定义边缘谓词的更多例子,请访问标准CodeQL库并搜索边缘。

Declaring sources and sinks

You must provide information about the source and sink in your path query. These are objects that correspond to the nodes of the paths that you are exploring. The name and the type of the source and the sink must be declared in the from statement of the query, and the types must be compatible with the nodes of the graph computed by the edges predicate.

你必须在路径查询中提供源和汇的信息。这些对象对应于我们正在探索的路径的节点。源和汇的名称和类型必须在查询的 from 语句中声明,并且类型必须与 edges 谓词计算出的图的节点兼容。

If you are querying C/C++, C#, Java, or JavaScript code (and you have used import DataFlow::PathGraph in your query), the definitions of the source and sink are accessed via the Configuration class in the data flow library. You should declare all three of these objects in the from statement. For example:

如果你正在查询C/C++、C#、Java或JavaScript代码(并且你在查询中使用了import DataFlow::PathGraph),源和汇的定义是通过数据流库中的Configuration类访问的。你应该在from语句中声明这三个对象。例如:

from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink

The configuration class is accessed by importing the data flow library. This class contains the predicates which define how data flow is treated in the query:

通过导入数据流库来访问配置类。这个类包含了定义查询中如何处理数据流的谓词。

  • isSource() defines where data may flow from.

    isSource() 定义了数据可能从哪里流出。

  • isSink() defines where data may flow to.

    isSink()定义了数据可能流向的地方。

For more information on using the configuration class in your analysis see the sections on global data flow in “Analyzing data flow in C/C++,” “Analyzing data flow in C#,” and “Analyzing data flow in Python.”

有关在分析中使用配置类的更多信息,请参见 "在C/C++中分析数据流"、"在C#中分析数据流 "和 "在Python中分析数据流 "中关于全局数据流的章节。

You can also create a configuration for different frameworks and environments by extending the Configuration class. For more information, see “Types” in the QL language reference.

您还可以通过扩展Configuration类为不同的框架和环境创建配置。有关更多信息,请参阅QL语言参考中的 "类型"。

Defining flow conditions

The where clause defines the logical conditions to apply to the variables declared in the from clause to generate your results. This clause can use aggregations, predicates, and logical formulas to limit the variables of interest to a smaller set which meet the defined conditions.

where子句定义了应用于from子句中声明的变量的逻辑条件,以生成结果。这个子句可以使用聚合、谓词和逻辑公式将感兴趣的变量限制在一个符合定义条件的较小集合中。

When writing a path queries, you would typically include a predicate that holds only if data flows from the source to the sink.

当编写路径查询时,你通常会包含一个谓词,只有当数据从源流到汇流时才成立。

You can use the hasFlowPath predicate to specify flow from the source to the sink for a given Configuration:

您可以使用 hasFlowPath 谓词为给定的配置指定从源到汇的流量:

where config.hasFlowPath(source, sink)

Select clause

Select clauses for path queries consist of four ‘columns’, with the following structure:

路径查询的选择子句由四个 "列 "组成,结构如下:

select element, source, sink, string

The element and string columns represent the location of the alert and the alert message respectively, as explained in “About CodeQL queries.” The second and third columns, source and sink, are nodes on the path graph selected by the query. Each result generated by your query is displayed at a single location in the same way as an alert query. Additionally, each result also has an associated path, which can be viewed in LGTM or in the CodeQL extension for VS Code.

元素列和字符串列分别代表警报和警报信息的位置,在 "关于CodeQL查询 "中解释过。第二列和第三列,源和汇,是查询选择的路径图上的节点。你的查询所产生的每个结果都会以与警报查询相同的方式显示在一个位置。此外,每个结果也有一个关联的路径,可以在LGTM或VS Code的CodeQL扩展中查看。

The element that you select in the first column depends on the purpose of the query and the type of issue that it is designed to find. This is particularly important for security issues. For example, if you believe the source value to be globally invalid or malicious it may be best to display the alert at the source. In contrast, you should consider displaying the alert at the sink if you believe it is the element that requires sanitization.

在第一列中选择的元素取决于查询的目的和它所要查找的问题类型。这对于安全问题尤其重要。例如,如果您认为源值是全局无效或恶意的,可能最好在源处显示警报。相反,如果您认为是需要进行消毒的元素,则应考虑在水槽处显示警报。

The alert message defined in the final column in the select statement can be developed to give more detail about the alert or path found by the query using links and placeholders. For more information, see “Defining the results of a query.”

可以使用链接和占位符开发选择语句中最后一列中定义的警报消息,以提供有关警报或查询发现的路径的更多细节。有关更多信息,请参阅 "定义查询的结果"。

Further reading