【译】oracle join 方式(Oracle? Database Performance Tuning Guide) | idouba

【译】oracle join 方式(Oracle? Database Performance Tuning Guide)

英文原文地址:Overview of Joins

@todo 翻译完毕,暂时保留原英文。对照校验完后删除应用,保持完整中文版本。

Joins?are statements that retrieve data from multiple tables. A join is characterized by multiple tables in the?FROM?clause. The existence of a join condition in the?WHERE?clause defines the relationship between the tables. In a join, one row set is called inner, and the other is called outer.?Join 是从多个表中获取数据的语句。join的特点是在From子句中有多个表。where子句中定义了表的关系,有内连接、外连接。 This section discusses: 在本节中将讨论

1 查询优化器如何执行join语句

为了为join语句选择一个查询计划,优化器需要做如下判定:

  • 访问路径

对于简单语句,优化器需要选择一个路径类从各个表中获取数据。

  • 连接方式

为了连接行源(row sources)的每一对列,数据库需要执行一个join操作。包括:嵌套循环、排序合并、笛卡尔积、hash连接等方式。

  • Join Order

执行一个join语句连接了两个以上的表,Oracle数据库先连将其中两个做join操作然后用其结果和另外一个做join操作。直到所有的表都Join完成。

可以参照:“Overview of Optimizer Access Paths”

2 查询优化器如何为join语句选择查询计划

The query optimizer considers the following when choosing an execution plan:选择查询计划时,优化器会考虑如下因素:

  • The optimizer first determines whether joining two or more tables definitely results in a row source containing at most one row. The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY constraints on the tables. If such a situation exists, then the optimizer places these tables first in the join order. The optimizer then optimizes the join of the remaining set of tables.优化器首先判定,优化器通过表上的UNIQUE and PRIMARY KEY 约束类识别是否。如果这种情况存在,则优化器把这些表放在join操作的前面,然后优化剩下的表。
  • 对于涉及outer join条件的,有outer join操作的表应该在其他表的join之后。优化器不会考虑违反这一原则的其他join顺序。

查询优化器查根据可能的join顺序、join方法以及可行的访问路径,生成收一组执行计划。然后苹果每个查询计划的开销cost,然后选择一个cos他最低的。查询优化器通过如下方式来估计cost:

  • 一个嵌套循环操作的开销基于读取outer table每一行和与inner table匹配的行到内存的开销。优化器使用数据字典中的统计数据来做估计。
  • 排序归并的连接的cost主要是把所有的数据集读到内存然后排序的开销。
  • Hash 连接的开销主要是为一个输入创建hash table,并使用另外一表的行去探测的开销。

The optimizer also considers other factors when determining the cost of each operation. For example:在判定消耗的时候优化器同时会考虑其他因素:

  • A smaller sort area size is likely to increase the cost for a sort merge join because sorting takes more CPU time and I/O in a smaller sort area. See“PGA Memory Management” to learn how to size SQL work areas.对于排序归并Join 小的排序区域(sort area)会增大开销,因为排序会划分更多的CPU时间和IO。参照“PGA Memory Management” 了解如何设定SQL工作区域大小。
  • A larger multiblock read count is likely to decrease the cost for a sort merge join in relation to a nested loop join. If the database can read a large number of sequential blocks from disk in a single I/O, then an index on the inner table for the nested loop join is less likely to improve performance over a full table scan. The multiblock read count is specified by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.较之循环嵌套join,对于排序归并join,较大的多块读(multiblock read)数量会减少开销。如果数据库能够在一次IO中从磁盘读取大量顺序的数据块,在循环嵌套join方式下,内部表上的索引并不能比全表扫描(full table scan)更能提升性能。多块读(multiblock read)数量可以他通过DB_FILE_MULTIBLOCK_READ_COUNT参数来设定。

写SQL的时候可以使用ORDERED 提示(ORDERED hint)来覆盖优化器选择的join顺序,如果ORDERED 提示(ORDERED hint)设置了一个违反outer join顺序的join顺序,优化器会忽略提示。同样,也可以通过提示来覆盖优化器选择的join方式。

参照:Chapter 19, “Using Optimizer Hints”来了解优化器的提示。

3 循环嵌套Join(Nested Loop Joins)

Nested loop joins are useful when the following conditions are true:嵌套循环join适用于如下条件:

  • 小数据集的join操作。
  • 通过Join条件可以方便的访问第二张表。

对于嵌套循环join方式有一点很重要,那就是保证内部表示外部表驱动的。如果内部表的访问路径是独立于外部表的,则对于每次的外部循环,同样的内部表的数据都会被获取到,极大的影响性能。在这种情况下,hash join可以更好的连接两个独立的行源(row resource)。

嵌套循环Join涉及如下步骤:

  1. 优化器确定驱动表并把其分配为外部表。
  2. 其他表被分配为内部表。
  3. 对外部表的每一行,访问内部表的所有行。外部循环是对外部表的每一行,内部循环是对内部表的每一行。外部循环在查询计划中出现在内部循环前,如下:

参照:“Cartesian Joins”

3.1 s Nested Loop Joins的新旧实现

Oracle Database 11g引入了嵌套循环Join的一种新的实现,因此包含嵌套循环join的查询计划会和之前有所不同。新旧实现在Oracle Database 11g的执行计划中都可能出现。嵌套循环join的行源数可能会不同。

3.1.1 嵌套循环join的老实现方式

考察如下query:

在11g之前,执行计划如下:

在这个例子中,外部的join包含对hr.departments的一次扫描得到满足 department_name IN(‘Marketing’, ‘Sales’)条件的行。nebulizer循环获取 hr.employees表中那些和departments关联的行。

3.1.2 New Implementation for Nested Loop Joins 嵌套循环join的新实现方式

11g中引入了嵌套循环join的一种新的实现以减少IO延迟。当有一个不在缓存中的索引块或者表块需要join操作,需要一个物理IO。11g可以使用一个IO向量计划批量批量的多IO请求的处理,而不是一次只处理一个。

作为新的嵌套循环连接实现的一部分,两个连接的行源会在执行计划中出现,而在之前的实现中,只有一个行源会出现。Oracle分配一个行源来join内部表上的索引来和外部表的值,分配第二个行源来join第一个连接的结果和内部表,其中第一个join结果中包括索引中存储的rowId。

参考嵌套循环join的老实现方式中的查询,在新的neted loop join的11g中,执行计划会如下。

在这个例子中,表 hr.departments 的行组成了第一个join的外部表,第一个循环的内部表是索引emp_department_ix。第一个join的结果集形成了第二个join的外部表,hr.employees 作为内部表。

在下列情况下不会分配第二个连接的行源,执行计划会和老版本的类似:

  • 所有查询需要的内部表的列都在索引上,就不需要访问表。在这种时候,Oracle就只是分配一个join的行源。
  • The order of the rows returned might be different than it was in previous releases. Hence, when Oracle Database tries to preserve a specific ordering of the rows, for example to eliminate the need for an ORDER BY sort, Oracle Database might use the original implementation for nested loop joins. 返回的行的顺序和之前版本的不同。当oralce为了保持特定的排序顺序,如消除ORdER BY的排序需求,Oracle会使用嵌套循环的老的实现机制。
  • 使用OPTIMIZER_FEATURES_ENABLE初始化参数来要求使用嵌套循环的老的实现机制。

3.2 查询优化器何时使用嵌套循环连接

The optimizer uses nested loop joins when joining small number of rows, with a good driving condition between the two tables. You drive from the outer loop to the inner loop, so the order of tables in the execution plan is important.嵌套循环连接用于连接数量比较少的行,并且两个表之间有好的驱动条件。驱动从外循环到内循环,所以在执行计划中的表顺序是非常重要的。

The outer loop is the driving row source. It produces a set of rows for driving the join condition. The row source can be a table accessed using an index scan or a full table scan. Also, the rows can be produced from any other operation. For example, the output from a nested loop join can serve as a row source for another nested loop join.外部循环式驱动行源。产生一组行数据来驱动连接条件。行源可以是一个全表扫描或者索引扫描的表,也可以是通过其他操作产生的结果,如一个嵌套循环的输出可以作为另外一个嵌套循环的输入行源。

内循环是对外循环每一行的迭代,最理想的是索引扫描。如果内循环的访问路径和外循环式独立的,则进行一个笛卡尔积;对于外循环的每次迭代,内存和产生同样的行集。因此,你应该选择其他连接方式来两个相互独立的行源。

3.3 Nested Loop Join Hints嵌套循环连接提示

如果优化器选择其他连接方式,你可以使用?USE_NL(table1 table2) 提示。其中table1 和 table2是连接的行源的别名。

对于有些SQL例子来说,数据量比较小,优化器更趋向于使用全表扫描并使用Hash连接。这个在Example 11-8, “Hash Joins”可以看到。但是你可以通过?USE_NL指令来要求优化器使用嵌套循环连接。关于嵌套循环连接提示可参照“Hints for Join Operations”.

3.4 Nesting Nested Loops 嵌套的嵌套循环

The outer loop of a nested loop can be a nested loop itself. You can nest two or more outer loops to join as many tables as needed. Each loop is a data access method, as follows:嵌套循环连接的外部循环本身可以是个嵌套循环。你可以根据使用要求嵌套两个甚至更多的循环连接,每次循环式是一个数据访问方法。如下:

4 Hash Joins Hash 连接

.数据库使用Hash连接来连接大的结果集。优化器从两个待连接的结果集中选择一个比较小的来在内存中对于连接的键构建hash表。然后扫描较大的表,探测hash表来查找连接的行。

最好是小表大小适合内存大小。这时候开销就只是对两个表的扫描读取。

4.1 When the Optimizer Uses Hash Joins何时使用hash连接

The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:当使用等值连接并且符合下列条件时,优化器使用hash连接:

  • A large amount of data must be joined.大数据量的连接
  • A large fraction of a small table must be joined.小表的大部分数据连接

In Example 11-8, the database uses the table orders to build the hash table. The database scans the larger order_items later.在?Example 11-8中,数据库使用orders表来构建hash表,扫描大表order_items

Example 11-8 Hash Joins

4.2 Hash连接提示

使用?USE_HASH来提示优化器使用hash连接。参照?“PGA Memory Management”来了解怎样设置SQL工作区的大小。参照?“Hints for Join Operations”了解Hash提示。

5 Sort Merge Joins排序合并连接

排序合并连接可以连接两个独立的行源。Hash连接一般性能比排序连接好,但是在满足如下情况下合并排序性能比Hash连接性能要好。

  • The row sources are sorted already.行源已经有序。
  • A sort operation does not have to be done.不用进行排序操作。

However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.如果排序合并连接涉及选择了一个慢的访问方法(全表扫描之于索引扫描),排序合并的优势就会丧失。

Sort merge joins are useful when the join condition between two tables is an inequality condition such as <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.当两个表之间的连接条件是?<, <=, >, or >=这样的非等值操作排序合并连接会比较有效。对于大的数据集合并连接比合并嵌套性能要好。只有等值的连接操作时才能使用hash连接。

In a merge join, there is no concept of a driving table. The join consists of two steps:在合并连接中,没有去到那个表的概。连接包含两个步骤:

  1. Sort join operation: Both the inputs are sorted on the join key.排序连接操作:两个输入集都在连接键上排序。
  2. Merge join operation: The sorted lists are merged together.合并连接操作:排序后的结果集合并在一起。

If the input is sorted by the join column, then a sort join operation is not performed for that row source. However, a sort merge join always creates a positionable sort buffer for the right side of the join so that it can seek back to the last match in the case where duplicate join key values come out of the left side of the join.如果合并列上已经有序,则不要排序操作。排序合并连接经常会为右侧的数据集创建一个可以定位的排序缓存,这样当连接左侧有重复的连接键值时,可以从缓存中定位到上一个匹配的位置。

5.1 When the Optimizer Uses Sort Merge Joins何时使用排序合并连接

The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:当连接大的结果集的时候如果以下条件满足,优化器会选择排序合并连接而不是hash连接。

  • The join condition between two tables is not an equijoin.连接条件不是等值连接。
  • Because of sorts required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.因为其他操作也需要排序,优化器发现使用排序合并连接比hash连接开销小。

5.2 Sort Merge Join Hints排序归并连接提示

To instruct the optimizer to use a sort merge join, apply the USE_MERGE hint. You might also need to give hints to force an access path.使用USE_MERGE指令来提醒优化器使用排序合并连接。

There are situations where it makes sense to override the optimizer with the USE_MERGE hint. For example, the optimizer can choose a full scan on a table and avoid a sort operation in a query. However, there is an increased cost because a large table is accessed through an index and single block reads, as opposed to faster access through a full table scan.在有些情况下,使用USE_MERGE来覆盖优化器的选择是明智的。如优化器选择一个全表扫来避免查询中的排序操作。但是,较之快速的全表扫描,通过索引和单个数据块读操作对大表的访问会增加开销。

For more information on the USE_MERGE hint, see “Hints for Join Operations”.参照?“Hints for Join Operations”了解更多的USE_MERGE 提示。

6 Cartesian Joins笛卡尔积连接

The database uses a Cartesian join when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.当一个或多个表没有连接条件,优化器连接一个数据源的每一行和另一个数据源的每一行即创建了两个数据源的笛卡尔积。

6.1 When the Optimizer Uses Cartesian Joins何时优化器选择用笛卡尔积

The optimizer uses Cartesian joins when it is asked to join two tables with no join conditions. In some cases, a common filter condition between the two tables could be picked up by the optimizer as a possible join condition. In other cases, the optimizer may decide to generate a Cartesian product of two very small tables that are both joined to the same large table.当两个表连接没有连接条件时,优化器会选择笛卡尔积。在某些时候,两个表间的公共过滤条件不能被优化器选为连接条件。有些时候,当两个小表都和同一个大表连接时,优化器会为这两个小表创建笛卡尔积。

6.2 Cartesian Join Hints笛卡尔积提示

Applying the ORDERED hint, instructs the optimizer to use a Cartesian join. By specifying a table before its join table is specified, the optimizer does a Cartesian join.使用ORDERED 来指示优化器使用笛卡尔积。通过指定一个表在连接中在另一个表的前面,优化器会使用笛卡尔积。

7 Outer Joins 外连接?

An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join condition and also returns some or all of those rows from one table for which no rows from the other satisfy the join condition.一个外连接扩展了一个简单连接的结果。一个外连接返回所有满足连接条件的行,同时返回一个表的部分或者全表数据尽管不满足连接条件。

7.1 Nested Loop Outer Joins 嵌套循环外连接

The database uses this operation to loop through an outer join between two tables. The outer join returns the outer (preserved) table rows, even when no corresponding rows are in the inner (optional) table.数据库嵌套循环外连接两个表时,外连接返回(保留)外部表行,即使内部表没有相应的行。

In a regular outer join, the optimizer chooses the order of tables (driving and driven) based on the cost. However, in a nested loop outer join, the join condition determines the order of tables. The database uses the outer table, with rows that are being preserved, to drive to the inner table.

在标准化的外连接中,优化器基于开销选择表的顺序(驱动和被驱动)。但是在嵌套循环外连接中,连接条件决定了表的顺序。数据库使用要保留行的外部表来驱动内部表。

The optimizer uses nested loop joins to process an outer join in the following circumstances:在下面情况下优化器使用循环嵌套连接来处理外连接:

  • It is possible to drive from the outer table to inner table.可以通过外部表驱动内部表。
  • Data volume is low enough to make the nested loop method efficient.数据量足够低是的循环嵌套有效。

For an example of a nested loop outer join, you can add the USE_NL hint to Example 11-9 to instruct the optimizer to use a nested loop. For example:对于一个嵌套循环外连接的例子,你可以添加一个USE_NL 提示到Example 11-9?来指导优化器使用嵌套循环连接。如:

7.2 Hash连接外连接

The optimizer uses hash joins for processing an outer join in the following cases:下列情况下优化器使用hash连接来处理外连接

  • The data volume is large enough to make the hash join method efficient.数据量足够大hash连接方法有效。
  • It is not possible to drive from the outer table to the inner table.不能够通过外部表驱动内部表

The order of tables is determined by cost. The outer table, including preserved rows, may be used to build the hash table, or it may be used to probe one.表顺序由开销决定,外部表包括保留的行可以用来构造hash表,或者用来做探测。

Example 11-9 shows a typical query that uses a hash join outer join. This example queries all customers with credit limits greater than 1000. An outer join is needed to avoid missing customers who have no orders. ?例子Example 11-9显示了一个典型的使用hash连接外连接的查询。在这个例子中查询所有信用额度大于1000的消费者。使用外连接来避免那些没有交易的客户。

Example 11-9 Hash Join Outer Joins

The query looks for customers which satisfy various conditions. An outer join returns NULL for the inner table columns along with the outer (preserved) table rows when it does not find any corresponding rows in the inner table. This operation finds all the customers rows that do not have any orders rows. ?查找满足各种条件的customers,当在内部表中没有找到关联的行,外连接返回NULL。这个操作找到所有customers中没有任何orders的行。

In this case, the outer join condition is the following: 在这个例子中,外连接的条件如下:

The components of this condition represent the following: 表示:

  • The outer table is customers. 外部表是customers表。
  • The inner table is orders.内部表是orders表。
  • The join preserves the customers rows, including those rows without a corresponding row in orders.连接保留customers表的行,包括和内连接的orders表没有关联的行。

You could use a NOT EXISTS subquery to return the rows. However, because you are querying all the rows in the table, the hash join performs better (unless the NOT EXISTS subquery is not nested). 你可以使用使用NOT EXISTS 子查询来返回行。因为你在查询表的所有行,hash连接性能更好(除非NOT EXISTS 子查询不是嵌套的)。

In Example 11-10, the outer join is to a multitable view. The optimizer cannot drive into the view like in a normal join or push the predicates, so it builds the entire row set of the view.在Example 11-10中,外连接是一个多表视图。优化器不能像普通连接一样驱动视图或者使用谓词,所以构造了视图的整个行集合。

Example 11-10 Outer Join to a Multitable View

The view definition is as follows:视图定义如下:

7.3 Sort Merge Outer Joins 排序归并外连接

When an outer join cannot drive from the outer (preserved) table to the inner (optional) table, it cannot use a hash join or nested loop joins. Then it uses the sort merge outer join for performing the join operation. 当外连接不能从外部表(保留)驱动内部表(可选)就不能使用hash连接或者嵌套循环连接,可以使用排序归并外连接。

The optimizer uses sort merge for an outer join:优化器使用排序循环外连接:

  • If a nested loop join is inefficient. A nested loop join can be inefficient because of data volumes.如果嵌套循环连接是低效的。因为数据量的原因嵌套循环连接可能会效率比较低。
  • The optimizer finds it is cheaper to use a sort merge over a hash join because of sorts required by other operations. 因为其他操作本来就需要排序,优化器发现使用排序归并连接比hash连接开销小。

7.4 Full Outer Joins 完全外连接

A full outer join acts like a combination of the left and right outer joins. In addition to the inner join, rows from both tables that have not been returned in the result of the inner join are preserved and extended with nulls. In other words, full outer joins let you join tables together, yet still show rows that do not have corresponding rows in the joined tables. 完全外连接操作就像把左边和邮编的结果集合并在一起。在内连接的基础上,两个表上在内连接上未被返回的其他行都被保留标记为null。也就是说,完全外连接连接表,但是仍然显示其他不满足连接条件的行。

The query in Example 11-11 retrieves all departments and all employees in each department, but also includes: 在Example 11-11例子中返回所有departments和每个departments的所有employees,同时还包括:

  • Any employees without departments 没有departments的employees
  • Any departments without employees 没有employees的departments

Example 11-11 Full Outer Join

The statement produces the following output: 语句输出:

Starting with Oracle Database 11g, Oracle Database automatically uses a native execution method based on a hash join for executing full outer joins whenever possible. When the database uses the new method to execute a full outer join, the execution plan for the query contains HASH JOIN FULL OUTER. Example 11-12shows the execution plan for the query in Example 11-11. 从11g开始,oracle自动使用一个基于hash连接的原生方法来执行完全外连接。当使用新方法执行时,执行计划中会包含HASH JOIN FULL OUTER.

Example 11-12 Execution Plan for a Full Outer Join

Notice that HASH JOIN FULL OUTER is included in the plan. Therefore, the query uses the hash full outer join execution method. Typically, when the full outer join condition between two tables is an equi-join, the hash full outer join execution method is possible, and Oracle Database uses it automatically.需要指出的是HASH JOIN FULL OUTER 包含在查询计划中,即使用了hash完全连接。比较典型的,当两个表上是等值链接的完全连接条件,可以使用hash完全连接执行方法。

To instruct the optimizer to consider using the hash full outer join execution method, apply the NATIVE_FULL_OUTER_JOIN hint. To instruct the optimizer not to consider using the hash full outer join execution method, apply the NO_NATIVE_FULL_OUTER_JOIN hint. The NO_NATIVE_FULL_OUTER_JOIN hint instructs the optimizer to exclude the native execution method when joining each specified table. Instead, the full outer join is executed as a union of left outer join and an anti-join. 可以通过NATIVE_FULL_OUTER_JOIN提示指导数据库使用hash完全连接执行方法。通过NO_NATIVE_FULL_OUTER_JOIN 提示其不要使用hash完全连接执行方法

原创文章。为了维护文章的版本一致、最新、可追溯,转载请注明: 转载自idouba

本文链接地址: 【译】oracle join 方式(Oracle? Database Performance Tuning Guide)


, ,

No comments yet.

发表评论