MySQL Execution Plan--COUNT相關(guān)測試

頭號碼甲 2022-05-05 發(fā)布于北京

展開全文

COUNT全表記錄

在MySQL中，相同的SQL不同的存儲引擎執(zhí)行計劃不同：

對于MyISAM引擎，由于使用表鎖進行并發(fā)控制，同一時間點多個并發(fā)線程執(zhí)行相同查詢獲得的結(jié)果相同，且MyISAM存儲引擎專門存儲表總記錄數(shù)，因此使用COUNT(*)查詢?nèi)碛涗洉r能直接返回。

而對于InnoDB存儲引擎，由于使用MVCC和行鎖進行并發(fā)控制，同一時間點多個并發(fā)線程執(zhí)行相同查詢獲得的結(jié)果存在差異(每個回話的READVIEW不同)，且沒有專門存儲表總記錄數(shù)，因此每次查詢都需要掃描全表或掃描某個索引的全部記錄。

For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.

InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.

Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause.

This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL。

現(xiàn)有測試表TB101：

CREATE TABLE `tb101` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `C1` int(11) NOT NULL,
  `C2` int(11) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=MyISAM AUTO_INCREMENT=140001 DEFAULT CHARSET=utf8

對于沒有WHERE條件的COUNT(*)/COUNT(1)/COUNT(ID)/COUNT(C1)的執(zhí)行計劃為：

mysql> EXPLAIN SELECT COUNT(*) FROM TB101 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Select tables optimized away
1 row in set, 1 warning (0.00 sec)

對于沒有WHERE條件的COUNT(C2)的執(zhí)行計劃為：

mysql> EXPLAIN SELECT COUNT(C2) FROM TB101 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TB101
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 140000
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

可以發(fā)現(xiàn)，對于MyISAM存儲引擎，在沒有WHERE條件下，如果C1列為NOT NULL，則可以將COUNT(C1)與COUNT(*)和COUNT(1)做相同的處理。

針對上面的測試，對于InnoDB存儲引擎，在沒有WHERE條件下：

1、ID列為NOT NULL主鍵，COUNT(ID)和COUNT(1)或COUNT(*)的執(zhí)行計劃相同，返回結(jié)果相同。

2、C1列為NOT NULL，COUNT(C1)和COUNT(1)或COUNT(*)的執(zhí)行結(jié)果相同，但執(zhí)行計劃不同。

COUNT(expr)異同

1、COUNT(1)和COUNT(*)等價，兩者在執(zhí)行計劃和執(zhí)行效率上完全相同。

個人推薦使用COUNT(1)替換COUNT(*)，原因是簡單直觀，
他人推薦使用COUNT(1),原因是符合SQL92標(biāo)準(zhǔn)，阿里巴巴Java開發(fā)手冊推薦。

2、COUNT(*)和COUNT(C1)不一定等價，兩者執(zhí)行計劃和執(zhí)行結(jié)果會存在差異。

COUNT(*)：執(zhí)行返回滿足WHERE條件的行數(shù)，不考慮NULL值問題
COUNT(C1): 執(zhí)行返回滿足WHERE條件且C1不等于NULL的行數(shù)，不統(tǒng)計C1等于NULL的行。

換種理解思路：
對于MyISAM引擎表和InnoDB引擎表，無論是顯式主鍵還是因此ROWID，都要求非空唯一，每行記錄都肯定存在一個不為NULL的列(列組)，因此計算COUNT(*)時不需要考慮NULL值問題。

一個有趣的擴展，如果C1為NOT NULL，那么COUNT(C1)與COUNT(1)的返回結(jié)果相同，那么MySQL會對此進行優(yōu)化么？

現(xiàn)有測試表結(jié)果如下：

CREATE TABLE `tb01` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `C1` int(11) NOT NULL,
  `C2` int(11) NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `IDX_C2` (`C2`)
) ENGINE=InnoDB AUTO_INCREMENT=16384 DEFAULT CHARSET=utf8

查看COUNT(*)和COUNT(C1)的執(zhí)行計劃：

mysql> EXPLAIN SELECT COUNT(*) FROM TB01 WHERE C2<100 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TB01
   partitions: NULL
         type: range
possible_keys: IDX_C2
          key: IDX_C2
      key_len: 4
          ref: NULL
         rows: 99
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT COUNT(C1) FROM TB01 WHERE C2<100 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TB01
   partitions: NULL
         type: range
possible_keys: IDX_C2
          key: IDX_C2
      key_len: 4
          ref: NULL
         rows: 99
     filtered: 100.00
        Extra: Using index condition
1 row in set, 1 warning (0.00 sec)

從上面執(zhí)行計劃可以發(fā)現(xiàn)，在處理COUNT(*)時，僅需要使用IDX_C2即可完成查詢，因此Extra為Using index，而在處理COUNT(C1)時，需要使用IDX_C2進行過濾后再執(zhí)行回表查詢，因此Extra為Using index condition。

針對上面的測試，MyISAM存儲引擎和InnoDB存儲引擎的測試結(jié)果相同。

COUNT(DISTINC ...)操作

MySQL官網(wǎng)解釋為：

COUNT(DISTINCT expr,[expr...])

Returns a count of the number of rows with different non-NULL expr values.

In MySQL, you can obtain the number of distinct expression combinations that do not contain NULL by giving a list of expressions. In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).

在MySQL中允許執(zhí)行：

SELECT COUNT(DISTINCT ID,C1) FROM TB02;

但不允許執(zhí)行：

SELECT COUNT(ID,C1) FROM TB02;

總結(jié)

1、對于InnoDB和MyISAM存儲引擎，COUNT(1)和COUNT(*)在任何場景下都等價，執(zhí)行性能和執(zhí)行計劃相同。

2、在查詢?nèi)碛涗?沒有WHERE條件)時，對于MyISAM存儲引擎，存儲引擎存儲表總記錄數(shù)，無需掃描數(shù)據(jù)因此查詢可以很快返回，對于InnoDB存儲引擎，需要掃描全表或某個索引的全部記錄因此查詢可能比較耗時。

3、對于MyISAM存儲引擎，在沒有WHERE條件情況下，如果列C1為NOT NULL，那么COUNT(C1)和COUNT(*)執(zhí)行操作相同。

4、對于InnoDB存儲引擎，如果列C1為主鍵，那么COUNT(C1)和COUNT(*)執(zhí)行計劃和執(zhí)行效率相同，如果C1為NOT NULL，那么COUNT(C1)和COUNT(*)執(zhí)行計劃和執(zhí)行效率不一定相同，只有在查詢使用C1列上索引時才可能相同。

參考鏈接：

https://dev./doc/refman/8.0/en/group-by-functions.html#function_count

https://dev./doc/refman/5.7/en/create-index.html

https://mp.weixin.qq.com/s/IOHvtel2KLNi-Ol4UBivbQ