ご挨拶

初めまして。スタイル・エッジLABO のしおです。昨年10月に中途で入社してきました。
前職では SQL を書いてデータ調査をしたり巨大なシステムの中で路頭に迷ったり Java を書いたり書かなかったりしていました。
現在は、スタイル・エッジの社内で使用されているとあるシステムの保守開発をさせて頂いています。

日々の業務の中で MySQL を使用してデータ調査用のクエリを組んだりバッチ処理を書いたりしているわけですが、ふと「MySQL の実装ってどうなってるんだ？そもそもこれってどんな仕組みで動いてるの？？」と思ったので、今日は実際に MySQL のソースコードを読んでみようと思います。

MySQL は OSS (オープンソースソフトウェア) で、ソースコードが全世界に公開されています。そのため、誰でも自由にソースコードを読むことが出来ます。
個人的には飛行機の飛ぶ原理とかも気になって仕方がないのですが、飛行機は OSS ではないので今回は諦めて MySQL のソースコードを読んでいくことにします。

モチベーション

色んな OSS のコードリーディングしてみたい。
初見のコード読めるようになりたい。
MySQL の仕様について自信を持ちたい。

対象

普段業務で使っている、mysql-server のソースを読んでいきます。ただ、闇雲に読んでも迷子になるだけなので、目的は設定しておきます。
今回は、最近知った 宇宙船演算子 (英: spaceship operator) というカッコイイ演算子 *1 の謎を紐解いていきます。

宇宙船演算子とは？

宇宙船演算子は、簡単に言うと null 値を比較できる比較演算子です。

例えば、数値同士は以下の通り何の問題もなく比較できます。

mysql> set @hoge = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> select @hoge = 1 as result;
+--------+
| result |
+--------+
|      1 |
+--------+
1 row in set (0.00 sec)

文字列同士での比較も同様です。

mysql> set @hoge = 'aaa';
Query OK, 0 rows affected (0.00 sec)

mysql> select @hoge = 'aaa' as result;
+--------+
| result |
+--------+
|      1 |
+--------+
1 row in set (0.00 sec)

また、異なる型同士でも、比較自体は可能です。

mysql> set @hoge = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> select @hoge = 'aaa' as result;
+--------+
| result |
+--------+
|      0 |
+--------+
1 row in set, 1 warning (0.00 sec)

では、比較対象が null 値だとどうなるのでしょうか？

mysql> set @hoge = null;
Query OK, 0 rows affected (0.00 sec)

mysql> select @hoge = 'aaa' as result;
+--------+
| result |
+--------+
|   NULL |
+--------+
1 row in set (0.00 sec)

この通り、比較結果は null となり結果は返ってきません。そのため、null と比較するときは coalesce (これなんて読むのが正解？)*2 で初期値を設定するか、is not null and 等でよくお茶を濁します。

mysql> set @hoge = null;
Query OK, 0 rows affected (0.00 sec)

mysql> select coalesce(@hoge, '') = 'aaa' as result;
+--------+
| result |
+--------+
|      0 |
+--------+
1 row in set (0.00 sec)

mysql> select (@hoge is not null) and (@hoge = 'aaa') as result;
+--------+
| result |
+--------+
|      0 |
+--------+
1 row in set (0.00 sec)

ただ、null 値の考慮のために coalesce や is not null and 等の比較を入れていることで単純にソースの分量も増えて読みづらくなるし、本来のクエリの目的もぼやけて伝わりづらくなってしまいます。

この問題を解決するのが宇宙船演算子で、彼は null 値との比較を可能にしてくれます。

mysql> set @hoge = null;
Query OK, 0 rows affected (0.00 sec)

mysql> select @hoge <=> 'aaa' as result;
+--------+
| result |
+--------+
|      0 |
+--------+
1 row in set, 1 warning (0.00 sec)

ちなみに、比較対象が両方とも null ならば等しいと見なされるそうです。

mysql> set @hoge = null;
Query OK, 0 rows affected (0.00 sec)

mysql> set @fuga = null;
Query OK, 0 rows affected (0.00 sec)

mysql> select @hoge <=> @fuga as result;
+--------+
| result |
+--------+
|      1 |
+--------+
1 row in set (0.01 sec)

実際に読んでいく

ソースの展開

前置きが大変長くなってしまいました。以上が宇宙船演算子の概要です。ここからは、実際に MySQL のソースを読んで宇宙船演算子の確かな仕様を掴みに行きます。

ソースの展開は以下の手順で行いました。

# 任意の場所に移動
cd ~/projects
# ソースを落としてくる
git clone https://github.com/mysql/mysql-server.git
# ディレクトリ移動する
cd ./mysql-server

とにかく grep だ！

兎にも角にも、宇宙船演算子なるものの正体を暴くためにキーワード <=> で grep してみます。

$ grep -rn "<=>" ./
./storage/innobase/include/log0types.h:222:       @name Users <=> writer
./storage/innobase/include/log0types.h:276:       @name Users <=> flusher
./storage/innobase/include/log0types.h:453:       @name Log flusher <=> flush_notifier
./storage/innobase/include/log0types.h:478:       @name Log writer <=> write_notifier
./storage/ndb/src/mgmclient/CommandInterpreter.cpp:351:"                      #&()*+-./:;<=>?@[]_{|}~.\n"
./storage/ndb/src/kernel/blocks/dbtup/DbtupExecQuery.cpp:3754:   * len=4 <=> 1 word
./storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp:21970:       * Nothing queued or started <=> Complete on that node
./storage/ndb/nodejs/jones-ndb/impl/src/ndb/QueryOperation.cpp:128://  DEBUG_PRINT_DETAIL("compareTwoResults for level %d: %d <=> %d", level, r2, r1);
./storage/myisam/mi_key.cc:277:    unpack_blobs        true  <=> Unpack blob columns
./storage/myisam/mi_key.cc:278:                        false <=> Skip them. This is used by index condition
./include/base64.h:77:    60,        61, -1, -1, -1, -1, -1, -1, /* 0123456789:;<=>? */
./include/mysql/service_command.h:184:  @param is_unsigned   TRUE <=> value is unsigned
./unittest/gunit/decimal-t.cc:409:  sprintf(s, "'%s' <=> '%s'", s1, s2);
./unittest/gunit/libmysqlgcs/xcom/gcs_xcom_xcom_base-t.cc:641:     gt_ballot(m->proposal, p->proposer.msg->proposal) <=>
./unittest/gunit/libmysqlgcs/xcom/gcs_xcom_xcom_base-t.cc:642:     gt_ballot((0,0), (0,1)) <=>
./unittest/gunit/opt_trace-t.cc:828:      "0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\\\]^_`"
./unittest/gunit/xplugin/xpl/mysql_function_names_t.cc.in:238:    {"<=>", OPERATOR},
./plugin/group_replication/include/sql_service/sql_service_context.h:136:    @param is_unsigned   TRUE <=> value is unsigned
./plugin/group_replication/include/sql_service/sql_service_context_base.h:142:    @param is_unsigned   TRUE <=> value is unsigned
./plugin/innodb_memcached/daemon_memcached/scripts/damemtop:369:                @newrows = sort { $a->[$colnum] <=> $b->[$colnum] } @rows;
./plugin/innodb_memcached/daemon_memcached/scripts/damemtop:375:                @newrows = sort { $b->[$colnum] <=> $a->[$colnum] } @rows;
./plugin/x/src/ngs/command_delegate.h:273:    @param unsigned_flag true <=> value is unsigned
./scripts/fill_help_tables.sql:157:INSERT INTO help_topic (help_topic_id,help_category_id,name,description,example,url) VALUES (55,10,'<=>','Syntax:\n<=>\n\nNULL-safe equal. This operator performs an equality comparison like the\n= operator, but returns 1 rather than NULL if both operands are NULL,\nand 0 rather than NULL if one operand is NULL.\n\nThe <=> operator is equivalent to the standard SQL IS NOT DISTINCT FROM\noperator.\n\nURL: https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html\n\n','mysql> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;\n        -> 1, 1, 0\nmysql> SELECT 1 = 1, NULL = NULL, 1 = NULL;\n        -> 1, NULL, NULL\n','https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html');
./scripts/fill_help_tables.sql:860:INSERT INTO help_keyword (help_keyword_id,name) VALUES (56,'<=>');
./scripts/mysqldumpslow.pl.in:157:my @sorted = sort { $stmt{$b}->{$opt{s}} <=> $stmt{$a}->{$opt{s}} } keys %stmt;
./sql/field.h:4590:    true <=> source item is an Item_field. Needed to workaround lack of
./sql/json_dom.h:1598:    @param[out] err    true <=> error occur during coercion
./sql/json_dom.h:1618:    @param[out] err    true <=> error occur during coercion
./sql/json_dom.h:1636:    @param[out] err    true <=> error occur during coercion
・・・
以下、お見せできないほどの量

そっか・・・

落ち着いて tree

情報量に圧倒されてしまいそうだったので、一旦呼吸を整えて tree コマンドでざっくりとディレクトリ構成を把握することにしました。

$ tree ./ -L 1
./
├── CMakeLists.txt
├── Docs
├── Doxyfile-ignored
├── Doxyfile.in
├── INSTALL
├── LICENSE
├── MYSQL_VERSION
├── README
├── client
├── cmake
├── components
├── config.h.cmake
├── configure.cmake
├── doxygen_resources
├── extra
├── include
├── libbinlogevents
├── libbinlogstandalone
├── libchangestreams
├── libmysql
├── libservices
├── man
├── mysql-test
├── mysys
├── packaging
├── plugin
├── router
├── run_doxygen.cmake
├── scripts
├── share
├── sql
├── sql-common
├── storage
├── strings
├── support-files
├── testclients
├── unittest
├── utilities
└── vio

29 directories, 10 files

なんだか sql の下が匂う・・・気がするので、再度./sql 配下で grep してみます。

$ grep -rn "<=>" ./sql
./sql/field.h:4590:    true <=> source item is an Item_field. Needed to workaround lack of
./sql/json_dom.h:1598:    @param[out] err    true <=> error occur during coercion
./sql/json_dom.h:1618:    @param[out] err    true <=> error occur during coercion
./sql/json_dom.h:1636:    @param[out] err    true <=> error occur during coercion
./sql/temp_table_param.h:174:    true <=> don't actually create table handler when creating the result
./sql/sql_class.cc:2518:  @param  all   true <=> rollback main transaction.
./sql/sql_const_folding.h:36:  Fold boolean condition {=, <>, >, >=, <, <=, <=>} involving constants and
./sql/range_optimizer/index_range_scan.h:79:  bool free_file; /* TRUE <=> this->file is "owned" by this quick select */
./sql/range_optimizer/index_range_scan_plan.h:55:      index_read_must_be_used  true <=> assume 'index only' option will be set
./sql/range_optimizer/index_range_scan_plan.h:57:      update_tbl_stats         true <=> update table->quick_* with information
./sql/range_optimizer/index_range_scan_plan.h:100:      update_tbl_stats  true <=> update table->quick_* with information
./sql/range_optimizer/partition_pruning.cc:911:          Ok, we've got "fieldN<=>constN"-type SEL_ARGs for all partitioning
./sql/range_optimizer/partition_pruning.cc:939:          Ok, we've got "fieldN<=>constN"-type SEL_ARGs for all subpartitioning
./sql/range_optimizer/range_analysis.cc:1134:  @param comp_op                    Comparison operator: >, >=, <=> etc.
./sql/range_optimizer/range_analysis.cc:1246:          Independent of data type, "out_of_range_value =/<=> field" is
./sql/range_optimizer/range_analysis.cc:1544:    Any sargable predicate except "<=>" involving NULL as a constant is always
./sql/range_optimizer/tree.cc:1072:  // (cond) OR (IMPOSSIBLE) <=> (cond).
./sql/rpl_rli_pdb.cc:1710:        lwm_estimate < last_committed  <=>  last_committed  \not <= lwm_estimate
./sql/sql_optimizer_internal.h:69:  Currently 'op' is one of {'=', '<=>', 'IS [NOT] NULL', 'arg1 IN arg2'},
./sql/lex.h:74:    {SYM("<=>", EQUAL_SYM)},
./sql/handler.h:3579:  Flag set <=> default MRR implementation is used
./sql/handler.h:3588:  Flag set <=> the caller guarantees that the bounds of the scanned ranges
./sql/handler.h:4165:  /* true <=> source MRR ranges and the output are ordered */
./sql/handler.h:4168:  /* true <=> we're currently traversing a range in mrr_cur_range. */
./sql/handler.h:4200:    true <=> the engine guarantees that returned records are within the range
./sql/handler.h:6777:  bool dsmrr_eof; /* true <=> We have reached EOF when reading index tuples */
./sql/handler.h:6779:  /* true <=> need range association, buffer holds {rowid, range_id} pairs */
./sql/handler.h:6782:  bool use_default_impl; /* true <=> shortcut all calls to default MRR impl */
./sql/sql_executor.h:392:  /** true <=> remove duplicates on this table. */
./sql/opt_explain_json.cc:903:        (x <=> (SELECT FROM DUAL) AND x = (SELECT FROM DUAL)),
./sql/table_function.h:214:    true <=> NESTED PATH associated with this element is producing records.
./sql/table_function.h:304:    @param[out]  skip  true <=> it's a NESTED PATH node and its path
./sql/item_cmpfunc.cc:1071:  @param [out] is_null        true <=> the item_arg is null
./sql/item_cmpfunc.cc:1490:  @param[out] is_null  true <=> the item_arg is null
./sql/item_cmpfunc.cc:1553:    is_null    [out]    true <=> the item_arg is null
./sql/item_cmpfunc.cc:4949:  /* true <=> arguments values will be compared as DATETIMEs. */
./sql/sql_resolver.cc:1867:  @param top         true <=> cond is the where condition
./sql/sql_resolver.cc:1868:  @param in_sj       true <=> processing semi-join nest's children
./sql/sql_resolver.cc:5479:      // If antijoin, we can decorrelate '<>', '>=', etc, too (but not '<=>'):
./sql/join_optimizer/make_join_hypergraph.cc:2235:      //   (t1 <opA> t2) <opB> t3 <=> t1 <opA> (t2 <opB> t3)
./sql/join_optimizer/interesting_orders.cc:1414:      // TODO(sgunders): When we get C++20, use operator<=> so that we
./sql/join_optimizer/join_optimizer.cc:3831:// TODO(sgunders): Include x=y OR NULL predicates, <=> and IS NULL predicates,
./sql/sql_opt_exec_shared.h:109:    true <=> disable the "cache" as doing lookup with the same key value may
./sql/sql_partition.cc:3112:      include_endpoint  true <=> the endpoint itself is included in the
./sql/sql_optimizer.h:589:  /** Exec time only: true <=> current group has been sent */
./sql/handler.cc:7151:  @param      interrupted  true <=> Assume that the disk sweep will be
./sql/sql_select.cc:2561:  index are available other_tbls_ok  true <=> Fields of other non-const tables
./sql/sql_select.cc:5036:          <=> N > refkey_rows_estimate.
./sql/opt_trace.cc:219:    0 <=> this trace should be in information_schema.
./sql/opt_explain.h:97:  bool zero_result;           ///< true <=> plan will not be executed
./sql/key_spec.h:182:  /// true <=> ascending, false <=> descending.
./sql/key_spec.h:185:  /// true <=> ASC/DESC is explicitly specified, false <=> implicit ASC
./sql/opt_sum.cc:759:  bool eq_type = false;          // =, <=> or IS NULL
./sql/opt_sum.cc:760:  bool is_null_safe_eq = false;  // The operator is NULL safe, e.g. <=>
./sql/opt_trace_context.h:370:    <>0 <=> any to-be-created statement's trace should not be in
./sql/sql_const_folding.cc:1053:  [*] for the "<=>" operator, we fold to FALSE (0) in this case.
./sql/partitioning/partition_handler.h:1022:    @param[in]  have_start_key  true <=> the left endpoint is available, i.e.
./sql/partitioning/partition_handler.h:1025:                                false <=> there is no left endpoint (we're in
./sql/partitioning/partition_handler.cc:2208:  @param idx_read_flag  true <=> m_start_key has range start endpoint which
./sql/partitioning/partition_handler.cc:2211:                        false <=> there is no start endpoint.
./sql/item.h:1607:        left_endp  false  <=> The interval is "x < const" or "x <= const"
./sql/item.h:1608:                   true   <=> The interval is "x > const" or "x >= const"
./sql/item.h:1610:        incl_endp  IN   false <=> the comparison is '<' or '>'
./sql/item.h:1611:                        true  <=> the comparison is '<=' or '>='
./sql/item.h:5944:    true <=> that the outer_ref is already present in the select list
./sql/item.h:6504:    true <=> cache holds value of the last stored item (i.e actual value).
./sql/sql_tmp_table.cc:1962:  @param force_disk_table true <=> Use InnoDB
./sql/sql_tmp_table.cc:2013:  @param force_disk_table true <=> Use InnoDB
./sql/item.cc:2041:  @param skip_registered <=> function be must skipped for registered SUM items
./sql/item.cc:2131:  /* An item of type Item_sum  is registered <=> referenced_by[0] != 0 */
./sql/item.cc:7310:  @param [out] arg If != NULL <=> Cache this item.
./sql/item.cc:9150:        We can't ignore NULL values here as this item may be used with <=>, in
./sql/sql_select.h:736:  /** true <=> AM will scan backward */
./sql/sql_select.h:811:  bool null_key{false}; /* true <=> the value of the key has a null part */
./sql/sql_select.h:992:  @param func   comparison operator (= or <=>)
./sql/sql_parse.cc:6551:  if ((cmp == &comp_eq_creator) && !all)  //  = ANY <=> IN
./sql/sql_parse.cc:6553:  if ((cmp == &comp_ne_creator) && all)  // <> ALL <=> NOT IN
./sql/table.cc:4170:  @param is_virtual true <=> it's a virtual tmp table
./sql/sql_planner.cc:965:  @param disable_jbuf      true<=> Don't use join buffering
./sql/sql_optimizer.cc:6151:      AND t11.b <=> t10.b AND (t11.a = (SELECT MAX(a) FROM t12
./sql/sql_optimizer.cc:6281:  @param  other_tbls_ok  true <=> Fields of other non-const tables are allowed
./sql/sql_optimizer.cc:6661:  -   (t2.key = t1.field OR t2.key <=> t1.field) -> null_rejecting=false
./sql/sql_optimizer.cc:6846:  @param eq_func            True if we used =, <=> or IS NULL
./sql/sql_optimizer.cc:7006:    Only the <=> operator and the IS NULL and IS NOT NULL clauses may return
./sql/sql_optimizer.cc:7062:    @param  eq_func        True if we used =, <=> or IS NULL
./sql/sql_optimizer.cc:8830:        Note that ref access implements "table1.field1 <=>
./sql/sql_optimizer.cc:10247:       3) If the <=> operator is used, result is always true because
./sql/table.h:1725:    For tmp tables. true <=> tmp table has been instantiated.
./sql/table.h:1742:      true <=> range optimizer found that there is no rows satisfying
./sql/table.h:3354:       <=>
./sql/table.h:3358:       <=>
./sql/table.h:3362:       <=>
./sql/table.h:3686:  /// true <=> VIEW CHECK OPTION condition is processed (also for prep. stmts)
./sql/table.h:3688:  /// true <=> Filter condition is processed
./sql/table.h:3826:  /// true <=> this table is a const one and was optimized away.
./sql/table.h:3830:    true <=> all possible keys for a derived table were collected and
./sql/item_cmpfunc.h:143:  bool set_null{true};  // true <=> set owner->null_value
./sql/item_cmpfunc.h:336:    True <=> this item was added by IN->EXISTS subquery transformation, and
./sql/item_cmpfunc.h:531:/// Abstract base class for the comparison operators =, <> and <=>.
./sql/item_cmpfunc.h:564:    return "<=>";
./sql/item_cmpfunc.h:681:  >, >=) as well as the special <=> equality operator.
./sql/item_cmpfunc.h:1067:  The <=> operator evaluates the same as
./sql/item_cmpfunc.h:1071:  a <=> b is equivalent to the standard operation a IS NOT DISTINCT FROM b.
./sql/item_cmpfunc.h:1089:  const char *func_name() const override { return "<=>"; }
./sql/item_cmpfunc.h:1230:  bool negated;    /* <=> the item represents NOT <func> */
./sql/item_cmpfunc.h:1231:  bool pred_level; /* <=> [NOT] <func> is used on a predicate level */
./sql/item_cmpfunc.h:1261:  /* true <=> arguments will be compared as dates. */

またまた情報量に圧倒されつつ、一つ明らかに怪しいものが・・・

./sql/lex.h:74:    {SYM("<=>", EQUAL_SYM)},

というわけで、./sql/lex.h:74 近辺を詳しく見てみます。

細かく grep しつつ読んでいく

static const SYMBOL symbols[] = {
    /*
     Insert new SQL keywords after that commentary (by alphabetical order):
    */
    {SYM("&&", AND_AND_SYM)},
    {SYM("<", LT)},
    {SYM("<=", LE)},
    {SYM("<>", NE)},
    {SYM("!=", NE)},
    {SYM("=", EQ)},
    {SYM(">", GT_SYM)},
    {SYM(">=", GE)},
    {SYM("<<", SHIFT_LEFT)},
    {SYM(">>", SHIFT_RIGHT)},
    {SYM("<=>", EQUAL_SYM)},
    {SYM("ACCESSIBLE", ACCESSIBLE_SYM)},
    {SYM("ACCOUNT", ACCOUNT_SYM)},
    {SYM("ACTION", ACTION)},
    {SYM("ACTIVE", ACTIVE_SYM)},
    {SYM("ADD", ADD)},
    {SYM("ADMIN", ADMIN_SYM)},
    {SYM("AFTER", AFTER_SYM)},
    {SYM("AGAINST", AGAINST)},
    {SYM("AGGREGATE", AGGREGATE_SYM)},
    {SYM("ALL", ALL)},
    {SYM("ALGORITHM", ALGORITHM_SYM)},
    {SYM("ALTER", ALTER)},
    // 省略

こんな感じで SQL の予約語が定義されてました。本筋とは関係ない話ですが、MySQL って <> と != の両方をサポートしてたんですね（両方とも NE なので同じ働きしてくれそう）。知らなかった。

さて、肝心の宇宙船演算子を意味する比較演算子 <=> はここでは EQUAL_SYM と定義されているらしいので、再度 EQUAL_SYM で ./sql 配下を grep してみます。

$ grep -rn "EQUAL_SYM" ./sql
./sql/lex.h:74:    {SYM("<=>", EQUAL_SYM)},
./sql/sql_yacc.yy:716:%token  EQUAL_SYM 416                     /* OPERATOR */
./sql/sql_yacc.yy:1405:%left   EQ EQUAL_SYM GE GT_SYM LE LT NE IS LIKE REGEXP IN_SYM
./sql/sql_yacc.yy:10462:        | EQUAL_SYM { $$ = &comp_equal_creator; }

上から順に見ていくと、

./sql/lex.h:74:    {SYM("<=>", EQUAL_SYM)},

これはさっき見た定義ファイル。

./sql/sql_yacc.yy:716:%token  EQUAL_SYM 416                     /* OPERATOR */
./sql/sql_yacc.yy:1405:%left   EQ EQUAL_SYM GE GT_SYM LE LT NE IS LIKE REGEXP IN_SYM

これは前者がよく分からないけど、後者はただ比較演算子を列挙しているだけっぽいのでスルー。

./sql/sql_yacc.yy:10462:        | EQUAL_SYM { $$ = &comp_equal_creator; }

次に見るとしたら comp_equal_creator な気がするので、grep してみます。

$ grep -rn "comp_equal_creator" ./
./sql/sql_yacc.yy:10267:            if ($2 == &comp_equal_creator)
./sql/sql_yacc.yy:10462:        | EQUAL_SYM { $$ = &comp_equal_creator; }
./sql/sql_parse.h:76:Comp_creator *comp_equal_creator(bool invert);
./sql/sql_parse.cc:6512:Comp_creator *comp_equal_creator(bool invert [[maybe_unused]]) {

上2つは、雰囲気的に comp_equal_creator 関数の呼び出し元で、下2つのどちらかが定義部分っぽい？

comp_equal_creator は、sql_parse.cc では以下のように定義されていました。

Comp_creator *comp_equal_creator(bool invert [[maybe_unused]]) {
  assert(!invert);  // Function never called with true.
  return &equal_creator;
}

最初の謎 assert は、コメントを信じると「引数 invert は基本的に true を取らない」と書いてあるし、引数定義部分で maybe_unused と注釈？があるので、読み飛ばします。

次に注目するワードは equal_creator 。equal_creator で grep するとさっき検索した comp_equal_creator も引っかかってしまうので、-w オプションを付けて完全一致で検索することにします。

$ grep -rnw "equal_creator" ./
./sql/item_cmpfunc.cc:287:Item_bool_func *Equal_creator::create_scalar_predicate(Item *a, Item *b) const {
./sql/item_cmpfunc.cc:292:Item_bool_func *Equal_creator::combine(List<Item> list) const {
./sql/sql_parse.cc:6514:  return &equal_creator;
./sql/mysqld.cc:1500:Equal_creator equal_creator;
./sql/item_cmpfunc.h:559:class Equal_creator : public Linear_comp_creator {
./sql/item_cmpfunc.h:2707:extern Equal_creator equal_creator;

Equal_creator というクラスの定義があるので、./sql/item_cmpfunc.h:559 を見てみます。

class Equal_creator : public Linear_comp_creator {
 public:
  const char *symbol(bool invert [[maybe_unused]]) const override {
    // This will never be called with true.
    assert(!invert);
    return "<=>";
  }

 protected:
  Item_bool_func *create_scalar_predicate(Item *a, Item *b) const override;
  Item_bool_func *combine(List<Item> list) const override;
};

ざっと見てみると、一番上の const は比較演算子の文字列定義、下の二つがメソッド？ create_scalar_predicate(Item *a, Item *b) が引数を2つ取っていてそれっぽいので、grep してみます。

$ grep -rn "create_scalar_predicate" ./
./sql/item_cmpfunc.cc:257:  create_scalar_predicate().
./sql/item_cmpfunc.cc:275:  return create_scalar_predicate(a, b);
./sql/item_cmpfunc.cc:278:Item_bool_func *Eq_creator::create_scalar_predicate(Item *a, Item *b) const {
./sql/item_cmpfunc.cc:287:Item_bool_func *Equal_creator::create_scalar_predicate(Item *a, Item *b) const {
./sql/item_cmpfunc.cc:296:Item_bool_func *Ne_creator::create_scalar_predicate(Item *a, Item *b) const {
./sql/item_cmpfunc.h:544:  virtual Item_bool_func *create_scalar_predicate(Item *a, Item *b) const = 0;
./sql/item_cmpfunc.h:555:  Item_bool_func *create_scalar_predicate(Item *a, Item *b) const override;
./sql/item_cmpfunc.h:568:  Item_bool_func *create_scalar_predicate(Item *a, Item *b) const override;
./sql/item_cmpfunc.h:577:  Item_bool_func *create_scalar_predicate(Item *a, Item *b) const override;

どうやら実装は ./sql/item_cmpfunc.cc:278 にあるようです。*Equal_creator::create_scalar_predicate を見てみます。

Item_bool_func *Equal_creator::create_scalar_predicate(Item *a, Item *b) const {
  assert(a->type() != Item::ROW_ITEM || b->type() != Item::ROW_ITEM);
  return new Item_func_equal(a, b);
}

最初に、また謎の assert。一旦 assert は読み飛ばして、return new してる Item_func_equal で検索してみます。

$ grep -rn "Item_func_equal" ./sql
./sql/sql_help.cc:687:      Item *cond_topic_by_cat = new Item_func_equal(
./sql/sql_help.cc:689:      Item *cond_cat_by_cat = new Item_func_equal(
./sql/item_cmpfunc.cc:289:  return new Item_func_equal(a, b);
./sql/item_cmpfunc.cc:2493:bool Item_func_equal::resolve_type(THD *thd) {
./sql/item_cmpfunc.cc:2500:longlong Item_func_equal::val_int() {
./sql/item_cmpfunc.cc:2535:float Item_func_equal::get_filtering_effect(THD *, table_map filter_for_table,
./sql/item_cmpfunc.h:1075:class Item_func_equal final : public Item_func_comparison {
./sql/item_cmpfunc.h:1077:  Item_func_equal(Item *a, Item *b) : Item_func_comparison(a, b) {
./sql/item_cmpfunc.h:1080:  Item_func_equal(const POS &pos, Item *a, Item *b)

./sql/item_cmpfunc.h:1075 で、Item_func_equal クラスが定義されているので、早速見てみます。

/**
  The <=> operator evaluates the same as

    a IS NULL || b IS NULL ? a IS NULL == b IS NULL : a = b

  a <=> b is equivalent to the standard operation a IS NOT DISTINCT FROM b.

  Notice that the result is TRUE or FALSE, and never UNKNOWN.
*/
class Item_func_equal final : public Item_func_comparison {
 public:
  Item_func_equal(Item *a, Item *b) : Item_func_comparison(a, b) {
    null_on_null = false;
  }
  Item_func_equal(const POS &pos, Item *a, Item *b)
      : Item_func_comparison(pos, a, b) {
    null_on_null = false;
  }
  longlong val_int() override;
  bool resolve_type(THD *thd) override;
  enum Functype functype() const override { return EQUAL_FUNC; }
  enum Functype rev_functype() const override { return EQUAL_FUNC; }
  cond_result eq_cmp_result() const override { return COND_TRUE; }
  const char *func_name() const override { return "<=>"; }
  Item *truth_transformer(THD *, Bool_test) override { return nullptr; }

  float get_filtering_effect(THD *thd, table_map filter_for_table,
                             table_map read_tables,
                             const MY_BITMAP *fields_to_ignore,
                             double rows_in_table) override;
};

かなり核心に近づいている気がします。

Javadoc 的なやつを書いてくれてるのでざっくり雰囲気で読んでみると、

<=> 演算子は、

a IS NULL || b IS NULL ? a IS NULL == b IS NULL : a = b

と同じ動きをします。

a <=> b は、標準的な構文の a IS NOT DISTINCT FROM b と同じ働きをします。

この関数により返却されるのは TRUE か FALSE のどちらかであり、UNKNOWN にはならないことに注目してください。

とのこと。（IS NOT DISTINCT FROM の構文知らなかった・・・。 *3 ）

・・・

時間切れです。

すみません、Item_func_equal のクラスを見つけるのに精一杯で、結局宇宙船演算子の実装までは辿り着けませんでした。敗因としては、コードリーディングの経験値の少なさと、そもそも C++ の構文を知らなさ過ぎて何となく当てずっぽうで読み進めてしまったところでしょうか。

次回はもう少し勉強して、踏み込んだ内容で記事を書けるよう研鑽します。

オライリー社出版の Understanding MySQL Internals がどうやら今の自分のニーズにぴったりの本で、実際にソースコードを読み進めていきながら MySQL の理解を深めていくという内容らしいので、会社の書籍購入支援制度で購入してもらって読んでみようと思います！！！

まとめ

残念ながら結論は出せませんでしたが、今回コードリーディングをしてみたことで、世界中で使われている超メジャーな OSS でもソースコードを落としてきてキーワードを grep していくだけである程度欲しい情報には近づけるということに気が付けた点においては大きな収穫と言えるかと思います。

普段何気なく使っているツール類でも「あれ、これってどうやって作ってるんだろう？」という視点で見てみると、より理解が深まったり新たなバグが発見出来てしまうかもしれません。

それでは、次回「MySQL のソースを読んでみる～宇宙船演算子完全に理解した編～」でお会いいたしましょう。

スタイル・エッジLABO では、一緒に働く仲間を募集しています。
もし興味を持っていただけましたら、以下の採用サイトも一度覗いてみてください！

recruit.styleedge-labo.co.jp

*1:正式には「NULL安全等価演算子（英：NULL-safe equal）」
dev.mysql.com

*2:先輩に教えてもらいました。読みは「コ(ウ)アレス」とのこと。

*3:「IS DISTINCT FROM」は SQL:1999、「IS NOT DISTINCT FROM」は SQL:2003 にて策定されたそうです。
modern-sql.com

スタイル・エッジ技術ブログ

士業集客支援/コンサルティングのスタイル・エッジのエンジニアによるブログです。

MySQL のソースを読んでみる～初めまして、コードリーディング編～

ご挨拶