前端與編譯原理：用 JS 寫一個 JS 解釋器

Earlycl6i7o9ek 2019-02-24

展開全文

（給前端大全加星標(biāo)，提升前端技能）

作者：jrainlau
https://segmentfault.com/a/1190000017241258

說起編譯原理，印象往往只停留在本科時那些枯燥的課程和晦澀的概念。作為前端開發(fā)者，編譯原理似乎離我們很遠(yuǎn)，對它的理解很可能僅僅局限于“抽象語法樹（AST）”。但這僅僅是個開頭而已。編譯原理的使用，甚至能讓我們利用JS直接寫一個能運行JS代碼的解釋器。

項目地址：https://github.com/jrainlau/canjs
在線體驗：https:///jrainlau/pen/YRgQXo

一、為什么要用JS寫JS的解釋器

接觸過小程序開發(fā)的同學(xué)應(yīng)該知道，小程序運行的環(huán)境禁止 newFunction， eval等方法的使用，導(dǎo)致我們無法直接執(zhí)行字符串形式的動態(tài)代碼。此外，許多平臺也對這些JS自帶的可執(zhí)行動態(tài)代碼的方法進(jìn)行了限制，那么我們是沒有任何辦法了嗎？既然如此，我們便可以用JS寫一個解析器，讓JS自己去運行自己。

在開始之前，我們先簡單回顧一下編譯原理的一些概念。

二、什么是編譯器

說到編譯原理，肯定離不開編譯器。簡單來說，當(dāng)一段代碼經(jīng)過編譯器的詞法分析、語法分析等階段之后，會生成一個樹狀結(jié)構(gòu)的“抽象語法樹（AST）”，該語法樹的每一個節(jié)點都對應(yīng)著代碼當(dāng)中不同含義的片段。

比如有這么一段代碼：

const a = 1
console.log(a)

經(jīng)過編譯器處理后，它的AST長這樣：

{
  'type': 'Program',
  'start': 0,
  'end': 26,
  'body': [
    {
      'type': 'VariableDeclaration',
      'start': 0,
      'end': 11,
      'declarations': [
        {
          'type': 'VariableDeclarator',
          'start': 6,
          'end': 11,
          'id': {
            'type': 'Identifier',
            'start': 6,
            'end': 7,
            'name': 'a'
          },
          'init': {
            'type': 'Literal',
            'start': 10,
            'end': 11,
            'value': 1,
            'raw': '1'
          }
        }
      ],
      'kind': 'const'
    },
    {
      'type': 'ExpressionStatement',
      'start': 12,
      'end': 26,
      'expression': {
        'type': 'CallExpression',
        'start': 12,
        'end': 26,
        'callee': {
          'type': 'MemberExpression',
          'start': 12,
          'end': 23,
          'object': {
            'type': 'Identifier',
            'start': 12,
            'end': 19,
            'name': 'console'
          },
          'property': {
            'type': 'Identifier',
            'start': 20,
            'end': 23,
            'name': 'log'
          },
          'computed': false
        },
        'arguments': [
          {
            'type': 'Identifier',
            'start': 24,
            'end': 25,
            'name': 'a'
          }
        ]
      }
    }
  ],
  'sourceType': 'module'
}

常見的JS編譯器有 babylon， acorn等等，感興趣的同學(xué)可以在 AST explorer 這個網(wǎng)站自行體驗。

可以看到，編譯出來的AST詳細(xì)記錄了代碼中所有語義代碼的類型、起始位置等信息。這段代碼除了根節(jié)點 Program外，主體包含了兩個節(jié)點 VariableDeclaration和 ExpressionStatement，而這些節(jié)點里面又包含了不同的子節(jié)點。

正是由于AST詳細(xì)記錄了代碼的語義化信息，所以Babel，Webpack，Sass，Less等工具可以針對代碼進(jìn)行非常智能的處理。

三、什么是解釋器

如同翻譯人員不僅能看懂一門外語，也能對其藝術(shù)加工后把它翻譯成母語一樣，人們把能夠?qū)⒋a轉(zhuǎn)化成AST的工具叫做“編譯器”，而把能夠?qū)ST翻譯成目標(biāo)語言并運行的工具叫做“解釋器”。

在編譯原理的課程中，我們思考過這么一個問題：如何讓計算機(jī)運行算數(shù)表達(dá)式 1 2 3：

1 2 3

當(dāng)機(jī)器執(zhí)行的時候，它可能會是這樣的機(jī)器碼：

1 PUSH 1
2 PUSH 2
3 ADD
4 PUSH 3
5 ADD

而運行這段機(jī)器碼的程序，就是解釋器。

在這篇文章中，我們不會搞出機(jī)器碼這樣復(fù)雜的東西，僅僅是使用JS在其runtime環(huán)境下去解釋JS代碼的AST。由于解釋器使用JS編寫，所以我們可以大膽使用JS自身的語言特性，比如this綁定、new關(guān)鍵字等等，完全不需要對它們進(jìn)行額外處理，也因此讓JS解釋器的實現(xiàn)變得非常簡單。

在回顧了編譯原理的基本概念之后，我們就可以著手進(jìn)行開發(fā)了。

四、節(jié)點遍歷器

通過分析上文的AST，可以看到每一個節(jié)點都會有一個類型屬性 type，不同類型的節(jié)點需要不同的處理方式，處理這些節(jié)點的程序，就是“節(jié)點處理器 nodeHandler”。

定義一個節(jié)點處理器：

const nodeHandler = {
Program () {},
VariableDeclaration () {},
ExpressionStatement () {},
MemberExpression () {},
CallExpression () {},
Identifier () {}
}

關(guān)于節(jié)點處理器的具體實現(xiàn)，會在后文進(jìn)行詳細(xì)探討，這里暫時不作展開。

有了節(jié)點處理器，我們便需要去遍歷AST當(dāng)中的每一個節(jié)點，遞歸地調(diào)用節(jié)點處理器，直到完成對整棵語法書的處理。

定義一個節(jié)點遍歷器 NodeIterator：

class NodeIterator {
  constructor (node) {
    this.node = node
    this.nodeHandler = nodeHandler
  }

  traverse (node) {
    // 根據(jù)節(jié)點類型找到節(jié)點處理器當(dāng)中對應(yīng)的函數(shù)
    const _eval = this.nodeHandler[node.type]
    // 若找不到則報錯
    if (!_eval) {
      throw new Error(`canjs: Unknown node type '${node.type}'.`)
    }
    // 運行處理函數(shù)
    return _eval(node)
  }

}

理論上，節(jié)點遍歷器這樣設(shè)計就可以了，但仔細(xì)推敲，發(fā)現(xiàn)漏了一個很重要的東西——作用域處理。

回到節(jié)點處理器的 VariableDeclaration()方法，它用來處理諸如 consta=1這樣的變量聲明節(jié)點。假設(shè)它的代碼如下：

VariableDeclaration (node) {
for (const declaration of node.declarations) {
const { name } = declaration.id
const value = declaration.init ? traverse(declaration.init) : undefined
// 問題來了，拿到了變量的名稱和值，然后把它保存到哪里去呢？
// ...
}
},

問題在于，處理完變量聲明節(jié)點以后，理應(yīng)把這個變量保存起來。按照J(rèn)S語言特性，這個變量應(yīng)該存放在一個作用域當(dāng)中。在JS解析器的實現(xiàn)過程中，這個作用域可以被定義為一個 scope對象。

改寫節(jié)點遍歷器，為其新增一個 scope對象：

class NodeIterator {
  constructor (node, scope = {}) {
    this.node = node
    this.scope = scope
    this.nodeHandler = nodeHandler
  }

  traverse (node, options = {}) {
    const scope = options.scope || this.scope
    const nodeIterator = new NodeIterator(node, scope)
    const _eval = this.nodeHandler[node.type]
    if (!_eval) {
      throw new Error(`canjs: Unknown node type '${node.type}'.`)
    }
    return _eval(nodeIterator)
  }

  createScope (blockType = 'block') {
    return new Scope(blockType, this.scope)
  }
}

然后節(jié)點處理函數(shù) VariableDeclaration()就可以通過 scope保存變量了：

VariableDeclaration (nodeIterator) {
const kind = nodeIterator.node.kind
for (const declaration of nodeIterator.node.declarations) {
const { name } = declaration.id
const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined
// 在作用域當(dāng)中定義變量
// 如果當(dāng)前是塊級作用域且變量用var定義，則定義到父級作用域
if (nodeIterator.scope.type === 'block' && kind === 'var') {
nodeIterator.scope.parentScope.declare(name, value, kind)
} else {
nodeIterator.scope.declare(name, value, kind)
}
}
},

關(guān)于作用域的處理，可以說是整個JS解釋器最難的部分。接下來我們將對作用域處理進(jìn)行深入的剖析。

五、作用域處理

考慮到這樣一種情況：

const a = 1
{
  const b = 2
  console.log(a)
}
console.log(b)

運行結(jié)果必然是能夠打印出 a的值，然后報錯： UncaughtReferenceError:bisnotdefined。

這段代碼就是涉及到了作用域的問題。塊級作用域或者函數(shù)作用域可以讀取其父級作用域當(dāng)中的變量，反之則不行，所以對于作用域我們不能簡單地定義一個空對象，而是要專門進(jìn)行處理。

定義一個作用域基類 Scope：

class Scope {
constructor (type, parentScope) {
// 作用域類型，區(qū)分函數(shù)作用域function和塊級作用域block
this.type = type
// 父級作用域
this.parentScope = parentScope
// 全局作用域
this.globalDeclaration = standardMap
// 當(dāng)前作用域的變量空間
this.declaration = Object.create(null)
}
/*
* get/set方法用于獲取/設(shè)置當(dāng)前作用域中對應(yīng)name的變量值
符合JS語法規(guī)則，優(yōu)先從當(dāng)前作用域去找，若找不到則到父級作用域去找，然后到全局作用域找。
如果都沒有，就報錯
*/
get (name) {
if (this.declaration[name]) {
return this.declaration[name]
} else if (this.parentScope) {
return this.parentScope.get(name)
} else if (this.globalDeclaration[name]) {
return this.globalDeclaration[name]
}
throw new ReferenceError(`${name} is not defined`)
}
set (name, value) {
if (this.declaration[name]) {
this.declaration[name] = value
} else if (this.parentScope[name]) {
this.parentScope.set(name, value)
} else {
throw new ReferenceError(`${name} is not defined`)
}
}
/**
* 根據(jù)變量的kind調(diào)用不同的變量定義方法
*/
declare (name, value, kind = 'var') {
if (kind === 'var') {
return this.varDeclare(name, value)
} else if (kind === 'let') {
return this.letDeclare(name, value)
} else if (kind === 'const') {
return this.constDeclare(name, value)
} else {
throw new Error(`canjs: Invalid Variable Declaration Kind of '${kind}'`)
}
}
varDeclare (name, value) {
let scope = this
// 若當(dāng)前作用域存在非函數(shù)類型的父級作用域時，就把變量定義到父級作用域
while (scope.parentScope && scope.type !== 'function') {
scope = scope.parentScope
}
this.declaration[name] = new SimpleValue(value, 'var')
return this.declaration[name]
}
letDeclare (name, value) {
// 不允許重復(fù)定義
if (this.declaration[name]) {
throw new SyntaxError(`Identifier ${name} has already been declared`)
}
this.declaration[name] = new SimpleValue(value, 'let')
return this.declaration[name]
}
constDeclare (name, value) {
// 不允許重復(fù)定義
if (this.declaration[name]) {
throw new SyntaxError(`Identifier ${name} has already been declared`)
}
this.declaration[name] = new SimpleValue(value, 'const')
return this.declaration[name]
}
}

這里使用了一個叫做 simpleValue()的函數(shù)來定義變量值，主要用于處理常量：

class SimpleValue {
  constructor (value, kind = '') {
    this.value = value
    this.kind = kind
  }

  set (value) {
    // 禁止重新對const類型變量賦值
    if (this.kind === 'const') {
      throw new TypeError('Assignment to constant variable')
    } else {
      this.value = value
    }
  }

  get () {
    return this.value
  }
}

處理作用域問題思路，關(guān)鍵的地方就是在于JS語言本身尋找變量的特性——優(yōu)先當(dāng)前作用域，父作用域次之，全局作用域最后。反過來，在節(jié)點處理函數(shù) VariableDeclaration()里，如果遇到塊級作用域且關(guān)鍵字為 var，則需要把這個變量也定義到父級作用域當(dāng)中，這也就是我們常說的“全局變量污染”。

JS標(biāo)準(zhǔn)庫注入

細(xì)心的讀者會發(fā)現(xiàn)，在定義 Scope基類的時候，其全局作用域 globalScope被賦值了一個 standardMap對象，這個對象就是JS標(biāo)準(zhǔn)庫。

簡單來說，JS標(biāo)準(zhǔn)庫就是JS這門語言本身所帶有的一系列方法和屬性，如常用的 setTimeout， console.log等等。為了讓解析器也能夠執(zhí)行這些方法，所以我們需要為其注入標(biāo)準(zhǔn)庫：

const standardMap = {
console: new SimpleValue(console)
}

這樣就相當(dāng)于往解析器的全局作用域當(dāng)中注入了 console這個對象，也就可以直接被使用了。

六、節(jié)點處理器

在處理完節(jié)點遍歷器、作用域處理的工作之后，便可以來編寫節(jié)點處理器了。顧名思義，節(jié)點處理器是專門用來處理AST節(jié)點的，上文反復(fù)提及的 VariableDeclaration()方法便是其中一個。下面將對部分關(guān)鍵的節(jié)點處理器進(jìn)行講解。

在開發(fā)節(jié)點處理器之前，需要用到一個工具，用于判斷JS語句當(dāng)中的 return， break， continue關(guān)鍵字。

關(guān)鍵字判斷工具 `Signal`

定義一個 Signal基類：

class Signal {
  constructor (type, value) {
    this.type = type
    this.value = value
  }

  static Return (value) {
    return new Signal('return', value)
  }

  static Break (label = null) {
    return new Signal('break', label)
  }

  static Continue (label) {
    return new Signal('continue', label)
  }

  static isReturn(signal) {
    return signal instanceof Signal && signal.type === 'return'
  }

  static isContinue(signal) {
    return signal instanceof Signal && signal.type === 'continue'
  }

  static isBreak(signal) {
    return signal instanceof Signal && signal.type === 'break'
  }

  static isSignal (signal) {
    return signal instanceof Signal
  }
}

有了它，就可以對語句當(dāng)中的關(guān)鍵字進(jìn)行判斷處理，接下來會有大用處。

1、變量定義節(jié)點處理器—— `VariableDeclaration()`

最常用的節(jié)點處理器之一，負(fù)責(zé)把變量注冊到正確的作用域。

VariableDeclaration (nodeIterator) {
const kind = nodeIterator.node.kind
for (const declaration of nodeIterator.node.declarations) {
const { name } = declaration.id
const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined
// 在作用域當(dāng)中定義變量
// 若為塊級作用域且關(guān)鍵字為var，則需要做全局污染
if (nodeIterator.scope.type === 'block' && kind === 'var') {
nodeIterator.scope.parentScope.declare(name, value, kind)
} else {
nodeIterator.scope.declare(name, value, kind)
}
}
},

2、標(biāo)識符節(jié)點處理器—— `Identifier()`

專門用于從作用域中獲取標(biāo)識符的值。

  Identifier (nodeIterator) {
    if (nodeIterator.node.name === 'undefined') {
      return undefined
    }
    return nodeIterator.scope.get(nodeIterator.node.name).value
  },

3、字符節(jié)點處理器—— `Literal()`

返回字符節(jié)點的值。

Literal (nodeIterator) {
return nodeIterator.node.value
}

4、表達(dá)式調(diào)用節(jié)點處理器—— `CallExpression()`

用于處理表達(dá)式調(diào)用節(jié)點的處理器，如處理 func()， console.log()等。

  CallExpression (nodeIterator) {
    // 遍歷callee獲取函數(shù)體
    const func = nodeIterator.traverse(nodeIterator.node.callee)
    // 獲取參數(shù)
    const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))

    let value
    if (nodeIterator.node.callee.type === 'MemberExpression') {
      value = nodeIterator.traverse(nodeIterator.node.callee.object)
    }
    // 返回函數(shù)運行結(jié)果
    return func.apply(value, args)
  },

5、表達(dá)式節(jié)點處理器—— `MemberExpression()`

區(qū)分于上面的“表達(dá)式調(diào)用節(jié)點處理器”，表達(dá)式節(jié)點指的是 person.say， console.log這種函數(shù)表達(dá)式。

MemberExpression (nodeIterator) {
// 獲取對象，如console
const obj = nodeIterator.traverse(nodeIterator.node.object)
// 獲取對象的方法，如log
const name = nodeIterator.node.property.name
// 返回表達(dá)式，如console.log
return obj[name]
}

6、塊級聲明節(jié)點處理器—— `BlockStatement()`

非常常用的處理器，專門用于處理塊級聲明節(jié)點，如函數(shù)、循環(huán)、 try...catch...當(dāng)中的情景。

  BlockStatement (nodeIterator) {
    // 先定義一個塊級作用域
    let scope = nodeIterator.createScope('block')

    // 處理塊級節(jié)點內(nèi)的每一個節(jié)點
    for (const node of nodeIterator.node.body) {
      if (node.type === 'VariableDeclaration' && node.kind === 'var') {
        for (const declaration of node.declarations) {
          scope.declare(declaration.id.name, declaration.init.value, node.kind)
        }
      } else if (node.type === 'FunctionDeclaration') {
        nodeIterator.traverse(node, { scope })
      }
    }

    // 提取關(guān)鍵字（return, break, continue）
    for (const node of nodeIterator.node.body) {
      if (node.type === 'FunctionDeclaration') {
        continue
      }
      const signal = nodeIterator.traverse(node, { scope })
      if (Signal.isSignal(signal)) {
        return signal
      }
    }
  }

可以看到這個處理器里面有兩個 for...of循環(huán)。第一個用于處理塊級內(nèi)語句，第二個專門用于識別關(guān)鍵字，如循環(huán)體內(nèi)部的 break， continue或者函數(shù)體內(nèi)部的 return。

7、函數(shù)定義節(jié)點處理器—— `FunctionDeclaration()`

往作用當(dāng)中聲明一個和函數(shù)名相同的變量，值為所定義的函數(shù)：

FunctionDeclaration (nodeIterator) {
const fn = NodeHandler.FunctionExpression(nodeIterator)
nodeIterator.scope.varDeclare(nodeIterator.node.id.name, fn)
return fn
}

8、函數(shù)表達(dá)式節(jié)點處理器—— `FunctionExpression()`

用于定義一個函數(shù)：

  FunctionExpression (nodeIterator) {
    const node = nodeIterator.node
    /**
     * 1、定義函數(shù)需要先為其定義一個函數(shù)作用域，且允許繼承父級作用域
     * 2、注冊`this`, `arguments`和形參到作用域的變量空間
     * 3、檢查return關(guān)鍵字
     * 4、定義函數(shù)名和長度
     */
    const fn = function () {
      const scope = nodeIterator.createScope('function')
      scope.constDeclare('this', this)
      scope.constDeclare('arguments', arguments)

      node.params.forEach((param, index) => {
        const name = param.name
        scope.varDeclare(name, arguments[index])
      })

      const signal = nodeIterator.traverse(node.body, { scope })
      if (Signal.isReturn(signal)) {
        return signal.value
      }
    }

    Object.defineProperties(fn, {
      name: { value: node.id ? node.id.name : '' },
      length: { value: node.params.length }
    })

    return fn
  }

9、this表達(dá)式處理器—— `ThisExpression()`

該處理器直接使用JS語言自身的特性，把 this關(guān)鍵字從作用域中取出即可。

ThisExpression (nodeIterator) {
const value = nodeIterator.scope.get('this')
return value ? value.value : null
}

10、new表達(dá)式處理器—— `NewExpression()`

和 this表達(dá)式類似，也是直接沿用JS的語言特性，獲取函數(shù)和參數(shù)之后，通過 bind關(guān)鍵字生成一個構(gòu)造函數(shù)，并返回。

  NewExpression (nodeIterator) {
    const func = nodeIterator.traverse(nodeIterator.node.callee)
    const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))
    return new (func.bind(null, ...args))
  }

11、For循環(huán)節(jié)點處理器—— `ForStatement()`

For循環(huán)的三個參數(shù)對應(yīng)著節(jié)點的 init， test， update屬性，對著三個屬性分別調(diào)用節(jié)點處理器處理，并放回JS原生的for循環(huán)當(dāng)中即可。

ForStatement (nodeIterator) {
const node = nodeIterator.node
let scope = nodeIterator.scope
if (node.init && node.init.type === 'VariableDeclaration' && node.init.kind !== 'var') {
scope = nodeIterator.createScope('block')
}
for (
node.init && nodeIterator.traverse(node.init, { scope });
node.test ? nodeIterator.traverse(node.test, { scope }) : true;
node.update && nodeIterator.traverse(node.update, { scope })
) {
const signal = nodeIterator.traverse(node.body, { scope })
if (Signal.isBreak(signal)) {
break
} else if (Signal.isContinue(signal)) {
continue
} else if (Signal.isReturn(signal)) {
return signal
}
}
}

同理， for...in， while和 do...while循環(huán)也是類似的處理方式，這里不再贅述。

12、If聲明節(jié)點處理器—— `IfStatemtnt()`

處理If語句，包括 if， if...else， if...elseif...else。

  IfStatement (nodeIterator) {
    if (nodeIterator.traverse(nodeIterator.node.test)) {
      return nodeIterator.traverse(nodeIterator.node.consequent)
    } else if (nodeIterator.node.alternate) {
      return nodeIterator.traverse(nodeIterator.node.alternate)
    }
  }

同理， switch語句、三目表達(dá)式也是類似的處理方式。

上面列出了幾個比較重要的節(jié)點處理器，在es5當(dāng)中還有很多節(jié)點需要處理，詳細(xì)內(nèi)容可以訪 https://github.com/jrainlau/canjs/blob/master/src/es_versions/es5.js 一探究竟。

七、定義調(diào)用方式

經(jīng)過了上面的所有步驟，解析器已經(jīng)具備處理es5代碼的能力，接下來就是對這些散裝的內(nèi)容進(jìn)行組裝，最終定義一個方便用戶調(diào)用的辦法。

const { Parser } = require('acorn')
const NodeIterator = require('./iterator')
const Scope = require('./scope')
class Canjs {
constructor (code = '', extraDeclaration = {}) {
this.code = code
this.extraDeclaration = extraDeclaration
this.ast = Parser.parse(code)
this.nodeIterator = null
this.init()
}
init () {
// 定義全局作用域，該作用域類型為函數(shù)作用域
const globalScope = new Scope('function')
// 根據(jù)入?yún)⒍x標(biāo)準(zhǔn)庫之外的全局變量
Object.keys(this.extraDeclaration).forEach((key) => {
globalScope.addDeclaration(key, this.extraDeclaration[key])
})
this.nodeIterator = new NodeIterator(null, globalScope)
}
run () {
return this.nodeIterator.traverse(this.ast)
}
}

這里我們定義了一個名為 Canjs的基類，接受字符串形式的JS代碼，同時可定義標(biāo)準(zhǔn)庫之外的變量。當(dāng)運行 run()方法的時候就可以得到運行結(jié)果。

八、后續(xù)

至此，整個JS解析器已經(jīng)完成，可以很好地運行ES5的代碼（可能還有bug沒有發(fā)現(xiàn)）。但是在當(dāng)前的實現(xiàn)中，所有的運行結(jié)果都是放在一個類似沙盒的地方，無法對外界產(chǎn)生影響。如果要把運行結(jié)果取出來，可能的辦法有兩種。第一種是傳入一個全局的變量，把影響作用在這個全局變量當(dāng)中，借助它把結(jié)果帶出來；另外一種則是讓解析器支持 export語法，能夠把 export語句聲明的結(jié)果返回，感興趣的讀者可以自行研究。

最后，這個JS解析器已經(jīng)在我的Github上開源，歡迎前來交流：https://github.com/jrainlau/canjs