02.go版本wasm解析器分析

Edit me

[TOC]

解析部分

第一步是打开我们传入的文件：

f, err := os.Open(flag.Arg(0))

然后调用下面函数来解析文件：

m, err := wasm.ReadModule(f, importer)

参数importer是用来解决传入的wasm文件导入函数问题的。 ReadModule函数负责将文件进行解析：

func ReadModule(r io.Reader, resolvePath ResolveFunc) (*Module, error) {
    magic, err := readU32(reader)
    if err != nil {
        return nil, err
    }
    if magic != Magic {
        return nil, ErrInvalidMagic
    }
    if m.Version, err = readU32(reader); err != nil {
        return nil, err
    }
    for {
        done, err := m.readSection(reader)
        if err != nil {
            return nil, err
        } else if done {
            break
        }
    }
    m.LinearMemoryIndexSpace = make([][]byte, 1)
    if m.Table != nil {
        m.TableIndexSpace = make([][]uint32, int(len(m.Table.Entries)))
    }
    if m.Import != nil && resolvePath != nil {
        err := m.resolveImports(resolvePath)
        if err != nil {
            return nil, err
        }
    }

首先判断魔数和版本号，这个是固定的：

const (
    Magic uint32 = 0x6d736100
    Version uint32 = 0x1
)

然后解析各个分区：

Type* — Function signature declarations
Import — Import declarations
Function* — Function declarations
Table — Indirect function table and other tables
Memory — Memory attributes
Global — Global declarations
Export — Exports
Start — Start function declaration
Element — Elements section
Code* — Function bodies
Data — Data segments

关于文件格式分区可参考链接： https://rsms.me/wasm-intro 解析完成后有两个变量需要格外处理，一个是表格，一个是引用，可以参考链接： https://developer.mozilla.org/zh-CN/docs/WebAssembly/Understanding_the_text_format 表格是一个存储函数引用的替代，解决动态操作的问题，而引用则是使用其他wasm的export部分，也可以是虚拟机内部实现的部分。 m.resolveImports调用的就是main函数中的importer方法，它从当前文件中重新打开相应的wasm文件并使用。我们可以做个测试。参见下面的引用导入测试。这样我们就可以在检测到env时手动解决导入问题，从而实现区块链的API。

执行部分

执行过程则是根据export出来的部分进行顺序执行：

    for name, e := range m.Export.Entries {
        i := int64(e.Index)
        fidx := m.Function.Types[int(i)]
        ftype := m.Types.Entries[int(fidx)]
        switch len(ftype.ReturnTypes) {
        case 1:
            fmt.Printf("%s() %s => ", name, ftype.ReturnTypes[0])
        case 0:
            fmt.Printf("%s() => ", name)
        default:
            log.Printf("running exported functions with more than one return value is not supported")
            continue
        }
        if len(ftype.ParamTypes) > 0 {
            log.Printf("running exported functions with input parameters is not supported")
            continue
        }
        o, err := vm.ExecCode(i)
        if err != nil {
            fmt.Printf("\n")
            log.Printf("err=%v", err)
        }
        if len(ftype.ReturnTypes) == 0 {
            fmt.Printf("\n")
            continue
        }
        fmt.Printf("%[1]v (%[1]T)\n", o)
    }

具体执行过程则是线性堆栈的入栈和出栈操作，我们需要关心的是如何实现自己的API。本体是通过实现了一个自己的memory包来对堆栈进行处理的。

那么函数是怎么一步一步被调用的呢，首先是上面的解析过程，解析时会得到一个module的Type，Code和Data几个部分，然后会调用populateFunctions将函数添加到FunctionIndexSpace列表中：

    for codeIndex, typeIndex := range m.Function.Types {
        if int(typeIndex) >= len(m.Types.Entries) {
            return InvalidFunctionIndexError(typeIndex)
        }
        fn := Function{
            Sig: &m.Types.Entries[typeIndex],
            Body: &m.Code.Bodies[codeIndex],
        }
        m.FunctionIndexSpace = append(m.FunctionIndexSpace, fn)
    }

然后调用NewVM生成虚拟机对象，同时生成函数列表：

vm.newFuncTable()
    for i, fn := range module.FunctionIndexSpace {
        if fn.IsHost() {
            vm.funcs[i] = goFunction{
                typ: fn.Host.Type(),
                val: fn.Host,
            }
            nNatives++
            continue
        }
        totalLocalVars := 0
        totalLocalVars += len(fn.Sig.ParamTypes)
        for _, entry := range fn.Body.Locals {
            totalLocalVars += int(entry.Count)
        }
        code, table := compile.Compile(disassembly.Code)
        vm.funcs[i] = compiledFunction{
            code: code,
            branchTables: table,
            maxDepth: disassembly.MaxDepth,
            totalLocalVars: totalLocalVars,
            args: len(fn.Sig.ParamTypes),
            returns: len(fn.Sig.ReturnTypes) != 0,
        }

goFunction带了一个call的方法，在执行函数时就是调用这个call方法，对虚拟机堆栈进行一些处理。

引用导入测试

首先编写一个wasm文件，并引用其他的方法：

int add()
{
  return 1;
}

它的wat文件为：

(module
 (table 0 anyfunc)
 (memory $0 1)
 (export "memory" (memory $0))
 (export "add" (func $add))
 (func $add (; 0 ;) (result i32)
  (i32.const 1)
 )
)

编写一个add函数，并编译成env.wasm，然后编写main并编译成main.wasm：

int main()
{
  return add();
}

它的wat文件为：

(module
 (type $FUNCSIG$i (func (result i32)))
 (import "env" "add" (func $add (result i32)))
 (table 0 anyfunc)
 (memory $0 1)
 (export "memory" (memory $0))
 (export "main" (func $main))
 (func $main (; 1 ;) (result i32)
  (call $add)
 )
)

可以看到，这里import了env的add方法，wasm使用两级命名空间，这里表示要从env模块导入add方法，因此需要在env.wasm中实现add方法，下面执行调用：

pct@Chandler:~/Downloads$ wasm-run -v main.wasm 
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 5
section.go:142: section type
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 11
section.go:149: section import
section.go:310: importing function
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 2
section.go:156: section function
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 4
section.go:163: section table
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 3
section.go:170: section memory
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 1
section.go:177: section global
section.go:441: 0 global entries
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 17
section.go:184: section export
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 10
section.go:205: section code
section.go:627: 1 function bodies
section.go:630: Reading function 0
section.go:688: bodySize: 4, localCount: 0
section.go:691: Read 3 bytes for function body
section.go:222: <nil>
section.go:96: Reading section ID
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 5
section.go:142: section type
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 2
section.go:156: section function
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 4
section.go:163: section table
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 3
section.go:170: section memory
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 1
section.go:177: section global
section.go:441: 0 global entries
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 16
section.go:184: section export
section.go:222: <nil>
section.go:96: Reading section ID
section.go:105: Reading payload length
section.go:125: Section payload length: 10
section.go:205: section code
section.go:627: 1 function bodies
section.go:630: Reading function 0
section.go:688: bodySize: 4, localCount: 0
section.go:691: Read 3 bytes for function body
section.go:222: <nil>
section.go:96: Reading section ID
index.go:77: There are 0 entries in the global index spaces.
module.go:142: There are 1 entries in the function index space.
index.go:77: There are 0 entries in the global index spaces.
module.go:142: There are 2 entries in the function index space.
memory() i32 => 1 (uint32)
main() i32 => 1 (uint32)

Tags: