From 14acb80adcad2500d2acd4465c6531358b171ec2 Mon Sep 17 00:00:00 2001 From: "Steven S. Lyubomirsky" <slyubomirsky@gmail.com> Date: Mon, 24 Dec 2018 14:15:32 -0500 Subject: [PATCH] [Relay][docs] Details on comp. graphs in Relay dev intro (#2324) --- docs/dev/relay_intro.rst | 378 ++++++++++++++++++++------------------- 1 file changed, 190 insertions(+), 188 deletions(-) diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst index dde900a50..2462d0d3e 100644 --- a/docs/dev/relay_intro.rst +++ b/docs/dev/relay_intro.rst @@ -1,188 +1,190 @@ -Introduction to Relay IR -======================== -This article introduces Relay IR -- the second generation of NNVM. -We expect readers from two kinds of background -- those who have a programming language background and deep learning -framework developers who are familiar with the computational graph representation. - -We briefly summarize the design goal here, and will touch upon these points in the later part of the article. - -- Support traditional data flow style programming and transformations. -- Support functional style scoping, let-binding and making it fully featured differentiable language. -- Being able to allow the user to mix the two programming styles. - -Build Computational Graph with Relay ------------------------------------- -Traditional deep learning frameworks use computational graphs as their intermediate representation. -A computational graph (or data-flow graph), is a directed acyclic graph (DAG) that represents the computation. - -.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png - :align: center - :scale: 70% - - -You can use Relay to build a computational(dataflow) graph. Specifically, the above code shows how to -construct a simple two-node graph. You can find that the syntax of the example is not that different from existing -computational graph IR like NNVMv1, with the only difference in terms of terminology: - -- Existing frameworks usually use graph and subgraph -- Relay uses function e.g. -- ``fn (%x)``, to indicate the graph - -Each data-flow node is a CallNode in Relay. The relay python DSL allows you to construct a data-flow quickly. -One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with -both input point to ``%1``. When a deep learning framework evaluates the above program, it will compute -the nodes in topological order, and ``%1`` will only be computed once. -While this fact is very natural to deep learning framework builders, it is something that might -surprise a PL folk in the first place. If we implement a simple visitor to print out the result and -treat the result as nested Call expression, it becomes ``log(%x) + log(%x)``. - -Such ambiguity is caused by different interpretation of program semantics when there is a shared node in the DAG. -In a normal functional programming IR, nested expressions are treated as expression trees, without considering the -fact that the ``%1`` is actually reused twice in ``%2``. - -Relay IR choose to be mindful of this difference. Usually, deep learning framework users build the computational -graph in this fashion, where a DAG node reuse often occur. As a result, when we print out the Relay program in -the text format, we print one CallNode per line and assign a temporary id ``(%1, %2)`` to each CallNode so each common -node can be referenced in later parts of the program. - -Module: Support Multiple Functions(Graphs) ------------------------------------------- -So far we have introduced how can we build a data flow graph as a function. One might naturally ask -- can we support multiple -functions and enable them to call each other. Relay allows grouping multiple functions together in a module, the code below -shows an example of a function calling another function. - -.. code:: - - def @muladd(%x, %y, %z) { - %1 = mul(%x, %y) - %2 = add(%1, %z) - %2 - } - def @myfunc(%x) { - %1 = @muladd(%x, 1, 2) - %2 = @muladd(%1, 2, 3) - %2 - } - -The Module can be viewed as a ``Map<GlobalVar, Function>``. Here GlobalVar is just an id that is used to represent the functions -in the module. ``@muladd`` and ``@myfunc`` are GlobalVars in the above example. When a CallNode is used to call another function, -the corresponding GlobalVar is stored in the op field of the CallNode. It contains a level of indirection -- we need to look up -body of the called function from the module using the corresponding GlobalVar. In this particular case, we could also directly -store the reference to the Function as op in the CallNode. So, why do we need to introduce GlobalVar? The main reason is that -GlobalVar decouples the definition/declaration and enables recursion and delayed declaration of the function. - -.. code :: - - @def myfunc(%x) { - %1 = equal(%x, 1) - if (%1) { - %x - } else { - %2 = sub(%x, 1) - %3 = @myfunc(%2) - %4 = add(%3, %3) - %4 - } - } - -In the above example, ``@myfunc`` recursively calls itself. Using GlobalVar ``@myfunc`` to represent the function avoids -the cyclic dependency in the data structure. -At this point, we have introduced the basic concepts in Relay. Notably, Relay has the following improvements over NNVMv1: - -- Succinct text format that eases debugging of writing passes. -- First-class support for subgraphs-functions, in a joint module, this enables further chance of joint optimizations such as inlining and calling convention specification. -- Naive front-end language interop, for example, all the data structure can be visited in python, which allows quick prototyping of optimizations in python and mixing them with c++ code. - - -Let Binding and Scopes ----------------------- - -So far, we have introduced how to build a computational graph in the good old way used in deep learning frameworks. -This section will talk about a new important construct introduced by Relay -- let bindings. - -Let binding is used in every high-level programming languages. In Relay, it is a data structure with three -fields ``Let(var, value, body)``. When we evaluate a let expression, we first evaluate the value part, assign -it to the var, then return the evaluated result in the body expression. - -You can use a sequence of let bindings to construct a logically equivalent program to a data-flow program. -The code example below shows one program with two forms side by side. - -.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png - :align: center - :scale: 70% - - -The nested let-binding is called A-normal form, and it is commonly used as IRs in functional programming languages. -Now, please take a close look at the AST structure. While the two programs are semantically identical -(so are their textual representations, except that A-normal form has let prefix), their AST structures are different from each other. - -Since program optimizations take these AST data structures and transform them, the two different structure will -affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``: - -- In the data-flow form, we can first access the add node, then directly look at its first argument to see if it is a log -- In the A-normal form, we cannot directly do the check anymore, because the first input to add is ``%v1`` -- we will need to keep a map from variable to its bound values and lookup that map, in order to know that ``%v1`` is a log. - -Different data structures will impact how you might write transformations, and we need to keep that in mind. -So now, as a deep learning framework developer, you might ask, why do we need let-binding. -Your PL friends will always tell you that let is important -- as PL is a quite established field, -there must be some wisdom behind that. - - -Why We Might Need Let Binding ------------------------------ -One key usage of let binding is that it specifies the scope of computation. Let us take look at the following example, -which does not use let binding. - -.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png - :align: center - :scale: 70% - -The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems -to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as shown in the picture) does not suggest so. -Actually, a dataflow graph never defines its scope of the evaluation. This introduces some ambiguity in the semantics. - -This ambiguity becomes more interesting when we have closures. Consider the following program, which returns a closure. -We don’t know where should we compute ``%1``. It can either be outside the closure, or inside the closure. - -.. code:: - - fn (%x) { - %1 = log(%x) - %2 = fn(%y) { - add(%y, %1) - } - %2 - } - -Let binding solves this problem, as the computation of the value happens at the let node. In both programs, -if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specify the computation location to -be outside of the if scope and closure. As you can see let-binding gives a more precise specification of the computation site -and could be useful when we generate backend code(as such specification is in the IR). - -On the other hand, the data-flow form, which does not specify the scope of computation, does have its own advantages --- we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom -to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow -form of the program in the initial phases of optimizations when you find it is convenient. -Many optimizations in Relay today are written to optimize dataflow programs. - -However, when we lower the IR to actual runtime program, we need to be precise about the scope of computation. -In particular, we want to explicitly specify where the scope of computation should happen when we are using -sub-functions and closures. Let-binding can be used to solve this problem in later stage execution specific optimizations. - - -Implication on IR Transformations ---------------------------------- - -Hopefully, by now you are familiar with the two kinds of representations. -Most functional programming languages do their analysis in A-normal form, -where the analyzer does not need to be mindful that the expressions are DAGs. - -Relay choose to support both the data-flow form and let binding. We believe that it is important to let the -framework developer choose the representation they are familiar with. -This does, however, have some implications on how we write passes: - -- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expr -> transformed expression anyway. Note that this will effectively remove all the let in the program. -- If you come from a PL background and like A-normal form, we will provide a dataflow -> A-normal form pass. -- For PL folks, when you are implementing something (like dataflow->ANF transformation), be mindful that the expression can be DAG, and this usually means that we should visit expressions with a ``Map<Expr, Result>`` and only compute the transformed result once, so the result expression keeps the common structure. - -There are additional advanced concepts such as symbolic shape inference, polymorphic functions -that are not covered by this material, you are more than welcomed to look at other materials. +Introduction to Relay IR +======================== +This article introduces Relay IR -- the second generation of NNVM. +We expect readers from two kinds of background -- those who have a programming language background and deep learning +framework developers who are familiar with the computational graph representation. + +We briefly summarize the design goal here, and will touch upon these points in the later part of the article. + +- Support traditional data flow-style programming and transformations. +- Support functional-style scoping, let-binding and making it a fully featured differentiable language. +- Being able to allow the user to mix the two programming styles. + +Build a Computational Graph with Relay +-------------------------------------- +Traditional deep learning frameworks use computational graphs as their intermediate representation. +A computational graph (or dataflow graph), is a directed acyclic graph (DAG) that represents the computation. +Though dataflow graphs are limited in terms of the computations they are capable of expressing due to +lacking control flow, their simplicity makes it easier to implement automatic differentiation and +compile for heterogeneous execution environments (e.g., executing parts of the graph on specialized hardware). + +.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png + :align: center + :scale: 70% + + +You can use Relay to build a computational (dataflow) graph. Specifically, the above code shows how to +construct a simple two-node graph. You can find that the syntax of the example is not that different from existing +computational graph IR like NNVMv1, with the only difference in terms of terminology: + +- Existing frameworks usually use graph and subgraph +- Relay uses function e.g. -- ``fn (%x)``, to indicate the graph + +Each dataflow node is a CallNode in Relay. The Relay Python DSL allows you to construct a dataflow graph quickly. +One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with +both input point to ``%1``. When a deep learning framework evaluates the above program, it will compute +the nodes in topological order, and ``%1`` will only be computed once. +While this fact is very natural to deep learning framework builders, it is something that might +surprise a PL researcher in the first place. If we implement a simple visitor to print out the result and +treat the result as nested Call expression, it becomes ``log(%x) + log(%x)``. + +Such ambiguity is caused by different interpretations of program semantics when there is a shared node in the DAG. +In a normal functional programming IR, nested expressions are treated as expression trees, without considering the +fact that the ``%1`` is actually reused twice in ``%2``. + +The Relay IR is mindful of this difference. Usually, deep learning framework users build the computational +graph in this fashion, where a DAG node reuse often occurs. As a result, when we print out the Relay program in +the text format, we print one CallNode per line and assign a temporary id ``(%1, %2)`` to each CallNode so each common +node can be referenced in later parts of the program. + +Module: Support Multiple Functions (Graphs) +------------------------------------------- +So far we have introduced how can we build a dataflow graph as a function. One might naturally ask: Can we support multiple +functions and enable them to call each other? Relay allows grouping multiple functions together in a module; the code below +shows an example of a function calling another function. + +.. code:: + + def @muladd(%x, %y, %z) { + %1 = mul(%x, %y) + %2 = add(%1, %z) + %2 + } + def @myfunc(%x) { + %1 = @muladd(%x, 1, 2) + %2 = @muladd(%1, 2, 3) + %2 + } + +The Module can be viewed as a ``Map<GlobalVar, Function>``. Here GlobalVar is just an id that is used to represent the functions +in the module. ``@muladd`` and ``@myfunc`` are GlobalVars in the above example. When a CallNode is used to call another function, +the corresponding GlobalVar is stored in the op field of the CallNode. It contains a level of indirection -- we need to look up +body of the called function from the module using the corresponding GlobalVar. In this particular case, we could also directly +store the reference to the Function as op in the CallNode. So, why do we need to introduce GlobalVar? The main reason is that +GlobalVar decouples the definition/declaration and enables recursion and delayed declaration of the function. + +.. code :: + + @def myfunc(%x) { + %1 = equal(%x, 1) + if (%1) { + %x + } else { + %2 = sub(%x, 1) + %3 = @myfunc(%2) + %4 = add(%3, %3) + %4 + } + } + +In the above example, ``@myfunc`` recursively calls itself. Using GlobalVar ``@myfunc`` to represent the function avoids +the cyclic dependency in the data structure. +At this point, we have introduced the basic concepts in Relay. Notably, Relay has the following improvements over NNVMv1: + +- Succinct text format that eases debugging of writing passes. +- First-class support for subgraphs-functions, in a joint module, this enables further chance of joint optimizations such as inlining and calling convention specification. +- Naive front-end language interop, for example, all the data structure can be visited in Python, which allows quick prototyping of optimizations in Python and mixing them with C++ code. + + +Let Binding and Scopes +---------------------- + +So far, we have introduced how to build a computational graph in the good old way used in deep learning frameworks. +This section will talk about a new important construct introduced by Relay -- let bindings. + +Let binding is used in every high-level programming language. In Relay, it is a data structure with three +fields ``Let(var, value, body)``. When we evaluate a let expression, we first evaluate the value part, assign +it to the var, then return the evaluated result in the body expression. + +You can use a sequence of let bindings to construct a logically equivalent program to a dataflow program. +The code example below shows one program with two forms side by side. + +.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png + :align: center + :scale: 70% + + +The nested let binding is called A-normal form, and it is commonly used as IRs in functional programming languages. +Now, please take a close look at the AST structure. While the two programs are semantically identical +(so are their textual representations, except that A-normal form has let prefix), their AST structures are different. + +Since program optimizations take these AST data structures and transform them, the two different structures will +affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``: + +- In the data-flow form, we can first access the add node, then directly look at its first argument to see if it is a log +- In the A-normal form, we cannot directly do the check anymore, because the first input to add is ``%v1`` -- we will need to keep a map from variable to its bound values and look up that map, in order to know that ``%v1`` is a log. + +Different data structures will impact how you might write transformations, and we need to keep that in mind. +So now, as a deep learning framework developer, you might ask, Why do we need let bindings? +Your PL friends will always tell you that let is important -- as PL is a quite established field, +there must be some wisdom behind that. + +Why We Might Need Let Binding +----------------------------- +One key usage of let binding is that it specifies the scope of computation. Let us take a look at the following example, +which does not use let bindings. + +.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png + :align: center + :scale: 70% + +The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems +to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as shown in the picture) does not suggest so. +Actually, a dataflow graph never defines its scope of the evaluation. This introduces some ambiguity in the semantics. + +This ambiguity becomes more interesting when we have closures. Consider the following program, which returns a closure. +We don’t know where should we compute ``%1``; it can be either inside or outside the closure. + +.. code:: + + fn (%x) { + %1 = log(%x) + %2 = fn(%y) { + add(%y, %1) + } + %2 + } + +A let binding solves this problem, as the computation of the value happens at the let node. In both programs, +if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specify the computation location to +be outside of the if scope and closure. As you can see let-binding gives a more precise specification of the computation site +and could be useful when we generate backend code (as such specification is in the IR). + +On the other hand, the dataflow form, which does not specify the scope of computation, does have its own advantages +-- namely, we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom +to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow +form of the program in the initial phases of optimizations when you find it is convenient. +Many optimizations in Relay today are written to optimize dataflow programs. + +However, when we lower the IR to an actual runtime program, we need to be precise about the scope of computation. +In particular, we want to explicitly specify where the scope of computation should happen when we are using +sub-functions and closures. Let-binding can be used to solve this problem in later stage execution specific optimizations. + + +Implication on IR Transformations +--------------------------------- + +Hopefully, by now you are familiar with the two kinds of representations. +Most functional programming languages do their analysis in A-normal form, +where the analyzer does not need to be mindful that the expressions are DAGs. + +Relay choose to support both the dataflow form and let bindings. We believe that it is important to let the +framework developer choose the representation they are familiar with. +This does, however, have some implications on how we write passes: + +- If you come from a dataflow background and want to handle lets, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expressions to transformed expressions anyway. Note that this will effectively remove all the lets in the program. +- If you come from a PL background and like A-normal form, we will provide a dataflow to A-normal form pass. +- For PL folks, when you are implementing something (like a dataflow-to-ANF transformation), be mindful that expressions can be DAGs, and this usually means that we should visit expressions with a ``Map<Expr, Result>`` and only compute the transformed result once, so the resulting expression keeps the common structure. + +There are additional advanced concepts such as symbolic shape inference, polymorphic functions +that are not covered by this material; you are more than welcome to look at other materials. -- GitLab