Tutorial II: Handling Type System (part 1):
=============================================

Parsing files are useful, but we quickly need to do some type checking on our input to do some advanced DSL.
To handle this problem, the package ``pyrser.type_system`` provide what we need.

1- Type semantics
-----------------

We provide some classes to describe types.

:class:`pyrser.type_system.Symbol`: A Symbol represents a thing in our
language.

:class:`pyrser.type_system.Signature`: A Signature is an abstract type common
to ``Val``, ``Var`` and ``Fun``. It is the common denominator of the typing
system and provides the capability to get a string representation of a symbol.

:class:`pyrser.type_system.Val`: A Val represents a litteral value in our
language.

:class:`pyrser.type_system.Var`: A Var represents a named variable in our
language.

:class:`pyrser.type_system.Fun`: A Val represents a named function in our
language.

:class:`pyrser.type_system.Scope`: A Scope represents a scope or a type (ADT
or Abstract Data Type).

    We could notice that a Scope could be of three kind:

    FREE: This is a standalone scope.

    LINKED: This scope is connected to a parent scope. So type resolution is forwarded to parent if it failed.

    EMBEDDED: This scope is a subscope of a parent scope. An embedded scope is seen as an extension of the parent scope.
    So when we iterate thru symbols in this scope we reached also symbols present in the parent scope.
    This is useful in certain case but problematic for other (typically :py:func:`pyrser.type_system.Scope.get_by_params`).

Basically we could use the package like this:

.. include:: tutorial2_scripts/minimal_scope.py
    :code: python
    :end-line: 10

And produce the following output:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/minimal_scope.py' 0,7

We're actually generating the signatures of one variable and three functions and
add them to an unnamed :py:class:`pyrser.type_system.Scope`, thus creating an
anonymous scope that could be our language's global scope. This is the reason
why we instantiate the :class:`pyrser.type_system.Scope` object using the
keyword ``sig`` (also second positionnal argument): by not giving a first
parameter which is an identifier naming the scope, we anonymize it.  If we
wanted to name it, we could have created it as follows:

.. include:: tutorial2_scripts/minimal_scope.py
    :code: python
    :start-line: 10
    :end-line: 12

Or, after creating the object, we can attribute the proper name:

.. include:: tutorial2_scripts/minimal_scope.py
    :code: python
    :start-line: 12
    :end-line: 15

that would produce the output:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/minimal_scope.py' 8,15

Now our functions and vars are automatically decorated to be part of the
namespace. We could inspect the internal names used by our symbols:

.. include:: tutorial2_scripts/minimal_scope.py
    :code: python
    :start-line: 16
    :end-line: 17

We get all internal names of our signatures:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/minimal_scope.py' 16,17

2- Type operations
------------------

With the previous classes, we got the basic abstraction to implement a name-based type system with functions/variables overloads.

In fact, the class :class:`pyrser.type_system.Scope` provides what we need for basic type operations.

Let's take a classical scope with few overloads of a function ``f``:

.. include:: tutorial2_scripts/type_operations.py
    :code: python
    :start-line: 2
    :end-line: 4

Then add some locals variables (with possible overloads):

.. include:: tutorial2_scripts/type_operations.py
    :code: python
    :start-line: 5
    :end-line: 17

We get this setting:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/type_operations.py' 0,16

We could easily infer what is the type of f,a,b,c in the sentence
``f(a, b, c)``.  In order to do this, we must first retrieve all the possible
signatures for each parameter.  Then, we need to retrieve all possible
signatures for the given function and filter them with the set of signatures
for each parameter, leaving only the plausible overloads for us to check.

Since ``a`` is already at hands (a literal value should always be represented
by a scope containing all the possible type overloads), we first need to get
all possible signature for ``b``:

.. include:: tutorial2_scripts/type_operations.py
    :code: python
    :start-line: 18
    :end-line: 20

We get:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/type_operations.py' 17,27

As you may have understood,
:py:meth:`pyrser.type_system.Scope.get_by_symbol_name` returns a sub-set of
the Scope instance itself. Thus, we get another Scope, on which we can operate
further.

We do the same for ``c``. After that, we choose only functions called f, with
these sets of parameters:

.. include:: tutorial2_scripts/type_operations.py
    :code: python
    :start-line: 22
    :end-line: 24

And we only got:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/type_operations.py' 29,36

As we can see, some types (``int`` and ``double``) are resolved to a
:py:class:`pyrser.type_system.Type` , while ``char`` is left unresolved. This
is because no declaration exists for the type ``char`` within our scope.
Indeed, the type system tried to retrieve the types associated to the different
parameters of a resolved function.

On another note, :py:meth:`pyrser.type_system.Scope.get_by_symbol_name`
also returns the Scope containing the different sets of parameters that must be
used for each overload:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/type_operations.py' 38,43

Here, we got a unique overload so the type checking resolved the types to the
proper function.

3 - Type mangling
-----------------

Now that we know how to look for a signature within a scope, we may want to
have a bit more control about how the unique identifiers are generated for the
signatures. Indeed, the whole typing system is based on a few classes which
provide the unique identifiers. Modifying how those identifiers are generated
can allow us to enable or disable function overload for a toy language, for
instance.

Remember, in the first section of this tutorial, we had the following code:

.. include:: tutorial2_scripts/minimal_scope.py
    :code: python
    :end-line: 10

which displayed the signatures as a list:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/minimal_scope.py' 16,17

It is actually the Symbol class that controls how those unique signature
identifiers are generated. The :py:class:`pyrser.type_system.Symbol` class
actually looks like this:

.. literalinclude:: ../../pyrser/type_system/symbol.py
    :pyobject: Symbol.show_name

.. literalinclude:: ../../pyrser/type_system/symbol.py
    :pyobject: Symbol.internal_name

And the implementation of the :py:class:`pyrser.type_system.Fun` class is the
following:

.. literalinclude:: ../../pyrser/type_system/fun.py
    :pyobject: Fun.internal_name

If we follow properly how the ``internal_name`` method of the
:class:`pyrser.type_system.Fun` class works, we can see that the higher level
class (:class:`pyrser.type_system.Fun` in our case) can use internally it's
parent class's ``internal_name`` method. That part is actually up to the
implementor, as it could also define a wholly different mangling method.

In reality, three classes express the different typing concepts that enter into
account when trying to generate unique signature identifiers. Those are the
concepts of Value, Variable and Function, which classes are respectively the
classes Val, Var and Fun. So in order to re-define the mangling for your own
language, you may need to redefine up to four classes:
:class:`pyrser.type_system.Symbol`, :class:`pyrser.type_system.Val`,
:class:`pyrser.type_system.Var` and :class:`pyrser.type_system.Fun`.

Now, let us try to define a mangling fit for a language that would not
support any overloading for a given symbol, meaning that a variable could not
have the same name as a function:

.. include:: tutorial2_scripts/type_mangling.py
    :code: python
    :end-line: 37

Note that ``MyVar`` only re-uses ``MySymbol``'s ``show_name`` and
``internal_name`` methods. Thus, we can see that using the ``show_name`` (used
mostly when printing out an object for display purposes), we can differentiate
``MyFun`` from ``MyVar``, even though the ``internal_name`` is the same for
both classes. So now, the unique identifier being the same, the typing system
won't allow having more than one unique name registered, and thus prevents us
from registering both a variable and a function having the same namespaces and
name.

We can try out the following piece of code:

.. include:: tutorial2_scripts/type_mangling.py
    :code: python
    :start-line: 38

Which yields the following output, where we can see that the mangling was
handled by our code:

.. program-output:: python3 tutorial2_scripts/type_mangling.py


4 - Type resolution and disambiguation
--------------------------------------

In most languages, the typing system can encounter situations where the type is
not as obvious as a one on one match. Indeed, a lot of languages have to
resolve (either following a standard resolution model or yielding an error)
situations where multiples signatures match the one we are looking for. As we
just saw, since we can redefine the unique internal identifier generation for
the typing system's classes, depending on the method used, we could fall more
or less easily in one of those situations.

For instance, let's assume that our mangling supports function overloads, like
the C++ language does. Then, let's assume the following symbols to have been
declared in a fictive language that we're trying to type-check:

.. include:: tutorial2_scripts/type_disambiguation.py
    :code: python
    :end-line: 12

As a pre-requisite, let us assume that a number litteral in our language can be
typed in multiple ways. For instance, a number litteral can be typed as a
character, an integer, or as a big number. Then, when some parsed code will
contain a litteral, the following set of
:class:`pyrser.type_system.Val` will be built:

.. include:: tutorial2_scripts/type_disambiguation.py
    :code: python
    :start-line: 12
    :end-line: 17

Now, we have a user input, where the written code is a function call to
``fun``, with a number litteral as a parameter, that we could translate to the
following typing code:

.. include:: tutorial2_scripts/type_disambiguation.py
    :code: python
    :start-line: 17
    :end-line: 20

Now, we display the following scope:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/type_disambiguation.py' 0,24

Here, since the overloads list contains more than one item, it may be easily
resolvable by using the get_by_params (returning a tuple of a scope and a list
of scopes) on the overloads :class:`pyrser.type_system.Scope`:

.. include:: tutorial2_scripts/type_disambiguation.py
    :code: python
    :start-line: 20
    :end-line: 23

Indeed, :py:meth:`pyrser.type_system.Scope.get_by_params` takes care of
matching the available signatures with the multiple sets for each parameter.
Now, if the ``fun`` scope contains more than one signature, it means that we
have an unresolved type.  That could mean a lot of differents things, but for
now, let's try to reduce this choice.

If we only got one signature in the resulting ``fun`` scope, then the typing
system would have validated the types of the input, and we could go on and fiddle
with the generation. Let us see what an unresolved func and param look like:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/type_disambiguation.py' 27,41

In this case, we can see that using the literal as parameter was not enough to
resolve the type of the function we want to use, but we can see a difference
between the two: the return type. So we can filter once again over the return
type:

.. include:: tutorial2_scripts/type_disambiguation.py
    :code: python
    :start-line: 24

and we then get the following output:

.. program-output:: python3 splice.py 'python3 tutorial2_scripts/type_disambiguation.py' 42,47

As we can see, by using this last filter, we could identify an unique function
signature matching our user input. Alas, in some cases, it's not as easy.
Indeed, in some languages you might have polymorphic types, that the Scope
class cannot resolve itself. It requires the help of another typing module: the
:class:`pyrser.type_system.inference.Inference`. Sometimes, even the inference
module cannot resolve something, and then we fall in the case of an error, that
will be up to us to notify to the user.

See ``Tuto III`` to dive deeper into the usage of pyrser, and see how to add
mechanisms for the typing system to have a more powerful resolver, adding Type
coercion, and using the Inference module.