Emo.NET

In short, we've created a .NET compiler for emo and wrote it in C#. We haven't tested it thoroughly, but it seems to work with the Hello World program. Also, it's much easier to read this if you copy if out (the green text) and into a syntax highlighting editor.

The code for this consists of several pieces:
  1. Grammar
  2. Base/common classes
  3. A scanner
  4. A parser
  5. An emitter
  6. Put it all together
First and foremost, the software on this page is free.
EMO language .NET compiler
Copyright (C) 2010 Sean Fife

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see http://www.gnu.org/licenses/.

1. Grammar

The grammar for this language is fairly simple and is spelled out below:

<stmt>   :=   <eyes><nose><mouth>
                  | <eyes><nose><mouth><bottom>
                  | <top><eyes><nose><mouth>
                  | <stmt><whitespace><stmt>
                  | <stmt><stmt>

<eyes>   := : | ; | X | =

<nose>   := ^ | - | o | c | <nose><nose>

<mouth>  := ( | ) | { | } | <pipecharacter> | @ | P

<top>    := <

<bottom> := >

<whitespace> := space | tab | newline | linefeed
                    | <whitespace><whitespace>

<pipecharacter> := |

From the grammar, we can start to build up classes that will represent the language inside the compiler.

2. Base/common classes

These base classes, some derivatives, and enumerations all make up the definition of the language inside the compiler. The scanner will then use these to create a tree of the tokens in your source file, which in turn the parser uses to determine validity and passes onto the code generator.

It's important to note that the language grammar is defined recursively. This is true of most languages.

All of these classes make use of certain using statements, so don't get confused when you see these out and about in the code:
using Reflect = System.Reflection;
using Emit = System.Reflection.Emit;

using Collections = System.Collections.Generic;
using IO = System.IO;
using Text = System.Text;

So, here are the base/common classes.

public class BaseEmoticon : Stmt
{

    public bool IsEye(char ch)
    {
        foreach (char c in Eyes)
        {
            if (ch == c)
            {
                return true;
            }
        }
        return false;
    }
    public bool IsNose(char ch)
    {
        foreach (char c in Noses)
        {
            if (ch == c)
            {
                return true;
            }
        }
        return false;
    }
    public bool IsMouth(char ch)
    {
        foreach (char c in Mouths)
        {
            if (ch == c)
            {
                return true;
            }
        }
        return false;
    }
    public bool IsHat(char ch)
    {
        foreach (char c in Hats)
        {
            if (ch == c)
            {
                return true;
            }
        }
        return false;
    }
    public bool IsBeard(char ch)
    {
        foreach (char c in Beards)
        {
            if (ch == c)
            {
                return true;
            }
        }
        return false;
    }

}



/*
<stmt>   :=   <eyes><nose><mouth>
			| <eyes><nose><mouth><bottom>
			| <top><eyes><nose><mouth>
			| <stmt><stmt>
*/
public class Stmt
{
    public static char[] Eyes = { ':', ';', 'X', '=' };
    public static char[] Noses = { '^', '-', 'o', 'c' };
    public static char[] Mouths = { ')', '(', '|', '{', '}', 'P', '@' };
    public static char[] Hats = { '<' };
    public static char[] Beards = { '>' };
}

public class Emoticon : Stmt
{
    public Stmt Eye;
    public Stmt Nose;
    public Stmt Mouth;

    public override string ToString()
    {
        string ret = "";
        if (Eye != null) ret += Eye.ToString();
        if (Nose != null) ret += Nose.ToString();
        if (Mouth != null) ret += Mouth.ToString();

        return ret;
    }
}

public class Sequence : Stmt
{
    public Stmt First;
    public Stmt Second;
}

//<eyes>   := : | ; | X | =
public class Eye : Stmt
{
    public EyeOps EyeOp;

    public override string ToString()
    {
        return Eyes[(int)EyeOp].ToString();
    }
}

//<nose>   := ^ | - | o | c | <nose><nose>
public class Nose : Stmt
{
    public NoseOps NoseOp;
    public override string ToString()
    {
        return Noses[(int)NoseOp].ToString();
    }
}

public class NoseSequence : Stmt
{
    public Stmt First;
    public Stmt Second;
}

public class Mouth : Stmt
{
    public MouthOps MouthOp;
    public override string ToString()
    {
        return Mouths[(int)MouthOp].ToString();
    }
}

public class Loop : Stmt
{
    public Stmt Body;
}

public enum EyeOps
{
    Colon,
    SemiColon,
    X,
    Equals
}

public enum NoseOps
{
    Caret,
    Dash,
    LowerCaseO,
    LowerCaseC,
}

public enum MouthOps
{
    RightParen,
    LeftParen,
    Pipe,
    LeftCurlyBrace,
    RightCurlyBrace,
    CapitalP,
    At
}

public enum Tops
{
    LeftAngleBracket
}

public enum Bottoms
{
    RightAngleBracket
}

The base of them all is the Stmt class, this is so we can utilize polymorphism and create a list of Stmt objects to hold any of these objects.

3. The Scanner

The scanner is probably the simplest looking of the classes. It's important to note that this is where comments get removed from the code.

public class Scanner : BaseEmoticon
{

    private readonly Collections.IList result;

    public Scanner(IO.TextReader input)
    {
        this.result = new Collections.List();
        this.Scan(input);
    }

    public Collections.IList Tokens
    {
        get { return this.result; }
    }

    private void Scan(IO.TextReader input)
    {
        while (input.Peek() != -1)
        {
            char token = (char)input.Peek();

            if (IsHat(token) ||
                IsEye(token) ||
                IsNose(token) ||
                IsMouth(token) ||
                IsBeard(token))
            {
                this.result.Add(token);
            }
            else if (token == '~')
            {
                //ignore comments
                token = (char)input.Peek();
                while (token != '\n' && token != '\r' && input.Peek() != -1)
                {
                    token = (char)input.Read();
                    token = (char)input.Peek();
                }
                if ((char)input.Peek() == '\r')
                {
                    input.Read();
                }
            }
            input.Read();
        }
    }

}

4. The Parser

This is pretty much the syntax checker of the compilation. If you read through the code, you'll see that everything is declared as what they objects are, but are stored as Stmt objects in lists.

public class Parser : BaseEmoticon
{
    private int index;
    private Collections.IList tokens;
    private readonly Stmt result;

    public Parser(Collections.IList tokens)
    {
        this.tokens = tokens;
        this.index = 0;
        this.result = this.ParseStmt();

        if (this.index != this.tokens.Count)
            throw new System.Exception("expected EOF");
    }

    public Stmt Result
    {
        get { return result; }
    }

    private Stmt ParseStmt()
    {
        Stmt result;

        if (this.index == this.tokens.Count)
        {
            throw new System.Exception("expected statement, got EOF");
        }

        if (IsEye(this.tokens[this.index]))
        {
            //Start an emoticon command
            Emoticon emo = new Emoticon();
            emo.Eye = GetEyeFromChar(this.tokens[this.index]);
            this.index++;
            emo.Nose = this.ParseNoses();
            if (emo.Nose == null)
            {
                this.index--;
            }
            if (this.index < this.tokens.Count && IsMouth(this.tokens[this.index]))
            {
                emo.Mouth = GetMouthFromChar(this.tokens[this.index]);
            }
            else
            {
                throw new Exception("Expected mouth not found");
            }
            this.index++;
            result = emo;
        }
        else if (IsHat(this.tokens[this.index]))
        {
            Loop loop = new Loop();
            this.index++;
            loop.Body = this.ParseStmt();
            result = loop;
            if (this.index == this.tokens.Count ||
                !IsBeard(this.tokens[this.index]))
            {
                throw new System.Exception("unterminated loop body");
            }

            this.index++;
        }
        else
        {
            throw new Exception("Unexpected input: " + this.tokens[this.index]);
        }

        if (this.index < this.tokens.Count && (IsEye(this.tokens[this.index]) || IsHat(this.tokens[this.index]) || IsBeard(this.tokens[this.index])))
        {
            if (this.index < this.tokens.Count && !IsBeard(this.tokens[this.index]))
            {
                Sequence sequence = new Sequence();
                sequence.First = result;
                sequence.Second = this.ParseStmt();
                result = sequence;
            }
        }
        return result;
    }

    private Stmt ParseNoses()
    {
        Stmt result = null;
        if (this.index < this.tokens.Count && IsNose(this.tokens[this.index]))
        {
            Nose n = GetNoseFromChar(this.tokens[this.index]);
            result = n;

            this.index++;

            if (this.index < this.tokens.Count && IsNose(this.tokens[this.index]))
            {
                NoseSequence sequence = new NoseSequence();
                sequence.First = result;
                sequence.Second = this.ParseNoses();
                result = sequence;
            }
            else
            {
                //this.index--;
            }
        }
        else
        {
            this.index++;
        }
        return result;
    }

    private Eye GetEyeFromChar(char eye)
    {
        Eye e = new Eye();
        switch (eye)
        {
            case ':': e.EyeOp = EyeOps.Colon; break;
            case ';': e.EyeOp = EyeOps.SemiColon; break;
            case '=': e.EyeOp = EyeOps.Equals; break;
            case 'X': e.EyeOp = EyeOps.X; break;
        }
        return e;
    }

    private Mouth GetMouthFromChar(char mouth)
    {
        Mouth m = new Mouth();
        switch (mouth)
        {
            case '(': m.MouthOp = MouthOps.LeftParen; break;
            case ')': m.MouthOp = MouthOps.RightParen; break;
            case '{': m.MouthOp = MouthOps.LeftCurlyBrace; break;
            case '}': m.MouthOp = MouthOps.RightCurlyBrace; break;
            case '|': m.MouthOp = MouthOps.Pipe; break;
            case '@': m.MouthOp = MouthOps.At; break;
            case 'P': m.MouthOp = MouthOps.CapitalP; break;
        }
        return m;
    }

    private Nose GetNoseFromChar(char nose)
    {
        Nose n = new Nose();
        switch (nose)
        {
            case '^': n.NoseOp = NoseOps.Caret; break;
            case '-': n.NoseOp = NoseOps.Dash; break;
            case 'o': n.NoseOp = NoseOps.LowerCaseO; break;
            case 'c': n.NoseOp = NoseOps.LowerCaseC; break;
        }
        return n;
    }
}
5. The Emitter

This is where the compilation magic happens. Here we actually output the IL (CLR op codes) that will be JIT compiled at runtime by the .NET CLR. The CodeGen function creates a program with a Main function that then executes whatever code is in your Emo file. It creates a real .NET executable.

public class CodeGen : BaseEmoticon
{
    Emit.ILGenerator il = null;
    Collections.Dictionary symbolTable;
    bool isLoop = false;

    public CodeGen(Stmt stmt, string moduleName)
    {
        if (IO.Path.GetFileName(moduleName) != moduleName)
        {
            throw new System.Exception("can only output into current directory!");
        }

        Reflect.AssemblyName name = new Reflect.AssemblyName(IO.Path.GetFileNameWithoutExtension(moduleName));
        Emit.AssemblyBuilder asmb = System.AppDomain.CurrentDomain.DefineDynamicAssembly(name, Emit.AssemblyBuilderAccess.Save);
        Emit.ModuleBuilder modb = asmb.DefineDynamicModule(moduleName);
        Console.WriteLine(string.Format("Full Executable Path: {0}", modb.FullyQualifiedName));
        Emit.TypeBuilder typeBuilder = modb.DefineType("Emo");

        Emit.MethodBuilder methb = typeBuilder.DefineMethod("Main", Reflect.MethodAttributes.Static, typeof(void), System.Type.EmptyTypes);

        // CodeGenerator
        this.il = methb.GetILGenerator();
        this.symbolTable = new Collections.Dictionary();

        // Go Compile!
 
        //INITIALIZE MEMORY
        ////push 256 * 4 onto the stack, for an array of 256 ints
        int x = 256 * 4;
        this.il.Emit(Emit.OpCodes.Ldc_I4, x);
        //allocate the memory
        this.il.Emit(Emit.OpCodes.Localloc);

        //duplicate the memory address twice so we have 3 copies of it on the stack
        this.il.Emit(Emit.OpCodes.Dup);
        this.il.Emit(Emit.OpCodes.Dup);

        //pop the memory location to local variable 1, this is the pointer to the memory block
        //it never changes through the program
        this.symbolTable["WorkingRegister"] = this.il.DeclareLocal(typeof(int));
        this.symbolTable["MemoryStart"] = this.il.DeclareLocal(typeof(int));
        this.symbolTable["MemoryAddress"] = this.il.DeclareLocal(typeof(int));
        this.symbolTable["Register"] = this.il.DeclareLocal(typeof(int));
        this.symbolTable["TopOfStack"] = this.il.DeclareLocal(typeof(int));

        this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryStart"]);
        //pop the memory location to local variable 2, this is the current memory pointer
        this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryAddress"]);

        //Now we have one copy of the memory address left, and we initialize the memory to zero
        //Push zero, this is the value that the memory will be initialized to
        this.il.Emit(Emit.OpCodes.Ldc_I4, 0);
        //push 256 * 4, this is the size of the memory block
        this.il.Emit(Emit.OpCodes.Ldc_I4, x);
        ////Do it!
        this.il.Emit(Emit.OpCodes.Initblk);

        //Now we build the program

        this.GenStmt(stmt);

         this.il.Emit(Emit.OpCodes.Call, typeof(System.Console).GetMethod("ReadKey", new System.Type[] { }));
        this.il.Emit(Emit.OpCodes.Pop);
        //free the memory, this happens automatically upon returning from the function
        il.Emit(Emit.OpCodes.Ret);

        typeBuilder.CreateType();
        modb.CreateGlobalFunctions();
        asmb.SetEntryPoint(methb);
        asmb.Save(moduleName);
        this.symbolTable = null;
        this.il = null;
    }

    private void GenStmt(Stmt stmt)
    {
        /*
         * Working Register: (push) ldloc.0, (pop) stloc.0
         * Register 1: (push) ldloc.3, (pop) stloc.3
         * Stack: Stack used by CLR
         * Initial Memory Address: (push) ldloc.1, (pop) stloc.1 
         * Current Memory Address: (push) ldloc.2, (pop) stloc.2
         */
        if (stmt is Sequence)
        {
            Sequence seq = (Sequence)stmt;
            this.GenStmt(seq.First);
            this.GenStmt(seq.Second);
        }

        if (stmt is Loop)
        {
            Emit.Label test = this.il.DefineLabel();
            this.il.Emit(Emit.OpCodes.Br, test);

            Emit.Label body = this.il.DefineLabel();
            this.il.MarkLabel(body);
            isLoop = true;
            Loop l = (Loop)stmt;
            this.GenStmt(l.Body);

            this.il.MarkLabel(test);
            //Code for the loop condition
            this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]);
            this.il.Emit(Emit.OpCodes.Ldind_I4);
            this.il.Emit(Emit.OpCodes.Ldc_I4_0);
            this.il.Emit(Emit.OpCodes.Bgt, body);
        }

        if (stmt is Emoticon)
        {
            Emoticon e = (Emoticon)stmt;
            Eye eye = (Eye)e.Eye;
            Mouth m = (Mouth)e.Mouth;

            #region EyeOperations
            switch (eye.EyeOp)
            {
                case EyeOps.Colon:
                    //read from the register
                    //and push onto the stack
                    this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["Register"]);
                    break;
                case EyeOps.SemiColon:
                    //read from the current memory location
                    //and push onto the stack
                    this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]);
                    this.il.Emit(Emit.OpCodes.Ldind_I4);
                    break;
                case EyeOps.Equals:
                    //read from the keyboard
                    //and push onto the stack
                    this.il.Emit(Emit.OpCodes.Call, typeof(System.Console).GetMethod("Read"));
                    break;
                case EyeOps.X:
                    //pop from the stack
                    //and store in the working register
 
                    //May have to change operation to have a language specific stack
                    //  that is separate from the operation stack
                    break;
                default:
                    break;
            }
            #endregion

            #region NoseOperations
            if (e.Nose != null)
            {
                Nose n = (Nose)e.Nose;
                switch (n.NoseOp)
                {
                    case NoseOps.Caret:
                        //increment working area by one
                        if (eye.EyeOp == EyeOps.Colon)
                        {
                            this.il.Emit(Emit.OpCodes.Ldc_I4_1);
                            this.il.Emit(Emit.OpCodes.Add);
                        }
                        else if (eye.EyeOp == EyeOps.SemiColon)
                        {
                            this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]);
                            this.il.Emit(Emit.OpCodes.Ldc_I4_4);
                            this.il.Emit(Emit.OpCodes.Add);
                            this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryAddress"]);
                        }
                        else
                        {
                            throw new Exception("Invalid eye-nose pair: " + e.ToString());
                        }
                        break;
                    case NoseOps.Dash:
                        //decrement working area by one
                        if (eye.EyeOp == EyeOps.Colon)
                        {
                            this.il.Emit(Emit.OpCodes.Ldc_I4_1);
                            this.il.Emit(Emit.OpCodes.Sub);
                        }
                        else if (eye.EyeOp == EyeOps.SemiColon)
                        {
                            this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]);
                            this.il.Emit(Emit.OpCodes.Ldc_I4_4);
                            this.il.Emit(Emit.OpCodes.Sub);
                            this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryAddress"]);
                        }
                        else
                        {
                            throw new Exception("Invalid eye-nose pair: " + e.ToString());
                        }
                        break;
                    case NoseOps.LowerCaseC:
                        //shift right
                        this.il.Emit(Emit.OpCodes.Ldc_I4_1);
                        this.il.Emit(Emit.OpCodes.Shr);
                        break;
                    case NoseOps.LowerCaseO:
                        //shift left
                        this.il.Emit(Emit.OpCodes.Ldc_I4_1);
                        this.il.Emit(Emit.OpCodes.Shl);
                        break;
                    default:
                        break;
                }
            }
            #endregion

            #region MouthOperations
            switch (m.MouthOp)
            {
                case MouthOps.LeftParen://(
                    //Write to register
                    //this.il.Emit(Emit.OpCodes.Pop);
                    this.il.Emit(Emit.OpCodes.Stloc, symbolTable["WorkingRegister"]);
                    this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]);
                    this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["WorkingRegister"]);
                    this.il.Emit(Emit.OpCodes.Stind_I4);
                    break;
                case MouthOps.RightParen://)
                    //Write to memory
                    this.il.Emit(Emit.OpCodes.Stloc, symbolTable["Register"]);
                    break;
                case MouthOps.CapitalP:
                    //push onto the stack
                    //BUT the value is already on the stack, so do nothing.

                    //May have to change operation to have a language specific stack
                    //  that is separate from the operation stack
                    break;
                case MouthOps.Pipe://|
                    //no op, do nothing, or rather pop the top value off the stack
                    // to get rid of it because it's no longer needed
                    this.il.Emit(Emit.OpCodes.Pop);
                    break;
                case MouthOps.LeftCurlyBrace://{
                    //Copy memory value to register
                    this.il.Emit(Emit.OpCodes.Pop);//remove the working location because it's no longer needed
                    this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]);
                    this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["Register"]);
                    this.il.Emit(Emit.OpCodes.Stind_I4);
                    break;
                case MouthOps.RightCurlyBrace://}
                    //Copy register value to memory location
                    this.il.Emit(Emit.OpCodes.Pop);//remove the working location because it's no longer needed
                    this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]);
                    this.il.Emit(Emit.OpCodes.Ldind_I4);
                    this.il.Emit(Emit.OpCodes.Stloc, symbolTable["Register"]);
                    break;
                case MouthOps.At://@
                    //print to screen
                    this.il.Emit(Emit.OpCodes.Call, typeof(char).GetMethod("ConvertFromUtf32", new System.Type[] { typeof(int) }));
                    this.il.Emit(Emit.OpCodes.Call, typeof(System.Console).GetMethod("Write", new System.Type[] { typeof(string) }));
                    break;
                default:
                    break;
            }
            #endregion

        }
    }
}
6. Putting it all together

So, now that you have all of these classes that will create our programs, how are they supposed to be used? Fair question, take the function below and add it to the Emo class on the Emulator page. Then just call emo.Compile(); and you'll be good to go. Contrariwise, you could just replace SourceFile with the path to the source file and be good to go.

public void Compile()
{
    Scanner scanner = null;
    using (TextReader input = File.OpenText(SourceFile))
    {
        scanner = new Scanner(input);
    }
    Parser parser = new Parser(scanner.Tokens);
    string file = Path.GetFileNameWithoutExtension(SourceFile) + ".exe";
    CodeGen codeGen = new CodeGen(parser.Result, file);
}

I would like to say thanks to Joel Pobar who wrote http://msdn.microsoft.com/en-us/magazine/cc136756.aspx making this compiler possible. I didn't know where to start writing a .NET compiler before I read that page and saw his example.