Wednesday, September 16, 2009

XML Comparison

 

In the previous post, I talked about serializing objects. Well… there are times when two serialized objects of the same class need to be compared for differences, for example:

Before

After

Result

<?xml version="1.0" encoding="utf-8"?>
<Object>
<PrimaryKey>
<![CDATA[123]]>
</PrimaryKey>
<Data1>
<![CDATA[sOme thing]]>
</Data1>
<Data2>
<![CDATA[sAme thing]]>
</Data2>
<SavedInformation>
<![CDATA[Do not compare]]>    
</SavedInformation>
</Object>


<?xml version="1.0" encoding="utf-8"?>
<Object>
<PrimaryKey>
<![CDATA[123]]>
</PrimaryKey>
<Data1>
<![CDATA[NO thing]]>
</Data1>
<Data2>
<![CDATA[sAme thing]]>
</Data2>
<SavedInformation>
<![CDATA[Do not compare]]>    
</SavedInformation>
</Object>


<?xml version="1.0" encoding="utf-8"?>
<Object>
<PrimaryKey>
<![CDATA[123]]>
</PrimaryKey>
<Data1>
<before>
<![CDATA[sOme thing]]>
</before>
<after>
<![CDATA[NO thing]]>
</after>
</Data1>
<SavedInformation>
<![CDATA[Do not compare]]>
</SavedInformation>
</Object>



Well I have just the class that’ll do this.



The rules are simple:



  1. Recursive: The class will recursively search every node for differences
  2. PrimaryKey (changeable): If primary key is available, it’ll be used to justify whether 2 nodes are comparable.
    1. If PrimaryKey is NOT available, the unidentified node will be alphanumerically ordered using a stable sort, then all nodes will be compared based on their position.

  3. PrimaryKey is always kept.
  4. SavedInformation (changeable) is always kept and ignored.
  5. Leaf nodes that changed are always kept and will feature two additional sub nodes: the “before” and “after” nodes.
  6. Parent of leaf nodes that changed are always kept.
  7. If the before state is a leaf node, but the after state is a container node, then the comparison, before and after will be tacked on along the shortest path, in this case, the leaf node.


So here’s how you can do that:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;

namespace XMLSerializerHelperClasses
{
    /// <summary>
    /// Pair up XMLNodes, one with another if they're comparable.
    /// </summary>
    internal class XmlNodePair
    {
        internal XmlNode Node1;
        internal XmlNode Node2;
    }

    /// <summary>
    /// Finds the difference of 2 XML Serialized objects. 
    /// Fields/Nodes are comparable only when the Join_Key of the node is identical (just like in the database, the same identity key that represent 2 different values indicates an update).
    /// When join_key is new, or gone, this is indicative of an insert or a delete.
    /// </summary>
    public class Difference
    {
        //The key to indicate that two objects are comparable. If the key differs then the object would be assumed to be recently created/deleted.
        public static string JOIN_KEY = "PrimaryKey";
        //Indicates that this information needs to be stored whether or not it changed. Ideally this is some information that may ease human read-ability.
        public static string REQUIRED_NODE_NAME = "_UserIdentifier";


        /// <summary>
        /// Finds the difference between two xmldocuments in string format.
        /// The result returned is an xml document with the old and new values wrapped inside <before></before> and <after></after> nodes.
        /// All parent nodes that contains the difference is preserved, along with the difference.
        /// Any other node (that did not change) is removed to simplify the resulting XML.
        /// </summary>
        /// <param name="Doc1String"></param>
        /// <param name="Doc2String"></param>
        /// <returns></returns>
        public static XmlDocument FindDocumentDifference(string Doc1String, string Doc2String)
        {
            XmlDocument doc1 = new XmlDocument(), doc2 = new XmlDocument();

            //Sometimes the string has a funny character and can't be loaded to xml document, we want to get rid of that character here.
            //The character is needed for deserialization, but since the xml difference will never be deserialized, this should be the best place to eliminate it.
            try
            {
                doc1.LoadXml(Doc1String);
            }
            catch
            {
                doc1.LoadXml(Doc1String.Substring(1));
            }
            try
            {
                doc2.LoadXml(Doc2String);
            }
            catch
            {
                doc2.LoadXml(Doc2String.Substring(1));
            }

            return FindDocumentDifference(doc1, doc2);
        }

        /// <summary>
        /// Finds the difference between 2 xml documents.
        /// </summary>
        /// <param name="Doc1"></param>
        /// <param name="Doc2"></param>
        /// <returns></returns>
        public static XmlDocument FindDocumentDifference(XmlDocument Doc1, XmlDocument Doc2)
        {
            XmlDocument doc = new XmlDocument();
            doc.AppendChild(doc.CreateXmlDeclaration("1.0", "utf-16", null));
            XmlNode Differences = FindNodeDifference(Doc1.SelectSingleNode("/*").PutPrimaryKeyFirst(), Doc2.SelectSingleNode("/*").PutPrimaryKeyFirst(), doc);
            if (Differences != null)
            {
                doc.AppendChild(Differences);
                return doc;
            }

            //We want to return null to indicate that there's no difference, such that it's easier to check on the database side, rather than having to inspect the first child.
            return null;
        }

        /// <summary>
        /// Creates an xml node containing all the differences between node1 and node2.
        /// When assembled this will create another xml document containing only the differences.
        /// NOTE: Node1 and Node2, must be of the SAME TYPE!
        /// NOTE: Node1 and Node2, must have the SAME PRIMARY KEY (IF EXISTS)!
        /// </summary>
        /// <param name="Node1"></param>
        /// <param name="Node2"></param>
        /// <returns></returns>
        public static XmlNode FindNodeDifference(XmlNode Node1, XmlNode Node2, XmlDocument doc)
        {
            XmlNode Before = null, After = null, Difference;
            try
            {
                Difference = doc.CreateNode(XmlNodeType.Element, Node1.Name, null);
            }
            catch
            {
                try
                {
                    Difference = doc.CreateNode(XmlNodeType.Element, Node2.Name, null);
                }
                catch
                {
                    Difference = doc.CreateNode(XmlNodeType.Element, "UNKNOWN", null);
                }
            }

            //There must be a text in between.
            if (Node1 == null)
                Node1.InnerText = "NULL";
            if (Node2 == null)
                Node2.InnerText = "NULL";

            //The most basic case, when both nodes are text.
            if (!IsContainer(Node1) || !IsContainer(Node2))
            {
                if (Node1.FirstChild == null || Node2.FirstChild == null || Node1.FirstChild.Name == "#text" || Node2.FirstChild.Name == "#text")
                {
                    //No difference.
                    if (Node1.FirstChild != null && Node2.FirstChild != null && Node1.FirstChild.Value == Node2.FirstChild.Value)
                    {
                        if (Difference.Name == REQUIRED_NODE_NAME) //Both name and content of the node must be the same, otherwise it'll show up as before and after.
                        {
                            //There's no difference here, but we want to save it anyways, because this is an identifying information that's friendlier to the user.
                            Difference.InnerText = Node1.FirstChild.Value;
                        }
                        else
                        {
                            return null;
                        }
                        //Found difference.
                    }
                    //The both nodes are of type <XXX/>, then the difference is nothing.
                    else if (Node1.FirstChild == null && Node2.FirstChild == null)
                    {
                        return null;
                    }
                    else
                    {
                        Before = doc.CreateNode(XmlNodeType.Element, "before", null);
                        After = doc.CreateNode(XmlNodeType.Element, "after", null);

                        Before.InnerXml = Node1.InnerXml;
                        After.InnerXml = Node2.InnerXml;
                        Difference.AppendChild(Before);
                        Difference.AppendChild(After);
                    }
                }
            }

            //If the nodes are containers
            else if (IsContainer(Node1) && IsContainer(Node2))
            {
                Queue<XmlNodePair> ComparisonQueue = PairUpMatchingContainers(Node1, Node2, doc);
                XmlNodePair DifferencePair;
                XmlNode PrimaryKeyNode = doc.CreateNode(XmlNodeType.Element, JOIN_KEY, null);

                //Because Node1 and Node2, should have the same primary key, so it really doesn't matter.
                try
                {
                    PrimaryKeyNode.InnerText = Node1.GetNode(JOIN_KEY).InnerText;
                }
                catch
                {
                    try
                    {
                        PrimaryKeyNode.InnerText = Node2.GetNode(JOIN_KEY).InnerText;
                    }
                    catch
                    {
                        PrimaryKeyNode.InnerText = "ANON";
                    }
                }


                while (ComparisonQueue.Count != 0)
                {
                    DifferencePair = ComparisonQueue.Dequeue();
                    //Add the difference, if it's not null.
                    try
                    {
                        Difference.AppendChild(FindNodeDifference(DifferencePair.Node1, DifferencePair.Node2, doc));
                        //Preserves the primary key, place it to the beginning of the list.
                        Difference.InsertBefore(PrimaryKeyNode, Difference.FirstChild);
                    }
                    //If null, adds nothing.
                    catch { }
                }
                //If there's a difference it should have before and after nodes, making it a container.
                //If all of the children of this node, are the same, and hence no difference, return null.
                if (!IsContainer(Difference))
                    return null;
            }
            //If there's a difference return it.
            return Difference;
        }

        /// <summary>
        /// The purpose of this to pair up child elements of the two nodes, that has the same id.
        /// If no ID found, then pair them up based on their type and general position.
        /// The ID should be found by the grand child element relative to node1, that is named JOIN_KEY.
        /// This pairing will work for one level ONLY! It'll not explore sub-nodes.
        /// </summary>
        /// <param name="Node1"></param>
        /// <param name="Node2"></param>
        /// <returns></returns>
        private static Queue<XmlNodePair> PairUpMatchingContainers(XmlNode Node1, XmlNode Node2, XmlDocument doc)
        {

            Queue<XmlNodePair> ProcessingOrder = new Queue<XmlNodePair>();

            Dictionary<string, XmlNodePair> Buckets = new Dictionary<string, XmlNodePair>();
            Dictionary<string, int> BucketedCounter;
            int Counter = -1;


            //Go through the children of the first node.
            foreach (XmlNode Child in Node1.ChildNodes)
            {
                //Figure out how many times we have seen this same node appear in the xml. We need this in order to do comparison where primary key is not given.
                BucketedCounter = new Dictionary<string, int>();
                try
                {
                    BucketedCounter[Child.Name] = BucketedCounter[Child.Name] + 1;
                }
                catch
                {
                    BucketedCounter[Child.Name] = 0;
                }
                Counter = BucketedCounter[Child.Name];

                //This child is not of collection type, continue.
                if (Child.Name == "#text")
                    continue;

                string NodeKey = "";
                //If the node has a primary key, use this key to generate a dictionary key.
                if (Child.GetNode(JOIN_KEY) != null)
                {
                    //Creates the nodekey from the Primary Key
                    NodeKey = Child.Name + ":" + Child.GetNode(JOIN_KEY).InnerText;
                }
                //If the node does not have the primary key
                else
                {
                    //The nodekey will be based on the the position and the name of the node.
                    NodeKey = Child.Name + ":Count" + Counter; //Notice the ":Count" this is to make sure that counter are always seperate from those with real primary key, in case the primary key is the same as the counter.
                }

                //Creates a new node.
                XmlNodePair temp = new XmlNodePair();
                //The first node is the child
                temp.Node1 = Child;
                //The second node should be the one from the dictionary, if it exists.
                try
                {
                    temp.Node2 = Buckets[NodeKey].Node2;
                }
                //If not copy the info from node 1, and set the inner text to null. - This means the node exists in Node1 but not in Node2
                catch
                {
                    temp.Node2 = doc.CreateNode(XmlNodeType.Element, Child.Name, Node1.NamespaceURI);
                    temp.Node2.InnerText = "NULL";
                }
                //Adds to the dictionary.
                Buckets[NodeKey] = temp;

            }

            //Reset the counter.
            Counter = 0;
            //Go through the children of the second node.
            foreach (XmlNode Child in Node2.ChildNodes)
            {
                //Figure out how many times we have seen this same node appear in the xml. We need this in order to do comparison where primary key is not given.
                BucketedCounter = new Dictionary<string, int>();
                try
                {
                    BucketedCounter[Child.Name] = BucketedCounter[Child.Name] + 1;
                }
                catch
                {
                    BucketedCounter[Child.Name] = 0;
                }
                Counter = BucketedCounter[Child.Name];

                //This child is not of collection type, continue.
                if (Child.Name == "#text")
                    continue;

                string NodeKey = "";
                //If the node has a primary key, use this key to generate a dictionary key.
                if (Child.GetNode(JOIN_KEY) != null)
                {
                    //Creates the nodekey from the Primary Key
                    NodeKey = Child.Name + ":" + Child.GetNode(JOIN_KEY).InnerText;
                }
                //If the node does not have the primary key
                else
                {
                    //The nodekey will be based on the the position and the name of the node.
                    NodeKey = Child.Name + ":Count" + Counter; //Notice the ":Count" this is to make sure that counter are always seperate from those with real primary key, in case the primary key is the same as the counter.                    
                }

                //Creates a new node.
                XmlNodePair temp = new XmlNodePair();
                //The first node is the child
                temp.Node2 = Child;
                //The FIRST node should be the one from the dictionary, if it exists.
                try
                {
                    temp.Node1 = Buckets[NodeKey].Node1;
                }
                //If not copy the info from node 2, and set the inner text to null. - This means the node exists in Node1 but not in Node2
                catch
                {
                    temp.Node1 = doc.CreateNode(XmlNodeType.Element, Child.Name, Node1.NamespaceURI);
                    temp.Node1.InnerText = "NULL";
                }
                //Adds to the dictionary.
                Buckets[NodeKey] = temp;

            }

            //Puts the dictionary into the queue.
            foreach (string DictionaryKey in Buckets.Keys)
            {
                ProcessingOrder.Enqueue(Buckets[DictionaryKey]);
            }

            return ProcessingOrder;
        }

        /// <summary>
        /// Checks if an xml node is a container.
        /// A container is a node that contains any other node types besides text.
        /// </summary>
        /// <param name="Node">The node to be checked.</param>
        /// <returns>True if it's a container, false otherwise.</returns>
        public static bool IsContainer(XmlNode Node)
        {
            bool IsContainer = false;
            foreach (XmlNode TestNode in Node.ChildNodes)
            {
                if (TestNode.NodeType != XmlNodeType.Text)
                {
                    IsContainer = true;
                    break;
                }
            }
            return IsContainer;
        }



    }
}

Friday, August 7, 2009

XML Serialization and Deserialization made easy.

In my previous post I described the problem with passing a local object as a webmethod argument. I briefly mentioned there're 2 ways to resolve the problem. Well here's the other one.

Using the following extension method, you can now get its XMLDocument representation by typing ".Serialize()"

Getting the same object back is extremely easy as well, simply typing ".Deserialize(OriginalSample)" OriginalSample being an object returned when calling the default constructor the object's default constructor (which you must have for .NET XML Serialization anyways)

This would also help in the webmethod call, local to proxy class conversion problem by serializing the object to XMLDocument and then deserializing into the desired proxy type. For example: "WebService.WebMethod(ObjectA.Clone(Proxy.ObjectA));"

This is the code you need:



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Serialization;
using System.IO;
using System.Xml;

namespace XMLSerializerHelperClasses
{
/// <summary>
/// This extension helps to easily and intuitively serialize objects to xml and deserialize back to object.
/// </summary>
public static class Extension
{
/// <summary>
/// XML Serilize the given object.
/// </summary>
/// <param name="O">The object to be serialized</param>
/// <returns>An xml representation of the serialized object.</returns>
public static string Serialize(this object O)
{
//Initial preparation for the object serialization. We instantiate the serializer, the memory stream and the text (xml) writer.
XmlSerializer Serializer = GetSerializer(O);


using (MemoryStream MemoryStream = new MemoryStream())
{
//Creates a unicode textwriter to the memory stream.
XmlTextWriter TextWriter = new XmlTextWriter(MemoryStream, Encoding.Unicode);

//Serialize the object into the memory stream using the textwriter.
Serializer.Serialize(TextWriter, O);

//Returns the string representation of the object.
//NOTE: The prefix of the string is some unrecognized character, and must be trimmed for proper functioning.
return Encoding.Unicode.GetString(MemoryStream.ToArray(), 0, (int)MemoryStream.Length).Substring(1);
}
}

/// <summary>
/// XML Serilize the given object.
/// </summary>
/// <param name="O">The object to be serialized</param>
/// <param name="Namespace">The namespace of the object, for SOA</param>
/// <returns>An xml representation of the serialized object.</returns>
public static string Serialize(this object O, string Namespace)
{
//Initial preparation for the object serialization. We instantiate the serializer, the memory stream and the text (xml) writer.
XmlSerializer Serializer = new XmlSerializer(O.GetType(), Namespace);


using (MemoryStream MemoryStream = new MemoryStream())
{
//Creates a unicode textwriter to the memory stream.
XmlTextWriter TextWriter = new XmlTextWriter(MemoryStream, Encoding.Unicode);

//Serialize the object into the memory stream using the textwriter.
Serializer.Serialize(TextWriter, O);

//Returns the string representation of the object.
//NOTE: The prefix of the string is some unrecognized character, and must be trimmed for proper functioning.
return Encoding.Unicode.GetString(MemoryStream.ToArray(), 0, (int)MemoryStream.Length).Substring(1);
}
}


/// <summary>
/// XML Serilize the given object.
/// </summary>
/// <param name="O">The object to be deserialized</param>
/// <param name="TargetObject">The target object for deserialization.</param>
/// <returns>An object representation of the xml serialized object.</returns>
public static T DeSerialize<T>(this string O, T TargetObject)
{
//Initial preparation for the object serialization. We instantiate the serializer, the memory stream and the text (xml) writer.
XmlSerializer Serializer = GetSerializer(TargetObject);

using (MemoryStream MemoryStream = new MemoryStream())
{
//Converts string to stream.
XmlTextWriter TextWriter = new XmlTextWriter(MemoryStream, Encoding.Unicode);
TextWriter.WriteRaw(O);
TextWriter.Flush();
MemoryStream.Seek(0, 0);
StreamReader Reader = new StreamReader(MemoryStream, Encoding.Unicode);
//DeSerialize the unicode reader.
return (T)Serializer.Deserialize(Reader);
}
}


/// <summary>
/// XML Serilize the given object.
/// </summary>
/// <param name="O">The object to be deserialized</param>
/// <param name="TargetObject">The target object for deserialization.</param>
/// <param name="Namespace">The target object's namespace, usually "http://tempury.org"</param>
/// <returns>An object representation of the xml serialized object.</returns>
public static T DeSerialize<T>(this string O, T TargetObject, string Namespace)
{
//Initial preparation for the object serialization. We instantiate the serializer, the memory stream and the text (xml) writer.
XmlSerializer Serializer = new XmlSerializer(TargetObject.GetType(), Namespace);

using (MemoryStream MemoryStream = new MemoryStream())
{
//Converts string to stream.
XmlTextWriter TextWriter = new XmlTextWriter(MemoryStream, Encoding.Unicode);
TextWriter.WriteRaw(O);
TextWriter.Flush();
MemoryStream.Seek(0, 0);
StreamReader Reader = new StreamReader(MemoryStream, Encoding.Unicode);
//DeSerialize the unicode reader.
return (T)Serializer.Deserialize(Reader);
}
}

/// <summary>
///
/// </summary>
/// <param name="O"></param>
/// <returns></returns>
private static XmlSerializer GetSerializer(Object O)
{
//Should not return null.
XmlSerializer Serializer = null;
Serializer = new XmlSerializer(O.GetType());
return Serializer;
}


/// <summary>
/// Clones an object using serialization - deserialization technique.
/// NOTE: All limitations pertaining to xml serialization/deserialization applies here, objects must have default constructor and public members.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="Object"></param>
/// <returns></returns>
public static T Clone<T>(this T Object)
{
return Object.Serialize().DeSerialize(Object);
}
}
}

ASPX Web Method: Same Class, Different Domains

When you try to pass an object as a parameter to a Web Service. For simplicity sake, call this object, object A. At the same time you also have object A referenced from a local binary.

As you try to do the following: WebServiceX.RunWebMethod(A); it bombed. Well... as it turns A and A are different since they come from a different domain. The first A is a proxy object, created when you reference the webservice in Visual Studio, the other A is the actual object whose binary exists locally. C# being strongly typed throws an exception.

The only way to get around this restriction is to make deep copy of object A via. reflection, or... XML serialization and deserialization. In this example I'll cover the first method, via. reflection.

So now after implementing the following code snippet, you want to run the webservice this way: WebServiceX.RunWebMethod((AnotherDomain.A) A.CopyTo(AnotherDomain.A))


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;

namespace ReflectionHelpers
{
public static class DeepCopyUtils
{


        /// <summary>
        /// We can't use generic with copy to but this wrapper allows it to wrap the logic around in a way that'll allow for type casting.
        /// </summary>
        /// <typeparam name="TSource"></typeparam>
        /// <typeparam name="TTarget"></typeparam>
        /// <param name="Source"></param>
        /// <param name="Target"></param>
        /// <returns></returns>
        public static TTarget CopyTo<TSource, TTarget>(this TSource Source, TTarget Target)
        {
            return (TTarget) Source.CopyTo_Base(Target);
        }

        /// <summary>
        /// Make a deep copy of source to target type of object.
        /// NOTE: Unfortunately the 2nd argument cannot be generic as that would cause problems when creating linq anonymous type.
        /// </summary>
        /// <param name="Source">The source</param>
        /// <param name="Target">The target object.</param>
        /// <returns>An object of type Target.</returns>
        private static Object CopyTo_Base(this Object Source, Object Target)
        {
            //The requisite null case scenerio. 
            if (Source == null)
                return null; //If the source is null, then target will have a null value also, simple.
            else if (Target == null && Source != null)
                Target = Source; //If the target is null, but source is not null, then we want to  return the shallow copy of the target, by letting this function falls through.


            Type BType = Target.GetType();
            Type AType = Source.GetType();
            //Members or properties.
            MemberInfo[] BProperties = ((MemberInfo[])BType.GetProperties()).Union((MemberInfo[])BType.GetFields()).ToArray();
            MemberInfo[] AProperties = ((MemberInfo[])AType.GetProperties()).Union((MemberInfo[])AType.GetFields()).ToArray();
            
            //Creates a new target object.
            Object Result;
            if (Target.GetType().GetConstructor(new Type[] { }) != null)
                Result = Target.GetType().GetConstructor(new Type[] { }).Invoke(new Object[0]);            //The requisite null case scenerio. 
            else //No default constructor, it may be a semi-primitive eg. string or array, so we can do nothing more
                return Source.Clone();
            

            //Find all similar property names between Source and Target, put them to a list, along with the content of the Source.
            var x = from b in BProperties
                    join a in AProperties 
                    on new{
                        Name = b.Name,
                        TypeName = b.GetMemberType().Name
                    }
                    equals
                    new {
                        Name = a.Name,
                        TypeName = a.GetMemberType().Name
                    }
                    select new
                    {                        
                        //The name of the property.
                        ID = a.Name,
                        //3 cases for content: 1) Array, 2) Struct Type and 3) Value Type.
                        Content = a.GetType().IsArray ? ((System.Array)a.GetValue(Source)).CopyTo((System.Array)b.GetValue(Target)) :
                                    !a.GetType().IsValueType ? a.GetValue(Source).CopyTo(b.GetValue(Target)) :
                                    a.GetValue(Source)
                                  
                    };


            Dictionary<string, object> MergeResult = new Dictionary<string, object>();

            foreach (var y in x)
            {
                MergeResult[y.ID] = y.Content;
            }

            //Copy if the same name exists in the dictionary as in the result.
            foreach (var z in ((MemberInfo[]) Result.GetType().GetProperties()).Union((MemberInfo[]) Result.GetType().GetFields()))
            {
                if (MergeResult.Keys.Contains(z.Name))
                {
                    if(z is PropertyInfo)
                        ((PropertyInfo) z).SetValue(Result, MergeResult[z.Name], null);
                    else if (z is FieldInfo)
                        ((FieldInfo)z).SetValue(Result, MergeResult[z.Name]);
                }
            }

            return Result;
        }
/// <summary> /// Retrieves the value of a member, if the member is of type property or field. /// NOTE: Only works for field and properties, all else returns null. /// </summary> /// <param name="Member"></param> /// <returns></returns> public static object GetValue(this MemberInfo Member, object ObjectInstance) { if (Member.MemberType == MemberTypes.Property) return ((PropertyInfo)Member).GetValue(ObjectInstance, new object[] { }); else if (Member.MemberType == MemberTypes.Field) return ((FieldInfo)Member).GetValue(ObjectInstance); return null; } /// <summary> /// Return the property type of a field, properties or method (return type). /// eg. int a = 0; will return "int". /// </summary> /// <param name="Member">The name of the member to look for.</param> /// <returns>The type of the field/property or method.</returns> public static Type GetMemberType(this MemberInfo Member) { if (Member.MemberType == MemberTypes.Property) return ((PropertyInfo)Member).PropertyType; else if (Member.MemberType == MemberTypes.Field) return ((FieldInfo)Member).FieldType; else if (Member.MemberType == MemberTypes.Method) return ((MethodInfo)Member).ReturnType; return null; } /// <summary> /// Shallow copy one array to the next. /// </summary> /// <param name="Source"></param> /// <param name="Target"></param> /// <returns></returns> public static System.Array CopyTo(this System.Array Source, System.Array Target) { //The requisite null case scenerio. if (Source == null) return null; //If the source is null, then target will have a null value also, simple. else if (Target == null && Source != null) Target = Source; //If the target is null, but source is not null, then we want to return the shallow copy of the target, by letting this function falls through. LinkedList<object> CopyResult = new LinkedList<object>(); for (int i = 0; i < Min(Source.GetLength(0), Target.GetLength(0)); i++) { CopyResult.AddLast(Source.GetValue(i).CopyTo(Target.GetValue(i))); } return (System.Array)CopyResult.ToArray(); } public static int Min(int a, int b) { if (a <= b) return a; else return b; } } }

Sunday, March 29, 2009

Spelled or Spoken Number to Decimal Conversion

Have you ever wanted to parse a text that reads “one hundred thousand and twenty five point seven” to “100025.7”, then the following code may be for you. I had been looking around the internet for some ready-made code that’ll do this to no avail, so this might help others who’re in the same situation I was in.

To use the class, give it your namespace. Instantiate and call TranslateSentenceToString(“String with text to be parsed”). Comprehensive explanation on the code logic is given in the inline documentation.

This code also requires the string replace with ignore case, included in the section after the parser class. To start parsing






using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Utils;

namespace YourNameSpace
{
public class TextToNumericParser
{
public static readonly int TOKEN_INDEX_DEFAULT = -1;
public static readonly int MAX_SEARCH_COUNTER = 1000; //If the parsing is falling into a cycle, this is an insurance, to force the class to quit.

private int _SearchCounter = 0;

public bool SearchMixture = false;

Dictionary<string, long> Specials = new Dictionary<string,long>();
Dictionary<string, long> Elements = new Dictionary<string, long>();
Dictionary<string, long> Exponents = new Dictionary<string, long>();
List<string> WhiteSpaces = new List<string>();

//We keep the LeadingZeroCount as a global variable. This works because this recursion is a straight line. It doesn't branch, but a more suitable solution may be needed.
int LeadingZeroCount = 0;

//List<string> Punctuations = new List<string>(); Not needed, we won't ignore punctuation. Eg. one hundred. two = 100 . 2

private static readonly char Seperator = '\a';

private int CurrentTokenIndex = -1;
private string[] TokensToBeParsed;

private void TickSearchCounter()
{
_SearchCounter++;
if(_SearchCounter>=MAX_SEARCH_COUNTER)
throw new Exception("Search taking too long, arborting");
}

/// <summary>
/// Retrieves the next token to be parsed, or null if it reaches the last token.
/// </summary>
public string NextToken
{
get
{
CurrentTokenIndex++;
//Increment the search counter, and make sure it hasn't hit the ceiling yet.
TickSearchCounter();
try
{
return TokensToBeParsed[CurrentTokenIndex];
}
catch //End of the tokens.
{
return null;
}
}
}

/// <summary>
/// Split the text to be parsed into tokens - to make it easier.
/// </summary>
public string TextToBeParsed
{
//get
//{
// return _TextToBeParsed;
//// On 2nd thought we don't want the user to see the text to be parsed, because it'll be marked with seperator.
//}
set
{
string _TextToBeParsed = "";
_TextToBeParsed = value;
foreach(string special in Specials.Keys)
{
_TextToBeParsed = _TextToBeParsed.Replace(special + " ", Seperator + special + Seperator + " ", true);
}
foreach(string element in Elements.Keys)
{
_TextToBeParsed = _TextToBeParsed.Replace(element + " ", Seperator + element + Seperator + " ", true);
}
foreach (string exponent in Exponents.Keys)
{
_TextToBeParsed = _TextToBeParsed.Replace(exponent + " ", Seperator + exponent + Seperator + " ", true);
}

TokensToBeParsed = _TextToBeParsed.Split(new char[] { '\a' });
}
}

/// <summary>
/// Adds all the discrete tokens that may be parsed.
/// </summary>
public TextToNumericParser()
{
//Instantiate the special character .eg (a thousand dollar) can be translated to 1,000, however it's special because it's only numeric if followed by a number eg. a plane is not numeric.
Specials.Add("a",1);
//Specials.Add("and", 0); //this is not a numeric but may be used as a conjunction **between two numbers** to imply an addition.
Specials.Add("point", 0); //this is not a numeric but may be used to denote a floating point.

//Elements are the most basic numeric.
Elements.Add("zero", 0);
Elements.Add("one", 1);
Elements.Add("two", 2);
Elements.Add("three",3);
Elements.Add("four",4);
Elements.Add("five",5);
Elements.Add("six",6);
Elements.Add("seven",7);
Elements.Add("eight",8);
Elements.Add("nine",9);
Elements.Add("ten",10);

//If allowed search mixture, this parser will attempt to parse an already numeric number, so that for example two hundred thousand . 5 will be parsed to 200,000.5
if (SearchMixture)
{
Elements.Add("0", 0);
Elements.Add("1", 1);
Elements.Add("2", 2);
Elements.Add("3", 3);
Elements.Add("4", 4);
Elements.Add("5", 5);
Elements.Add("6", 6);
Elements.Add("7", 7);
Elements.Add("8", 8);
Elements.Add("9", 9);
}
Elements.Add("eleven",11);
Elements.Add("twelve",12);
Elements.Add("thirteen",13);
Elements.Add("fourteen", 14);
Elements.Add("fifteen", 15);
Elements.Add("sixteen", 16);
Elements.Add("seventeen", 17);
Elements.Add("eightteen", 18);
Elements.Add("nineteen", 19);
Elements.Add("twenty", 20);
Elements.Add("thirty", 30);
Elements.Add("fourty", 40);
Elements.Add("fifty", 50);
Elements.Add("sixty", 60);
Elements.Add("seventy", 70);
Elements.Add("eighty", 80);
Elements.Add("ninety", 90);

//These are the exponent order.
Exponents.Add("hundred", 100);
Exponents.Add("thousand", 1000);
Exponents.Add("million", (long) Math.Pow(10,6));
Exponents.Add("billion", (long)Math.Pow(10, 9));
Exponents.Add("trillion", (long) Math.Pow(10, 12));

//White spaces. The existence of white spaces should be ignored when parsing a number.
WhiteSpaces.Add("");
WhiteSpaces.Add("\n");
WhiteSpaces.Add("\t");
WhiteSpaces.Add("\r");
WhiteSpaces.Add("\v");

}

/// <summary>
/// Parse all numbers and concat all the tokens result into a string.
/// </summary>
/// <param name="TextToParse">The text that needs parsing.</param>
/// <returns>A string translation with all the spelled numeric turned into numbers.</returns>
public string TranslateToString(string TextToParse)
{
List<string> ParsedText = TranslateSentence(TextToParse);
string TranslatedText = "";
foreach (string Text in ParsedText)
{
TranslatedText += Text;
}
return TranslatedText;
}

/// <summary>
/// Parse spelled out numbers in the sentence.
/// </summary>
/// <param name="TextToParse">The text that needs parsing.</param>
/// <returns>All the tokens that resulted from the parsing.</returns>
public List<string> TranslateSentence(string TextToParse)
{
TextToBeParsed = TextToParse;
return TranslateSentence();
}

/// <summary>
/// You must set the text to be parsed first.
/// NOTE: TODO. this will not detect one hundred - thousand ...
/// </summary>
/// <returns>The tokenized translation</returns>
private List<string> TranslateSentence()
{
List<string> Translation = new List<string>();
//Reset the current index.
CurrentTokenIndex = -1;

for(string Token = NextToken; CurrentTokenIndex < TokensToBeParsed.Length; Token = NextToken)
//while (CurrentTokenIndex < TokensToBeParsed.Length)
{
if (MayBeNumeric(Token))
Translation.Add(TranslateMayBeDecimal(Token));
//Definitely could not be numeric.
else
Translation.Add(Token);
}
//Reset the current index.
CurrentTokenIndex = -1;

return Translation;
}

/// <summary>
/// Translate a decimal (made of 2 integers, seperated by a "point")
/// </summary>
/// <param name="Token"></param>
/// <returns></returns>
private string TranslateMayBeDecimal(string Token)
{
//return null;
string ReturnToken = Token;

long? NumericString = GetInteger(Token);

double? FractionString = GetFraction(NextToken);

if (NumericString != null)
if (FractionString != null)
ReturnToken = (NumericString + FractionString).ToString();
else
ReturnToken = NumericString.ToString();
if (FractionString != null)
ReturnToken = FractionString.ToString();

//Since this will always return a value regardless of whether a numeric conversion took place or not and GetNumeric successfully consoume the token,
//This method ALWAYS consumes a token, but will return a string if it cannot translate to some decimal.
if(NumericString == null)
CurrentTokenIndex++;
return ReturnToken;
}

/// <summary>
/// Gets the integer portion of a decimal number.
/// </summary>
/// <param name="Token">The token that needs to be translated into an integer.</param>
/// <returns>The integer translation of the so said number.</returns>
private long? GetInteger(string Token)
{

long? exp, Temp, nextNumber, thisNumber;

if (Token != null) //If this token is null, ie, if the previous call to "NextToken" went over the number of tokens, then we do nothing.
{

//Continue traversing while it's a white space.
if (Token.Trim().Length == 0)
{
long? Number = GetInteger(NextToken);
//If at the end of the whitespace, we reached a number, then return it.
if (Number != null)
return Number;
}
else if (Token.Trim().ToLower() == "a")
{
exp = GetExponent(NextToken);
//If the a is followed by an "a" is followed by an exponent, then this "a" is equivalent to the number "1" and we treat it as such.
if (exp != 1)
{
nextNumber = GetInteger(NextToken);

//For example a hundred thousand and twelve, we want to add the "100,000" with the "12"
if (nextNumber != null)
return exp + nextNumber;
else
return exp;
}
}
else if (Token.ToLower().Trim() == "and")
{
Temp = GetInteger(NextToken);
//If the and is followed by a number, return that number, otherwise we want to spit the and back out, because we can't process this.
if (Temp != null)
return Temp;
}
else if (Elements.Keys.Contains(Token.ToLower().Trim()))
{
//Get thisNumber whether it's spelled out, or in digit. - See constructor.
thisNumber = Elements[Token.ToLower().Trim()];
//Try to see if the number is followed by an exponent.
exp = GetExponent(NextToken);
//Get the number that follows the exponent (if there's one)
nextNumber = GetInteger(NextToken);

if (exp.Value != 1 || Elements[Token.ToLower().Trim()] >= 10)//If the number is NOT a digit spelled out one by one, or if it has an exponent component, then we want to add them together,
{
if (nextNumber != null)
return thisNumber * exp + nextNumber;
else
return thisNumber * exp;
}
else //Otherwise we want to interprete it as if it're a digit spelled one by one.
{
long? ReturnTemp = 0;
//We multiply this number by the number of leading zero
if (nextNumber != null)
ReturnTemp = long.Parse((thisNumber * Math.Pow(10, LeadingZeroCount)).ToString() + nextNumber.ToString());
else
ReturnTemp = thisNumber;
//We need to increment any leading zero to sorta "borrow" it, otherwise they'll dissapear on the way back from recursion, when we cast the string to long.
if (thisNumber == 0)
LeadingZeroCount++;
//If this number is not zero, then it should be accounted for above.
else
LeadingZeroCount = 0;
//Return the number to the caller.
return ReturnTemp;
}
}

}
//Cannot consume the token, so we need to decrement the index back to the previous state.
CurrentTokenIndex--;

return null;
}

/// <summary>
/// This is used to get the fraction part ie. the part that follows after a "." (dot) or "point"
/// </summary>
/// <param name="Token"></param>
/// <returns></returns>
private double? GetFraction(string Token)
{
if (Token != null)
{
//Continue traversing while it's a white space.
if (Token.Trim().Length == 0)
{
double? Fraction = GetFraction(NextToken);
//If at the end of the whitespace, we reached number;
if (Fraction != null)
return Fraction;
}
else if (Token.Trim().ToLower() == "." || Token.Trim().ToLower() == "point")
{
long? NumberPart = GetInteger(NextToken);
//If the point was followed by a number, then return the fraction, otherwise we need to backtrack;
if (NumberPart != null)
{
return double.Parse("." + NumberPart.ToString()) * Math.Pow(10, -1 * LeadingZeroCount);
}
}
}
CurrentTokenIndex--;
return null;
}

/// <summary>
/// Gets the exponent part. Eg it'll return "100" from "a hundred"
/// </summary>
/// <param name="Token">The exponent part</param>
/// <returns>The exponent value.</returns>
private long? GetExponent(string Token)
{
if(Token!=null)
Token = Token.Trim().ToLower(); //Get rid of white spaces and caps.

//Continue traversing while it's a white space.
if (Token != null && Token.Trim().Length == 0)
{

long? exp = GetExponent(NextToken);
//If at the end of the whitespace, we reached a null or a non-exponent
if (exp.Value == 1)
CurrentTokenIndex--; //We cannot consume.
else
return exp; //Otherwise, we found an exponent and return it.
}
//If an exponent is found.
else if (Token != null && Exponents.Keys.Contains(Token))
return Exponents[Token] * GetExponent(NextToken);
//Cannot consume, neither white space nor an exponent.
else
CurrentTokenIndex--;
return 1;
}

/// <summary>
/// Checks if the given token is possibly a number - a necessary, insufficient check.
/// </summary>
/// <param name="Token">The token to be checked.</param>
/// <returns></returns>
private bool MayBeNumeric(string Token)
{
return Specials.Keys.Contains(Token.Trim().ToLower()) || Elements.Keys.Contains(Token.Trim().ToLower());
}

}
}


You also need to create the following helper class:



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Utils
{
public static class StringExtension
{
/// <summary>
/// String replace function that support
/// </summary>
/// <param name="OrigString">Original input string</param>
/// <param name="FindString">The string that is to be replaced</param>
/// <param name="ReplaceWith">The replacement string</param>
/// <param name="Instance">Instance of the FindString that is to be found. if Instance = -1 all are replaced</param>
/// <param name="CaseInsensitive">Case insensitivity flag</param>
/// <returns>updated string or original string if no matches</returns>
public static string Replace(this string OrigString, string FindString,
string ReplaceWith, int Instance,
bool CaseInsensitive)
{
if (Instance == -1)
return OrigString.Replace(FindString, ReplaceWith, CaseInsensitive);

int at1 = 0;
for (int x = 0; x < Instance; x++)
{

if (CaseInsensitive)
at1 = OrigString.IndexOf(FindString, at1, OrigString.Length - at1, StringComparison.OrdinalIgnoreCase);
else
at1 = OrigString.IndexOf(FindString, at1);

if (at1 == -1)
return OrigString;

if (x < Instance - 1)
at1 += FindString.Length;
}

return OrigString.Substring(0, at1) + ReplaceWith + OrigString.Substring(at1 + FindString.Length);
}

/// <summary>
/// Replaces a substring within a string with another substring with optional case sensitivity turned off.
/// </summary>
/// <param name="OrigString">String to do replacements on</param>
/// <param name="FindString">The string to find</param>
/// <param name="ReplaceString">The string to replace found string wiht</param>
/// <param name="CaseInsensitive">If true case insensitive search is performed</param>
/// <returns>updated string or original string if no matches</returns>
public static string Replace(this string OrigString, string FindString,
string ReplaceString, bool CaseInsensitive)
{
int at1 = 0;
while (true)
{
if (CaseInsensitive)
at1 = OrigString.IndexOf(FindString, at1, OrigString.Length - at1, StringComparison.OrdinalIgnoreCase);
else
at1 = OrigString.IndexOf(FindString, at1);

if (at1 == -1)
return OrigString;

OrigString = OrigString.Substring(0, at1) + ReplaceString + OrigString.Substring(at1 + FindString.Length);

at1 += ReplaceString.Length;
}

return OrigString;
}
}
}

Thursday, January 29, 2009

C# .NET 3.5 Credit card validation using Luhn formula

The following function in C# .NET 3.5 checks whether a given credit card is valid using the Mod10 or Luhn formula.

For more complete information and explanation of how the formula works, please check out this website. I based this function largely from the information I got there.

In summary the formula is as follow:
1) Double each alternative digit, starting from the second digit from the *RIGHT
2) If any digit is equal to 10 or greater, then add up the digits. e.g. "10" would be equal to "1 + 0" or "1"
3) Sum up all the digits.
4) If result is divisible by 10, then we probably have got a valid card number, otherwise it's fake.

This is a necessary, but insufficient check to verify credit card numbers generated by most financial institution. For a more complete check, you'll also need to check the first digits, to make sure they match the credit card company, eg. Master Card, may start with 51. However this is an easy check and is not a subject of this topic.

Function to check if credit card number is valid, or otherwise using Mod 10, or Luhn Formula:




public bool Mod10Check(string CreditCardNumber)
{
char[] CreditCardNumberArray = CreditCardNumber.ToCharArray();


var CreditCardDigits = new short[CreditCardNumberArray.Length];


for (int i = 0; i < CreditCardNumberArray.Length; i++)
{
CreditCardDigits[i] = short.Parse(CreditCardNumberArray[i].ToString());
}


CreditCardDigits = CreditCardDigits.Reverse().ToArray();


for (int i = 0; i < CreditCardDigits.Length; i++)
{
if (i%2 == 1)
{
CreditCardDigits[i] = (short)
(CreditCardDigits[i]*2);


if (CreditCardDigits[i]
>= 10)
{
char[] BigDigit =
CreditCardDigits[i].ToString().ToCharArray();


CreditCardDigits[i] = (short)
(short.Parse(BigDigit[0].ToString())
+ short.Parse(BigDigit[1].ToString()));
}
}
}


int SumOfDigits = CreditCardDigits.Sum(o => (int) o);


return SumOfDigits%10 == 0;
}

Wednesday, January 14, 2009

DirectoryInfo Move (UNC Compatible)

Ever wanted to move a whole directory, but .NET would not allow it? The only possible way was to create directory, at target, and copy each files to the new folder and delete the old one.

To make matter worse, you can't copy or move directory to a shared location, with a given UNC path.

The following code is an extension method to allow the built-in .NET DirectoryInfo class, to move an entire folder, even across servers.





///////////////////////////Directory Info Extension///////////////////////////
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace YourNameSpace
{
public static class DirectoryInfoExtension
{
///
/// Copy directory from one location to another, recursively.
///
///
///
///
///
public static DirectoryInfo CopyTo(this DirectoryInfo Source, DirectoryInfo Target, bool Overwrite)
{
string TargetFullPath = Target.FullName;

if (Overwrite && Target.Exists)
{
Target.Delete(true);

}
else if (!Overwrite && Target.Exists)
{
Target.MoveTo(Target.Parent.FullName + "\\" + Target.Name + "." + Guid.NewGuid().ToString());
}

//Restores target back, such that it's not pointing to the renamed, obsolete directory.
Target = new DirectoryInfo(TargetFullPath);

Target.Create();

CopyRecurse(Source, Target);

return Target;
}
///
/// Copy source recursively to target.
/// NOTE: This will create target subdirectories, but NOT target itself.
///
///
///
private static void CopyRecurse(DirectoryInfo Source, DirectoryInfo Target)
{
foreach (DirectoryInfo ChildSource in Source.GetDirectories())
{
DirectoryInfo ChildTarget = Target.CreateSubdirectory(ChildSource.Name);
CopyRecurse(ChildSource, ChildTarget);
}
foreach (FileInfo File in Source.GetFiles())
{
File.CopyTo(Target.FullName + "
\\" + File.Name);
}
}

///
/// This extension allows directory to be moved accross servers.
///
///
///
///
///
public static DirectoryInfo MoveTo(this DirectoryInfo Source, DirectoryInfo Target, bool Overwrite)
{
Source.CopyTo(Target, Overwrite);
Source.Delete(true);
return Target;
}
}
}

Case Insensitive String Replace

This post is a modification from Rick Strahl's Web Log, to replace string. ignoring case. The code has been modified to suit C# .NET 3.5 coding style - with extension methods.

Recently in one of the project I was working on, I came across a problem where I needed to replace parts of the string, while preserving the case, otherwise. That last part is the problem. While it's possible to simply call string.ToLower(), it'd result in the whole string converted to lowercase, regardles of whether they match the replacement criteria or otherwise.

The code below is designed to address the issue. It's almost copy and paste, except you need to rename the namespace to that of your solution. Reference the namespace and class, and string.Replace() will have 2 extra overloads.



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace YourNameSpace.SubNameSpace
{
public static class StringExtension
{
/// <summary>
/// String replace function that support
/// </summary>
/// <param name="OrigString">Original input string</param>
/// <param name="FindString">The string that is to be replaced</param>
/// <param name="ReplaceWith">The replacement string</param>
/// <param name="Instance">Instance of the FindString that is to be found. if Instance = -1 all are replaced</param>
/// <param name="CaseInsensitive">Case insensitivity flag</param>
/// <returns>updated string or original string if no matches</returns>
public static string Replace(this string OrigString, string FindString,
string ReplaceWith, int Instance,
bool CaseInsensitive)
{
if (Instance == -1)
return OrigString.Replace(FindString, ReplaceWith, CaseInsensitive);

int at1 = 0;
for (int x = 0; x < Instance; x++)
{

if (CaseInsensitive)
at1 = OrigString.IndexOf(FindString, at1, OrigString.Length - at1, StringComparison.OrdinalIgnoreCase);
else
at1 = OrigString.IndexOf(FindString, at1);

if (at1 == -1)
return OrigString;

if (x < Instance - 1)
at1 += FindString.Length;
}

return OrigString.Substring(0, at1) + ReplaceWith + OrigString.Substring(at1 + FindString.Length);
}


/// <summary>
/// Replaces a substring within a string with another substring with optional case sensitivity turned off.
/// </summary>
/// <param name="OrigString">String to do replacements on</param>
/// <param name="FindString">The string to find</param>
/// <param name="ReplaceString">The string to replace found string wiht</param>
/// <param name="CaseInsensitive">If true case insensitive search is performed</param>
/// <returns>updated string or original string if no matches</returns>
public static string Replace(this string OrigString, string FindString,
string ReplaceString, bool CaseInsensitive)
{
int at1 = 0;
while (true)
{
if (CaseInsensitive)
at1 = OrigString.IndexOf(FindString, at1, OrigString.Length - at1, StringComparison.OrdinalIgnoreCase);
else
at1 = OrigString.IndexOf(FindString, at1);

if (at1 == -1)
return OrigString;

OrigString = OrigString.Substring(0, at1) + ReplaceString + OrigString.Substring(at1 + FindString.Length);

at1 += ReplaceString.Length;
}

return OrigString;
}
}
}