Wednesday, September 29, 2010

LINQ: Bulk Insert/Delete/Update

 

One of the major stumbling blocks that keeps LINQ 2 SQL / LINQ 2 Entity, from being considered seriously in a commercial development is performance. But not just any performance, because as you should already know running DB queries with LINQ is relatively fast (sometimes even faster, than hand coded SQL script). SQL SERVER 2005 (if not earlier) and up also ensure that the compiled queries are cached, which means the difference between LINQ script vs. stored procedure should be irrelevant.

Of course it is easier to do something stupid with LINQ 2 SQL such as referencing a late bound property that ends up returning thousands of rows. But even then careful coding and some help from the DataContext.AssociateWith method should limit the fallout, somewhat.

Additionally you should also check out LINQ pad for a tool to help you visualize the query execution plan.

With that out of the way, the only thing LINQ 2 SQL/Entity Framework cannot run batched insert, delete  or update, out of the box. Of course you can always write a SQL Stored Procedure, or call execute command or what have you, but then you will be losing all the benefits that come with LINQ: type checking, ease of maintenance, re-factorable code and so on.  What I mean is that when you have 10,000 data to delete from table A, LINQ will first query for the 10,000, then do a line by line delete based on primary key. (See:  http://www.aneyfamily.com/terryandann/post/2008/04/Batch-Updates-and-Deletes-with-LINQ-to-SQL.aspx) <--Terry Anney describes the problem far better than I ever could. She also provides part of the solution for batched Update and Delete.

However those alone are not enough. There is one case left: what if you need to do a batch insert from another table into another, after a little manipulation. Such scenario isn’t uncommon. For that, I really have to recommend, Magiq at: http://magiq.codeplex.com/ <—These fine folks are doing what Microsoft should have done from the start (before shoehorning the craptastic, LINQ to Entity down our throats).

One final thing, once you started playing with LINQ 2 SQL,  you probably noticed that it is fairly difficult to manage the DataContext object  scope/lifetime. The problem is compounded in a distributed application. For that, you should check out Rick Strahl blog over at:  http://www.west-wind.com/weblog/posts/246222.aspx. He did a great job at describing the problem and showing a few possible solutions for it.

Saturday, September 11, 2010

Recursive Text File Replacement in Powershell

Text manipulation is something that Powershell is very natural at.

For example, the script below, finds all string occurrences that matches the given regex and replace them as necessary in any file and subfolder. I can think of a great deal of use for this, especially when changing localization texts (among other things), where the Visual Studio built in  string replace is simply too inclusive and underpowered.

--------------------------------------------Powershell Script-------------------------------------------------------------

gci -r -i <#The files to search for eg. @("*.aspx","*.cs","*.resx") #>    | %{
        [bool] $SomethingChanged = $false;
        $newContent =
            (
                gc $_.FullName| %{
                            if($_ -match <#PUT YOUR REGEX HERE (in Quotes)!#>)                                                
                             {                                                
<#PUT YOUR STRING REPLACE LOGIC IN THIS BLOCK OF CODE AND SET $SomethingChanged TO TRUE IF SUCCESSFUL#>                                               
                             }
                            else  #No match found, output the original string.                          
                            {
                                $_
                            }
                        }   
            );
        if($SomethingChanged){   
            $newContent;       
            #Saves the updated content.
            sc $_.Fullname $newContent -Encoding UTF8
        }   
    }

---------------------------------------------Powershell Script-------------------------------------------------------------

The following table should help in figuring out how the script above works.

Syntax Definition
gci Get-ChildItem, it works like the “dir” in old DOS command.
-r Tells the “dir”/”gci” command to search recursively, ala. the old “/s” in normal cmd prompt.
-i This is the input filter flag. It should be followed by the filter string. eg. “*.aspx”
| Piping, it allows you to use the result from the previous computation in the next. It’s like the memory recall button in your calculator. (Note: It can be nested using parenthesis! )
$_ This variable/field stores the result piped from the previous statement. Ie. if the last computational result is “1”, then $_ == 1 is $true
% For each statement. It runs for each element, piped from the previous statement.
gc Get-Content. It’s like “type” in old DOS (which still works in powershell btw). It reads the content of a text file line by line.
sc Save-Content. Saves the content, it takes an array of string. Each element in the array, represents a line in the text file.

-

Sunday, March 28, 2010

.NET Serialization Techniques

Serialization modes

·         Binary Serialization

o   Pros:

§  Handles circular references by mapping objects prior to serialization.

o   Cons:

§  Class has to be tagged with serializable attribute

§  Parent, member classes have to be tagged as serializable.

·         May be impossible for certain built-in classes.

o   Note:

§  Classes are opt-in while members are opt out.

·         Meaning the classes will have to be tagged as serializable, but you don’t need to tag each and every member to make them serializable (unlike DataContract).

 

·         XML Serialization

o   Pros:

§  Easy to use and intuitive.

§  Serialize public fields and properties automatically.

§  Build upon the existing infrastructure and requires little code change or extra attributes, for example binary serialization requires code to be tagged with “serializable” attribute, but not so if you just want to XMLSerialize the same object.

§  Serialized object is readable.

o   Cons:

§  Cannot be used in classes with circular reference.

·         A quick example is a doubly linked list.

§  Will not serialize arrays of generic list (see: http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx)

o   Note:

§  Opt out, any public member that should not be serialized has to be [XMLIgnore] tagged.

§  Class must have default constructor that takes no argument.

§  Generic collection must have default accessors. Eg. Stack is not serializable this way, but list is.

§  Will not serialize enum whose:

·         Type is ulong

·         Value is greater than 9,223,372,036,854,775,807

 

·         ISerializable

o   Pros:

§  Very flexible, it can be made to do anything.

o   Cons:

§  A lot of rewrite and extra codes.

§  Too specific and not generic enough.

·         If you write the serialize method implementation for a type, it “usually” will not work with any other type.

§  It basically demands that you custom write each type serialization and deserialization method.

o   Note:

§  May be simplified with multiple inheritance using AOP

 

·         Marshall By Value

o   Pros:

§  Scalable.

o   Cons:

§  Cannot store state information on the server

o   Notes:

§  Behaves a lot like the binary serialization with identical pros and cons.

§  Object is transmitted across the network, to the client. Any subsequent calls will be made on the client object itself.

 

·         Marshall By Ref

o   Pros:

§  Easy and intuitive to use.

§  Object state is stored on the server side.

§  Better abstraction as the object behaves like any other local object.

o   Cons:

§  Repeated use is expensive. A loop that goes through a function call will be made to travel across the network each time!

§  Hides the cross network call round trip a little too well as remote object behaves exactly like a local object, causing unwary programmer to write slow and bloated code.

§  Must inherit from MarshallByRef object, forcing this kind of remoting object to originate from MarshallByRef object (directly or indirectly eg. the object great great grand parent could be MarshallByRef instead of the parent itself).

§  All member reference types must inherit off the same class as well.

o   Note:

§  Very cool concept that abstracts away the hardware layer, unfortunately its drawbacks are equally terrible.

 

·         DataContract

o   Pros:

§  The latest, currently supported serialization mode for WCF services

§  Easily wired using LINQ to SQL wizard.

o   Cons:

§  Class must have a default construct (ie. Constructor with no parameter)

§  So new that it may not be supported by services or older 3rd party tools that rely on binary serialization.

·         For example:

o   Sql Server Caching.

o   Shared Session in web farms.

o   Older versions of NCache.

§  Cannot automatically resolve circular reference.

§  Every member to be serialized must be tagged [DataMember] – tedious.

§  Each class has to be tagged [DataContract] – tedious.

§  Enumeration member type must be tagged with [EnumMemberAttribute] – (see: http://blog.waynehartman.com/articles/84.aspx)

§  Does not serialize .NET specific classes.

o   Note:

§  This is an opt-in serialization technique.

§  WCF does not recognize inheritance from a Data Contract Serialized objects. Derived object must be declared using – [KnownType(Type T)] tag to be recognized.

§  In Linq to SQL UI, circular reference can be turned off by selecting unidirection property, which’ll automatically remove tags on members that potentially cause circular reference.

§  For detail see: http://msdn.microsoft.com/en-us/library/ms731923.aspx

 

·         NetDataContract

o   Pros:

§  Binary formatter without having to mark the code as serializable. Beware it’ll behave like the old XMLSerializer in this case, except that it will take circular reference.

·         Useful for inheriting from objects that are not tagged serializable.

§  Type checked – whereas wcf services using the default DataContractSerializer gets confused and throws an exception when a child type is passed in as an argument whose parameter is its parent type, NetDataContractSerializer will accept it.

o   Cons:

§  Client has to have the type parameters in the WCF service, ie. You have to share your class libraries.

§  Class has to be tagged with either DataContract + DataMembers attributes or Serializable attributes, or it needs to follow the XML Serializer rules (except that circular reference is allowed in this case). See: http://www.pluralsight-training.net/community/blogs/aaron/archive/2008/05/13/50934.aspx

§  Since client has to have your classes, client must also have a .NET CLR, which means client must typically run in Windows.

§  WCF does not support this method out of the box, so a custom attribute implementation is required.

o   Note:

§  Most online example shows using WCF + NetDataContract serialization  using a custom operation attribute.

·         This means every function call will have to be tagged with the netdatacontract attribute, which may be tedious.

§  Alternatively a few actually shows how to declares a WCF end point to use NetDataContractSerializer binding.