Basic compression in .NET

Following on from my previous post on encryption, a similar technique often used in conjunction with encryption is compression.

As an aside, if you are going to use encryption and compression together it’s important that you compress and then encrypt rather than the other way round. This is due to the way compression engines work. They reduce data size by looking for repeating patterns in data and then storing the pattern once along with the locations in the data in which it appears. It is often the case that unencrypted data is far more repetitive than the seemingly random output you get when using encryption. This is especially true for human readable text. Consider how many times words are repeated in a piece of prose versus how many repeating patterns there are likely to be in encrypted version of the prose.

As with encryption, we’ll start with a basic interface so that compression can be injected in other objects using dependency injection:

namespace Compression
{
    public interface ICompressor
    {
        string Compress(string text);
        string Decompress(string compressedText);
    }
}

The implementation for this is as follows:

using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace Compression
{
    public class Compressor : ICompressor
    {
        public string Compress(string value)
        {
            if (string.IsNullOrEmpty(value))
            {
                return value;
            }

            var inputArray = StringToByteArray(value);

            using (var outputStream = new MemoryStream())
            {
                using (var compressionStream = new GZipStream(outputStream, CompressionMode.Compress))
                {
                    // Compress:
                    compressionStream.Write(inputArray, 0, inputArray.Length);
                    // Close, but DO NOT FLUSH as this could result in data loss:
                    compressionStream.Close();

                    // Get a byte array from the output stream:
                    var outputArray = outputStream.ToArray();
                    outputStream.Close();

                    return ByteArrayToString(outputArray);
                }
            }
        }

        public string Decompress(string value)
        {
            if (string.IsNullOrEmpty(value))
            {
                return value;
            }

            var inputArray = StringToByteArray(value);

            using (var inputStream = new MemoryStream(inputArray))
            {
                using (var compressionStream = new GZipStream(inputStream, CompressionMode.Decompress))
                {
                    var outputList = new List<byte>();
                    int nextByte;
                    while ((nextByte = compressionStream.ReadByte()) != -1)
                    {
                        outputList.Add((byte)nextByte);
                    }

                    inputStream.Close();
                    compressionStream.Close();

                    return ByteArrayToString(outputList.ToArray());
                }
            }
        }

        private static byte[] StringToByteArray(string value)
        {
            var array = new byte[value.Length];
            for (var i = 0; i < array.Length; i++)
            {
                array[i] = (byte)value[i];
            }

            return array;
        }

        private static string ByteArrayToString(byte[] array)
        {
            var stringBuilder = new StringBuilder(array.Length);
            foreach (var b in array)
            {
                stringBuilder.Append((char)b);
            }

            return stringBuilder.ToString();
        }
    }
}

This can be demonstrated via the following simple test harness:

using System;
using Compression;

namespace TestHarness
{
    public static class Program
    {
        public static void Main()
        {
            const string text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
            var compressor = new Compressor();
            var compressedText = compressor.Compress(text);
            var decompressedText = compressor.Decompress(compressedText);

            ShowText("Text", text);
            ShowText("CompressedText", compressedText);
            ShowText("DecompressedText", decompressedText);
            
            Console.ReadLine();
        }

        private static void ShowText(string label, string text)
        {
            Console.WriteLine(label + ":");
            Console.WriteLine("Length: " + text.Length);
            Console.WriteLine(text);
            Console.WriteLine();
        }
    }
}

I’m not feeling particularly creative this morning so I opted for “Lorem ipsum” rather than anything witty as my input text.

The output of the program is as follows:

compression

Note that the compressed text is 282 characters in length while the original text is 445 characters.

Extending the Compressor class to be able to compress arrays and other data would be pretty simple too, given that the input strings are converted to arrays before the compression is performed.

Basic two-way encryption in .NET

There are lots cases when we need to encrypt and then decrypt data using .NET. A great example is if we want to store something secure in a cookie. We write the encrypted data to the web-response and decrypt it again again later from the next web-request.

I’ve been using the following algorithm for two-way encryption for a number of years now. I found it on the web a long time ago, so unfortunately don’t have a link to the original source any more. If anyone can help me here that would be appreciated.

First, let’s start with an interface as the chances are we’re going to need to inject encryption capabilities into other objects using dependency injection:

namespace Encryption
{
    public interface IEncryptor
    {
        string Encrypt(string text);
        string Decrypt(string encryptedText);
    }
}

The implementation for this is as follows:

using System;
using System.Security;
using System.Security.Cryptography;
using System.Text;
using System.IO;
using System.Globalization;
using System.Runtime.InteropServices;

namespace Encryption
{
    public class Encryptor : IEncryptor
    {
        private readonly SecureString _password;

        public Encryptor(SecureString password)
        {
            _password = password;
        }

        public string Encrypt(string text)
        {
            return Encrypt(text, GetDefaultSalt());
        }

        public string Encrypt(string text, string salt)
        {
            if (text == null)
            {
                return null;
            }

            RijndaelManaged rijndaelCipher;
            byte[] textData;
            ICryptoTransform encryptor;

            using (rijndaelCipher = new RijndaelManaged())
            {
                var secretKey = GetSecretKey(salt);

                // First we need to turn the input strings into a byte array.
                textData = Encoding.Unicode.GetBytes(text);

                // Create a encryptor from the existing secretKey bytes.
                // We use 32 bytes for the secret key. The default Rijndael 
                // key length is 256 bit (32 bytes) and then 16 bytes for the 
                // Initialization Vector (IV). The default Rijndael IV length is 
                // 128 bit (16 bytes).
                encryptor = rijndaelCipher.CreateEncryptor(secretKey.GetBytes(32), secretKey.GetBytes(16));
            }

            MemoryStream memoryStream;
            byte[] encryptedData;

            // Create a MemoryStream that is going to hold the encrypted bytes:
            using (memoryStream = new MemoryStream())
            {
                // Create a CryptoStream through which we are going to be processing 
                // our data. CryptoStreamMode.Write means that we are going to be 
                // writing data to the stream and the output will be written in the 
                // MemoryStream we have provided.
                using (var cryptoStream = new CryptoStream(memoryStream, encryptor, CryptoStreamMode.Write))
                {
                    // Start the encryption process.
                    cryptoStream.Write(textData, 0, textData.Length);

                    // Finish encrypting.
                    cryptoStream.FlushFinalBlock();

                    // Convert our encrypted data from a memoryStream into a byte array.
                    encryptedData = memoryStream.ToArray();

                    // Close both streams.
                    memoryStream.Close();
                    cryptoStream.Close();
                }
            }

            // Convert encrypted data into a base64-encoded string.
            // A common mistake would be to use an Encoding class for that.
            // It does not work, because not all byte values can be
            // represented by characters. We are going to be using Base64 encoding.
            // That is designed exactly for what we are trying to do.
            var encryptedText = Convert.ToBase64String(encryptedData);

            // Return encrypted string.
            return encryptedText;
        }

        public string Decrypt(string encryptedText)
        {
            return Decrypt(encryptedText, GetDefaultSalt());
        }

        public string Decrypt(string encryptedText, string salt)
        {
            if (encryptedText == null)
            {
                return null;
            }

            RijndaelManaged rijndaelCipher;
            byte[] encryptedData;
            ICryptoTransform decryptor;

            using (rijndaelCipher = new RijndaelManaged())
            {
                var secretKey = GetSecretKey(salt);

                // First we need to turn the input strings into a byte array.
                encryptedData = Convert.FromBase64String(encryptedText);

                // Create a decryptor from the existing SecretKey bytes.
                decryptor = rijndaelCipher.CreateDecryptor(secretKey.GetBytes(32), secretKey.GetBytes(16));
            }

            MemoryStream memoryStream;
            byte[] unencryptedData;
            int decryptedDataLength;

            using (memoryStream = new MemoryStream(encryptedData))
            {
                // Create a CryptoStream. Always use Read mode for decryption.
                using (var cryptoStream = new CryptoStream(memoryStream, decryptor, CryptoStreamMode.Read))
                {
                    // Since at this point we don't know what the size of decrypted data
                    // will be, allocate the buffer long enough to hold EncryptedData;
                    // DecryptedData is never longer than EncryptedData.
                    unencryptedData = new byte[encryptedData.Length];

                    // Start decrypting.
                    try
                    {
                        decryptedDataLength = cryptoStream.Read(unencryptedData, 0, unencryptedData.Length);
                    }
                    catch
                    {
                        throw new CryptographicException("Unable to decrypt string");
                    }                    

                    cryptoStream.Close();
                    memoryStream.Close();
                }
            }

            // Convert decrypted data into a string.
            var decryptedText = Encoding.Unicode.GetString(unencryptedData, 0, decryptedDataLength);

            // Return decrypted string.  
            return decryptedText;
        }

        private PasswordDeriveBytes GetSecretKey(string salt)
        {
            // We are using salt to make it harder to guess our key
            // using a dictionary attack.
            var encodedSalt = Encoding.ASCII.GetBytes(salt);

            var valuePointer = IntPtr.Zero;
            try
            {
                // The Secret Key will be generated from the specified
                // password and salt.
                valuePointer = Marshal.SecureStringToGlobalAllocUnicode(_password);
                return new PasswordDeriveBytes(Marshal.PtrToStringUni(valuePointer), encodedSalt);
            }
            finally
            {
                Marshal.ZeroFreeGlobalAllocUnicode(valuePointer);
            }            
        }

        private string GetDefaultSalt()
        {
            return _password.Length.ToString(CultureInfo.InvariantCulture);
        }
    }
}

This can then be used as follows:

using System;
using System.Security;

namespace Encryption
{
    public static class Program
    {
        public static void Main()
        {
            using (var password = GetPassword())
            {
                var encryptor = new Encryptor(password);

                const string text = "Hello World!";

                var encryptedText = encryptor.Encrypt(text);
                var decryptedText = encryptor.Decrypt(encryptedText);

                Console.WriteLine(encryptedText);
                Console.WriteLine(decryptedText);

                Console.ReadLine();
            }
        }

        private static SecureString GetPassword()
        {
            var password = new SecureString();

            password.AppendChar((char)80); // P
            password.AppendChar((char)97); // a
            password.AppendChar((char)115); // s
            password.AppendChar((char)115); // s
            password.AppendChar((char)119); // w
            password.AppendChar((char)111); // o
            password.AppendChar((char)114); // r
            password.AppendChar((char)100); // d
            password.AppendChar((char)49); // 1

            return password;
        }
    }
}

Note that we store the password using the SecureString class. This makes it harder to read the string from the memory while the program is running in the event that the host machine is hacked. Also, as I’m super-paranoid, I write to the SecureString using ASCII values converted to chars to prevent any readable values from appearing in string tables in the compiled code.

When using the Encryptor class you also have the option to add “salt”. If no salt is supplied a default based on the length of the password is used. Salt is effectively an additional piece of information you must supply in order to decrypt information, which becomes part of the decryption password. It is recommended that you use salt wherever possible as it makes it much harder for a hacker to decrypt sets of data using brute-force attacks.

Happy encrypting!

Trigger TeamCity to build and test your project on commits to multiple branches

A while ago I learnt a great trick to get TeamCity to perform the same action for commits to multiple branches, rather than just a specific branch. To do this, simply enter +:* into the Branch specification setting, as follows:

TeamCity Multiple Branches

This option can be found in the Edit VCS Root screen.

Once this is set, you’ll see the branch name next to each build run in the results screen:

TeamCity Multiple Branches Results

Simples.

Help! My .NET application refuses to open more than 2 concurrent internet connections!

A while ago some colleagues and I were writing a .NET application that fired off web-requests to a third party web-API. The details of the application are unimportant, other than that it was a migration tool which sent data from one third party store to another. There was a lot of data to shift (~9TB) and it needed to be done very quickly. So, we ended up writing a multi-threaded application which could read in many pieces of the stored data concurrently, and fire them off to the third party web-API concurrently.

We wrote a beautiful, elegant and totally thread-safe application and all watched in anticipation as we fired it off for the first time, expecting to see data throughputs the likes of which had never been seen before.

Instead the performance was only slightly better than the first version of our application, which just sent data through one piece at a time in a single thread. After a lot of digging, it turned out that all .NET applications have a restriction on the number of network connections that can be opened. By default this number is two. Fortunately, this number can be changed using the configuration file.

Simply add the following to override this value:

<system.net>
    <connectionManagement>
        <add address = "*" maxconnection = "100" />
    </connectionManagement>
</system.net>

To demonstrate this in action, try running the following .NET console application without overriding the configuration file:

using System;
using System.Net;
using System.Threading;

namespace ConnectionTestApplication
{
    public static class Program
    {
        private const int ThreadCount = 100;
        private static volatile int _timesPageGot;
        
        public static void Main()
        {
            var timerThread = new Thread(TimerThreadMethod);

            var getPageThreads = new Thread[ThreadCount];
            for (var i = 0; i < ThreadCount; i++)
            {
                getPageThreads[i] = new Thread(GetPageMethod);
            }

            timerThread.Start();
            for (var i = 0; i < ThreadCount; i++)
            {
                getPageThreads[i].Start();
            }

            Console.ReadLine();
        }

        private static void TimerThreadMethod()
        {
            var seconds = 0;

            while (true)
            {
                Thread.Sleep(1000);
                seconds++;
                Console.WriteLine("Seconds: {0}", seconds);
            }
        }

        private static void GetPageMethod()
        {
            var request = WebRequest.Create("https://en.wikipedia.org/wiki/List_of_law_clerks_of_the_Supreme_Court_of_the_United_States");
            using (var response = request.GetResponse())
            {
                _timesPageGot++;
                Console.WriteLine("Page got: {0}", _timesPageGot);
            }
        }
    }
}

Note that this will download what is supposed to be the longest page on Wikipedia (at the time of writing) a whole bunch of times concurrently using threads:

List of law clerks of the Supreme Court of the United States

You’ll get something like this on the screen depending on the speed of your internet connection:

Concurrent Internet Connections Before

Note that it took under 12 seconds to download the page 100 times on my internet connection.

However, with the configuration file override set to allow 100 concurrent connections I get:

Concurrent Internet Connections After

This time it took under 3 seconds.

Result!

Preventing embedded YouTube videos from showing related videos after playing

I was recently made aware of a cool feature in YouTube that prevents the embedded player from automatically playing/showing related videos once it has finished playing. With this piece of advice you need never give competitors chance to get their videos shown on your page again, and can avoid those inappropriate thumbnails of scantily clad girls from appearing on your oh-so-serious business website.

By default, YouTube will give you a piece of HTML like the following to embed in your site:

<iframe width="420" height="315" src="https://www.youtube.com/embed/Msp0ezv764g" frameborder="0" allowfullscreen></iframe>

Simply add “rel=0” to the query string part of the src attribute and banish those nasty related videos forever. The altered version should look like this:

<iframe width="420" height="315" src="https://www.youtube.com/embed/Msp0ezv764g?rel=0" frameborder="0" allowfullscreen></iframe>

YouTube supports a whole host of other query string parameters that you might find useful. A full list is available on the following site on the YouTube Embedded Players and Player Parameters site.

Configure the .NET SmtpClient object using the configuration file

If you’re using the SmtpClient class in the System.Net.Mail namespace of the .NET Framework, there’s an easy way to set the properties of the client without having to wire them up in the code manually.

Simply add the following section to your application configuration file and the class will auto-magically wire up the properties at run-time:

<system.net>
  <mailSettings>
    <smtp deliveryMethod="Network">
      <network enableSsl="true" port="123" host="myhost" defaultCredentials="false" userName="myuser" password="mypassword"/>
    </smtp>
  </mailSettings>
</system.net>

There are a whole bunch more settings you can edit. See Microsoft’s <mailSettings> Element documentation page for more details.

Testing your responsive mobile web-sites

Although there is NO substitute for testing your responsive mobile designs on physical devices, performing this kind of testing is often time consuming as it requires us to push code out to web-servers so we can get to it from our devices. Then of course we spot a bunch of issues, fix them, re-release and test again. This process often repeats a few times before we get it right.

It is possible to reduce the number of iterations by doing more “device like testing” in the browser. There are two aspects to control in order to get your browser to simulate a mobile browser more closely. The first is the user agent, and the second is the screen size.

There are many tools out there to help us alter the user agent. As I mentioned in my blog post entitled Essential FireFox add-ons for web-developers I tend to use the FireFox User Agent Switcher.

That just leaves the screen size. FireFox has a great tool called the Responsive Design View which allows you to set a custom screen size after pressing CTRL+SHIFT+M.

Once you’ve faked the user agent and reduced your screen size, you should be able to find more issues with your responsive designs in your own browser without the hassle of continually pushing code out to your web-server.

Here is a screen shot of FireFox pretending to be an iPhone 3:

Mobile Testing

Text truncation at a word boundary using C#

Following on from my last post on stripping HTML from text using C#, once I had removed all signs of HTML from the incoming text, I was also required to show a short preview of the text. I originally went with a truncation method, as follows:

namespace ExtensionMethods
{
    public static class StringExtensionMethods
    {
        public static string Truncate(this string text, int maximumLength)
        {
            if (string.IsNullOrEmpty(text))
            {
                return text;
            }

            return text.Length <= maximumLength ? text : text.Substring(0, maximumLength);
        }
    }
}

This works, but the results look a little odd if the truncate happens half-way through a word.

Instead, I came up this method to truncate at the first word break within the allowed number of characters:

using System.Linq;

namespace ExtensionMethods
{
    public static class StringExtensionMethods
    {
        private static readonly char[] Punctuation = {'.', ',', ';', ':'};

        public static string TruncateAtWordBoundary(this string text, int maximumLength)
        {
            if (string.IsNullOrEmpty(text))
            {
                return text;
            }

            if (text.Length <= maximumLength)
            {
                return text;
            }

            // If the character after the cut off is white space or punctuation 
            // then return what we've got using substring:
            var isCutOffWhiteSpaceOrPunctuation = char.IsWhiteSpace(text[maximumLength]) || Punctuation.Contains(text[maximumLength]);
            text = text.Substring(0, maximumLength);

            if (isCutOffWhiteSpaceOrPunctuation)
            {
                return text;
            }

            // Find the last white-space or punctuation and chop off there:
            var lastWhiteSpaceOrPunctuationPosition = 0;
            for (var i = text.Length - 1; i >= 0; i--)
            {
                if (char.IsWhiteSpace(text[i]) || Punctuation.Contains(text[i]))
                {
                    lastWhiteSpaceOrPunctuationPosition = i;
                    break;
                }
            }

            text = text.Substring(0, lastWhiteSpaceOrPunctuationPosition).Trim();

            return text;
        }
    }
}

While not perfect, this approach works a lot better. Please feel free to suggest improvements.

Stripping HTML from text using C#

I recently had a situation where I needed to show some text received in HTML format as plain text. This is the method I now use for this purpose, implemented as an extension method:

using System.Linq;
using System.Text.RegularExpressions;

namespace ExtensionMethods
{
    public static class StringExtensionMethods
    {
        public static string StripHtml(this string text)
        {
            if (string.IsNullOrEmpty(text))
            {
                return text;
            }

            var tagRegex = new Regex(@"(?></?\w+)(?>(?:[^>'""]+|'[^']*'|""[^""]*"")*)>");
            var tagMatches = tagRegex.Matches(text);

            var commentRegex = new Regex(@"\<![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)\>");
            var commentMatches = commentRegex.Matches(text);

            // Replace each tag match with an empty space:
            text = tagMatches.Cast<object>().Aggregate(text, (current, match) => current.Replace(match.ToString(), " "));

            // Replace each comment with an empty string:
            text = commentMatches.Cast<object>()
                .Aggregate(text, (current, match) => current.Replace(match.ToString(), string.Empty));

            // We also need to replace &nbsp; as this can mess up the system:
            text = text.Replace("&nbsp;", " ");

            // Trim and remove all double spaces:
            text = text.Trim().RemoveDoubleSpaces();

            return text;
        }

        public static string RemoveDoubleSpaces(this string text)
        {
            if (string.IsNullOrEmpty(text))
            {
                return text;
            }

            // Condense all double spaces to a single space:
            while (text.Contains("  "))
            {
                text = text.Replace("  ", " ");
            }

            return text;
        }
    }
}

The method RemoveDoubleSpaces was also needed, since after replacing HTML elements with empty space it is possible to end up with multiple empty spaces where a single space would do. This is quite a useful method in its own right, hence separating it out.

If you find any inputs which trip this method up, please let me know.

Using Visual Studio to run web-sites locally using real domains

Visual Studio is great for developing and testing web-sites, but by default it runs them under a localhost address using a randomly generated port number, as given in the property page for the web-site:

Visual Studio Real Domains 01

This is fine, but sometimes it’s useful to have sites run at a location that looks more like a real internet URL, especially if you plan on including functionality that manipulates the URL for redirects etc… I’ve also encountered issues creating cookies using the default http://localhost:xxxx address.

Fortunately, there’s a hosts file trick you can use to make Visual Studio run sites under more realistic URLs. First point 127.0.0.1 at the URL you want to use in your hosts file, for example:

127.0.0.1 www.testing.com

Note though that if the URL you are using points to a real web-site on the internet, you won’t be able to access that site again until you remove the entry from your host file.

Then edit the property pages so that you’re using the Visual Studio Development Server, a specific port (rather than an auto-assigned port) and a custom start URL containing your chosen port number and test URL, for example:

http://www.testing.com:1455

Your property pages should look like this:

Visual Studio Real Domains 02

Then, when you run your web-site using Visual Studio, it will use http://www.testing.com:1455 as the base URL. All paths relative to that should work as expected:

Visual Studio Real Domains 03