[C#] Modify strings in intern pool

Today, IÔÇÖm going to use three advanced techniques to create an awesome illusion. I’ll create a console application that writes Hello World! to the console, but H3110 W0r1d! will actually be printed.

Code source says "Hello World!" but "H3110 W0r1d!" is printed

Neat huh? OK, I admit itÔÇÖs useless, this is just a proof of concept.

String interning

String interning is an optimization done by the C# compiler and the CLR.

The idea is simple: when several literal strings are equal, the different handles will point to only one instance. This saves memory and makes string comparison more efficient.

For instance, if you write:

string a = "Test";
...
string b = "Test";

even in a different class or assembly, then

ReferenceEquals(a, b)

will be true.

You can force the CLR to put a string in the string intern pool with:

str = string.Intern(str);

And you can know if a string is in the intern pool with

string.IsInterned(str) != null

String interning is done at the domain level, which means that every assembly loaded in the domain will share the same string intern pool.

Modify strings with unsafe code

Strings are supposed to be immutable in C#. But with unsafe code you can get a pointer to the characters and modify them.

string a = "Test";

fixed (char* p = a)
{
    p[1] = '3';
}

Console.WriteLine(a);

This will print:

T3st

Now if we combine that with what we learned on string interning:

string a = "Test";
string b = "Test";

fixed (char* p = a)
{
    p[1] = '3';
}

Console.WriteLine(b);

still prints

T3st

event if the handle b wasn’t used to modify the string.

Since the string intern pool is shared between all assemblies in a domain, we could modify a string that is used by another assembly :-)

Enumerate string literals with meta-data

.NET uses meta-data to store the information about .NET assemblies. These meta-data can be read through the COM interface IMetaDataImport. Unfortunately this feature is not wrapped in .NET classes, so we need to do the COM interoperability ourselves.

Here is the COM interface that we can use to enumerate the literal strings of an assembly, call “user strings” in this context:

[ComImport, Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IMetaDataImport
{
    void CloseEnum(IntPtr hEnum);
	
    uint GetUserString(uint stk, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] char[] szString, uint cchString, out uint pchString);
   
    uint EnumUserStrings(ref IntPtr phEnum, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)]uint[] rStrings, uint cmax, out uint pcStrings);

    // interface also contains 62 irrelevant methods
}
Get an instance of IMetaDataImport

To get an instance of this interface, we use:

[ComImport, Guid("809C652E-7396-11D2-9771-00A0C9B4D50C")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[CoClass(typeof(CorMetaDataDispenser))]
interface IMetaDataDispenser
{
    uint OpenScope([MarshalAs(UnmanagedType.LPWStr)]string szScope, uint dwOpenFlags, ref Guid riid, [MarshalAs(UnmanagedType.Interface)] out object ppIUnk);

    // interface also contains 2 irrelevant methods
}

[ComImport, Guid("E5CB7A31-7512-11D2-89CE-0080C792E5D8")]
class CorMetaDataDispenser
{
}

Now, to get a IMetaDataDispenser, we just need:

var dispenser = new IMetaDataDispenser();
var metaDataImportGuid = new Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44");

object scope;
var hr = dispenser.OpenScope(location, 0, ref metaDataImportGuid, out scope);

metaDataImport = (IMetaDataImport)scope;    

Here, location is the path to the assembly file.

Enumerate user strings

Once you have an instance of IMetaDataImport, enumerating user string is easy and requires two steps:

  1. Enumerate the user string ids
  2. Get the string from the id

Here is the first step:

IEnumerable<uint> EnumerateUserStringIds()
{
    var hEnum = IntPtr.Zero;
    var buffer = new uint[100];
    uint count;

    metaDataImport.EnumUserStrings(ref hEnum, buffer, 100, out count);

    for (var i = 0; i < count; i++)
        yield return buffer[i];

    metaDataImport.CloseEnum(hEnum);
}

And step two:

string GetUserString(uint id)
{
    uint length;
    metaDataImport.GetUserString(id, null, 0, out length);

    var buffer = new char[256];
    metaDataImport.GetUserString(id, buffer, length, out length);

    return new string(buffer);
}

Putting it all together

Please have a look at the demo project on GitHub: https://github.com/bblanchon/CsharpLeetSpeak

The process describe in the article is implemented in the project LeetSpeak.

The console application is as simple as:

static void Main()
{
    LeetSpeakTranslator.LeetifyAssembly(Assembly.GetExecutingAssembly());
    Console.WriteLine("Hello World!");
}

And the ouput is:

H3110 W0r1d!

Conclution

I’m not sure it’s going to be useful for anyone, but it was a great opportunity to talk about advanced topics in .NET.

There may be a potential security issue with the fact that one assembly can modify a string used in another assembly, but I don’t think it’s more harmful than what you can achieve through reflection.

Also, I tried to “leetify” a WPF and a WinForm, but it didn’t work.

References