Main Points

I've been delving much more seriously into programming lately. Now that I have a design for Earth Chronicle put together, I'm pushing a lot of content into webpages to get everything launched. However, a number of files I created like the Site Index, Image Index, Table of Contents, etc. were hand coded manually for the relatively few number of pages that I had. As we expand, that is not even remotely an option, so I've been hidden away at the library and building a programming library from Amazon for more information about best practices.

My favorite book so far would have to be Steve McConnell's Code Complete 2. This is everything I wanted and needed in terms of how to program. I love he explains how to program correctly And why. Moreover, McConnell does something rare in the average programming text; he cites numerous books and scientific studies which support his qualitative assertions. This is not your typical book put together by a great code jockey, this is the masterpiece of how programming is done. The fact that it's beautifully written and clearly understandable is almost a side note.

A fantastic surprise I found on Amazon was Christian Darie and Jaimie Sirovich's Search Engine Optimization with ASP.NET. I purchased it as a book of tangential interest for optimizing the website more professionally; and it certainly has all the information necessary to do that. However, I've been starving for C# programming samples like this. I've slogged through the MSDN when only vampires are out, reading until my eyes bleed, I've done all the walkthroughs which provide so much information - though I'm beginning to see how almost none of them were done remotely professionally. I've completed my education in what .NET can do and I'm ready to see how to really make it work. I want to see n-tier development examples, I want to see applications built from class files inside the App_Code folder, I want to build a Data Access Layer (DAL) to handle database work separate from the application logic. But those techniques require multiple files, extended descriptions and a lot of patience to write up. Microsoft has very little information like that, and almost nothing exists on the internet. Well, Search Engine Optimization with ASP.NET tackles each of these three best practices in its examples and provides more besides. While I don't know what Steve McConnell would say about Christian's building his application in all public static functions, Christian is light years ahead of any other demonstration I've seem in writing working C# code you can compile and incorporate into your own website. I owe Christian and Steve immeasurably for how to put this test together.

Compiling a Working Class

Our requirement for the Link Factory is to create absolute links from any page in the website to any other page in the website. Ultimately, we'll need to continue this test by building a data access layer (DAL), click here to check it out. However, in planning this project, it's clear this would be too much to test intelligently at one time. That's an excellent cue that this project is too big to implement as one class. So we're implementing the Link Factory in two phases. This page will create and test the Business logic of the link factory itself. Then we'll go after the data access layer to support it. If we create the DAL first, how do you check the output? You've got to create an output mechanism to test it. Conveniently, creating the business object is the best possible real world test to see if the DAL output is correct. By contrast, the output of the business object can be checked by doing a view source on the webpage, and we can create some input easily by hard coding some data as if it were coming from a DAL. Then we do testing, testing, testing, until that output meets the proper W3C specifications of a well formed absolute link. A simple and very real world test.

The goal of every programmer is to construct elegant, efficient code that does what it's supposed to. The secret goal of every programmer trying to get there is to first get the damn thing to compile. That'd be nice. Here's what we got up and running, then we'll break it down.

using System;
using System.Configuration;
using System.Text.RegularExpressions;
using System.Web;

/// <summary>
/// Summary description for BetaTesting2LinkFactory
///
/// BetaTesting2LinkFactory is a testing class for creating absolute links with golden keywords embedded in the URLs for SEO optimization. It can be called throughout the Earth Chronicle application.
///
/// </summary>
public static class BetaTesting2LinkFactory
{
.
.
.

}

At the top of the file are the necessary using statements and the summary. Properly commented code is the only way anyone will ever know what you're doing, including you when you revisit a file a couple years after you wrote it. This starts with the summary of why you're writing the class and what it does. Next we define the entire class.

public static class BetaTesting2LinkFactory
{

// Remove all punctuation from URLs to protect against possible errors
private static string stripPunctuation(string possiblyInvalidUrl)
{
.
.
.
}


// Build SEO optimized absolute links from anywhere in the application
public static string insertLink(string pageName)
{
.
.
.
}

}

Note that everything is tagged with the keyword static. Once the class is defined as static everything else must be set to static as well. When I forgot to include the static keyword on one variable declaration, the entire application folded. A static class, method, variable, or any other class member indicates that the code is the definition of a type; by contrast, non-static classes and members are free to define actual instances of a type. Exactly why I'm using it here, I'm not entirely sure. Steve McConnell doesn't address technical details of that low level, Jesse Liberty says it's "magic" and he'll explain later (but then doesn't), while Christian Darie says to do this in one of the few moments that he forgets to explain why. My rationale here is that Christian did it, so I'm building this as a static class too, basically. I believe it's probably for performance reasons that I'll come to appreciate later.

I've also defined the class as public so that I can access it throughout the application, the insertLink() method I want to use is also defined as public. However, the helper method to remove any punctuation accidentally included in the URL is defined as private so that it's not accessible except to functions within the class. Finally, the class keyword declares that, yes, BetaTesting2LinkFactory is a class.

Scratch that. It may not be listed in the index but Jesse Liberty's Programming C# does sneak in some discussion of static classes. C# doesn't allow global methods, however, because the process is so useful, C# uses static methods of static classes to duplicate the functionality in a safer, more object oriented way. Therefore, Christian Darie, chose to make this a static class so he doesn't have to (because you can't) create an instance of BetaTesting2LinkFactory, he can just use it.

[chroniclemaster1, 2009/10/10]

Of course, my methods can't take the class keyword, they define their output. In the case that they provide no output but simply run code, they'd take the void keyword. However, both of my methods return strings, so they take the string keyword. Then I define the name of the method, insertLink() and stripPunctuation(). Inside the parentheses, I'll be passing parameters, so I need to declare the variables for them. For insertLink() my variable is pageName which is a string, so it's declared insertLink(string pageName). For stripPunctuation() my variable is possiblyInvalidUrl which is a string, so it's declared stripPunctuation(string possiblyInvalidUrl). I've also included comments to explain what each method does. Now that my structure is defined, I can get busy coding.

// Remove all punctuation from URLs to protect against possible errors
private static Regex invalidUrlCharacters = new Regex("[^/a-zA-Z0-9]", RegexOptions.Compiled);
private static string validUrl;

private static string stripPunctuation(string possiblyInvalidUrl)
{
.
.
.
}

First, let's look at my private helper method, stripPunctuation(). Christian Darie includes a very complicated link building routine that includes lots of problematic content, and so he needs to build a robust regex-based method to clean every link. My application is going to use links much more extensively in a portal-like application; there's no way it can survive the kind of complicated multi-parameter link building that he does. Earth Chronicle requires a much simpler method and that's why I have to build a database to effect the full functionality that I want. However, that also means the regex requirements are merely a safety issue for situations where someone accidentally includes punctuation in the page name. Therefore, this implementation is much lighter. You will note that contrary to most of my examples, I use extra variables for better semantic clarity; instead of working with a variable named url, I operate on a variable named possiblyInvalidUrl which is "fixed" and passed into the variable validUrl.

I begin by declaring two variables. Since this is a private static method, the variables have to be declared the same way. My first variable is the set of invalid characters that I won't accept in my page names. It is easiest to declare this as a Regex which I've named invalidUrlCharacters. I then specify the exact set of characters I don't want. Note that this is a problem down the line, because I want the website to be truly multilingual. At present, I don't have the capacity to do that, so I'm going to move ahead until I can research the set of UTF-8 characters which are invalid for use in URLs. Then, this statement can be updated appropriately. Now, I'm using the caret, ^, to specify "anything except" English alphnumeric characters defined by the regex group [^/a-zA-Z0-9]. Since I've isolated this issue to this one location, I should be safe, and this will be a simple change when I have the revised set. I also declare a second variable; a string name validUrl. This is the variable that receives the output after the variable, possiblyInvalidUrl is cleaned.

// Remove all punctuation from URLs to protect against possible errors
private static string stripPunctuation(string possiblyInvalidUrl)
{
validUrl = invalidUrlCharacters.Replace(possiblyInvalidUrl, "");
return validUrl;
}

The method stripPunctuation(), declares the final variable, which is the method's single input parameter, a string named possiblyInvalidUrl. Now we're ready to get some work done. Because of the well-defined OOP principles we've used, this is a classic short piece of code. We define validUrl as the result of running possiblyInvalidUrl through the regex. Note that we're calling the Replace() method of the regex we created and passing it two parameters, possiblyInvalidUrl to process and an empty string. This tells the regex to replace the specified invalid characters with an empty string - ie it's removing them - as it searches possiblyInvalidUrl. Finally, we return validUrl as the output of the method. This is how we protect against any unsafe punctuation accidentally making it through to the link.

// Build SEO optimized absolute links from anywhere in the application
public static string insertLink(string pageName)
{
// Grab the elements needed to build the link
string goldenKeywords = "";
string website = "";
string folder = "";

// Purify the linkWithAbsoluteUrl components
website = stripPunctuation(website);
folder = stripPunctuation(folder);
pageName = stripPunctuation(pageName);

// Create and insert the link tag
string linkWithAbsoluteUrl = String.Format("<a title='{0}' href='{1}{2}{3}'>", goldenKeywords, website, folder, pageName);
return HttpUtility.UrlPathEncode(linkWithAbsoluteUrl);
}

First comes the section that hints at the future. In the live version, the first thing I'll need to do is hit the database with the pageName variable and retrieve all the information I need to build the link. For now, without the DB, I'm serving the same purpose by hard coding the results here, to test that the BetaTesting2LinkFactory works. We want to test at every step along the way, and this let's us defer all the DB stuff until later.

Next we remove the punctuation from all the variables that will be incorporated into the URL. This is where our helper method stripPunctuation() comes in. Finally, we construct the output for the link. I create a new string variable, linkWithAbsoluteUrl, using the Format() method of the String object. Then I've built the literal text to display to the page, specifying where each variable will go in the final output. Last but not least, I return the value so it can be output to the page.

Shockingly, this compiles. ;) So let's look at phase 2, making the code produce the output we want.

So let's get a test going. Does this link to the C# Programming Page? So it didn't work right off. First, my Regex removed the dot, ., in "CSharpDev.aspx", oops. That was easy to fix though it threw a couple application errors when I made the mistake of treating it like a normal Regex and I escaped the dot, "." as "\.". Since this was part of a group inside brackets, [], the dot loses its special function and is treated as a literal. So once I removed the backslash, my regex worked fine. The valid set now reads [^/.a-zA-Z0-9] including both the dot and the slash.

Next, the call to HttpUtility.UrlPathEncode is adding "%20" at each space. Christian Darie used this in his Link Factory, but he was only creating the path itself. However, I also want to construct the title attribute for optimal SEO, so I'm generating the entire opening <a> tag. I figured that this would probably blow up, but I was interested to see how it happened before I changed anything. Now that I know what UrlPathEncode does, I removed it. I'm returning return linkWithAbsoluteUrl;, and the spaces are no longer replaced with "%20".

Also, the path was not coming out correctly. It looked like href="http://localhost:1204/full path/http//localhost1204/fullpath/CSharpDev.aspx". The colons, :, in the webhost name were being removed, so it didn't recognize this as an absolute path; it was merely appending it to the current path like a relative URL. Once I deleted the line of code which runs the website variable through the stripPunctuation() method, my Link Factory worked fine. Note that in final testing, I've chosen to replace the website with the version which will run on the live website, rather than what will run on my dev box. [chroniclemaster1, 2009/11/18] My regex only changed in a couple characters that we've already discussed, so my finished insertLink() method looks like this.

// Build SEO optimized absolute links from anywhere in the application
public static string insertLink(string pageName)
{
// Grab the elements needed to build the link
string goldenKeywords = "lots of golden keywords";
string website = "http://beta.earthchronicle.com/ECBeta/";
string folder = "Testing/AspNet/CSharpProgramming/";

// Purify the absoluteUrl components
folder = stripPunctuation(folder);
pageName = stripPunctuation(pageName);

// Create and insert the link tag
string linkWithAbsoluteUrl = String.Format("<a title='{0}' href='{1}{2}{3}'>", goldenKeywords, website, folder, pageName);
return linkWithAbsoluteUrl;
}

Link Factory Implementation

Since my Link Factory is working, here's the code I'm using to take advantage of it in my webpages.

<%= BetaTesting2LinkFactory.insertLink("CSharpDev.aspx") %>C# Programming Page</a>

Now that I've got my factory built, this is what I'm using in place of a standard link. The link text and the closing link element are coded in the page as normal. I'm not sure if I'll build them as part of the link output in production or not, I'm considering. At present, this gives me greater control with minimal interference over the output and leaves the link text normally and readably in the page. However, in place of the opening link tag, I use an inline expression <%= ...blahblahblah... %> to process the link server side. Then I call the insertLink() method of my BetaTesting2LinkFactory class and tell it what page to insert. For the moment, I've hard coded the golden keywords and all of the path information into the variables where the database code will go. This allows me to test as closely to real world implementation as I can; that's a critical best practice in coding and testing. Here's the XHTML output, note that there are no additional spaces or anything else between the opening link tag and the link text, 'CSharpDev.aspx">C# Programming'. Very nice, very clean.

<a title='lots of golden keywords' href='http://beta.earthchronicle.com/ECBeta/Testing/AspNet/CSharpProgramming/CSharpDev.aspx'>C# Programming Page</a>

To clean things up a little, I also went back to my App_Code folder. Classes in App_Code don't have to be located anyplace in particular, their folder is irrelevant. As long as it is in App_Code itself, or a subfolder of App_Code, everything works fine. Still, organization is key for all programming. Therefore I've modeled my App_Code folder structure on the beta website to mimic the file locations of this webpage I'm testing it against. Now I just need to build my database code and you're welcome to come along.

This concludes the current test. The Link Factory works brilliantly, I'm really proud of it. The next phase is to expand it into a truly working class. In order to do that, we need to hit the database so we can populate the variables based on the input parameter of insertLink(), so feel free to join us in the database testing section, and check out our data access layer. I've also had the chance to start up the next test of my LinkFactory class and the new version naturally conflicts with this one. I realized this when I copied the file, but let it run just to see the fireworks; the app throws an error on compilation specifically complaining about the two identically named classes. So I've renamed my Link Factory class to describes the testing we're doing and provide different names built into the tests, BetaTesting2LinkFactory. [chroniclemaster1, 2009/10/12]