Promotion

Having a web site is rather pointless unless people can find it. To do that, they use search engines or indexes. Before you submit your site to the engines and indexes, you need to take care of a few other things. You need to:
  • Rate your site for appropriateness to various age levels and sensibilities to ensure that access to it will not be blocked in error.
Only then should you submit your site to the various search engines and indexes.
horizontal bar

Preparation

First you need to figure out which words or phrases people will use when searching for your site. Don't assume that everyone thinks the way that you do. Talk to some friends and ask them what they would search for if they were trying to find your site. These are your "key words".

Armed with a list of keywords, you next want to optimize your ranking for such searches. Although the various search engines differ in their approach, the following rules will give you good results. They are listed in order of importance, with the most critical first:
  • Use the main key word in the page name.
  • Use the key words in your TITLE tag.
  • Use the key words in your KEYWORDS meta tag.
  • Use the key words in your DESCRIPTION meta tag.
  • Keep meta tags under 255 characters or they may be ignored. That limit is part of the HTML standard.
  • Use the keywords in H tags (H1, H2, etc.)
  • Use the key words in the body of your pages, especially near the top of your page.
  • Use the key words in the ALT attribute of your IMG tags.
  • Either avoid using frames, or make extensive use of the NOFRAMES tag.
  • Get other sites to link to yours.
  • Try to optimize the ratio of key words in your body text to the rest of the words.
horizontal bar

sitemap.xml

The sitemap.xml file is a way for webmasters to inform search engines about the pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists all of the URLs for a site that should be crawled. Optionally, it can include additional information called metadata about each URL (when it was last updated, how often it usually changes, and how important it is relative to other URLs in the site). This provides the search engine with the information needed to more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. However, you may not have links between every page in your site in a form that the crawlers can find. This is especially likely if you use Java, Javascript, or server-side navigation controls or menus. Having a sitemap.xml file does not guarantee that web pages are included in search engines, but it ensures that the listed pages will be crawled. It also enables you to submit the sitemap.xml file to the major search engines instead of submitting each page of your site individually. The file can be created in any plain text editor (like Notepad). It needs to have these lines at the top:

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">


If you want to be able to use a validator (like the one below) to ensure that your sitemaps.xml file is correct, use these lines at the top instead:

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">


To simply list the URL of each page, list each one on a separate line using the following format:
<url><loc>http://webauthoring.tercenim.com/index.htm</loc></url>>

The file needs to end with the line:
</urlset>

If you want to include the optional metadata, please refer to the official protocol site for the details.
w3schools logo Sitemap Validator
 
horizontal bar

robots.txt

Robots.txt is a plain text file that is placed in the home directory of your site. It is referenced by the software agents called User Agents that the search engines and others use to gather information about your site. (Most people refer to the User Agents as "spiders", probably because they crawl around web sites.) Legitimate search spiders will honor the robots.txt file, but there are others that will ignore it, such as spiders that hunt for email addresses on behalf of spammers. There is no reason not to create one, and several good reasons to do so:

  • It tells the spiders which, if any, files and directories you do not want indexed. For example, probably don't want your script and CSS files indexed. It puts an unnecessary load on your server and has no benefit. You may not want your images indexed if they are copyrighted and you don't want them turning up in image search results.
  • It tells the search engines where to find your XML site map file.
  • It prevents your server from logging a 404 (file not found) error every time a spider crawls your site.

The first line of the file should point your XML sitemap file. You must use the full URL of the file. For example:

Sitemap: http://webauthoring.tercenim.com/sitemap.xml

The next line specifies a particular spider by name. If, as is most often the case, you want to provide the same instructions to all spiders, you use an asterisk (*) as a wildcard. For example:

User-agent: *

If you want a list of the User Agent names so you can set up special exclusions for one or more of them, the best one I've found is at John Fotheringham's search engine robots page. After the User Agent line, you then add one line for each file or directory that you want the spiders to ignore (they look at everything by default). If you exclude a directory, you are excluding everything file and directory within it. For example:

Disallow: /scripts/

Note that you must use a trailing slash on the directory name. Some spiders recognize wildcards on the Disallow lines, but many do not. Therefore I recommend that you specify every individual file by name that you want to exclude, and do not do something like this:

Disallow: *.css
Disallow: *.js


You can have as many Disallow lines as you like for a spider, and as many separate spider sections as you like. You can only list one file or directory on each Disallow line. Just leave a blank line between a Disallow line and the User Agent line that starts the next section. Complete information about the robots.txt file can be found at The Web Robots Pages. If you want all your files and directories to be accessed by spiders, use a single Disallow line with nothing listed, like this:

User-agent: *
Disallow:

robots.txt Validator
horizontal bar

Resources

horizontal bar

META Tags

META tags go in the "HEAD" section of your page, and describe your document.

These control the description of your page that will be used by many of the search engines, provide information about who wrote the page, the language of the text, and many other attributes.
Creating these well will have a major impact on how easy your page will be to find using the search engines, and how your listing in those engines will look.

They also provide some instructions to the browser as to how to best interpret your page.
horizontal bar

Domain Name

Before you promote your site, you should consider whether you should get your own domain name. You don't want to put a lot of time and effort into promoting your URL and then end up changing it. It can take the search engines and indexes months to catch up.

Many give preferential ranking to URLs that have a keyword-relevant domain name like "www.FreeOrCheap.com" or "www.WebAuthoring.com".
Some engines will only list a finite number of pages per domain, so your page can get dropped when someone else on your domain submits their page. Some engines consider any page on certain free hosting domains to be unworthy of listing.