We saw an article from USA Today that caught our eyes. It was an interview of Google’s Matt Cutts talking about the basics on optimizing one’s site. He didn’t really say things we didn’t know about already nor did he explain extensively on the hows and whats but it is a good read for those just starting to create their own websites.
Checked your website lately? How easy is it to navigate? Are your readers often confused what your site is all about? Are you marketing something? What? Is it using too many flash or too many colors? Are you using too many fonts within a page? Too many unnecessary links? Perhaps it is time for a clean up!
Yahoo, Inc. submitted a patent pointing out that search engines may soon be looking at our web page design for consideration in Search rankings. The patent is a method with guidelines determining a usability of a web page.
“It can be important to make web pages easy and pleasing to use, which can be particularly important for web pages it is desired to monetize. This may include, for example, advertisement-containing web pages (of a so-called “web portal,” for example), for which an advertiser pays money when a user views the web page and activates a link of the advertisement. If such web pages are not easy and pleasing to use, the money-making potential of those web pages can be jeopardized. One conventional indication of whether a web page is easy and pleasing to use is called “clutter,” Yahoo further explains on the patent application.
Structural Characteristics of a web page
Here are 51 factors detailed on the patent that search engines may look into for usability of a web page:
* Total number of links
* Total number of words
* Total number of images (non-ad images)
* Image area above the fold (non-ad images)
* Dimensions of page
* Page area (total)
* Page length
* Total number of tables
* Maximum table columns (per table)
* Maximum table rows (per table)
* Total rows
* Total columns
* Total cells
* Average cell padding (per table)
* Average cell spacing (per table)
* Dimensions of fold
* Fold area
* Location of center of fold relative to center of page
* Total number of font sizes used for links
* Total number of font sizes used for headings
* Total number of font sizes used for body text
* Total number of font sizes
* Presence of “tiny” text
* Total number of colors (excluding ads)
* Alignment of page elements
* Average page luminosity
* Fixed vs. relative page width
* Page weight (proxy for load time)
* Total number of ads
* Total ad area
* Area of individual ads
* Area of largest ad above the fold
* Largest ad area
* Total area of ads above the fold
* Page space allocated to ads
* Total number of external ads above the fold
* Total number of external ads below the fold
* Total number of external ads
* Total number of internal ads above the fold
* Total number of internal ads below the fold
* Total number of internal ads
* Number of sponsored link ads above the fold
* Number of sponsored link ads below the fold
* Total number of sponsored link ads
* Number of image ads above the fold
* Number of image ads below the fold
* Total number of image ads
* Number of text ads above the fold
* Number of text ads below the fold
* Total number of text ads
* Position of ads on page
The importance of this tool checker is that we get an idea how popular a page is to be able to rank number one in the Google search page. With that, we also get an idea how many visitors- more or less that we need for our own sites (depending on our niche or popularity) to be number 1 in Google search.
For example, the Wiki page for “gravity” appears number one on the Google search. In the statistics checker, the said article “gravity” has been viewed 115447 in February 2008. Views through out the month ranges between 2,300 to 5,000 views per day.
The checker is still on beta and of course carries a disclaimer that it is prone to manipulation and or attack so we shouldn’t “base any important decisions on these stats,” according to the independent site.
Never mind if it’s inaccurate or inconclusive (for now?), use the checker for fun and general information. It doesn’t hurt to know a few stats on important keywords in Wikipedia translating into rankings in Google search.
There are three types of links in a blog or website and it is often debated which one helps in ranking our site in search engines. Which one will yield a higher search ranking or does it really help at all for our site to appear in search?
Three Types of Links
One way link
This is a basic link wherein you cite a website in your post and the site doesn’t link back to you. For example, we are linking wikipedia for an article on hyperlinks. Wikipedia doesn’t link back to us so we created a one way link.
Two way or Reciprocal link
This refers to mutual linking of sites. This type of link help each other out in terms of traffic. Most often, bloggers of the same niche or category request each other to exchange links thus gaining more exposure. For example, blogrolls found on personal blogs are almost always placed there with an exchanged link from the other bloggers featured on the roll.
Three way Link
This is believed to be the most effective of all links if we are to listen to webmasters. Most webmasters think that one way link isn’t any good but so is the reciprocal link so they formulated the three way link concept.
This means that website 1 links to website 2, website 2 links to website 3 and website 3 links to website 1- completing the three way linking.
What do we think?
We don ‘t believe that three way linking is the best strategy in yielding higher search engine rankings. Sure it helps but so are the other types of link. We think all are equal and that reciprocal linking is only slightly better.
The most important thing to consider is that good inbound links are the links that will definitely help in getting higher search engine results. The bottomline is that careful consideration should be exercised when linking with other sites. Let us not be in the habit of exchanging links to anybody who asks or to any website we come across just to get a link. Links to our website should be related or within the same topic, niche or category to get better ranking.
There is constant going ons and improvements at Google, Inc. camps particularly in the area of search. In its quest to giving relevant, balanced, and timely information to people using Google search- they brew up regular updates and changes in the way they handle search results. The changes were greatly observed on new year’s day!
In January 1, 2008- the world wide web celebrated the 25th Anniversary of TCP/IP (Transmission Control Protocol/Internet Protocol). TCP/IP is the basis for all the communication happening the moment you type in a URL. Messages are sent to the webserver/s of that particular URL and get back to you with the information you need, example going into the log in page of the site.
In celebration of the momentous event for the internet, Google opened the year 2008 with a happy new year and TCP/IP 25th year greetings on its logo. Clicking on Google’s logo when it appeared on January 1st, lead users to the search results for “January 1 TCP/IP” with the most updated information on the internet at that time.
The result? It wasn’t Wikipedia that topped the Google search but rather the most recent content it found on the internet.
Normally, we would wait for days, weeks or months to see our content indexed by Google, hence when searching for a keyword, the result would only show those with huge backlinks that are not necessarily recent. As shown on January 1, with Google experimenting on the keyword “January 1 TCP/IP”, it wasn’t the case.
Google now boasts that it could give out search results with almost real-time accuracy- meaning one would likely find a result to a query with the freshest information that was published minutes ago not some old news from weeks or months back!
This is great news!
This is somewhat flawed and is geared towards publishers who could write the fastest on a given keyword and writes on a daily basis but offers no guarantee for quality content or authority on the subject. So for queries that suddenly turn out to be a popular search, Google’s new system makes it so, that the recent articles (regardless of quality) may show on the front page of Google search results. This leaves so much room for bloggers playing the SEO game to take advantage of hot keywords so they would appear on Google’s front page search.
Have you recently searched on a keyword and was frustrated with the results because it showed no relevance whatsoever to your query and only appears like spam articles using the keywords you searched and they appear on the first page only because they were “new” articles?
Everyday hundreds of web robots and search engine crawlers set out to accomplish a huge task- that of visiting billions of pages in the internet, be it Google’s bot indexing all our pages and the rest of the web or the bad robots called spam bots hunting down every email addresses it could find to steal it.
For the most part, we love it when Google pays us a visit to index our content! Knowing what Google and others are getting however means we’re taking an extra step to direct them only to the content we want indexed. Sometimes there are areas in our directory where we don’t want others to see like our temp folder. To save bandwidth, we may want images, stylesheets or other files from being indexed too. For confidential files on our site, like a database of names and addresses of contacts, of course, it is best to just put it offline or onto another machine than risking spreading it on the net.
Comes the term Robots Exclusion Protocol (REP). Think of this as a sign to our office where it says, restricted area. That means for employees access only and meant to drive away unwanted visitors. /robots.txt works just like that. There is another REP to place in META tag that works the same way. We will discuss it in our next post. We’ll talk about the former first.
What is /robots.txt?
/robots.txt is a simple text file. It’s not an HTML, just a basic text file that can do wonders! It instructs robots which pages we would NOT want them to visit. It is not required of them to follow so but generally good robots and crawlers are courteous enough to comply with what is asked of them. It is important to note nonetheless that as in the above comparison of a restricted area, it’s just a sign to an unlocked door. It doesn’t mean that the unwanted visitor can’t get in when he wants to! Bad robots like spam bots and malware bots may still get through the door to look for loopholes in your security and those email addresses but the good bots will definitely abide with the sign and will not barge in uninvited.
As mentioned earlier, it is risky to place sensitive files on your directory and hope that robots.txt will protect it from being indexed and appearing in search results. /robots.txt is also public and may be accessed by anyone and it sees exactly what sections you don’t want robots to see so that you don’t want a filename like /mybankaccounts on the /robots.txt included. It just tells them you can’t view mybankaccounts folder but if you know a way to get into it, you can!
What does robots.txt look like and how does it work?
The concept of robots.txt is this: a robot wants to visit the site http://www.myownsite.com/welcome.html. Before it does anything, it first looks for http://www.myownsite.com/robots.txt, to find out which pages it can index or not. If it can’t find the filename, it will go ahead and index everything on that directory.
This is the basic structure of a robots.txt file where * (asterisk) means ALL robots and / (slash) means all pages should not be indexed. As a file it means: ALL robots are NOT allowed to index any of the pages. We don’t want that but just so you know the basic component, Disallow: /thenfilename .
User-agent: * Disallow: /
On the other hand this below says to allow all robots to index all pages. This is usually the default for all websites unless we manually create robots.txt to include files we don’t want indexed.
User-agent: * Disallow:
If you don’t have really much yet on your site, it is best to just do the above or simply create a robots.txt file and leave it empty or just not do anything. The /robots.txt works for those who have files in their directories that they don’t want to be indexed; files they don’t want to see appear on searches.
To save bandwidth and there’s really no point in having folders like our images or cgi-bin or other files from being indexed, we create this below which means you’re allowing all robots to index your pages except the one listed on the Disallow.
It means you’re allowing Google to index your pages except the cgi-bin and privatedir folders. Note however that if you do this, you’re not allowing MSN, Yahoo or Alexa to index your site. So you might want to reconsider doing so.
The samples above should be used as it is. Be careful with spelling, missing colons and placements. For example, writing Disalow instead of Disallow or User Agent instead of User-agent. Also the filename is robots.txt not Robots.Txt.
Where to place the robots.txt file?
Placement of the robots.txt file is very important since wrong placement means the robots and search crawlers won’t be able to find it, hence will most likely index ALL your pages. They don’t have all day to look for robots.txt file on our files. The only place to put is on the root directory of the site, not on folders, not on sub-directories. To check for your robots.txt file, just place this on the URL tab of the browser: http://myownsite.com/robots.txt