Choosing Your SEO Testing Grounds

March 24, 2008 – 1:19 am

OK, so after a long time and a lot of testing and data crunching, it was about time I sat and wrote about my latest SEO adventures.

It seems like the SEO testing is all over the place lately. Firstly there is the excellent post by our old favorite XMCP about all the things to keep in mind when setting up an SEO test (and my follow up post). Then there is an interesting test about Google preference for different TLDs at GoogleCache blog. Top it up with the “Google indexes only the first link” test from SEOmoz (or the same test done earlier by Michael VanDeMar) and you have a rage. That is without mentioning Michael Martinez’s SEO Theory which consistently provides thought provoking material. Hell, there is so much writing about SEO testing that maybe there should be a session with similar topic at the next search conference ? SMX Advanced maybe ? (hint, hint)

So, you read all the excellent articles that describe different SEO tests and you want to test an idea that has been running circles in your head for months now. The question is how do you start ?

Image courtesy of cyranthu

You want your test to have a clear-cut, reliable results that can be translated into actions applicable on your money/client’s websites. So obviously you will want to put a website of your own out there, tweak a thing or two and see how that affects your rankings. Now this is where the problems start. What keyphrase to optimize for ? How much to optimize ? …. The main question here is what am I looking for in a testing ground. Well there are two things that come to mind:

  1. Low noise level - we want to perform our experiments in a surroundings that will not drown out our signal. This means that if I see my site drop 5 positions in SERPS, this is due to the action I performed and not due to the fact that the 5 sites below me have increased their rankings.
  2. Low level of competition - this is needed for two reasons:
    1. Competitive phrases have a high level of noise due to the constant promotion work being done on competing websites. Links added, text changed, metatags improved, etc.
    2. It will be harder for you to actually bring your site to a position where changes effecting it can be analyzed.

Looking at the above mentioned articles, SEO testing ground choices that people have made can be summarized into three models:

  1. A virgin testing ground - this is a nonsense keyword that has no results in Google SERPs prior to the test. This is basically what was done in SEOCache Google TLD preference test. They created new websites and promoted them as much as they wanted for the non-existing keyword, thus controlling all the 10 results. This approach provides a great level of control over all the parameters of the competing sites, thus enabling the tester to accurately attach every change in rankings to the action done on the websites. That said, this is as far from a real life situation as you can get. Forget about semantic relevance, forget about the rate of incoming links, forget about the search history. The problems with this model are further stressed by the actual SEOCache experiment - they show that Google prefers .org TLDs since they ranked above all the other TLDs consistently. However, when I checked the results from Israel, .net TLD’s came up above the .org’s.
  2. Semi-virgin testing ground - optimizing for nonsense keyphrases that have websites in the SERPs but are not used. Good examples I can think of are old SEO contests like [nigritude ultramarine] or [seraphim proudleduck]. These are actually quite good testing grounds, but again, no semantic relevance and other algo parameters that characterize a real SERP.
  3. Semi-promiscuous testing ground - these are SERPs for keyphrases that are made up of words that actually mean something separately but together are not in common use, like [small red cantaloupe chair] or [arrogant tennis epiphany]. This will provide you with a number of real websites to compete against which will provide the search history and link addition rate parameters. The real keywords used in the query will cover the semantic relevancy issue. There are, of course, problems with this ground, which I will elaborate on in a moment.

So, my somewhat biased description of different testing grounds should tell you which one I personally prefer. Yep, the majority of my tests are done on the Semi-Promiscuous Testing Grounds (SPTG), due to the closest possible resemblance to the real life SERPs. There are some problems with SPTGs as well as is shown on the actual examples below.

Throwing theories up in the air is all well and good, however some examples need to be shown. So yours truly took 10 queries that define SPTGs and monitored the rankings for 20 days. Well, actually I did not monitor the rankings, SERP Archive did and I just gathered and analyzed the results.

So, I will provide just the chosen few phrases that, IMHO, represent the typical SPTGs and discuss the potential problems with these niches. I already apologize for not giving the complete details about the queries, I am running additional tests on some of them and do not want them spoiled just yet :).

So let’s look at the graph over time for the phrase #1:

As can be seen, the top 10 for this phrase is pretty stable. Notice how things go a bit haywire between the 7th and the 11th of February ? Keep that in mind. Also see how the Site 8 dropped out of top 20 for a few days and then returned ? Remember this too, it will come useful when we get to the numerical analysis. Let’s take a look at the phrase #5:

This one looks even more stable. The turmoil begins only at the position 7 and lower and even then it is not significant for the sites #7, #8 and #9. Let’s take a shot at another niche defined by Phrase 3:

See the upheaval between the 7th and the 11th of February ? Just like in Phrase 1. Since these are two completely unrelated phrases, it makes sense that this is some kind of Google link recalculation/PR update/algo change. This is further emphasized by a similar pattern seen with other phrases not shown here.

Now, as a comparison, let’s take a look at a competitive phrase [personal loans]:

All hell breaks loose. No one is safe here, since there are constantly links being added so the position is never constant.

Looking at colorful charts is nice and important, however it is not enough. If you want to automate the process of choosing the testing grounds, you need to have some numbers for your scripts to crunch, so some calculations need to be made. Here is the point where I am warning any casual reader that what I did when doing the calculations is based only on my common sense. I am aware of the fact that much more robust and logical statistical tests exist that should be applied to the data, but I am just not swimming well in that field. I am actually trying to set a meeting with a statistics whiz that will guide me in these kinds of analysis, but that has not happened yet and I did not want to delay this post any longer. So take anything from this point onwards with more grains of salt than you are usually recommended when reading this blog (which is a lot).

So what I did is calculate an absolute value of change between locations on every two adjacent days and then averaged these changes over the testing time period. This gave me an average change of locations for each site. Then I averaged these values in order to get a value I called a Niche Stability Value (NSV). I put those in a bar chart and here is what I got:

So even though I am not sure (to say the least) about the reliability of my calculations, the above chart matches what I saw in the phrases charts. Phrases 6-10 were considered as non-competitive, however they did include competitive words like “investment” or “outfit” albeit in non-conventional context. Since the queries were not done in quotes, it makes sense that some of the sites in the top 10 were being promoted which would add to the level of noise.

One of the weaknesses of my calculations is the fact that they should minimize the effects of the temporary reversible drops in SERPs (like we saw with site 8 for phrase #1). These drops do not represent the real devaluation in the site’s score. Actually, if i took those few days out of calculations, the NSC for phrase #1 would drop to 0.2, which would make more sense. So any statisticians out there, I would love to get some input and further improve my calculations.

So, what is the take home message ? How do we choose a niche to perform experiments in ?

  • From what I saw in the results of the experiment, the phrases that defined the most stabile niches were scientific phrases. Any query that brings up a lot of PDF files from scientific magazines should fit this category. Furthermore, non-exact sciences are better suited niches for SEO testing than exact science related. So go for comparative religion, literature, sociology etc.
  • It is important to do both a visual inspection of the location charts and the statistical analysis. The advantage of visual check is that you can spot the algo changes/PR updates/reversible drops that should be taken out of equation. The advantage of the statistical analysis is that it can provide you with the quick estimate of low-competitive niches. It may have some false negatives, however the chances of a false positive are rather small. If the statistical analysis singles out a niche as a non-competitive one, the chances are that it really is a good testing ground, while the niches marked as competitive could still be good testing grounds with non-significant quirks that skewed the calculations.
  • Do not rely on your estimation of what is a non-competitive phrase without doing the above analysis. As you can see, I thought that phrase #8 was non-competitive and it came out to be a terrible possible testing ground. As they say, assumption is the mother of all f*ckups. Yes, yes I know there is a nicer way of saying that. It sounds dorky though.

To summarize this g-i-g-a-n-t-i-c post: chosing pristine niches for your testing is a good tactics, but it takes out a lot of real-life parameters out of equation. On the other hand, performing tests in real competitive SERPs will probably tell you nothing and will waste a lot of your time. Therefore, I do my tests in SERPs made out of illogical phrases consisting of real words. This however demands location monitoring for all the sites around my testing pages, which gives it an additional level of reliability ruling out temporary hickups in Google’s algo and other unrelated changes.

I don’t have a disclaimer to this blog (although I should put it up sometimes), but this post maybe brings it out most significantly: the setup of the tests, the results and the interpretations (mostly the interpretations) are all a product of my experience and current knowledge. They can be spot on and they can be complete crap. For me, the most important thing is to put the material and the ideas out there for the public to judge, add and shoot down. Only time and additional tests will tell whether there is some value to my ramblings here. So, if you have a different idea, find a significant logical failure or just strongly disagree with everything written here, please leave a comment, I don’t consider this a popularity contest.

Positive responses are also welcome. :))

PS. After re-reading the post and before publishing it, I noticed a possible mix-up that can happen: the stability of a niche should not be confused with the competitiveness, ie. the difficulty of promoting a page to the top 10 for that phrase. It only shows the levels of change in the locations of the top 10 websites.

Tags: , , , , , ,

Natural Link Building

February 24, 2008 – 7:09 pm

It’s been a while since the last post, mainly due to the echoes of the conference, we are getting involved in several exciting SEO projects and there is less and less time to do experiments and build pretty graphs everyone liked so much. :) I am hoping that in the future I will be able to put out good post a bit more frequently. I actually have a few interesting experiments cooking in the pot, one evaluating ways to identify reliable SEO testing niches and the other one testing different models of link juice sculpting by nofollow.

Image courtesy by: gilliman

OK, let’s get back on track.

I don’t need to tell you that link acquirement is bread and butter of today’s SEOs. Basically, it boils down to this: on-page optimization, being controlled by the SEOs (directly or indirectly) leaves little space for your talent to shine in competitive niches. It has been tested dry and there is a more or less a set list of actions one must do to get the maximal optimization score in that area. If you do too little, you continue optimizing. If you do too much, you dial it back a bit and you find your golden middle (which is niche specific, by the way).

Link acquirement, on the other hand, is where the men are separated from the boys. It is the hidden nature of the SE algos that makes link building such a polarizing technique: basically you have no idea whether the moment a search engine finds a link to your site is the moment when that link is being calculated towards (or against) your ranking score, you don’t know when you are acquiring links too fast, you have no idea what is the threshold of toleration of uniform anchor texts for your incoming links, etc. Because of this and many other reasons, link building has become the limiting factor in every competitive SEO campaign. No wonder that it is one of the topics most frequently written about in the SEOsphere. In spite of that, I wanted to share some of the tips/techniques/tools that I know to be helpful in improving one’s linking technique, hoping that I would bring some new information/experience to the table:

Your own backyard

The value of internal linking is often underestimated by an eager SEO who jumps into the deep end of the pool first and starts looking for PR 7 websites in his client’s field prepared to give out links. A lot of link love can be gained/properly utilized by optimizing the internal linking structure:

  1. How are your links to homepage doing ? Are you using your targeted anchor text when linking back Home ? Are you diversifying your anchor text in Home links ?
  2. Are you sprinkling targeted anchor text links through out the body copy of your pages ?
  3. Are you using nofollow to channel your link juice to optimized pages ? More importantly, how are you using nofollow ? There are several different models of link sculpting for SEO but you will have to be a bit more patient for that will be discussed in one of the future posts.
  4. Are you using your site navigation to make your site architecture more shallow? Spiders are shallow creatures and closer your page is to Home, easier it will be for spiders to find it.

Additionaly, Wiep has some great ideas how to gather up all those unused link resources you already have laying around and leverage them towards better ranking.

Some guidelines before you start the hunt

The title of the post says “Natural Link Building”. While to the untrained eye (or ear) this may sound as an oxymoron (natural links are gained by willing webmasters linking to quality information, not by zealous SEOs that create/ask for/buy links), the word “Natural” is actually intended for Google’s algo: it is of utmost importance to convince the algo that your linking efforts are in fact a part of natural link acquirement. It must walk like natural and it must quack like natural. So it is important to put our inner SEO beast back in the cage and consider the following:

  1. Do all the links deemed natural have the same “spare car parts” anchor text ? Or do some of them have anchor texts like “Check it out”, “Buy here”, http://www.sparecarparts.com, etc. ? It is very important to diversify your anchor text, but that does not mean having 50% of anchor texts “spare car parts” and the other 50% “spare truck parts”. It means truly mimicking the way people linked before SEO. I know of cautious SEOs that in some cases make 50% of their incoming link anchor text totally irrelevant to search phrases they are targeting. On the other hand, they make sure that the link is placed on a page that is very relevant to their phrases and we’ll expand on what is considered relevant a bit later.
  2. Do all the links come from pages that have a PR higher than yours ? While the argument about the importance of the little green bar still rages, the undeniable fact is that Google has that information and that it can use it for its purposes. And what purpose is more noble than discovering earnest SEOs artificially inflating their link scores ? If I was at Google, trying to look at suspicious link acquirement patterns, constantly getting links solely from pages with higher PR value would be a red flag with bold white printing on it saying “unnatural”. And if I can think of it, no reason that some of their PhDs can’t.
  3. Are all your incoming links placed sitewide on link contributing sites ? That is another tell sign. Instantly getting 300 links, even coming from a highly relevant site, is a temptation that cautious link builder will resist.
  4. Are all the incoming links pointing to your homepage ? Admit it, it looks very unnatural if your site is getting links from 50 other websites and all of them are pointing to the same page. This is especially true with large sites that have a lot of product/category pages. Acquiring deep links, together with the above mentioned link juice flow sculpting techniques will both promote your targeted pages and do it in a seemingly natural way that will present your site as an authority in it’s field, securing top locations for almost all of the phrases that the site targets presently or will target in the future. Additionaly, it will help with the spidering frequency of your site.
  5. Are you being careful not to add too many links at once ? Link velocity is something often overlooked, even by the most experienced SEOs. If I had a penny for each site that got burnt by instantly getting 200K incoming links, today I would have two pennies :) Seriously, unless you are reporting a war breakout in your neighborhood or are a distant relative to Britney and have just received custody over her children, keep a leash on your massive link campaigns and reduce the flow to a trickle.
    It can actually be a good idea to monitor the link addition rate of your competitors so you can set a maximal rate for yourself. How to do this ? There is a nifty tool called SERP Archive. It lets you set up a query that it monitors on daily basis and builds an accessible database of SERPs. [link:] is a legitimate query so make a list of all your competitors and feed the backlink query for each one of them into SERP Archive and start following.

Be vewy, vewy quiet; I’m hunting winks

So after we made sure that our internal linking structure is all dandy and optimized and that we know how to make our campaign appear natural enough, the big question is how to find places to get links from ? What is more important, relevancy or PR ?

Whether you are manually approaching webmasters or purchasing links through a text link vendor, the bottleneck of the process is finding relevant sites to get a link from. While I am consciously trying to steer away from the argument on whether the PR is important or not, there are two principles I try to live by: 1) a link from a higher PR page contributes more than a link from a lower PR page; 2) relevancy and authority beat PR almost every time. It is the fine balance between these two principles that defines my link hunting grounds.

So the next question is how to find relevant sites to get a link from ? How do I define relevant topics that will have enough sites to give me links that will help me with my targeted phrases ? While it may seem that the research process here is similar to keyword research, there is one substantial difference: when looking for keywords for your site, you are looking for phrases that your targeted audience uses. Therefore, you have some kind of litmus test which will tell you whether the keyword you chose is good or bad - you check the conversion rates, compare it with other keywords and you can get a pretty decent picture for each of the keywords (after taking prominence, level of optimization, location etc. into the equation). When looking for topics that are deemed relevant to your targeted phrases, you have to start thinking like a search engine and take into consideration all kinds of semantic algorithms that stretch the niche definition to a lot of neighboring phrases related to your targeted keywords.

In order to diversify your anchor texts and find additional linking resources, you have to expand the list of your niches (defined by keywords). Here are some of the tools that will help you achieve that:

  1. Google Sets - get it straight from the horse’s mouth. Google’s Labs project may tell you what are the keyphrases related to your niche.
  2. Google tilda (~) search operator - this one is great for single word queries. It performs the search on all the related terms to your query (and marks them bold in your SERPs). For example, performing a [~SAP] query will give the following list of related keywords: CRM, ERP, ABAP, supply chain, mysap, supply chain management, CIO, peoplesoft, enterprise resource planning, business application programming, etc. Then you can perform a tilda search for each of the related phrases you got and filter out with the (-) operator all the phrases that you already have. You can end up with quite a large list of phrases which may not all sound relevant to your niche, but hey, Google deems them semantically related, so who are you to argue ?
  3. Yahoo suggestion tool - just below the Yahoo search box, there is an expandable section called Search Assist. It has two areas: Suggestions, which will give you all the phrases that include the phrase that you searched and Explore Concepts which is the list you are looking for and will give you related phrases to your search query. It can go really wide so apply some common sense. On the other hand, going really wide may be just what you need.
  4. Ask suggestion tool - perform a query on Ask and on the left hand side you will see again two main areas: “Narrow Your Search” which will give you all the queries that include your keyword and “Expand Your Search” that will give you all the queries that do not include your keyword but are related to your keyword.
  5. Google Adwords Keyword Tool - you can use the experience that the hordes of PPC marketers have accumulated to your linking benefits. While not being so great at predicting the amount of traffic/impressions/clicks/search volume your ad will get, it can tell you what are the keywords that the publishers in your niche are using, which should be what they think their customers are using. While they may be wrong on guessing their potential customers’ intentions, they are usually good with defining the semantic field of every niche.

Just remember that this is a recursive procedure, meaning that each of the new keywords you find can be used as a query for each of the above tools so the list gets expanded. In the end, you will end up with an extensive list of keywords. What to do with it ? There are 4 major ways i can think of, that you can use such list:

  1. Searching for links - sometimes it is quite hard to find relevant sites that are willing to link to you, so having an extensive list of queries can inflate a potential link source list significantly
  2. Writing inspiration - there is nothing that will attract links like good topic and if you are a professional in the field of SAP, I am sure you will be able to muster some writing magic and expand a bit about CRP or ERP. Then you can promote the articles through social media or article submission services and that can produce a bunch of links, that will come from semantically relevant articles.
  3. Your PPC campaigns - why not plug those new keywords in a separate ad group (so you don’t ruin your existing groups performance by some potentially poorly converting keywords) and see how they perform. You may be surprised.
  4. SEO - since you are already doing the work, sprinkle some of the phrases over your website, see what kind of long-tail they bring and if they start pulling in some good traffic, maybe you will need a separate optimized page on that topic.

Wow. This post got longer than I thought it will be. I hope this is new for at least some of you and would love to hear any additional ideas on how to make your link building look natural.

Tags: , , , , ,

SphinnCon Israel

February 7, 2008 – 9:00 pm

Wow. It has been a really wild ride. When we first contacted Barry offering help in organizing the first Israel SEM/SMO event, little did I know…

I will not add another report from the conference. You can find the majority of them on Barry’s post. Just a several quick remarks:

1. Putting up a conference is a major task. I’m not sure I will ever take upon myself such a thing again. That said, I got a chance to work with and to get to know some pretty amazing people and will definitely be in touch with them in the future.

2. It is amazing that the conference went down so smoothly. We did not have en event organizers, no sound professionals, our company organized the food, the graphic artist and the venue, the RankAbove guys were mainly responsible for coordinating the speakers, press releases and doing the coordination between us while Gilad from Nekuda was responsible for the sponsors. Point being, we did it all by ourselves, no professional help… Even though, everything went down amazingly well, and most importantly, people had fun.

3. This was my first conference. Ever. I was a bit nervous due to the fact that my presentation is opening both the show and the SEO panel. However, I don’t remember the last time I’ve had so much work-related fun. The presentation was well received. Even the Google representative, after suffering through quite a bit of Google abuse in the presentation, came to say that he enjoyed it. The panel was funny and informative and I tried to give clear cut, precise answers to questions. Judging by the overall response, it was the best panel of the conference.

All in all, it was a great event. The Israeli SEM community has really stepped up, showing that there is a lively, knowledgeable and competitive SEM Industry here and that an SMX Israel conference is long overdue.

Finally, here is my (somewhat altered) presentation from the SEO panel. I am sorry for a bit of the text and font confusion, it seems like PowerPoint 2007 and SlideShare are not too friendly with each other. I’ve decided to use Google Docs presentation. Much better. A cleaner PDF version can be found on the SphinnCon Israel official page.

Tags: , , , , , , ,

Tapping into Unconventional Link Attributes

February 2, 2008 – 4:31 pm

When we analyze incoming links, we tend to focus on more or less the same set of link parameters: PageRank, anchor text, relative position in linking document, surrounding text, etc. However, sometimes looking beyond the regular, can provide the opportunity to not only succeed in link building, but even dominate the niche you are competing in. One of these uncommon parameters is the link freshness factor.

The whole issue of temporal aspect to links is a few years old. It has first appeared in official Google writing in this patent, signed, among others, by Matt Cutts, which makes it even more interesting to the SEO community. The whole patent itself is an interesting read and revisiting it can produce a worthy blog post, but I would like to focus on a very specific aspect of it I have recently noticed with several of our websites.

Some of our sites were enjoying top locations for their main targeted phrases in the past few months. While that is primarily good for the company’s bank account, it also gives us some freedom as to how we spend our time and how we divide the work priorities with that specific site. So, in the framework of secondary-phrase optimization stage, I have decided to drastically slow down the link acquirement process so I can try and gauge the relative value of each of the linking resources I was using at the time. Basically what I’ve done is instead of just throwing all the weight on several link acquirement techniques at once, I decided to use one, wait for the increase in rankings and then use the next one and compare. It is hardly a sterile experimenting environment but I thought it is a decent start…

While the comparison of impact of different link sources produced some interesting data by itself, plotting the change in locations over the time and marking the addition of links to different sources on the graph, provided me with a bit more interesting information (click on the below image to enlarge):

As can be seen on the above graph, every addition of a link (or number of links), resulted in a location increase, followed by the gradual slippage to a lower position (albeit higher than the starting one). This phenomena took place after several link additions from different sources and on different sites in different niches, so I believe it is not an isolated occurrence.

So what do we have here ? From the above graph, it can be theorized that there are at least two different scores that a link can pass to a page it is pointing to:

  1. A “fresh link” score. Since this is a new link, Google does not yet know the amount of link-juice this link should pass on. Even if this was the only link added to a linking page at the time of the observation, the number of outgoing links has changed and the proportion of PR this link (and other links on that page) should send on, must change. Since even Google cannot calculate all that on the fly, an “artificial” value is added to the link. From the above graph it can be concluded that this “artificial” value can be higher than…
  2. … a “real” link score. This score kicks in after Google reiterates all the PR calculations and assigns an objective value to that link.

In the above graph, I have marked the “fresh link” score with an A and the “real” score with the B. It is obvious that in the above case A > B which means that when the real value comes into account, the links are worth a bit less, the webpage’s ranking score is adjusted accordingly and the site slips in SERP’s.

Based on this analysis, we can have three possible relations between A and B:

  1. A > B - as in the example above, when the “real” link score is lower than the “fresh” link score - usually happens in case of crappy comment / forum signature / reciprocal / unrelated links.
  2. A B - when the two link values are approximately equal. We will usually not see a significant change in locations due to the switch between these two values.
  3. A < B - when the “real” link value is actually higher than the “fresh” link score. This usually happens with the high-quality links from on-topic / authoritative website. The result of this would be for a site to get an initial boost in rankings, stagnate for a while and then further improve.

So, what can we do with this information? Well, if you have a large pool of authoritative websites that can give you on-topic incoming links, then you should remember that (more often than not) your initial improvement in locations, due to that link addition, is only temporal and is bound to improve even more. When the link addition is considered through this scenario, it is easy to see how a phenomena dubbed “inbound link sandbox” came into existence. The situation where it takes quality links a while to affect the rankings can be explained by the fact that the A value of those links is not high enough to overcome the ranking score of competing websites so there is no improvement in locations. When (higher) B value kicks in, the score gap between the two sites is overcome and the locations improve.

However if you belong to the majority of people that have only less sophisticated link pool to dip into, you may want to add links at such rate that the “unknown” link score just keeps adding to the previous link’s “unknown” score and thus continuously improve the locations.

Obviously that rate will change from niche to niche and from link to link. Furthermore, you should be careful not to overdo it and raise some red flags due to extensive link addition rate, but some trial and error in each niche should outline the playfield rules for that particular niche.

As for the methods of reproducing the “unknown link” score over and over again, well, that is a completely different hat color… ;)

Tags: , , , , , ,

Designing SEO Experiments - a different angle

January 26, 2008 – 11:06 pm

 

I have thoroughly enjoyed SlightlyShadySEO’s post about designing SEO experiments. If you haven’t read it yet, go do that now, it gives quite a comprehensive list of all the variables needed to be controlled when creating a sterile environment necessary for isolating and testing the individual ranking algorithm parameters. I felt that there are few points worth expanding a bit:

Creating sterile environments is really important in science, since one always wants to look at the influence of changing a single parameter, while keeping all the others constant. The same premise is valid for constructing experiments in search engines. However, there are at least two drawbacks in this translation of scientific principles into the SEO world:

 

  • It is very hard to create a sterile environment. Even if you create a number of websites, all ranking for the same keyphrase and then start changing one of them, the complexity and the hidden nature of SE ranking algorithm always presents us with dilemmas regarding the cause and effect of the change we observe.Let’s say we have increased a keyphrase density of one of the websites and it drops in rankings. Is it because we crossed some threshold value after which increase in density starts invoking penalty ? Or is it because someone scraped the site, thus inducing duplicate content issues ? Or maybe there was a change in the ranking algorithm which made keyphrase density less important ? There are innumerable possibilities for things that could have affected our SERPs and it is very hard to point at one of those as the primary contributor to the change.
  • Conclusions reached in such sterile environments are not always valid in competitive niches. For example, you want to test the effect of adding irrelevant incoming links. You create a number of links to your testing site from unrelated pages and with varying, irrelevant anchor text. After two days, your site experiences a huge surge in locations in your sterile environment. Those locations stick for a few months and you conclude that getting links improves your locations, regardless of their relevance to your topic or keyphrase you are testing for. You then do the same to a site in a competitive niche. The locations increase after a few weeks, but then start to drop, until your site lands only 5 positions above its starting point. You are baffled, since it is not what has happened in your sterile environment

The main reason for these differences is that you are seeing only a very crude presentation of possibly subtle differences between sites - the SERPs. In other words, there is this big, complex machine - the ranking algo - which adds and subtracts points from a website’s score, according to different parameters that the algo takes into consideration.

Imagine the final ranking score being a bucket of water to which liquid (the points) is constantly being added and taken out of by different glasses, cups, spoons and straws. You do not see how much each of those vessels has added or taken out of any particular bucket. Hell, you don’t even see how much liquid there is in each bucket at any given time. You only see that final order of buckets, and you know that they are ordered by decreasing amount of liquid in them.

If we assume that the final ranking of sites represents a decreasing order of scores assigned to those sites, another mystery is the actual gap between the scores. Top 5 locations on Google could represent 5 point differences between each of those sites:

  1. Site A - 500 points
  2. Site B - 495 points
  3. Site C - 490 points
  4. etc.

If, on the other hand, there is a 100 point gap between each of the sites, we would get a different scoring scale:

  1. Site A - 500 points
  2. Site B - 400 points
  3. Site C - 300 points
  4. etc.

Obviously, an increase of 20 points in one of the sites’ ranking score would have a dramatic effect in the first SERP and absolutely no effect in the second. It also makes sense that “sterile” environments, where all the websites are under the tester’s control, look more like the 5-point-gap SERPs than the 100-point-gap model, thus creating a difficulty in reaching conclusions applicable on competitive SERPs by experimenting on the artificially created ones.

So, in spite of these issues, how do we design experiments and still reach valid conclusions relevant for competitive niches ?

In SEO, like in real world science, the key to this problem is repetition. One can conduct experiments on competitive niches, but single experiments will not teach us anything about effects our actions have on SERPs. It could be that a certain link filter has just kicked in or that a particularly valuable link was taken off. However if a phenomena repeats itself over a number of niches and/or in the same niches several times, the chances of your action being the main reason for the observed change, get just a little bit closer to certainty.

Tags: , , , , ,

Google vs. Paid Links - how will it end ? Lessons from Mother Nature

December 29, 2007 – 10:44 pm

So I hear there is this thing going with Google not wanting people to buy links from other people and lowering their toolbar PR, taking away their ability to pass link juice through links. Terrible stuff! There are even frightening reports of adult SEOs getting lumps in their throats at conferences while Google reps are crushing candies with their bare hands in the back row.

So where is it all going to end ? Who is going to win ? Is the paid links model going to survive ? Is Google going to prevail and eradicate paid links as a method of promoting sites in their precious SERPs ? No one knows for sure. So being both an SEO and an (aspiring) scientist, I turn to science to try and predict how this struggle will end.

First, let’s try and see who the main players are: there are the SEOs - they are trying to get their sites to the top locations. Let’s be simplistic and for the purpose of this comparison assume that the only way to achieve top locations is by purchasing links. Google on the other hand is trying to prevent SEOs from artificially influencing the SERPs and, again taking a simplistic approach, they will fight back by PR reduction and by abolishing link-juice-transfer-powers. So we have two forces, trying to prevail and have it their way. If Google prevails, there will be no more paid links in SERPS (which will open the doors to the next technique). If the SEOs prevail, Google’s SERPS may become less authoritative and relevant.

So it happens that this kind of situation is happening in nature all the time. The predators are trying to outsmart the prey and the prey is trying to hide/outrun/scare away the predator with all their might. The smarter the prey gets, more pressure is on predators which are thus forced to evolve into more efficient hunting machines. Two powers, competing against each other, where the prevailing of one signals the demise of the other in what is known as evolutionary arms race.

Now we turn to science. In Population Ecology, there is a theory called ESS - Evolutionarily Stable Strategy. It states that when such a struggle (as described above) occurs in nature, the winning strategy on both sides will be the one that will preclude a new strategy from replacing the existing one. Since the last sentence does not mean much to nonscientists, I will illustrate by example:

chicks in the nest

Imagine a bird’s nest with several chicks. They have just hatched and are hungry. They all chirp for food which their mother promptly brings to the nest. However, when she arrives with a worm in her beak, she must decide which one of the chicks gets fed first. She cannot remember who got fed the last time. So she gives it to the one that seems the hungriest - the one that is screaming the loudest. For chicks, it turns out that it is worth their while to scream as loud as possible, since that will increase their chance of getting fed and subsequently surviving to become mature birds. Conversely, a nest that produces so much noise will certainly attract the tree-top skimming hawk or a wandering snake. So the louder they scream, the more they increase their chances of having a rather short life span. This is called “the begging conflict” and is seen throughout nature as a problem in communication between parents and their offspring. The Nature paper on this topic, with mathematical models and game theory application to the solution of the problem can be found here (subscription needed, if you want to get a copy of this PDF, leave a comment and I will get it to you)

Let’s analyze how this situation is copied into the paid links vs. Google situation. SEO’s are obviously the chicks. They will try to get as many relevant links as possible in order to promote their clients. The more relevant links they offer their customers, the better. On the other side is Google - the snake or the hawk, whichever suits your current feeling about them. They will be attracted to the sudden surge in links, changes in rankings, unprofessional websites ranking for brain tumor-related searches. So whichever SEO sticks out/ which ever link/post selling service advertises itself the most - gets eaten alive / its PR gets taken away.

How was this problem solved in nature ? Chicks need to eat. Mother birds need to decide who to feed. Hawks and snakes are looking for prey. One possible solution would be for all the chicks to be silent. That way they would reduce to a minimum the chance of being eaten by the local predator and at the same time make the chance of being fed, equal among all of them. However, this strategy would be very short termed. When the mother lands on the nest with that delicious worm and all the chicks are silent, one smart-ass chick will undoubtedly chirp and with one clever move, swipe the worm away from his otherwise-cooperative siblings. Soon enough, others would understand that silence is self-defeating and the screaming will begin again. That is why this solution is not considered to be an Evolutionarily Stable Strategy (or Solution) - it will quickly be replaced with another strategy - one where there is a loudly chirping chick. So the ESS solution would be for all the chicks to chirp moderately. Not too loud, so as not be discovered by the predators and not too quietly so they would not encourage their clever sibling to start screaming. Nests that adopt this kind of strategy are the ones that have a better chance of survival than the ones that don’t and this strategy gets passed down the generations in higher percentages than other strategies, due to higher rate of survival of the ESS adopting nests.

Back to paid links. One possible ESS solution would be to abolish paid links concept altogether. Websites would stop selling links (or prevent paid links from passing link love by no following them), link sale mediating agencies would stop functioning due to the massive abandonment of potential clients. Everybody turns to organic linking and Matt’s department gets back to trying to discover invisible text. However, and this one is even more obvious than in the chicks example, in no time there will be the smart-ass chick that starts selling links and another smart-ass SEO that starts buying them. Since all of his competitors are sitting quietly in their nests, waiting for their organic content to roll in by way of natural links, he has the immediate advantage over them. Other SEOs observing the sudden rise in locations of link purchasing websites quickly understand that in order to compete with them they have to purchase links themselves and, wham, in no time, the link purchasing frenzy is back.

The ESS solution would be not to stop buying links/reviews. It would be to do it in moderation. This way, everyone would benefit from the advantages of the paid links moderately and the relative success would be distinguished by the ability of the SEO to correctly identify the relevant website from which a link should be purchased.

Now the question is, how to enforce the “moderation in links purchasing strategy”?

There could be many ways that this can be achieved - one of them is for the link selling agencies, such as TLA or PPP, to limit the number of links/posts a single advertiser can purchase. Over the time equilibrium would be reached between the ability of SEOs to buy the links in great numbers and the wish/ability of Google to actively seek and destroy such marketing efforts.

There could be other possibilities of reaching this state of equilibrium, although none come to mind at present. However, I do think that it is inevitable that some kind of equilibrium must be reached. Any other solution would perpetrate the evolutionary pendulum between the search engines and the SEOs and reinstate the whirlwind of link purchasing-website punishing we are seeing right now.

Hattip to my friend Tzvika whose paper on SEO-Search Engines relationship as an example of evolutionary arms race, serves as a continuous inspiration for understanding both search engines and evolutionary mechanisms of population dynamics.

Tags: , , , , , , , ,

Sphinn gets your site indexed in Google in just a few hours

December 19, 2007 – 2:20 am

How long does it take for a page on a brand new domain to get indexed and showing in Google’s SERPs ? Weeks, if not months. Well, not if you Sphinn them. Here is a study case of my new blog and the super-fast indexing it experienced in last few hours.

So the subject is the blog you are reading. Domain purchased last Saturday, Wordpress installed about the same time and first two posts published on Sunday. One is Hello World kind of post. The other is about Google Spiders, something that was hanging in my mind for a while now. Being SEO related, I sphinn it. Nothing spectacular, 7 sphinns and a nice short discussion with g1smd, who kindly sinks my theory and sends me to do some reading.

I am already contemplating my next post and I turn to Analytics to see whether the data has started going in already and lo and behold, there is a visit from Google. Keyword: [seo scientist]. Turns out that Google has indexed my post and is showing it in SERP. Now, other sites I have and cherish are working very hard to achieve just that and this greenhorn does it without breaking a sweat. So I decide to pay closer attention to the indexing of my next post.

So this morning I write a post about using Google AdWords to get linking prospects. I sphinn it and wham - 4 hours later the post is showing in site:www.seo-scientist.com search on Google.

There are the ten gazzilion pings my blog is performing whenever I make a post, but that is true for my other posts that were not sphinned and they are nowhere to be found in Google SERPs.

Google’s faster rate of indexing is something that Sphinners have noticed and Googlers have bragged about, but I haven’t seen the connection between the actual sphinning and faster indexing being made yet, so here it goes.

Another interesting observation is that the new posts, although they appear in SERPs, are not being cached yet (the posts from December 14th are in cache) which nicely shows the separation between two levels of Google presence - being in SERPs and being in cache.

I am inserting a link to an otherwise isolated page on a this site to see whether the quick indexing will cause quick link skipping by Googlebot. I will post the updates if they are of any interest.

———————————————————————————————————————-

Update: OK, so apparently the title of the post shoud be “Sphinn gets your site indexed in Google IN A MINUTE. Check this out:

Update 2: Now my non-sphinned post from this morning is also appearing in SERPs, about 12 hours after publication. I can’t keep up.

Tags: , ,

WordPress and Windows Live Writer XmlRpc server problem solutions

December 18, 2007 – 7:40 am

So after installing my snazzy new blog, changing the theme and playing with the plugins, I decided to try the Windows Live Writer. A friend of mine has warmly recommended it and since he is a sucker for WP, I reckoned it is a good bet.

So I download it and install it and when I try to connect my blog to it, I get the following message:

“Invalid Server Response - The response to the blogger.getUsersBlogs method received from the weblog server was invalid: Invalid response document returned from XmlRpc server.”

OK. I couldn’t expect to get it from the first try. So I uninstall the WLW. Nothing. I reinstall the WordPress. Nothing. I change the hosting. Nothing.

So I turn to Google in despair and find four different solutions:

  1. There is a problem with the PHP version. Solution: the following code should be added to the top of the xmlrpc.php file:

    $HTTP_RAW_POST_DATA = file_get_contents(“php://input”);

    Source

  2. There is a problem with .htaccess. The following code should be added to the .htaccess file:

    <Files xmlrpc.php>
    SecFilterInheritance Off
    </Files>

    Source

  3. There is a clash between WLW and some of the installed plugins. Disable the clashing plugins.

    Source

  4. There are some extra lines in all kind of files that xmlrpc.php is referring to so the php functions calling those file are not able to execute. Solution: Use Fiddler to monitor the HTTP traffic between the WLW and your hosting and find the calls to files that are giving the error.

    Source

Well, needless to say, none of those worked for me. However in the comments of the solution #4, I find the following gem:

Check the WLW log as well. Go to Help, About and click on the Show Log File link.

Well, I try and find the following lines in the WLW log:

<b>Warning</b>: include_once(public_html/www.seo-scientist.com/wp-includes/class-IXR.php) [<a href=’function.include-once’>function.include-once</a>]: failed to open stream: No such file or directory in <b>/public_html/www.seo-scientist.com/xmlrpc.php</b> on line <b>43</b><br />
<b>Warning</b>: include_once() [<a href=’function.include’>function.include</a>]: Failed opening ‘/public_html/www.seo-scientist.com/wp-includes/class-IXR.php’ for inclusion in <b>/public_html/www.seo-scientist.com/xmlrpc.php</b> on line <b>43</b><br />
<b>Fatal error</b>: Class ‘IXR_Server’ not found in <b>/public_html/www.seo-scientist.com/xmlrpc.php</b> on line <b>73</b><br />

Just to translate: xmlrpc.php was trying to access the class named class-IXR.php from line 43 and line 73 and not finding it in the /wp-include/ directory. So I access the directory and I see that the stupid FTP client has converted all my filenames to lowercase. One change of one filename and everything works perfectly.

You can’t beat the feeling…

How to find linking resources through Google AdWords

December 18, 2007 – 6:22 am

Any webmaster/SEO has reached the same inspirational rock bottom when it comes to getting links to a website close to his heart - link source exhaustion. With all the directories submitted to, hundreds of emails sent to webmasters, thousands of comment/signature spam links left, you have the will, you have the resources but there is absolutely nowhere left to look. So here is a thought. Google indexes all the sites on the Internet anyway. So why not ask them to give you a list of relevant sites you can contact for links ?

Here is a little step-by-step tutorial that has helped me do this more than once:

  1. Create a substantial relevant keyword list . Don’t be afraid to dig sideways. Even a fleeting association with your main theme is good enough.
  2. Create AdWords campaign for your site, bidding for the keywords from the above list. Limit the campaign to Content Network.
  3. Plan the bidding daily budget in a way so that you don’t spend too much money on it. The purpose of the exercise is not necessarily to get traffic to your site (although it is a nice bonus. Usually badly converting one, but still a bonus), but to develop a list of sites that can serve as potential link sources.
  4. Let the campaign run for a week.
  5. After a week (or couple of thousands of impressions, whichever comes first), run a Placement Report for the period throughout which the campaign is running (click on thumbnail on the right to see the snapshot of the type-of-report-choosing step in Google AdWords)
  6. Harvest the list of websites that showed your ad through Google AdSense program.
  7. Start contacting webmasters and offering them, ahem, your eternal friendship and appreciation in exchange for a link with the preferable anchor text. For better results, preferably contact those websites that have actually brought you conversions. There is going to be a lot of pruning of useless MySpace accounts (although there may be a few useful ones there) but some gems will emerge.
  8. Lather, rinse, repeat.

The idea is that Google is placing ads in the content network on sites they consider relevant to your niche (their effectiveness at doing so is debatable, but that is a different issue altogether). So take the list of URLs that have shown your ad and voilà.

So you may have spent a few bucks on the AdWords, however, instead of purchasing relevant links for big bucks, you have invested some of that money in potentially relevant traffic AND gained dozens of URLs with linking potential. Naturally, this process has a learning curve which will get you to the point where you are bidding only for those keywords that trigger your ad on very relevant sites.

Good luck.

Tags:

My Request from Google

December 14, 2007 – 9:24 am

In the last few years, we can see a lot of effort coming out of Mountain View* to reach out to webmasters and assist them with managing their rankings and the ways their web property is perceived by Google spiders. The toolbar PR, the ever expanding Webmaster Tools, Matt Cutts blogging about what to do and what not to do when optimizing your site (do I even need to link to his blog?), regular updates of the Webmaster Guidelines section, participating in various conferences, demonstrate the many ways Google helps people maximize the utility they derive from their sites and assures them a continuous flow of targeted traffic from The Search Engine (brown-nosing, I know).

Being a webmaster and an SEO, I thought what the heck - its my turn to ask. Let’s see if this idea finds an attentive ear among the people at Google that will, in a best case scenario, spark a discussion among them and decide that this is a great idea, offer me a job (which I will promptly decline) and improve the lives of webmasters around the globe. Or, in a worst case scenario, my idea may not penetrate Google’s crap filter but it might at least raise some dust around the blogosphere and maybe stimulate other, more interesting questions in the same vein.

So what is my request ?

Allow spiders to pass referral information.

For those of you that the above sentence did not make sense, here is a little explanation: every visit to your site elicits a response from the server. Every time someone requests a page, image or a Flash file (or whatever) from your website, the server logs that request in a file called, surprisingly enough, “log file”. There are several pieces of information written in every line of the log file. The more interesting ones are the response code of the server, the time and date of the visit, the kind of request that was made, the user agent, the OS and the IP of the initiator of the request and in some cases the referring URL from which the visitor performing the request came to your site. The referrer information is passed only if this information passing is enabled. For the majority of the human visitors this information is passed to the server by default. However, in some cases, such as search engine spiders, the referral information is not recorded in the log files.

I don’t know if my request is even technically possible, although I don’t see why it wouldn’t be - they do follow links, they do read the content of the site, (or most of it), and they do receive and pass all the other information (time of visit, requested page, user agent). Even if they don’t visit every existing link, every time (as suggested by jdMorgan on WMW), over a long period of time, Googlebot will visit most of my links. Combination of this information from all the other SE bots will compose a comprehensive picture of my links.

What are the benefits of this?

More information - more power to the webmasters :

1. By analyzing my log files and filtering them for spider(s) visits, I can get the full, comprehensive, not-to-be-found-anywhere-else information about the details of all incoming links to my site. If you are a webmaster/SEO, there is no need to explain why this is useful. I don’t think there should be any privacy problems with this - after all, those are my log files and by being able to analyze them I am in a sense confirming my ownership of the site.

2. With this information, I would be better equipped to control my incoming links. I could see who is linking to me and how and try to change that by contacting the webmasters of the sites that are linking to me. I could quickly discover potential attempts of linking sabotage jobs done by competitors (linking from bad neighborhoods, using deceptive and derogative anchor text when linking to my site, etc.).

3. I would also be able to control the flow of link juice on my site. For example: I have a page about blue widgets on my site (damn, I promised myself not to use the blue widgets example). Some other blue widgets site links to my homepage, not because they are mean or evil, just because that’s what a majority of people prefer to do - link to a homepage rather than to an inner page. I would prefer the link to contribute to my blue-widget.html page and not to the homepage. If I had this information about the existence of the link, I could contact the webmaster of the other Blue Widget site and ask him to link to the page of my preference.

What are the drawbacks ?

As with any tool or useful method, this could be abused by spammers/cloackers and redirect spider visits coming from certain links to certain websites, but I think that this is something they can already do. Maybe not at potential resolution that this thing would allow them (links coming from certain websites are redirected to site X and others are redirected to site Y) but it would not add significantly to their ability to spam compared to what they already have.

I think that this would be a very positive step in the direction of putting more power in the hands of the webmasters and would, in the long run, contribute towards better indexing and classification on the web.

What do you think?

*I wonder whether the citizens of Mountain View (all 70,708 of them) mind the fact that there are thousands of people (if not more) around the world who have completely erased their identity as an actual town in California and have equated their existence with the location of a certain search engine HQ?

Tags: