Truevent is a search engine built on top of the Yahoo! boss api.

I was talking to a search researcher about their offering.

researcher: i think the underlying theory is a bit overstretched...

researcher: well, i think there's nothing new about it

Paul Birnie: you mean the linking of words associated to other words

Paul Birnie: well at least boss is achieving something - making lots of people try different things

researcher: they fed the system with documents about eco-stuff

researcher: modeled, somehow, the word patterns

Paul Birnie: like simple signature detection

researcher: and now do some kind of re-ranking of boss-results based on the similarity to that model

6 easy steps to making a fortune by creating a cool vertical search engine startup based on the boss api are:

  • signup for the boss appid
  • use the boss mashup framework to retrieve search results from boss api
  • pull data from another webservice at the same time
  • mix and rerank the results based on some "patent pending" technology
  • get bought by Yahoo!, Google or Microsoft - either because its good PR or because your product actually works.

The problem with search is it very hard to get users to switch from one search engine to another. All of the major search engines use MLR to do ranking of results and finding a new page feature that can provide >1% improvement in ranking for the average user, is very hard (All the easy and even hard stuff has been done)

Custom vertical search engines:

Creating custom vertical search engines is not new: Google and Yahoo have both supported "site restrict" to create a custom vertical search engine for years

Google provides Google Co-op Custom Search Engine

Yahoo has shut down sitebuilder which also used to support site restrict to a custom list of sites.

afaik. The boss api is currently limited to around 22 site restricts before you run out of space in the GET webservice call. I wonder if POSTS are supported- POSTS are not yos compliant

The problem with the search market is its really hard to get users to switch to another supplier.

Boss mashup framework:

I really like the mashup framework that comes with boss - it allows you to simply instantiate a object and pass a webservice url in the constructor. The mashup framework will automatically find a collection in the response and create a dictionary from the webservice response. You can then write sql-like syntax to mashup (mix) the data returned from each of the services.

About a year ago I met Douglas Mc Ilwraith, who is working on research into body sensor networks at Imperial college

The device I like is the accelerometer that fits behind you ear and uses bluetooth to transmit information about your head movement

here are some links

http://www.hotelmarketing.com/index.php/content/article/hotelscom_introduces_visual_hotel_search/
I think the thing about it is that it sells you on a experience before you buy
The emotive pictures sell you on the idea of of the holiday - I found myself imaging my ideal holiday and was really keen to book the trip

Another example of a tool that allows users to have a discussion around a visualization - this time focused around stock prices - Why has google not already created this feature and a community around it - because they are crap at creating user communities.

Checklists are :

  • A practical means of capturing processes and procedures
  • Ensures that issues are not missed
  • Quick to create
  • Quick to update
  • Easy to follow
  • Easy to adapt to any situation

A quote from Steve McConnell, author of Code Complete and Rapid Development, editor of IEEE Software's "Best Practices" column.

"Create and use checklists. Checklists are an often-overlooked, low-tech development tool, but they are useful in many areas. They are created from experience, so they're inherently practical. Use them during requirements time to avoid missing key requirements. Use them at architecture and design time to be sure your design accounts for all relevant considerations. Use them during design and code reviews to help reviewers catch the most common problems. Use them at software release to assure that, in the last-minute rush to release the software, you don't make careless mistakes. "

Principles related to checklists:

When using a checklist, if an item doesn't apply, then don't use it. If you aeroplane doesn't have retractable undercarriage then you don't have to check that you undercarriage is down.

"Everyone knows the story of the engineer who in the 1970s cut a hole in the roof of a 3 Series car in his garage and cobbled together BMW's first convertible to wow reluctant board members. Production engineers in the paint shop recently racked up an impressive first with a new powder-based technology to apply the final clear coat on a car, providing a more perfect finish and better scratch resistance, and completely eliminating toxic waste. "All the Japanese and American auto makers have come to view it," says Walter Wimmer, head of the paint shop at the Dingolfing plant."

It was a Ford engineer, Harold K. Sperlich, who in the 1970's came up with the idea of a van big enough to haul a family but small enough to fit in a standard garage. (Granted, vehicles like the VW Microbus and Corvair Greenbrier had been there before.) Mr. Sperlich's boss, Lee A. Iacocca, took the proposal to top management, but they dismissed it as too risky, given the cost it would entail. When the two men wound up at Chrysler in the early 1980's, they resurrected the idea and the minivan was born. A concept deemed too risky for Ford ended up saving Chrysler from extinction.

People mention the story of the engineer at BMW - but what happened to him - Did he get a bonus, a promotion, a $1million?

pie chart

axiom: When brainstorming - Engineers should be given the "problem use case" first, then brainstorm solutions

Engineers often come up with an idea - then find a problem and apply the idea - the risk is that what they thought was a problem isn't really a serious one.

It may be better to split brain storming into three steps:

  1. "problem use case" extraction from problem space
  2. "problem use case" prioritisation
  3. brainstorm possible solutions for a chosen use case

For example:

I could think of 100 ideas on how to improve the search quality in answers.yahoo.com - but when real life tests are run I could easily find that the feature I have defined only applies to 0.001% of situations in reality.

Step1: "problem use case" extraction

It is better to start with looking at what the user was trying to do eg: Gemma was trying to find an idea for a good date but when using answers.yahoo.com and got the following answers

Problem: user is looking for advice that is local. User is subjected to some random unhelpful comments.

Percentage occurrence of this use case: 65%

Step 2: "problem use case" prioritisation

Sort list of problem use cases and pick high priority cases

Step 3: Brainstorm possible solutions for important use cases

Predictify.com - Specializes in prediction market questions - a special category of question - something that we don't know the answer to now but will be able to definatively know the answer to in the future.

This can be used for :

Supports reporting the current prediction by user demographic or by area of expertise of the user

Other random features:

  • You are able to start asking a question straight away - T&C + reg are deferred to end
  • Popup tips while you are asking the question
  • Question ask progress meter
  • Outcome is either multiple choice or numeric
  • User chooses "end date" and "verification date"
  • Option to upload an image associated with the question
  • They mention getting a group of people together who are specialized in a particular area - for example: creating Brittany Spears page
  • No community moderation features - questions are sent to the "predictify team for review"
  • Because of the time nature and fact that it is based around answer options options - The questions are poor for SEO

more info in this video - The rise of crowdsourcing

(Warning: please check the date of this posting - as the information provided here is not up kept up to date)

1) you can phone 0845 010 5200. opening hours are 0900 - 2100 Monday to Friday. is very good for any questions.

2) to write the "life in the UK test" -
search engine with "test centers" are details how to book for the test is on this site.

THERE ARE 2 TYPES OF BOOKS FOR THE TEST. Get the small study guide, Large book is a waste of time.

You need to study - for the test - its multiple choice based on the simple facts in the book.

3) fill in form AN - see http://www.bia.homeoffice.gov.uk/britishcitizenship/applying/applicationtypes/naturalisation/

You don't have to get entry-exit dates 100% correct - I did it to the best of my ability. I had 70+ entires in the exit and entry dates, some of the dates must have been incorrect. I used an excel spreadsheet to capture the dates from my passport and old emails from BA confirming my flight details (some of the dates are complex to reverse engineer - eg. fly from US, stamp in US date is not same as UK entry date. UK tends to only stamp on the way into the country - so you end up with 3 stamps per trip (3 columns in my excel spread sheet + plus another for "days out country" calculation).

4) application it cost me £655 - - I just gave my debit card details when I went to the Nationality checking service (see below)

5) I went though the nationality checking service - I just phoned around till I found a place that had an appointment that I could book soon. the list of numbers was on the site (http://www.bia.homeoffice.gov.uk/britishcitizenship/applying/checkingservice/). They checked my application form, photocopied my passport and dispatched the application off to the correct address - all worth £30. I suspect that my application was processed faster because I used this service.

6) ceremony - you have to do this within 3 months of you confirmation letter - have this tomorrow.

7) apply for a british passport - another £80

From http://www.lovemytool.com/blog/2008/06/bootstrapping.html

"My own experience with starting and running startups is that time has indeed changed. I am not saying that time has changed so completely that the old model (using VC money to start companies) does not exist, I am just saying (humbly) that a new model has emerged that allows entrepreneurs to bootstrap meaningful companies achieving sustainable revenues, starting with their own money and their sweat equity (and more precisely, avoid taking VC money).

Entrepreneurship is all about wealth creation and it should be a conscious choice. As entrepreneurs, our goal is to maximize our returns and minimize our risks, and my experience is that ultimate success has a lot to do with impedance matching. Should we match the impedance for the VC's, which means big exit, big risk, big team and big funding? Or should we try to match the impedance for the entrepreneurs, which calls for modest exit, modest risk, modest team and modest funding?"

BBC visualization tool for weather rocks

This tool allows you to see the animated weather in UK - clouds, wind and temperature - brilliant- too bad the may bank holiday weekend had such poor weather.

Adam Jacobs is a Founder of HJK Solutions which helps startups build stable, scalable, and repeatable infrastructures utilizing open-source tools

one of the tools they use is Puppet.

http://reductivelabs.com/trac/puppet/wiki/AboutPuppet

"Rather than approaching server management by automating current techniques, Puppet reframes the problem by providing a language to express the relationships between servers, the services they provide, and the primitive objects that compose those services. Rather than handling the detail of how to achieve a certain configuration or provide a given service, Puppet users can simply express their desired configuration using the abstractions they're used to handling, like service and node, and Puppet is responsible for either achieving the configuration or providing the user enough information to fix any encountered problems."

Building a plugin for this that will generate a dot file that can then be pumped though graphviz - to product a diagram of what the servers are and how they are connected  could be good fun

some quotes from the long tail  on wikipedia, google and wisdom of crowds

Wikipedia-

Like a biological system, it evolves, selecting for traits that help it stay one step ahead of predators and pathogens in its ecosystem.
 

Paul Graham about google -

“The web naturally has a certain grain, and Google is aligned with. That’s why their success seems effortless. Their sailing with the wind instead of sitting becalmed praying for a business model, like print media or trying to tack upwind by suing their customers like Microsoft and the record labels.”

Don’t try to force things to happen their way, figure out the tend and be their when it happens.

“The answer is not a simple yes or no, because it is the nature of user generated content to be messy and uncertain at the microscale, which is the level at which we usually experience it, as it is amazingly successful at the big picture macroscale.

“Wikipedia, like Google and the collective wisdom of millions of blogs”  [wisdom of crowds], operates on an alien logic of probabilistic systems – a matter of likelihood rather than certainty.  Our brains aren’t wired to think interms of statistics and probability. We want to know whether an encyclopaedia entry is right or wrong.

Market economics and evolution are counter intuitive to our mammalian brains

ie. when designing your roadmap:

  • understand that fundamental parts of the web are counter intuitive to human thinking
  • design your product so it can sail with the wind

When I mention "the long tail" to some people - they roll their eyes and go "oh yeah - that book". When you ask further you realize that they liked the book but think that it was all focused about amazon. This is understandable - since a lot of this book focuses on this - but the reader has missed the genius of "the long tail" and will miss the genius of his next book "free".

Firstly:

An open source book -Whilst Chris is the author of the book - it is essentially and open source development - the book is the result of his blog - to which thousands of people contributed ideas and criticisms. It is similar to a project like linux - lots of people working together to create a book - overseen by an architect. His second book will also use a blog - although I suspect he will have SEO problems with the word "free" and needs to invent a new term - possibly something like freeconomy or fre or fr2e

A free podcast about the the FR2E book

The principles are axioms: The principles discussed in this book apply to many web apps.

The long tail explains why in a search frontend cache

  • 8 million queries - I get a cache hit rate of 31%
  • 64 million entries - I get a cache hit rate of 35%.

Lots of users are searching for previously unseen queries. [note: previously unseen queries - means say 8 hours, since I don't want the cache to get stale]

It explains the problem of "should I cache this last query" - I simply don't know if someone will search for it again in the future.

The long tail explains - why I would want to encourage users down the long tail - users find more value in niche content.

The long tail explains:

why you don't want to discourage poor UGC content on your site and instead want to focus on creating powerful filters that can match the correct content with the correct user - ie. what you classify as poor content, may be highly valuable content to others - the site should simply not be showing you this content. The cost of storing the content is very small as its is not limited by physical "bookshelf space"

The long tail explains what will happen to consumption if the filters and content matching improve - it suggests "a couch potato will still watch the same number of hours of tv, irrespective of the number of channels", amazon users by books more valuable to themselves (and amazon can charge a premium on this niche content).

The long tail is a fractal - no matter how custom your field - you will still find a long tail. take for example "blogs on mobile css".

    The Free book

    I hope will explain why:

    • why yahoo.com has free news, mail, games - they are like a free paper magazine - giving away content to get eyeballs (readership).
    • why open source exists
    • The motivation of why people contribute to answers.yahoo.com. For:
      • reputation
      • attention
      • respect
      • fame
      • fun

    I was working on some java code and as usual oracle was throwing a cryptic error saying "table not found" but not specifying the table name. Its not my code and the code is in a complete mess - I wanted a quick fix, so in the method that is calling preparedStatement.execute() I wrapped it with

    try {

    preparedStatement.execute()

    }

    catch(SQLException e){

    throw new Exception( "Could not execute |" + sql + "|", e);

    }

    Two points:

    point 1) why cant the oracle driver do this for us and give clear error messages - 4 years ago I tried to fix this problem by decompiling the classes12.jar file - but I think the jad decompiler was not working and I could not get the code to recompile.

    point 2) I know that java was designed with the feature of showing the types of exceptions thrown in the method signature. Over the 4+ years I worked with java - I found this feature to be completely pointless.

    reason a) When an exception is thrown 95% of the time - the code simply prints the exception to the log and maps it to an error message for the user - knowledge of the specific exceptions has little value. If you want to do something special for a particular type of exception - you will discover this Exception type during testing and can add a specific catch for it.

    reason b) It creates tighter coupling between the caller and the method - since the caller is encouraged to know about the specific exception (in this case: SQLException). The caller just wanted to read the data - he didnt need to know that a SQL database was being used. If a different persistent store (such as file) was swapped in the caller code needs to changed - With the refactoring tools available this is not really a problem - my point is it is creating tighter coupling. Creating a hierarchy of Exception types (eg. PersistanceException) has always looks beautiful on a class diagram - but seems to encourage overdesign with multiple layers of abstraction before the need for this abstraction exists.

    reason c) All methods can thow runtime exceptions (eg NullException ) - so the method signature is just a subset of all of the exceptions that can be thrown. You still have to expect and handle exceptions that are not listed.

    Personally I think we should avoid this "feature" of java and standardize on throws Exception in method signatures throughout the code :)

    http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html

    this book is fantastic if you want to understand the basics of search.

    I would with a lot of search experts, and by working though this book I am able to have basic conversations with them.

    Intend to do a summary of this book at some stage.

    Is the Patriot act causing a similar problem to the old law of US companies not being able to sell certificates higher than 40bits. One in which non-US companies have an advantage - like Thawte which was producing > 40bit keys for international sale when the US could not? Will web 3.0 consist of apps running as hosted services like salesforce.com or will companies insist on servers hosted internally like the Google search appliance.

    http://tech.slashdot.org/tech/08/03/24/1959201.shtml

    "The issue here is not with users voluntarily using Google services (search, gmail, etc.). Rather it is with companies who want to outsource their data needs to Google. In addition to the visible public products that Google has, it also offers corporate solutions: for instance if a company wants to outsource their email system, or have Google run search and collaborative software for use inside the company."

    "Google is trying hard to make these new kinds of products work. But unfortunately U.S. laws mean that any data that ends up on Google servers can be snooped by U.S. authorities. Many companies don't like the idea that the U.S. government will have such broad access to their data. In many countries where strong privacy laws exist (Canada, U.K., etc.), allowing the data to be managed by a U.S. company would then actually be illegal--since the company couldn't guarantee integrity or privacy of the data."

    "The end result of this is that Google is at disadvantage in the global marketplace because of the over-reaching U.S. laws. Google isn't the only one, of course: I'm sure U.S. companies have been losing lots of contracts because international businesses are wary of storing or moving data through U.S. systems since it is now well-known that such systems are not immune to U.S. government monitoring or interference."

    http://tech.slashdot.org/tech/08/03/24/1959201.shtml 

    an interesting comment on slashdot - indicating that will crawl urls seen in gmail emails.

    "Nothing, absolutely nothing, stops Google from harvesting everything they can get their hands on- and they have the storage systems and human expertise to do it.

    Case and point: I emailed a link to a wiki I had just set up to 3 people, two of whom had Gmail accounts. A spider from Google hit the page hours before anyone else did, hitting the wiki just after I emailed the link out. There were no public links to the site, and no referral URL."

    At the moment there seem to be two opposing mentalities in how to work

    Mechamism 1: Prioritize and escalate 

    When someone gives you a task to do either answer with:

    either say -"no, that is not in my priority list" or "to you manager - what is the priority on this"

    When you encounter a problem,

    For example:

    Your manager says "dig a hole here". You pick up a spade and start to dig. If you hit a rock. Put spade down and escalate to your manager.

    Mechamism 2: Get the shit done
    This is a very dotcom approach of bonding together as a team and getting stuff out the door. If something is in the way, either move it or go around it.

    You could run your UGC website active-active akamai (ie. servers in both colos are serving traffic load) - Lets say the colos are: west coast and east coast. The problem is that if you site has user generated content(UGC) - some users (1%) will find themselves being thown from one set of servers to the others in the middle of their sessions. Given that the east coast database may contain something that they just added and that full bi-directional database replication is hard to do and is delayed - many UGC sites simply run active-passive.

    Chris: Flipping is rare.
    Chris: Around 1 %.
    Chris: But it can flip anytime.
    Paul Birnie: but over what time period
    Chris: It will flip when an ISP makes a network change.
    Chris: This will happen every day, several times somewhere.
    Chris: There will be a constant number of networks flipping all the time.

    Next Page »