April 17, 2006

Kaboodle releases new features and announces Series A

I wrote some time ago about my involvement in Kaboodle, the company bringing social bookmarking to the masses - especially useful in the shopping, travel and research areas. Since the launch of the company, a lot of feedback has been gathered from the community of users, and many new features have been developed - including new search and sharing capabilities, a new home page, a Firefox extension which has made it on the Firefox recommended list, tagging, etc. Matt Marshall had a nice piece about the company in the San Jose Mercury News this week-end.

Kaboodle New Home Page

The purpose of Kaboodle has remained the same: provide an easy way to collect detailed information about items you are researching - across web sites and services, and then make a purchase (or any kind) of decision. I have seen my 8-year old son, my wife and my mother use and collaborate on the service in that context. The interesting development for me is the increased possibility of benefiting from other users' research and selections - as I am connected to them (in the social network sense) or as I discover their pages through keyword search or tag-based navigation.

Kaboodle is also announcing today the close of a $3.55 Series involving a number of prominent angels - including my friends Ron Conway, Rajeev Motwani and Georges Harik - as well as yours truly. It has been a pleasure to work with Manish Chandra and his team, and I look forward to a continued involvement as the company further develops.

More:

Tags:

February 27, 2006

Edgeio just opened its doors and announced its Angel financing

Edgeio-logoOne of the most exciting and scary moments for a startup is the moment it ships its product/service for the very first time “for real” – without any password, limitation, etc. It is often the result of months of effort by a dedicated team, and gets you to deal with a new constituency: users. Tonight is Edgeio’s turn, and this is indeed an exciting moment.

Edgeio-itemAs Om Malik wrote, Edgeio has received a lot of coverage already and therefore I won’t get too much into a lot of detail of the functionality: Edgeio extracts listings (classifieds, jobs, etc.) made available via RSS feeds, and tagged “listing”. Once a listing has been taken into account, the publisher can add metadata such as the main category under which the item should appear, its price and additional tags. Users can search the marketplace, or use tags and location as filters. There is quite a detailed FAQ here. Here is the item I just posted on my blog listed in Edgeio, and check the Edgeio widget on the right side of my blog.

I became aware of Edgeio months ago upon reconnecting with Keith Teare (who I had met years ago), and meeting Mike Arrington as he started TechCrunch. The concept of edge aggregation of semi-structured data was clearly the next logical step in pushing RSS beyond text syndication, hence the appeal of the Edgeio solution.

Disclosure: I was lucky (and grateful) that Keith and Mike asked me to get involved in the company as a (paid) advisor about six months ago. More recently, I also invested in the $1.5M angel round that was just announced.

Edgeio-chatA lot has been said about Edgeio, and a number of questions and suggestions will be addressed by the team in the coming weeks – and there are still a lot of interesting features coming up. In the meantime, take a look at the service and send us your feedback.

A bonus: here is the end of the IM chat we had with the team just after opening the site. Congratulations again to you guys.

More:

  • Keith Teare announces the launch
  • Fred Oliviera, who designed the Edgeio UI, chimes in
  • If you are interesting in keeping track of Edgeio’s development, the blog is here
  • The TechCrunch review is here, written independently by Nik Cubrilovic

Tags:

September 01, 2005

Lots of new good from Buzznet

Qoop PosterQuite a buzz today around the official launch of Buzznet and Flickr photo printing powered by our friends from Qoop, even if the functionality has been available to Buzzneters for at least a month (press release here). Posters are a particular favorite of mine, especially for $10 a piece (plus tax & shipping).

I thought that this was a good opportunity to summarize some of the features Buzznet has added over the same period.

So, have been added:

  • Journals, so that you can publish your thoughts and personal diaries alongside your photos. And links to pictures can be added to text posts with just a few clicks. New journal posts are displayed on your Buzznet home page.
  • A new search interface: keyword search and a powerful people search. The keyword search is also integrated in your home page (below your list of galleries).
  • Presence: you can now see who in your friends list in online.
  • QuickEdit of galleries has been enhanced to batch edit all data related to a number of pictures in one go.

Many more functionality to come in the next few weeks, as well as the much expected overhaul of the back-end server infrastructure.

Buzznet PartyAnd if you are in LA on Sept 1st (tonight), don't miss the Buzznet Party (the first 300 Buzzneters showing up will get a free admission).

Disclosure: I am a shareholder and consultant to Buzznet Inc.

August 22, 2005

Glenbrook Networks: Trawling the Deep Web

The Glendor ShowcaseIn my last post referring to the piece of the San Jose Mercury News on Glenbrook Networks, I mentioned that we would dig further in the technology used to build the Glendor Showcase. This first post covers the extraction of data from the Deep Web.

The majority of web pages one can access through search engines were collected by crawling the so-called Static or Surface Web. It is a smaller portion of the Internet reportedly containing between 8 and 20 billion pages (Google vs. Yahoo index sizes). Though this number is already very large, the total number of pages available on the Web is estimated to 500 billion pages. This part of the Internet is often referred to as Deep Web, Dynamic Web, or Invisible Web. All these names reflect some of the features of this gigantic source of information - stored deep down in databases, rendered through DHTML, not accessible to standard crawlers. Pages in the Deep Web typically might not have a standard URL, and cannot be addressed in a standard fashion. In many cases, they actually do not even exist until a user asks a question by filling up fields in a form, and a response (page) is generated. Typical examples of deep web applications are airline reservation, online dictionaries, etc.

It is supposedly quite easy for a human to navigate through the Deep Web. One just needs to fill up a form by choosing one of several options like destinations and dates a on travel site, or entering a word to search for a meaning or a translation. It is much more difficult for a machine to do so automatically and generically. Because the Deep Web contains a lot of factual information, it can be seen metaphorically as an ocean with a lot of fish. That is why we call the system that navigates the Deep Web a trawler.

There are two major problems with navigating Deep Web automatically. First, the trawler needs to understand what questions to ask through aforementioned forms, and ask them exhaustively. Second, the trawler can not easily navigate from one page to another since pages do not have set URLs or might not even exist. That's why the trawler needs to remember where it came from and return to the surface (like a whale) before "diving" again to ask the next question.

If the number of sites is relatively small, say a few thousands, each set of forms could be described manually through a templating system. Its major limitations are scalability, and non resilience to changes in page formats. 

There is a third problem that is related to the size of the Deep Web. It is so big that one needs to focus on a particular subset (vertical) to have a chance to trawl it with some level of success, especially if high precision is an important factor. Since the task of determining what questions to ask includes understanding of semantics and context, the focus on a vertical comes handy.

Glenbrook's approach to building a trawler is based on mimicking the behavior of a (human) user. It is a useful approach since the "doors" opening the Deep Web were built with a human in mind and reflect the standards (no matter how loose) that humans use to navigate the Web.

The Trawler consists of five layers:

  1. Discoverer - locates perspective target home pages in Surface Web
  2. Scout - navigates Surface Web part of a web site and finds the "doors" - DHTML pages that contain forms leading to the Deep Web part of a web site
  3. Locksmith - fills up the forms with various requests and collects responses
  4. Assessor - analyses responses and makes a decision to use this door as candidate to query the Deep Web part of the site or move elsewhere
  5. Harvester - collects all relevant pages from Surface and Deep Web parts of the web site

After all potentially relevant pages are harvested the Extractor takes over. The Extractor is a hybrid system that applies Pattern Recognition, Natural Language Processing and other AI techniques to extract facts, combine them and populate a database that is used to provide factual answers to search queries.

The Extractor will be the subject of another post.

Tag:

July 05, 2005

Introducing the Glendor Jobs Search Showcase

I have done tens and tens of product launches in my 15+ years in the software industry, but they always look the same: even with a lot of planning, you end up spending the last night polishing the final details, and sending bunch of emails to your poor team members who’d like nothing but relaxing a bit after having been under pressure for a long time. And the excitement of unveiling something that you have been working on for months or years is (thankfully) always there.

Glendor Showcase
It is therefore my great pleasure to announce the official launch of the Glendor job showcase, developed by one of my clients, Glenbrook Networks. Glenbrook was founded in 2001 with the objective to deliver a next generation search technology, one that would enable the extraction of information in a given vertical market with a high degree of precision. The company has developed over the past four years a unique technology platform that automatically extracts unstructured data from Internet sources (company web sites, online publications, semi-structured feeds,…) and turns them into structured facts that can be aggregated and stored in a database. Unlike standard search engines, theirs are capable of providing precise answers to complex business or temporal queries. The business of the company is currently to license its products and data services to search engines and large information providers, and will include at some point the development of its own vertical search engine(s) in specific markets.

One of the common issues one deals with in the search and information extraction space is that you need to demonstrate, at some scale, the capabilities of your product. It is even worse when you have a platform that can be equally be used in a number of verticals. That’s why we have decided a couple of months ago to develop a vertical search engine showcase – in the jobs market. Why a showcase, and not a beta? Because it does not intend to cover hundreds of thousands of companies, and millions of listings – just enough to prove that the technology delivers on its claim to fame: automatically extracting job listings from “the long tail”, which in the jobs market refers to individual company web sites and local classifieds sites. So the initial scope of the showcase is to extract jobs from hundreds of Bay Area company web sites, local jobs from one major board, and eventually a classifieds site.

Glendor ShowcaseWhy the jobs vertical market – which is already well served by talented teams ? Because extracting listings from company web sites exercises all aspects of our technology to produce quality, structured results: surface and dynamic web crawling, layout recognition, natural language processing,... And we believe that the “deep web”, guesstimated at 500B+ documents a few years ago, is where the action is going to be: extracting information available behind dynamic forms and DHTML rendering, and delivering high quality results. And this deep web crawling requirement can be found in local search, travel search, fraud detection, etc. – and is a tough nut to crack automatically, mixing AI and search algorithms.

We’ve also strived to build this showcase as a (modest) Web 2.0 application: we deliver search results through RSS, and we map job listings onto Google Maps. Yep, a first in the jobs listings space – for a few days or weeks, now that the API has been released we expect that most of the players in the space will add that cool functionality (our engineers built this application without the API). And we have a couple of other things up our sleeves that we will be rolling out in the coming weeks.

So enough said: have a go at the jobs showcase, make sure to try the mapping of jobs (and play with the zoom), do let us know if you find any issues, and give us your feedback. We also have a blog that will talk about the showcase and the typical challenges one is facing in developing a vertical search engine. It will also relay interesting news related to the development of the vertical search industry - which is booming, if you consider that Vertical Leap gathered almost 300 people for its first edition last week.

You can also find some information about Glenbrook's technology and products. Feel free to get in touch to find out more about the showcase, or what we can do to help.

And congratulations to the Glenbrook team for this great work.

Update: Glenbrook Networks, and its Deep Web trawling technology, have been featured in the San Jose Mercury News. Read about it here.

September 01, 2004

An Ultra reliable DNS is key to your web infrastructure (and Amazon seems to agree)

It is with great personal satisfaction that I have spotted that Amazon.com's DNS entry was now pointing to UltraDNS servers (like softtechvc.com, linkedin.com, and 8000 other companies and 26 TLDs). The press release just came out, announcing that Amazon had selected UltraDNS to optimize its Web site infrastructure.

Why is DNS important ? Because, as most of you know, it is like the door knob to an Internet presence. If it does not work, you can't get in your web site (the famous error 404 that you get when a browser can't reach a page) or receive emails. Unfortunately this happens more often than one would like to think: DDOS attacks, operational errors, and simple mistakes lead to unavailability of Internet resources for minutes, hours or even days (in the worst case, it can take up to 48 hours to update one's DNS entry on all Internet name servers).

Continue reading "An Ultra reliable DNS is key to your web infrastructure (and Amazon seems to agree)" »

On the Web


  • www.flickr.com
    This is a Flickr badge showing public photos from jeffclavier. Make your own badge here.