Where's the open source distributed search?

Back before Google, a lot of hackers were writing search engines in their free time. The general consensus, at least from my own recollection, was that search was a problem that needed to be solved, and that all the current solutions more or less sucked. Today, search encompasses a huge territory and there are still a lot of problems to be solved, but, for the most part, web search is extremely usable and reliable. It's not perfect, there's room for improvement, but it get's the job done. I don't know too many people these days who spend their time hacking search. Why re-create such a low-level service when there are so many innovative and higher-level web applications to be built?

The thing is, search is the operating system of the web. The fact that we have no open-source/open-data search infrastructure is as bad as if there were no Linux or OpenBSD. If Google, Yahoo and MS weren't providing such a great product, my guess is that the hacker community would be attacking this problem like Captain Kirk on a lizard monster.

Where We Are:

Currently, there are a number of open source projects related to general web search. Most notably, the Java based Lucene project is a solid foundation for indexing and information retrieval, and it's what the Nutch search engine is built on.

There are a few distributed crawlers like Grub and Majestic 12. Unfortunately, these both pass data to a central, private storage system. The hard work of crawling and indexing is open for everyone to participate in, but the resultant data is not.

Where We Need To Be:

In my mind, search hackers need to create an open source solution for the following:

  • A distributed mechanism for crawling and indexing the web on a mass scale.
  • Distributed, decentralized, redundant data storage for the cache and index.
  • An end-user, public facing interface for querying the distributed index.
  • A mechanism for retrieving or crawling a local, private slice of the index and cache, for research or personal use.
  • A way to publish alternate indexing models to the distributed grid.

All of these tools need to be designed with the assumption that anyone can and will have access to the system's data, and as the system grows, there will be people, corporations, and governments hell-bent on corrupting the search infrastructure to their advantage.

It's not an easy problem to solve, but you've got to admit it's an interesting problem. Anyone keen on being the Torvalds of search?

Where To Begin:

The Lucene Project - Link
Nutch Open Source Search Engine - Link
Open Source Search Wiki - Link

Have I missed anything? Please share your thoughts on open source search in the comments.


Recent Entries

Comments

Oldest comments listed first.

Posted by: flowgar on October 19, 2007 at 4:30 AM

Have a look at FAROO, a peer-to-peer web search engine (although not open source). It was selected as one of the finalists at the TechCrunch40 conference. Distributed crawler, distributed index, distributed ranking.

See also Building An Open Source, Distributed Google Clone


Leave a comment


Subscribe to MAKE!Subscribe to MAKE Magazine!

Subscribe today, save 42% and get web access to MAKE free. MAKE Digital Edition is available only to subscribers.

$34.95 / 1 year
(4 Quarterly Issues)

Subscribe now


Void your warranty, violate a user agreement, fry a circuit, blow a fuse, poke an eye out. Make: The risk-takers, the doers, the makers of things... Welcome to Make: Online!


CRAFT Maker Shed Maker Faire MAKE television
Holiday Gift Guides from MAKE
Gifts for Dads
Science and Chemistry
Gifts Under $20
More guides: Santa Claus Machines, Geek Toys for Grown Up Girls & Boys


Check out all of the episodes of Make: television

Alex Rider Dream Gadget Contest
Make: Science Room

Connect with MAKE

Be a MAKE fan on Facebook MAKE on Facebook
Visit our Facebook page and become a fan of MAKE!
MAKE on Twitter MAKE on Twitter
Follow our MAKE tweets!
MAKE Flickr Pool MAKE on Flickr
Join our MAKE Flickr Pool!
    make_tips on Twitter




    Maker SHED

    Advertise here with FM.

    Why advertise on MAKE?
    Read what folks are saying about us!

    Click here to advertise on MAKE!



    Subscribe to MAKE Magazine!

    Make: Online authors!

    Gareth BranwynGareth Branwyn
    Senior Editor


    Phillip TorronePhillip Torrone
    Senior Editor
    | AIM | Twitter


    Becky SternBecky Stern
    Associate Editor
    | AIM | Twitter


    Marc de VinckMarc de Vinck
    Contributing Writer
    | AIM | Twitter


    John ParkJohn Park
    Contributing Writer
    | Twitter


    Sean RaganSean Ragan
    Contributing Writer
    | Twitter


    Matt MetsMatt Mets
    Contributing Writer
    | AIM | Twitter


    Dale DoughertyDale Dougherty
    Editor & Publisher
    | Twitter


    Shawn ConnallyShawn Connally
    Managing Editor
    | Twitter


    Goli MohammadiGoli Mohammadi
    Associate Managing Editor

    Kip KayKip Kay
    Weekend Projects
    | AIM | Twitter


    Collin CunninghamCollin Cunningham
    Contributing Writer
    | AIM | Twitter

    Adam FlahertyAdam Flaherty
    Contributing Writer
    | AIM | Twitter



    More contributors: Mark Frauenfelder (Editor-in-Chief, MAKE magazine), Kipp Bradford (Technical Consultant/Writer), Chris Connors (Education), Diana Eng (Guest Author), Peter Horvath (Intern), Brian Jepson (O'Reilly Media), Robert Bruce Thompson (Science Room)

    Suggest a Site!

    Current Podcast

    itunesdl.gif Weekend Project: Beetlebot Simple robot from your parts bin that avoids obstacles. Thanks go to Jerome Demers for the original article in MAKE, Volume 12. To download the Beetlebot video, click here or subscribe in iTunes. Check out the complete Beetlebot article... More...

    Get the Make: Online sent via email
    Enter your email to receive Make: Online each day:



    MAKE Fascination video series brought to you by Dow

    Make: Education
    MAKE: en EspaƱol MAKE: Japan
    Important please read


    Subscribe to MAKE Magazine!

    Recent Posts from the Craft: Blog