Perl coder needed (was Cataloging Audi Info)

Brett Dikeman brett at cloud9.net
Tue Mar 19 00:54:33 EST 2002


At 11:47 PM -0500 3/18/02, TM wrote:
>Question for all of you:
>How do you catalog all of the info that you've collected over time?
>
>I have a ton of collected posts and emails that I need to organize in
>some fashion and was wondering how you did it. I'm using Outlook and
>am thinking of just creating a whole new personal folder just for Audi
>stuff and trying to organize everything by subject matter, periodically
>archiving the whole mess to CDR.

I hadn't wanted to discuss it on-list(I wanted it to be a surprise) 
but a few friends and I had been working on a much better archive 
format.  We started work before Christmas, and work fizzled out big 
time.  So no surprise(we originally thought we'd have something to 
show by January 1st and wanted to present it as a new year's surprise 
to the list.)  At the moment, we're hung on scripts to do three 
things:

a)import a single message, received through stdin(called from 
/etc/aliases)  Ie, keeps the archive up to date

b)import an entire MBOX of messages.  Both majordomo's archive 
program and Mailman's archive program store messages in this 
format(in addition to the HTML.)  This is not -nearly- as simple as 
it sounds.  All sorts of different header formats(and different 
headers), forwarded emails, attachments, wrong dates, etc all make 
the problem pretty messy.

Once we get the data in, -basic- searching isn't a problem, and we've 
already got some basic frontend stuff set up...that stuff is pretty 
easy.

Basically, I -desperately- need someone who has experience with perl 
coding and some SQL(specifically PostgreSQL, but pgsql is entirely 
SQL compliant and the most complete SQL implementation) to look over 
what's already done(the import-one-message script is partially done), 
get up to speed on our DB layout etc and help us finish both scripts.

There's another problem, and I'll mention it in hopes a lightbulb 
goes off in someone's head...we need a full text search engine that 
can index content in an SQL database.  You wouldn't believe how few 
of these things there are, and how much people want for them. 
Searching HTML and plaintext files?  Free or pennies, even ones that 
STORE their indexes in an SQL db.  Actually indexing text that is IN 
an SQL database?  $50,000+  There's -some- stuff partially integrated 
into pgsql already, but it's featureset doesn't match our needs very 
well.  While intelligence about word forms isn't really necessary, 
ability to handle odd pseudo-words like "5kstq" is ESSENTIAL for 
obvious reasons, and the search system needs to recognize such words 
entirely on its own.

Once we get over the import hurdle(even just importing new messages), 
I can promise you all you'll really like what you see.  It will be 
very much a work in progress, but I have a lot of exciting and useful 
features in mind...but again...the big problem right now is getting 
the stuff into our DB.

Please contact me off-list if you're able to help, it will be -much- 
appreciated.

Brett
-- 
----
"They that give up essential liberty to obtain temporary
safety deserve neither liberty nor safety." - Ben Franklin
http://www.users.cloud9.net/~brett/



More information about the quattro mailing list