MESSAGE
DATE | 2007-11-18 |
FROM | Ruben Safir
|
SUBJECT | Re: [NYLXS - HANGOUT] Website Updates
|
On Sun, Nov 18, 2007 at 12:50:44AM -0500, Ron Guerin wrote: > Ruben Safir wrote: > > On Sun, Nov 18, 2007 at 12:17:55AM -0500, Ron Guerin wrote: > >> Ruben Safir wrote: > >> > >> Sorry, I missed this before... > >> > >>> Maybe there is a magic way around this if the header tells me how many lines of > >>> content there is. Then I can gobble up the content without viewing > >>> the individual lines. > >> Yeah, chop off the headers by assuming everything from the start of the > >> file to the first blank line is a header. Parse those to your heart's > >> content for headers. > >> > >>> I'm open to suggestions. Meanwhile I just noticed that the message body is being doubled so I > >>> need to look at the code again in the morning when I get home from work. > >> Then when you get to the body, don't try to parse the entire body for > >> everything either. Keep cutting it down, parse out the pieces you don't > >> need to do anything else with, like the binaries. You don't need to be > >> running text searches on those, it's just going to burn up cycles and > >> heaven forbid, actually match something. > >> > >> Then when you've got only the parts of the body you want to search, run > >> regexes on that last bit of remaining content. > >> > > > > How do you know when the body ends? The body ends with a line feed and a fromline > > > > I can get the ehaders like you suject, but the delimitator is the next header. > > > > From ruben-at-mrbrklyn.com Sun 18 Nov 00:38:27 2007 > > Smilies :) ;) > > Oh, I see. You know,... Kevin was right. You should be using the > existing message parsing libraries for Perl, because you don't know what > you're looking for here. And I'm not being snippy, I don't know what > you're looking for here either, *except* I do know there's a message > delimiter there that you're not picking up. There's no hard and fast > rule about that delimiter, IIRC, which is one really good reason to use > an existing, well-vetted library to read these mbox files. > > Just breaking these down into individual messages would make your life a > whole lot easier. >
I could do this, but then this learning curve wouldn't be happening and I have use for this information in a future project.
That being said, the closest thing to an true standard for mboxes says that messages are delimted by a From at the start of the line and then a colon email address and date
Ruben > - Ron
-- http://www.mrbrklyn.com - Interesting Stuff http://www.nylxs.com - Leadership Development in Free Software
So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998
http://fairuse.nylxs.com DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002
"Yeah - I write Free Software...so SUE ME"
"The tremendous problem we face is that we are becoming sharecroppers to our own cultural heritage -- we need the ability to participate in our own society."
"> I'm an engineer. I choose the best tool for the job, politics be damned.< You must be a stupid engineer then, because politcs and technology have been attached at the hip since the 1st dynasty in Ancient Egypt. I guess you missed that one."
© Copyright for the Digital Millennium
|
|