MESSAGE
DATE | 2007-11-27 |
FROM | Ruben Safir
|
SUBJECT | Re: [NYLXS - HANGOUT] Website Updates
|
On Tue, Nov 20, 2007 at 12:23:05PM -0500, Ron Guerin wrote: > Ruben Safir wrote: > > On Sun, Nov 18, 2007 at 01:08:12PM -0500, Ron Guerin wrote: > >> Ron Guerin wrote: > > >> You sure there's a colon there? > >> > > > > There is no colon > > In that case, you should be good to go. >
OK
I've reworked the program and put some cheates into it by seperating the headers out. First, we have the line in the headers that says
Line: some number
That will let me glob in the majority of the messages. In the process, I've actually simplified greatly the code.
I also simlified the regex for the From line
The code is now at
http://www.mrbrklyn.com/prog.html
I'll get a chance to review it later, but you can take a look and give me any thoughts you might have.
Ruben
> >> The best description of an mbox that I was able to turn up in Google, > >> suggests the delimiter you're showing above is invalid. > >> > >> A message encoded in mbox format begins with a From_ line, > >> continues with a series of non-From_ lines, and ends with a > >> blank line. A From_ line means any line that begins with > >> the characters F, r, o, m, space > >> > >> The final line is a completely blank line (no spaces or > >> tabs). Notice that blank lines may also appear elsewhere in > >> the message. If the last line of the message was a partial > >> line, it writes two newlines; otherwise it writes one. > >> > >> The From_ line always looks like From envsender date > >> moreinfo. envsender is one word, without spaces or tabs; it > >> is usually the envelope sender of the message. date is the > >> delivery date of the message. It always contains exactly 24 > >> characters in asctime format. moreinfo is optional; it may > >> contain arbitrary information. > >> > > > > That should be the regex. > > Note what it says below about corruption in the From_ line in the other > message. This means your regex should only be looking for lines > starting with "f","r","o","m","space" to split the messages out of the > mbox. The other info may or may not be there, or may or may not be in > the correct format. So all that checking you were doing on the address > and date should go away, and that's going to speed things up too. Only > regex for that stuff after you've already determined it's a From_ line, > assuming you even care about that information. But since it doesn't > have to be there, don't put it in the regex you use to split messages > out of the mbox. > > > Meanwhile I have a new problem tonight. I have a dead computer > > in the livingroom. The soundcard lost all but two channels, so > > I cleaned the machine and now its not starting. Any clues on > > where to find an ATX mainboard that will accept a Duron 800 > > CPU. > > Not a clue. I used to love hardware, but it no longer loves me, and the > feeling is mutual. > > - Ron
-- http://www.mrbrklyn.com - Interesting Stuff http://www.nylxs.com - Leadership Development in Free Software
So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998
http://fairuse.nylxs.com DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002
"Yeah - I write Free Software...so SUE ME"
"The tremendous problem we face is that we are becoming sharecroppers to our own cultural heritage -- we need the ability to participate in our own society."
"> I'm an engineer. I choose the best tool for the job, politics be damned.< You must be a stupid engineer then, because politcs and technology have been attached at the hip since the 1st dynasty in Ancient Egypt. I guess you missed that one."
© Copyright for the Digital Millennium
|
|