here's example of file i'm trying parse:

xx00135                   abcdefghij risk solutions            page no :      7 beg per: 03/17/2014            current company                       03/18/2014 end per: 03/18/2014       qa process - reject report                   20:28:36  batch: 123456789 contrib: 987654321 - abcde fghi-san diego                                                     quote back: 1a23b45c79  code   account no           typ company name         beg date end date err ------ -------------------- --- -------------------- -------- -------- --- 12345  1234567890001        ab  abcde fghi products  20140314 20140914 059   xx00135                   abcdefghij risk solutions            page no :      8 beg per: 03/17/2014            current company                       03/18/2014 end per: 03/18/2014       qa process - reject report                   20:28:36  batch: 234567890 contrib: 987654321 - abcde fghi-san diego                                                     quote back: 5f7a657g87  code   account no           typ company name         beg date end date err ------ -------------------- --- -------------------- -------- -------- --- 12346  2345678901           ab  abcde fghi products  20140129 20140729 059 12346  3456789012           ab  abcde fghi products  20140317 20140917 059   xx00135                   abcdefghij risk solutions            page no :      9 beg per: 03/17/2014            current company                       03/18/2014 end per: 03/18/2014       qa process - reject report                   20:28:36  batch: 345678901 contrib: 987654321 - abcde fghi-san diego                                                     quote back: 6k75l8791l  code   account no           typ company name         beg date end date err ------ -------------------- --- -------------------- -------- -------- --- 12346  4567890123           ab  abcde fghi products  20140317 20140917 059 12346  4567890123           ab  abcde fghi products  20140317 20140917 059  number of sets rejected :         13  total sets in batch:     16,940                             *** end of report *** 

and here collection of snippets module:

module xx00135 (parsefile)  import control.applicative ((<$>), (<*>), (<*)) import text.parsercombinators.parsec hiding (line)  data line = line { code    :: string                  , account :: string                  , atype   :: string                  , company :: string                  , begdate :: string                  , enddate :: string                  , errcode :: string }  data page = page { periodbeginning :: string                  , periodend       :: string                  , reportdate      :: string                  , batch           :: string                  , contrib         :: string                  , quoteback       :: string                  , linelist        :: [line] }  data report = report { pages :: [page] }   parsereportdate :: parser string parsereportdate =   manytill anychar (string "current company") >> spaces >> count 10 anychar  headers :: parser string headers =   choice [ try (string "\n")          , try (string "code   account no           typ company name         beg date     end date err")          , try (string "------ -------------------- --- -------------------- -------- -------- ---") ]  line :: parser line line =   line <$> count  6 anychar <* space        <*> count 20 anychar <* space        <*> count  3 anychar <* space        <*> count 20 anychar <* space        <*> count  8 anychar <* space        <*> count  8 anychar <* space        <*> count  3 anychar <* newline  page :: parser page page =   page <$> (manytill anychar (string "beg per:")    >> space >> count 10 anychar)        <*> parsereportdate        <*> (manytill anychar (string "end per:")    >> space >> count 10 anychar)        <*> (manytill anychar (string "batch:")      >> space >> count  9 anychar)        <*> (space >> string "contrib:"              >> space >> count  9 anychar)        <*> (manytill anychar (string "quote back:") >> space >> count 10 anychar        <*   skipmany1 headers)        <*> (manytill line (twonewlines <|> footer))  report :: parser report report = report <$> manytill page (try footer)  twonewlines :: parser () twonewlines = (count 2 newline) >> return ()  footer :: parser () footer = (space >> string "number of sets rejected" >> manytill anychar (string "*** end of report ***") >> optional eof) >> return ()  parsefile :: [(string, string)] -> string -> string parsefile errors text =   let rs = case parse (manytill report eof) "" text of       ... 

there 115 lines in full file. when cat file , pipe haskell, get:

(line 116, column 1); unexpected end of input expecting "beg per:" 

i had working ignoring footer , followed. full use case cat multiple files , pipe haskell, meaning cannot throw away footer , follows it. once started trying ignore footer instead of throwing away, problems began. it's simple, , i'm confused , over-looking obvious.

let me know if need more code. few transformations after parsing, , didn't want clutter code unnecessary detail.


i've resolved problem. code little different, , i'm not sure solved problem. spent lot of time staring @ code , making little changes here , there. think, though, had cat appending newline file. changed footer:

footer = space >> string "number of sets rejected"        >> anychar `manytill` (string "*** end of report ***") >> newline >> string "" 

now footer consumes newline @ end of file, , returns string. use footer in eop (end of page):

eop =   choice [ count 2 newline          , footer ] 

and use eop in last line of page:

<*> line `manytill` eop 

report now:

report = count 2 newline >> report <$> many page 

i changed page. think consuming anychar in unexpected ways. throw away first line of each page:

page = firstline >>   page <$> (string "beg per:" >> space >> count 10 anychar)        ...  firstline =   string "xx00135                   abcdefghij risk solutions            page no :"   >> spaces > many digit >> newline 

i think covers important changes made made parse successful. parses single file cat command, multiple files concatenated cat command. yay! love haskell.


