invoice

Classes for different invoce types

class pdf2xlsx.invoice.CreditEntry(entry_tuple=None, invo=None)

These entries contain negative prices as these are creadit invoices Dummy!

class pdf2xlsx.invoice.CreditInvoice(no=0, orig_date='', pay_due='', total_sum=0, entries=None, orig_invo_no=0)

Creadit invoice class

parse_line(line)
Parameters:line (str) – The actual line to parse
Returns:True when the parsing of the Invoice was started
Return type:bool
xlsx_write(worksheet, row, col)

Write the invoice information to a template xlsx file.

Parameters:
  • worksheet (Worksheet) – Worksheet class to write info
  • row (int) – Row number to start writing
  • col (int) – Column number to start writing
Returns:

the next position of cursor row,col

Return type:

tuple of (int,int)

class pdf2xlsx.invoice.Entry(entry_tuple=None, invo=None)

Parse, store and write to xlsx invoice entries. The invoice informations are stored in the EntryTuple namedtuple. The parsing is contolled by a state variable (:entry_found:) Because the invoice entries are split into two line, the tmp_str attribute is used to store the first part of the entire The ME values are configurable, so they cannot be created at class level, they need to be recomputed at evry instantiation

Parameters:
  • entry_tuple (EntryTuple) – The invoice entry
  • invo (Invoice) – The parent invoice containing this entry
line2entry(line)

Extracts entry information from the given line. Tries to search for nine different group in the line. See implementation of entry_pattern. This should match the following pattern: NNNNNN-NNN STR+WSPACE PREDEFSTR INTEGER INTEGER-. INTEGER% INTEGER-. INTEGER-. INTEGER% Where: N: a single digit: 0-9 STR+WSPACE: string containing white spaces, numbers and special characters PREDEFSTR: string without white space ( predefined ) INTEGER: decimal number, unknown length INTEGER-.: a decimal number, grouped with . by thousends e.g 1.589.674 INTEGER%: an integer with percentage at the end

Parameters:pdfline (str) – Line to parse, this line should be begin with NNNNNNN-NNN
Returns:The actual invoice entry
Return type:EntryTuple
parse_line(line)

Parse through raw text which is supplied line-by-line. This is the structure of the pdf (the brackets() indicate what should be collected): n times: <disinterested rubish> (NNNNNN-NNN ...

...) <disinterested rubish> When the Invoice code is found, an additional line is waited, and then it is sent to the line2entry converter.

Parameters:line (str) – The actual line to parse
Returns:True when an entry was found
Return type:bool
xlsx_write(worksheet, row, col)

Write the entry information to a template xlsx file.

Parameters:
  • worksheet (Worksheet) – Worksheet class to write info
  • row (int) – Row number to start writing
  • col (int) – Column number to start writing
Returns:

the next position of cursor row,col

Return type:

tuple of (int,int)

class pdf2xlsx.invoice.EntryTuple(kod, nev, ME, mennyiseg, BEgysegar, Kedv, NEgysegar, osszesen, AFA)
AFA

Alias for field number 8

BEgysegar

Alias for field number 4

Kedv

Alias for field number 5

ME

Alias for field number 2

NEgysegar

Alias for field number 6

kod

Alias for field number 0

mennyiseg

Alias for field number 3

nev

Alias for field number 1

osszesen

Alias for field number 7

class pdf2xlsx.invoice.Invoice(no=0, orig_date='', pay_due='', total_sum=0, entries=None)

Parse, store and write to xlsx invoce informations. Such as Invoice Number, Invoice Date, Payment Date, Total Sum Price. It also contains a list of Entry, which is also extracted form raw string. The parsing of the raw string is controlled by three state variables: no_parsed, orig_date_parsed and pay_due_parsed. These represent the structure of the pdf.

Parameters:
  • no (int) – Invoice number, default:0
  • orig_date (str) – Invoice date stored as a string YYYY.MM.DD
  • pay_due (str) – Payment Date stored as string YYYY.MM.DD
  • total_sum (int) – Total price of invoice
  • entries (list) – List of Entry containing each entries in invoice

[TODO] implement state pattern for parsing ??? [TODO] implement _to_money as a mixin class

parse_line(line)

Parse through a raw text which is supplied line-by-line. This is the structure of the pdf (the brackets() indicate what should be collected): <disinterested rubish> Számla sorszáma: (NNNNNNNN) ... <disinterested rubish> Számla kelte: (YYYY.MM.DD|DD.MM.YYYY) ... <disinterested rubish> FIZETÉSI HATÁRIDŐ:(YYYY.MM.DD|DD.MM.YYYY) (NNN[.NNN.NNN]) <disinterested rubish> This is structure is paresed using the three state variable, and stored inside the class attributes

Parameters:line (str) – The actual line to parse
Returns:True when the parsing of the Invoice was started
Return type:bool
xlsx_write(worksheet, row, col)

Write the invoice information to a template xlsx file.

Parameters:
  • worksheet (Worksheet) – Worksheet class to write info
  • row (int) – Row number to start writing
  • col (int) – Column number to start writing
Returns:

the next position of cursor row,col

Return type:

tuple of (int,int)

pdf2xlsx.invoice.get_invo_type(pdf_line)

TODO add title parse to decide between invoce types

pdf2xlsx.invoice.invo_parser(pdf_file, logger)

Factory to generate the apropriate invoce type based on the title in the PDF