Getting Started

Once the translateME package has been successfully installed, you can now begin translating PDF, Excel, and CSV files. This package should be primarily used for evaluating the documentation of Surveys to see if the underlying microdata is worth extracting and, if it is, which indicators should be extracted. A command line tool is being worked on for easier use, but as of now, this package can be used in the Python_ shell or in interactive sessions like Jupyter_ Notebooks_. First steps are to import and then instantiate the Translate class:

from translate import translate

x = translate.Translate(key="developerKey", filename="path/to/file")

Two things to note: a developerKey is needed from the Google-Translate API to being translating documents. This can be acquired by signing up for a free trial account on Google-Cloud. Also a valid filepath needs to be provided to the Translate class.

PDF Files

If the file is a PDF this file can be either local or from the internet. Next steps are to translate the document. If the document is a pdf:

text = x.translatePDF()

This will return a String that can be further manipulated. If you would like to return the PDF as an HTML or TXT file you can:

x.translatePDF(writeDoc=True, outpath="/file/to/output/", outname="output_name", type="html")

x.translatePDF(writeDoc=True, outpath="/file/to/output/", outname="output_name", type="txt")

If there are tables in the PDF, it is highly recommended to export as HTML files as they will preserve (generally) the original syntax.

Excel / CSV Files

The translateME package also leverages Pandas_ Excel and CSV readers to help with translating of tables that are in foreign languages. The tables can be exported as Excel or CSV files or can be returned as pd.DataFrame objects that can be further manipulated:

df = x.translateXL()
x.translateXL(writeDoc=True, outpath="/file/to/output/", outname="output_name")