pandas - 1/1 Blog automation (4) documentation (8) example (2) latex (2) notebook (4) sphinx (10)
pandas - 1/1¶
read_csv and zip files¶
2016-02-06
read_csv is no longer able to extract a dataframe from a zip file. The parameter format changed for compression but the zip format disappeared from the list. I assume the reason is that zip files can contains many files.
pyquickhelper now implements the function read_csv which can extract all dataframe in a zip file or falls back into the regular function if no zip format is detected. In that case, it returns a dictionary of dataframes indexed by their name in the zip file.
from pyquickhelper.pandashelper import read_csv
dfs = read_csv("url_or_filename.zip", compression="zip")
print(dfs["dataframe.txt"].head())
If only one file must be converted as a dataframe, the parameter fvalid must be used:
from pyquickhelper.pandashelper import read_csv
dfs = read_csv("url_or_filename.zip", compression="zip",
fvalid=lambda name: name == "the_file.txt")
print(dfs["the_file.txt"].head())
The others files will be loaded as text. In more details, when it is a zip file, the function reads a dataframe from a zip file by doing:
import io, zipfile, pandas
def read_zip(local_file, encoding="utf8"):
with open(local_file, "rb") as local_file:
content = local_file.read()
dfs = {}
with zipfile.ZipFile(io.BytesIO(content)) as myzip:
infos = myzip.infolist()
for info in infos:
name = info.filename
with myzip.open(name, "r") as f:
text = f.read()
text = text.decode(encoding="utf8")
st = io.StringIO(text)
df = pandas.read_csv(st, compression=compression, **params)
dfs[name] = df
return dfs
pandas - 1/1 2015-04 (8) 2015-05 (4) 2015-08 (2) 2015-10 (1) 2015-12 (3) 2016-01 (1) 2016-02 (3) 2016-04 (1) 2016-06 (1)