textfile utility module¶
First, import the module:
from dataIO import textfile
Having encoding problem in Python2? Do this:
text = "English, 中文, にほんご, ру́сский язы́к, ..."
textfile.write(text, "text.txt") # write
text = textfile.read("text.txt") # read
If you have a file with special encoding, do this:
text = textfile.read("text.txt", encoding="Your encoding")
If you don’t know the encoding, you can let dataIO
smartly decided for you. This feature requires chardet
, just do pip install chardet
.
text = textfile.smartread("text.txt")
Have lots of text file downloaded from Internet, but the encoding is messed up? You can easily encode them to utf-8 by doing this:
to_utf8("download.txt") # automatically generate a new file, you never lose your old file.
You probably need skip first M lines, and fetch next N lines read patter, and you don’t want to read a super big file into your memory, now you should do this:
# skip first 2 lines, fetch next 1000 lines
for line in textfile.readlines("text.txt", skiplines=2, nlines=1000, strip="right"):
print(line) # do what ever you want with this line
For parameter explanation, read this readlines()
readchunks
method doing similar things. The only difference is reading multiple line at a time as a chunk, then yield. Here’s a example usage:
Group ID, Series ID
1, 1
1, 2
1, 3
2, 1
2, 2
2, 3
3, 1
3, 2
3, 3
Then you can do this:
for chunk in textfile.readchunks("text.txt", skiplines=1, chunksize=3):
print(chunk) # do what ever you want with this chunk