Line 52 I agree that's a horrible way to handle the filename. writing all of them to the same file, doesn't that overwrite any previous with the last?) anyway some docstring here would help. Pdf2csv, line 34, when does convert_from_path return multiple images, and why are you discarding all but the last? (i.e. ![]() That done you could have all the code in one module and, with the parameter values stored by argparse, you could stop passing many parameters around in function calls. See also configparser which you could use to establish the defaults for seldom-changed parameters like the paths to tesseract and poppler. The argparse module doc is quite readable. Like everyone says, move all the config values to command line arguments with suitable defaults. Now improving the actual conversion and studying pathlib.ĮDIT & UPDATE 2 - Implemented pathlib to the code, now no more playing with filenames. Also, i have cleaned up the program a little bit and now it contains only 1 python file. Kindly review the code in your free time and tell if there's something i need to work on.ĮDIT & UPDATE - As you guys suggested, i have added argparse and configparser, now users can convert file by setting path in config.ini and entering 'python pdf2csv.py -i filetoconvert.pdf'. ![]() I used modules like tesseract and pdf2image. Step 3 - Conversion Formatting of TXT to CSV This gave me the idea to make a program in python which converts pdf to csv, for the people like me in need, as a fun project. So yesterday, my dad asked me to make some graphs in excel for him, but the data was a pic scanned from camscanner and made into a pdf, so it took me a lot of time to copy all that in excel because I had to type it all out. ![]() I have made a lot of programs in python but all of them were either from a tutorial or somewhere else, except one or two.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |