We will see how to extract text from PDF and all Microsoft Office files.
Generating OCR for PDF:
$ pip install slate $ pip install pdfminer
import slate with open('sample.pdf', 'rb') as f: pdf_text = slate.PDF(f) print pdf_text Output: ['Sample text...', '......', '......']
* The PDF class, of slate, takes file-like object and extracts all the text from the PDF file. It provides the output as a list of strings(one for each page).
* NOTE: If the PDF file has password, then pass the password as second parameter.
import slate with open('test_doc.pdf', 'rb') as f: pdf_text = slate.PDF(f, "pass the PDF file password here") print pdf_text Output: ['Sample text...', '......', '......']
Django-CRM :Customer relationship management based on Django
Django-blog-it : django blog with complete customization and ready to use with one click installer Edit
Django-webpacker : A django compressor tool
Django-MFA : Multi Factor Authentication
Docker-box : Web Interface to manage full blown docker containers and imagesMore...