Giter Site home page Giter Site logo

doccounter's People

Contributors

joeblurton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

doccounter's Issues

Other languages support

It shows very weird words/lines/pages count on russian language. I think other non-roman languages affected too.

Thanks!

It is showing follwing error when using with docx file

Warning: ZipArchive::close(): Invalid or uninitialized Zip object in /home2/correcta/public_html/filetotext2/class.doccounter.php on line 240

Although it is also showing ok word counts which i desire, but also showing above error.

Permission denied error when unlink is run

I'm using this repo to play around with in a Laravel project and have required the repo via composer.

When running the following code:

require base_path('vendor/joeblurton/doccounter/class.doccounter.php');

$docCounter = new \DocCounter();

$docFile = storage_path('app/files/doc-file.doc');

$docCounter->setFile($docFile);

print_r($docCounter->getInfo());

I get the following error:

ErrorException in class.doccounter.php line 297:
unlink(tmp.pdf): Permission denied

See line of code referenced in error here: https://github.com/joeblurton/doccounter/blob/master/class.doccounter.php#L297

I am using Windows 10 and Laragon (Apache, PHP 7) as my environment.

Do I need to modify my permissions on the folder the file is getting executed from as mentioned here: http://stackoverflow.com/a/13595087/648247 (unsure how to do this in Win 10) or is the file not being closed properly by fpdf/fpdi prior to the unlink() call as mentioned here: http://stackoverflow.com/a/23259767/648247?

My Document have 7 Words but, its display one Word

I created plugin with ount words in uploaded document. i use your package for word count. after uploaded file your package will return only one found in uploaded Document. but my uploaded document have seven words. how to fix that issue,.

PDF and Doc file with same contents produce different word count

I created a doc and pdf file with the same text contents for testing purpose of this package and I found I got the following differing word counts:

  • .doc 518
  • .pdf 513

I believe this is happening because of the regex used to split the text for the .doc file which appears to ignore the following string: <br /> and because there are 5 paragraphs in the text there are 5x <br /> which results in five more words. The regex in question is used in the following method:

function str_word_count_utf8($str) {
    return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}

Here is the text content inside the documents:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ante magna, rutrum quis convallis sed, porttitor a est. In ac odio ante. Suspendisse potenti. Sed cursus et magna vitae mollis. Cras vel tortor urna. Pellentesque diam purus, placerat quis lacus eu, accumsan facilisis lectus. Duis sodales, leo at sollicitudin tincidunt, elit neque consequat metus, vitae lacinia diam felis quis augue. Nunc quis mi ipsum. Nulla et semper elit. Aliquam et sem consectetur, aliquet lectus nec, fringilla mi. Nullam eu augue sit amet est vestibulum egestas aliquam vitae quam. Aliquam consectetur nunc vitae dignissim efficitur. Interdum et malesuada fames ac ante ipsum primis in faucibus. Cras sagittis tortor ante, ac semper urna finibus a. 
Nulla facilisi. Vivamus condimentum libero eget lacus suscipit cursus. Proin congue neque nunc, a volutpat orci viverra accumsan. Vestibulum vulputate pretium ipsum. Aenean tortor nisi, interdum a arcu porttitor, tempor mollis quam. Integer in eros pretium, pretium lacus at, dictum leo. In ut elit vel dolor efficitur luctus. Quisque pellentesque, est non viverra fringilla, tellus mi gravida est, scelerisque fermentum turpis quam a tortor. Mauris eu placerat dolor. Ut gravida est aliquet diam luctus, ac ornare justo pretium. Cras a aliquam felis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Sed molestie arcu vel diam congue aliquet. Maecenas ullamcorper velit id orci ullamcorper ultricies. Nulla justo tellus, tristique quis ex nec, iaculis tristique metus. 
Sed maximus finibus dui sit amet tempor. Nulla ut justo volutpat, iaculis lacus in, volutpat arcu. Suspendisse dapibus arcu ac luctus lobortis. Vivamus ante ante, cursus ut nisl ut, maximus maximus tellus. Maecenas eget orci nec enim maximus vulputate. Nunc egestas, est ac tristique dignissim, nisi ligula feugiat enim, a feugiat massa risus at risus. Praesent facilisis turpis id odio tincidunt, sed tincidunt mi lobortis. Nullam gravida mollis magna ut congue. 
Fusce ultrices mauris quis lobortis rutrum. Nunc risus sapien, egestas ut orci a, lobortis viverra est. In elementum elementum leo, quis aliquam purus vulputate ac. Aliquam erat volutpat. Duis ex neque, convallis sit amet sodales et, consectetur et tortor. Donec commodo ligula vitae nulla faucibus, eget bibendum massa blandit. Pellentesque porta placerat nibh, sit amet volutpat felis consequat vitae. Interdum et malesuada fames ac ante ipsum primis in faucibus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur non dignissim purus. Etiam eget sapien mattis, semper tortor vitae, sodales augue. Cras posuere ornare posuere. Nulla facilisi. Cras aliquet mi et tortor fringilla, eu eleifend felis cursus. 
Donec in leo facilisis, maximus tortor non, placerat neque. Donec vulputate erat purus, vitae auctor nisi iaculis vitae. Curabitur nisl lectus, dignissim eu convallis id, volutpat ac lectus. Proin bibendum dictum ex ut porta. Morbi leo tortor, fringilla et lectus nec, facilisis lobortis ex. Donec aliquam nec mauris id tincidunt. Cras at nunc odio. Donec eu nisi sit amet massa rutrum vehicula eget quis augue. Maecenas porta ultricies scelerisque. Duis fringilla tempor dui. Praesent nunc est, hendrerit eget viverra in, varius sed ante. Quisque hendrerit felis et purus euismod consequat.

When running the following code for the .doc file

$docFile = storage_path('app/files/doc-file.doc');
$docCounter->setFile($docFile);
$docWordCount = $docCounter->read_doc_file();

the value of $docWordCount is as follows:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ante magna, rutrum quis convallis sed, porttitor a est. In ac odio ante. Suspendisse potenti. Sed cursus et magna vitae mollis. Cras vel tortor urna. Pellentesque diam purus, placerat quis lacus eu, accumsan facilisis lectus. Duis sodales, leo at sollicitudin tincidunt, elit neque consequat metus, vitae lacinia diam felis quis augue. Nunc quis mi ipsum. Nulla et semper elit. Aliquam et sem consectetur, aliquet lectus nec, fringilla mi. Nullam eu augue sit amet est vestibulum egestas aliquam vitae quam. Aliquam consectetur nunc vitae dignissim efficitur. Interdum et malesuada fames ac ante ipsum primis in faucibus. Cras sagittis tortor ante, ac semper urna finibus a.<br />\rNulla facilisi. Vivamus condimentum libero eget lacus suscipit cursus. Proin congue neque nunc, a volutpat orci viverra accumsan. Vestibulum vulputate pretium ipsum. Aenean tortor nisi, interdum a arcu porttitor, tempor mollis quam. Integer in eros pretium, pretium lacus at, dictum leo. In ut elit vel dolor efficitur luctus. Quisque pellentesque, est non viverra fringilla, tellus mi gravida est, scelerisque fermentum turpis quam a tortor. Mauris eu placerat dolor. Ut gravida est aliquet diam luctus, ac ornare justo pretium. Cras a aliquam felis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Sed molestie arcu vel diam congue aliquet. Maecenas ullamcorper velit id orci ullamcorper ultricies. Nulla justo tellus, tristique quis ex nec, iaculis tristique metus.<br />\rSed maximus finibus dui sit amet tempor. Nulla ut justo volutpat, iaculis lacus in, volutpat arcu. Suspendisse dapibus arcu ac luctus lobortis. Vivamus ante ante, cursus ut nisl ut, maximus maximus tellus. Maecenas eget orci nec enim maximus vulputate. Nunc egestas, est ac tristique dignissim, nisi ligula feugiat enim, a feugiat massa risus at risus. Praesent facilisis turpis id odio tincidunt, sed tincidunt mi lobortis. Nullam gravida mollis magna ut congue.<br />\rFusce ultrices mauris quis lobortis rutrum. Nunc risus sapien, egestas ut orci a, lobortis viverra est. In elementum elementum leo, quis aliquam purus vulputate ac. Aliquam erat volutpat. Duis ex neque, convallis sit amet sodales et, consectetur et tortor. Donec commodo ligula vitae nulla faucibus, eget bibendum massa blandit. Pellentesque porta placerat nibh, sit amet volutpat felis consequat vitae. Interdum et malesuada fames ac ante ipsum primis in faucibus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur non dignissim purus. Etiam eget sapien mattis, semper tortor vitae, sodales augue. Cras posuere ornare posuere. Nulla facilisi. Cras aliquet mi et tortor fringilla, eu eleifend felis cursus.<br />\rDonec in leo facilisis, maximus tortor non, placerat neque. Donec vulputate erat purus, vitae auctor nisi iaculis vitae. Curabitur nisl lectus, dignissim eu convallis id, volutpat ac lectus. Proin bibendum dictum ex ut porta. Morbi leo tortor, fringilla et lectus nec, facilisis lobortis ex. Donec aliquam nec mauris id tincidunt. Cras at nunc odio. Donec eu nisi sit amet massa rutrum vehicula eget quis augue. Maecenas porta ultricies scelerisque. Duis fringilla tempor dui. Praesent nunc est, hendrerit eget viverra in, varius sed ante. Quisque hendrerit felis et purus euismod consequat.<br />\r

please note the <br />. You might want to copy and paste into a code editor to inspect properly.

When running the following code for the .pdf file

$pdfFile = storage_path('app/files/pdf-file.pdf');
$docCounter->setFile($pdfFile);
$pdfWordCount = $docCounter->pdf2text();

the value of $docWordCount is as follows:

""
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ante magna, rutrum quis convallis \n
    sed, porttitor a est. In ac odio ante. Suspendisse potenti. Sed cursus et magna vitae mollis. Cras \n
    vel  tortor  urna.  Pellentesque  diam  purus,  placerat  quis  lacus  eu,  accumsan  facilisis  lectus.  Duis \n
    sodales,  leo  at  sollicitudin  tincidunt,  elit  neque  consequat  metus,  vitae  lacinia  diam  felis  quis \n
    augue. Nunc quis mi ipsum. Nulla et semper elit. Aliquam et sem consectetur, aliquet lectus nec, \n
    fringilla  mi.  Nullam  eu augue  sit  amet  est  vestibulum  egestas  aliquam  vitae  quam.  Aliquam \n
    consectetur nunc  vitae  dignissim  efficitur.  Interdum  et  malesuada fames  ac ante  ipsum  primis in \n
    faucibus. Cras sagittis tortor ante, ac semper urna finibus a. \n
    Nulla facilisi. Vivamus condimentum libero eget lacus suscipit cursus. Proin congue neque nunc, \n
    a  volutpat  orci  viverra  accumsan.  Vestibulum  vulputate  pretium  ipsum.  Aenean  tortor  nisi, \n
    interdum  a  arcu  porttitor,  tempor  mollis  quam.  Integer  in  eros  pretium,  pretium  lacus  at,  dictum \n
    leo.  In  ut  elit  vel  dolor  efficitur  luctus.  Quisque  pellentesque,  est  non  viverra  fringilla,  tellus  mi \n
    gravida est, scelerisque fermentum turpis quam a tortor. Mauris eu placerat dolor. Ut gravida est \n
    aliquet diam luctus, ac ornare justo pretium. Cras a aliquam felis. Class aptent taciti sociosqu ad \n
    litora  torquent  per  conubia  nostra,  per  inceptos  himenaeos.  Sed  molestie  arcu  vel  diam  congue \n
    aliquet.  Maecenas  ullamcorper  velit  id  orci  ullamcorper  ultricies.  Nulla  justo  tellus,  tristique  quis \n
    ex nec, iaculis tristique metus. \n
    Sed  maximus  finibus  dui  sit  amet  tempor.  Nulla  ut  justo  volutpat,  iaculis  lacus  in,  volutpat  arcu. \n
    Suspendisse  dapibus  arcu  ac  luctus  lobortis.  Vivamus  ante  ante,  cursus  ut  nisl  ut,  maximus \n
    maximus tellus. Maecenas eget orci nec enim maximus vulputate. Nunc egestas, est ac tristique \n
    dignissim,  nisi  ligula  feugiat  enim,  a  feugiat  massa  risus  at  risus.  Praesent  facilisis  turpis  id  odio \n
    tincidunt, sed tincidunt mi lobortis. Nullam gravida mollis magna ut congue. \n
    Fusce  ultrices  mauris  quis  lobortis  rutrum.  Nunc  risus  sapien,  egestas  ut  orci  a,  lobortis  viverra \n
    est.  In  elementum  elementum  leo,  quis  aliquam  purus  vulputate  ac.  Aliquam  erat  volutpat.  Duis \n
    ex  neque,  convallis  sit  amet  sodales  et,  consectetur  et  tortor.  Donec  commodo  ligula  vitae  nulla \n
    faucibus,  eget  bibendum  massa  blandit.  Pellentesque  porta  placerat  nibh,  sit  amet  volutpat  felis \n
    consequat  vitae.  Interdum  et  malesuada  fames  ac  ante  ipsum  primis  in  faucibus.  Lorem  ipsum \n
    dolor  sit  amet,  consectetur  adipiscing  elit.  Class  aptent  taciti  sociosqu  ad  litora  torquent  per \n
    conubia  nostra,  per  inceptos  himenaeos.  Curabitur  non  dignissim  purus.  Etiam  eget  sapien \n
    mattis,  semper  tortor  vitae,  sodales  augue.  Cras  posuere  ornare  posuere.  Nulla  facilisi.  Cras \n
    aliquet mi et tortor fringilla, eu eleifend felis cursus. \n
    Donec  in  leo  facilisis,  maximus  tortor  non,  placerat  neque.  Donec  vulputate  erat  purus,  vitae \n
    auctor  nisi  iaculis  vitae.  Curabitur  nisl  lectus,  dignissim  eu  convallis  id,  volutpat  ac  lectus.  Proin \n
    bibendum  dictum  ex  ut  porta.  Morbi  leo  tortor,  fringilla  et  lectus  nec,  facilisis  lobortis  ex.  Donec \n
    aliquam nec mauris id tincidunt. Cras at nunc odio. Donec eu nisi sit amet massa rutrum vehicula \n
    eget  quis  augue.  Maecenas  porta  ultricies  scelerisque.  Duis  fringilla  tempor  dui.  Praesent  nunc \n
    est,  hendrerit  eget  viverra  in,  varius  sed  ante.  Quisque  hendrerit  felis  et  purus  euismod \n
    consequat.
    ""

Making the package compatible with composer/packagist

Firstly, thanks for putting this repo together.

Secondly, would you accept pull requests that would reshape this repo to be compatible with composer/packagist? Alternatively, I will create a fork so there would be a composer/packagist compatible version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.