Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Imagemagick memory usage/crash #98

Closed
caffeineflo opened this issue Mar 23, 2016 · 15 comments · Fixed by #100
Closed

Imagemagick memory usage/crash #98

caffeineflo opened this issue Mar 23, 2016 · 15 comments · Fixed by #100
Labels

Comments

@caffeineflo
Copy link
Contributor

I recently started using this project on a Raspberry PI 2 B attached to a feed scanner. Smaller documents seem to work just perfect, while anything bigger than 20 mb seems to break as imagemagick crashes when converting.

The "full" error looks to be:
Mar 23 18:57:30 secondary-pi2 python[208]: Consuming /home/paperless/documents/scans/scan.tiff Mar 23 18:57:30 secondary-pi2 python[208]: Generating greyscale image from /home/paperless/documents/scans/scan.tiff Mar 23 18:59:17 secondary-pi2 python[208]: convert: unable to extent pixel cacheNo such file or directory' @ fatal/cache.c/CacheSignalHandler/3381.
Mar 23 18:59:31 secondary-pi2 python[208]: Generating the thumbnail
Mar 23 19:02:26 secondary-pi2 python[208]: convert: unable to extent pixel cache No such file or directory' @ fatal/cache.c/CacheSignalHandler/3381. Mar 23 19:02:29 secondary-pi2 python[208]: OCR FAILURE for /home/paperless/documents/scans/scan.tiff: No images found

A google search for that error reveals that it's a out of memory exception that convert runs in and crashes. There seems to be a --limit-memory option that would most likely solve this.
Other systems will most likely run into the same problem once reaching their file size limit.

@danielquinn
Copy link
Collaborator

Interesting. A couple months ago I tried consuming a 157 page PDF and it didn't blink an eye, but maybe I missed something on the memory consumption. I'll give it another look this weekend and see what I can do about the memory usage.

If you can provide a copy of a pdf that causes these big memory footprints, that'd help me with my debugging.

@danielquinn
Copy link
Collaborator

Ok, so I dug into this some more and this is what I found.

  • You can indeed limit memory usage with -limit memory nMiB where n is something tiny like 20 in your case.
  • When you do this however, ImageMagick will dump a bunch of files into /tmp, which fills up pretty fast since many modern systems mount /tmp as tmpfs. The error you get in this case is something akin to convert: unable to extent pixel cacheNo such file or directory' @ fatal/cache.c/CacheSignalHandler/3394.`.
  • The workaround for this is to set the MAGICK_TMPDIR environment variable to somewhere with lots of scratch space.
  • You can also set the memory limit with an environment variable: MAGICK_MEMORY_LIMIT.
  • Setting these variables when you invoke document_consumer seems to have the desired effect.

So, given that only one of these problems can be fixed with a command-line argument, I propose that the solution to this one be an entry in the documentation about the edge case of large documents on small machines. The solution would be to simply set these environment variables before executing the consumer, or simply execute it like this:

MAGICK_MEMORY_LIMIT=20000000 MAGICK_TMPDIR=/home/daniel/tmp ./manage.py document_consumer

The alternative is to write a convoluted series of changes to settings.py, consumer.py, and paperless.conf.example to allow users to set this in the config file. I'm not thrilled with this idea, but am open to being persuaded. What do you think?

@caffeineflo
Copy link
Contributor Author

Hi @danielquinn,
thanks for taking a look into this so fast (and thanks for the project in general, it's super awesome).
I meant to share my TIFF file that crashed the consumer with you, unfortunately those were legal documents that I'd rather not put on Github...

What I don't understand is, that ImageMagick apparently had -limit memory nMiB set, cause the error message I received looks exactly like you mentioned what happens after setting a memory limit for ImageMagick. So I do wonder if convert actually detects and sets a memory limit automatically and we would only need to set the env var MAGICK_TMPDIR on affected systems?

Anyway, about your suggested solution. I like what you suggested to take those env vars into the documentation so people can set them themselves as needed, although I do wonder if it wouldn't be better to have MAGICK_TMPDIR and MAGICK_MEMORY_LIMIT in settings.py to avoid interference with other processes using ImageMagick that would fall back on the options set through those environmental variables?

It's really up to you as the project owner what you feel as a better solution, thinking of less educated users I think having those options in the settings.py file wouldn't be a bad idea but definitely more work.

Edit:
I just tried it again to confirm what's happening on the system. Convert request about 1-2GB RAM but doesn't use /tmp or anything like that before it simply dies.

@danielquinn
Copy link
Collaborator

Ok now I'm confused, did the above trick work for you?

I've been trying to discourage users from editing settings.py at all, instead having it know to grab values from the environment (see the twelve-factor app) which in most cases should come automatically from /etc/paperless.conf. So, if you're editing that file, you may want to look into the .example file I've got in the root. It'll make upgrades easier.

The other thing that I should clarify, is that if you start document_consumer as I have shown above, the environment variables defined there only affect the process in question that is, in this case, they would only apply to document_consumer. The only way to have environment variables affect other processes is to make it a permanent part of one's system environment variables, typically by adding lines like export MY_VAR_NAME="whatever" in `/etc/profile or something (it depends on the distro). But this method is totally safe.

But if you think that all this is probably too confusing for new users, I'll add it to the list of things to add :-)

@danielquinn
Copy link
Collaborator

Gah, don't worry about it, I'm already about halfway to rolling this in already, so I'll just finish the job :-)

@danielquinn danielquinn mentioned this issue Mar 25, 2016
@danielquinn
Copy link
Collaborator

Ok I've just coded what I think should solve this, but I've left it as a PR in case you'd like to offer some input before I merge it. I'm going to fiddle with a few other issues tonight as I'm on a train for a bit, so if I don't hear back from you by tomorrow, I'll just merge it and close this issue.

Also, take a look at the changelog. I gave you credit :-)

@caffeineflo
Copy link
Contributor Author

Sorry it took me so long to get back to you and sorry for my complicated way of explaining my situation + findings to you!

The PR looks good and things work for me on my rpi, although as expected really slow :) but time is not of the essence for me here.
I'm using paperless mainly as a method for my folks back in Germany to send me my mail to avoid these horrible iPad pictures I got before. So I've setup a old HP OfficeJet 4500, a CGI script that triggers the feed scanner of this AIO printer and paperless to consume it.
That helps me getting my mail in the best way possible and to keep up with all the paperwork (which is especially important in Germany as once you do your tax report, you'll have to find all that paper again).

Again, great project! And thanks for the fast response here!

@danielquinn
Copy link
Collaborator

Stories like this are a big reason I keep developing on this. It's so nice to hear about people using something you've built and the different ways in which it's helped them.

@ahxxm
Copy link

ahxxm commented Apr 10, 2016

ah found this issue... please update docker image as well

@ahxxm
Copy link

ahxxm commented Apr 10, 2016

I've built docker image on latest master(as of now), with MAGICK_MEMORY_LIMIT=20000000 in docker-compose.env, memory of convert still explodes a while after it starts generating greyscale image.. not sure why.
My environment: manjaro linux(arch), docker 1.10.3 build 20f81dd, docker-compose 1.6.2 pip version

@danielquinn
Copy link
Collaborator

I'm reasonably sure that Docker shouldn't factor into this at all here, but my Docker experience is limited, so perhaps I can loop-in @pitkley here to clarify my ignorance on that front.

My guess though is that you're not taking advantage of the optional configuration variables on your system, specifically the last two in the paperless.conf.example file:

PAPERLESS_CONVERT_MEMORY_LIMIT=0
PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless

If these values are set in /etc/paperless.conf to a value that makes sense (explanations in the .example file) then that should solve your problem for you. If you're trying to tweak your Docker setup to set environment variables like MAGICK_MEMORY_LIMIT you're going to have a bad time.

@ahxxm
Copy link

ahxxm commented Apr 10, 2016

from what I can see, docker-compose will read docker-compose.env as environment variables instead of paperless.conf, the .conf file won't be passed to container.. os.path.exists("/etc/paperless.conf") will always be False within container.
the thing I got wrong is that MAGIC_MEMORY_LIMIT is final variable name in code, I should set PAPERLESS_CONVERT_MEMORY_LIMIT in docker-compose.env and let it be passed to settings.py, which makes CONVERT_MEMORY_LIMIT True and finally run_convert will run under this limit..
now it runs happily

@pitkley
Copy link
Member

pitkley commented Apr 10, 2016

Glad you got it working. Both those variables are indeed independent of the Docker image itself, and setting them in docker-compose.env is the best way to go about using them.

Setting MAGIC_MEMORY_LIMIT directly would theoretically work, since it gets set globally on container-start. The reason it doesn't work is that the environment gets cleared before running convert.

@danielquinn Maybe this should rather be environment = os.environ to not clear out every available environment-variable?

danielquinn added a commit that referenced this issue Apr 10, 2016
@danielquinn
Copy link
Collaborator

@pitkley good call. I've updated consumer.py as per your suggestion.

@robsdedude
Copy link

robsdedude commented Jun 2, 2019

I had the same problem but adding PAPERLESS_CONVERT_MEMORY_LIMIT=128MB to docker-compose.env alone couldn't solve it. I also hat do add MAGICK_MAP_LIMIT=512MB. Maybe it would be a smart idea to make paperless set MAGICK_MAP_LIMIT (and not only MAGICK_MEMORY_LIMIT) when PAPERLESS_CONVERT_MEMORY_LIMIT is set.

EDIT: It turned out I also had to set MAGICK_AREA_LIMIT=5MP to make it run on my Raspberry Pi 3B+.
Note: Also these values are more or less randomly chosen and can definitely be optimized.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants