OK, now this has been bugging me since I-don't-know-when and since here at Ipernity there are tons of people that deal with images and computing all the time I'll seek help here.
The task is really simple: filter out landscape / portrait format images from a pool of pictures.
I'd like this to happen on the command-line - if possible with out-of-the-box tools and not with special software.
If it were a cross-platform solution, that'd be great but primarily I need a Linux-based idea.
It would be nice if in the end I had a simple script that I tell a directory (possibly with subdirectories to look into recursively) with images. It "scans" these images for their height and width values, figures out if they are oriented in landscape (width > height) or portrait (height > width) or square (width = height) and prints out a list of filenames of the previously defined format.
This list could then be directed into a text-file or piped to another program etc.
Possible tools could be the "identify"-command from the great Imagemagick toolbox or maybe the "list"-option from the equally great feh image viewer.
Both are capable of figuring out the dimensions of an image.
Probably identifiy is the better choice since it allows a range of operations on the parameters of an image-readout.
It can also read out EXIF information.
[Update]
OK, maybe another possibility would be the excellent Perl library exiftool which - as its name suggests - is a Swiss-Army-knife to read/manipulate metadata.
You can specify certain tags to be read out such as ImageSize which will present you directly with the desired values.
I just learned that the identify-command has a similar option.
[Update]
[First possible solution]
for a in *; do identify -format "%f:%[fx:w/h]" $a; done
This will print out the filename (%f) followed by a colon and then the result of width divided by height (fx:w/h), e. g.
img_1654.jpg:1.5
img_1655.jpg:0.666667
It's fast and the CPU load looks reasonable even though not necessarily light.
Next steps are probably filtering out the pictures of the desired orientation (>1, <1, =1) and printing out the filenames.
I'm not sure if what I am doing here is really clever or just script-kiddie hacking.
[First possible solution]
[Quick-and-dirty solution for the moment]
Ok, now this will do for the moment but it's not really much more than a command-combination that needs to be turned into a proper script with input possibilites.
for a in *jpg; do identify -format "%[fx:w/h]%%%f" $a; done | sort | grep ^1 -v | sed 's/.*%//g'
As explained above this uses the identify-command from the ImageMagick toolbox to extract width and height of images. These values are divided and printed out in the form result%filename. This is now sorted so the portrait format pictures (< 1) are displayed first, followed by possible squares and then the landscape format ones. Doing an inverted grep of all lines with a 1 at the beginning sorts out only those in portrait layout. Using sed the numbers are eliminated and only the filenames remain.
Not really super-great but OK.
I would still be glad about input and advice. :)
[Quick-and-dirty solution for the moment]
I wonder what is the best way to go about this task, especially which is the fastest and less CPU-intense way.
Is it easier to read out the EXIF information and work with that because only the image headers need to be processed? feh - as far as I understand - loads the whole image to determine its dimensions and this takes time and calculation, esp. on large images.
I would be really thankful for ideas, hints to other people that may have challenged this idea before etc.
Thank you and - while I'm at it - merry christmas. :)
Send a message
Search for members
Paul Schubert says:
Awk splits automatically lines in words, in perl that´s either a command line option or one statement. Both languages can also split the e.g. "800x600" in "800 600". The filename in the first output- field of identify is preserved and can be output as You need it.
Compare the image with with the image height - or the aspect ratio with 1. The result is eiter boolean ( landscape / portrait ), 0,1 or in addition "square", -1,0,1 or "less,equal,greater". That´s easy to convert into the output strings or return values that You need.
You can call identify directly from the perl script.
And I would strongly prefer not to use "for" in the shell but either pass "*" or "*.jpg" as command line parameter to identify or the perl script, that calls identify.
A further advantage of perl is, that You can quite easily search subdirectories. Remember that You can do everything with perl, that You can do with a shell script - much more and some things better.
A disadvantage of perl is the size of the program. You can decide yourself in every distinct case if You prefer the small texttools, including sed and awk - or if the single very powerful perl justifies the use of the quite big compiler. Keep in mind that in most cases You don´t need anything of the gigantic perl library.
I don´t care much about CPU usage when I use perl, but it should be always better than a pipeline, especially a long one. With the small amount of information that You need to process You won´t even have to care about memory usage. ( I don´t expect that You got less than 256 kB memory or more than 100.000 image files )
According to common sense reading the file header requires less CPU and memory resources than reading EXIF. It´s simply less information and afaik located in front of the EXIF Block. But it´s hard to tell if the tool, that You use, reads more information than that You need from the image files.
In any case calling identify ( or any standalone program ) for every image file is the worst possible solution concerning CPU time. Albeit calling a whole pipeline for every image file surely is worse.
Tell me if You want more explicit help. I would prefer a perl script but can handle bash, sed, awk and the texttools as well.
P.S. I have bash, perl, python, and the GNU tools running under Windows as well as under linux.
Roberto Ballerini - travelingpro says:
--
Seen in modalo home page (?)