Underscan Published on December 21, 2008
by Underscan

Underscan's blog

Browse posts
Series "Live in Concert"
Posted on March 10, 2009
5 comments (latest 7 months ago)
Cockatoo Island Project
Posted on January 25, 2009
[RFH] Identifiying landscape / portrait format images via CLI
2 comments (latest 10 months ago)
The year 2008 in photographs - boston.com
Posted on December 18, 2008
8 comments (latest 10 months ago)
Does Ipernity scale?
Posted on December 6, 2008
3 comments (latest 11 months ago)
Possible bug: different license for visitors and members for blog posts?
Posted on September 16, 2008
6 comments (latest 13 months ago)
ExiftTool and "sensitive" meta data revisited
Posted on September 16, 2008
A Sunday At Britzer Garden
Posted on April 22, 2008
1 comment (latest 18 months ago)
Group Administration Function Needs An Overhaul - Urgently
Posted on April 14, 2008
6 comments (latest 18 months ago)

More information

This post is public
Attribution + non Commercial
  1. Read 383 times

[RFH] Identifiying landscape / portrait format images via CLI

Sunday December 21, 2008 at 02:04PM

OK, now this has been bugging me since I-don't-know-when and since here at Ipernity there are tons of people that deal with images and computing all the time I'll seek help here.

The task is really simple: filter out landscape / portrait format images from a pool of pictures.

I'd like this to happen on the command-line - if possible with out-of-the-box tools and not with special software.
If it were a cross-platform solution, that'd be great but primarily I need a Linux-based idea.

It would be nice if in the end I had a simple script that I tell a directory (possibly with subdirectories to look into recursively) with images. It "scans" these images for their height and width values, figures out if they are oriented in landscape (width > height) or portrait (height > width) or square (width = height) and prints out a list of filenames of the previously defined format.

This list could then be directed into a text-file or piped to another program etc.

Possible tools could be the "identify"-command from the great Imagemagick toolbox or maybe the "list"-option from the equally great feh image viewer.

Both are capable of figuring out the dimensions of an image.

Probably identifiy is the better choice since it allows a range of operations on the parameters of an image-readout.
It can also read out EXIF information.

[Update]
OK, maybe another possibility would be the excellent Perl library exiftool which - as its name suggests - is a Swiss-Army-knife to read/manipulate metadata.
You can specify certain tags to be read out such as ImageSize which will present you directly with the desired values.

I just learned that the identify-command has a similar option.
[Update]

[First possible solution]
for a in *; do identify -format "%f:%[fx:w/h]" $a; done

This will print out the filename (%f) followed by a colon and then the result of width divided by height (fx:w/h), e. g.
img_1654.jpg:1.5
img_1655.jpg:0.666667

It's fast and the CPU load looks reasonable even though not necessarily light.

Next steps are probably filtering out the pictures of the desired orientation (>1, <1, =1) and printing out the filenames.

I'm not sure if what I am doing here is really clever or just script-kiddie hacking.
[First possible solution]

[Quick-and-dirty solution for the moment]
Ok, now this will do for the moment but it's not really much more than a command-combination that needs to be turned into a proper script with input possibilites.

for a in *jpg; do identify -format "%[fx:w/h]%%%f" $a; done | sort | grep ^1 -v | sed 's/.*%//g'

As explained above this uses the identify-command from the ImageMagick toolbox to extract width and height of images. These values are divided and printed out in the form result%filename. This is now sorted so the portrait format pictures (< 1) are displayed first, followed by possible squares and then the landscape format ones. Doing an inverted grep of all lines with a 1 at the beginning sorts out only those in portrait layout. Using sed the numbers are eliminated and only the filenames remain.

Not really super-great but OK.

I would still be glad about input and advice. :)
[Quick-and-dirty solution for the moment]

I wonder what is the best way to go about this task, especially which is the fastest and less CPU-intense way.
Is it easier to read out the EXIF information and work with that because only the image headers need to be processed? feh - as far as I understand - loads the whole image to determine its dimensions and this takes time and calculation, esp. on large images.

I would be really thankful for ideas, hints to other people that may have challenged this idea before etc.

Thank you and - while I'm at it - merry christmas. :)

2 Comments / add your comment?

Paul Schubert says:
Pipe the output of identify into awk or perl.
Awk splits automatically lines in words, in perl that´s either a command line option or one statement. Both languages can also split the e.g. "800x600" in "800 600". The filename in the first output- field of identify is preserved and can be output as You need it.
Compare the image with with the image height - or the aspect ratio with 1. The result is eiter boolean ( landscape / portrait ), 0,1 or in addition "square", -1,0,1 or "less,equal,greater". That´s easy to convert into the output strings or return values that You need.

You can call identify directly from the perl script.
And I would strongly prefer not to use "for" in the shell but either pass "*" or "*.jpg" as command line parameter to identify or the perl script, that calls identify.
A further advantage of perl is, that You can quite easily search subdirectories. Remember that You can do everything with perl, that You can do with a shell script - much more and some things better.
A disadvantage of perl is the size of the program. You can decide yourself in every distinct case if You prefer the small texttools, including sed and awk - or if the single very powerful perl justifies the use of the quite big compiler. Keep in mind that in most cases You don´t need anything of the gigantic perl library.

I don´t care much about CPU usage when I use perl, but it should be always better than a pipeline, especially a long one. With the small amount of information that You need to process You won´t even have to care about memory usage. ( I don´t expect that You got less than 256 kB memory or more than 100.000 image files )

According to common sense reading the file header requires less CPU and memory resources than reading EXIF. It´s simply less information and afaik located in front of the EXIF Block. But it´s hard to tell if the tool, that You use, reads more information than that You need from the image files.
In any case calling identify ( or any standalone program ) for every image file is the worst possible solution concerning CPU time. Albeit calling a whole pipeline for every image file surely is worse.

Tell me if You want more explicit help. I would prefer a perl script but can handle bash, sed, awk and the texttools as well.
P.S. I have bash, perl, python, and the GNU tools running under Windows as well as under linux.
Posted 10 months ago. ( permalink )
Roberto Ballerini - travelingpro says:
Reading EXIF data have to be less resource intensive, but not all images have EXIF data inside; anyway, if I remember well, height and width are incorporated in all the most used graphical formats header, has they are shown in almost all the file browser. ImageMagick seems to me a good choice: low footprint and specialised tools. Renaming with a prefix and sorting/grepping can be a way to divide the files working in every OS.

--
Seen in modalo home page (?)
Posted 10 months ago. ( permalink )

Add your comment

Reply to this comment

Edit your comment

Please sign in to post a comment Sign in now?


rss Latest comments – Subscribe to the feed of comments related to this post.

 

Català | Čeština nové | 中文 | Deutsch | English | Español | Esperanto | Ελληνικά | Français | Galego | Italiano | Nederlands | Português | More...