2012

0

Photo import workflow

Posted on Sunday, 1 July 2012

Introduction

Since I'm writing about workflows today, I thought I'd also quickly chuck in a guide to how I get the photos and movies that I've taken with my iPhone, onto my laptop and specifically, imported into Aperture.

The Mechanics

This requires a few moving parts to produce a final workflow. The high-level process is:


  1. Plug iPhone into a USB port
  2. Copy photos from the iPhone into a temporary directory, deleting them as they are successfully retrieved
  3. Import the photos into Aperture, ensuring they are copied into its library and deleted from the temporary directory
Simple, right? Well yes and no.

Retrieval from iPhone

This really ought to be easier than it is, but at least it is possible.

Aperture can import photos from devices, but it doesn't seem to offer the ability to delete them from the device after import. That alone makes it not even worth bothering with if you don't want to build up a ton of old photos on your phone.

OS X does ship with a tool that can import photos from camera devices and delete the photos afterwards, a tool called AutoImporter.app, but you won't find it without looking hard. It lives at:

/System/Library/Image Capture/Support/Application/AutoImporter.app
If you run that tool, you will see no window, just a dock icon and some menus. Go into its Preferences and you will be able to choose a directory to import to, and choose whether or not to delete the files:


Easy!

Importing into Aperture


This involves using Automator to build a Folder Action workflow for the directory that AutoImporter is pulling the photos into. All it does is check to see if AutoImporter is still running and if so wait, then launch Aperture and tell it to import everything from that directory into a particular Project, and then delete the source files:


That's it!


Really, that's all there is. Now whenever you plug in your iPhone, all of the pictures and movies you've taken recently, will get imported into Aperture for you to process, archive, touch-up, export or whatever else it is that you do with your photos and movies.

0

Paperless workflow

Introduction


This is going to be quite a long post, but hopefully interesting to a particular crowd of people.
I'm going to tell you all about how I have designed and built a paperless workflow for myself.

Background


This came about some months ago when I needed to find several important documents that were spread through the various organised files that I keep things in. The search took much longer than I would have liked, partly because I am not very efficient at putting paper into the files.
You could suggest that I just get better at doing that, but even if I were to do that, it still only makes me quicker at finding paperwork from the files on my shelf. If I want to really kick things up a gear, the files need to be electronic, accessible from anywhere and powerfully searchable.

The hardware


I started thinking about what I would want. Obviously a scanner was going to be the first pre-requisite of being able to digitise my papers, but what kind to get? After investigating what other people had already said about paperless workflows, it seemed like the ScanSnap range of scanners was a popular choice, but they are quite expensive and it's one more thing on my desk. Instead I decided to go for a multi-function inkjet printer - they have scanners that are good enough, and even though they're bigger than a ScanSnap, I'm also getting a printer in the bargain.
So which one to get? Well that depended on which features were important. My highest priority in this project was that the process of taking a document from paper to my laptop had to be as simple as possible, so in the realms of scanning devices, that means you need one which can automatically scan both sides of the paper.
This turns out to be quite rare in multi-function printers, but after a great deal of research, I found the Epson Stylus Office BX635FWD which has a duplex ADF (Automatic Document Feeder), is very well supported in MacOS X, and is a decent printer (which, for bonus points, supports Apple's AirPrint and Google's Cloud Print standards).

The setup of the Epson was extremely pleasing - it has a little LCD screen and various buttons, which meant that I could power it up and join it to my WiFi network without having to connect it to a computer via USB at all. I then added it as a printer on my laptop (which was easy since the printer was already announcing itself on the WiFi network) and OS X was happy to do both printing and scanning over WiFi.

I then investigated the Epson software for it and found that I didn't have to install a giant heap of drivers and applications, I could pick and choose which things I had. Specifically I was interested in whether I could react to the Scan button being pressed on the printer, even though it was not connected via USB. It turns out that this is indeed possible, via a little application called EEventManager. With that setup to process the scans to my liking (specifically, Colour, 300DPI, assembled into a PDF and saved into a particular temporary directory), the hardware stage of the project was over.

With the ability to turn paper into a PDF with a couple of button presses on the printer itself, I was ready to figure out what to do with it next.

The software


As people with a focus on paperless workflows (such as David Sparks) have rightly pointed out, there are several stages to a paperless workflow - capture, processing and recall. At this point I had the capture stage sorted, so the next one is processing.

When you have a PDF with scanned images inside it, you obviously can't do anything with the text on the pages, it's not computer-readable text, it's a picture, but it turns out that it is possible to tell the PDF what the words are and where they are on the page, which makes the text selectable. So my attention turned to OCR (Optical Character Recognition) software. I didn't engage in a particularly detailed survey because I came across a great deal on Nuance's PDF Converter For Mac product and was so impressed with its trial copy that I snapped up the deal and forged ahead. I hear good things about PDFPen, but I've never tried it.

Automation


Having a directory full of scanned documents and some OCR software is a good place to be, but it's not a great place to be unless you can automate it. Fortunately, OS X has some pretty excellent automation tools.
The magic all happens in a single Automator workflow configured as a Folder Action on the directory that EEventManager is saving the PDFs into:

It will find any PDF files in that temporary folder, then loop over them, opening each one in Nuance PDF Converter, run the OCR function then save the PDF. The file is then moved to an archive directory and renamed to a generic date/time based filename. That's it.

That's it


Like I said, that's it. If you've been paying attention, at this point you'll say "but wait, you said there was a third part of a paperless workflow - you need tools to recall the documents later!". You would be right to say that, but the good news is that OS X solves this problem for you with zero additional effort.
As soon as the PDF is saved with the computer-readable text that the OCR function produces, it is indexed by the system's search system - Spotlight. Now all you need to do is hit Cmd-Space and type some keywords, you'll see all your matching documents and be able to get a preview. You can also open the search into a Finder window and see larger previews, change the sorting, edit the search terms, etc.

Future work


While that is it, there are future things I'd like to do - specifically I don't currently have an easy way to pull in attachments from emails, or downloaded PDFs, I have to go and drag them into the archived folder and optionally rename them. However, if you have your email hooked into the system email client (Mail.app) then it is being indexed by Spotlight, including attachments, so there's no immediate hurry to figure out a solution for that.

I do also like the idea of detecting specific keywords (e.g. company names) in the documents and using those to file the PDFs in subdirectories, but I'm not sure if I actually need/want it, so for now I'm sticking with one huge directory of everything.

2

A sysadmin talks OpenSSH tips and tricks

Posted on Tuesday, 7 February 2012

My take on more advanced SSH usage
I've seen a few articles recently on sites like HackerNews which claimed to cover some advanced SSH techniques/tricks. They were good articles, but for me (as a systems administrator) didn't get into the really powerful guts of OpenSSH.
So, I figured that I ought to pony up and write about some of the more advanced tricks that I have either used or seen others use. These will most likely be relevant to people who manage tens/hundreds of servers via SSH. Some of them are about actual configuration options for OpenSSH, others are recommendations for ways of working with OpenSSH.

Generate your ~/.ssh/config
This isn't strictly an OpenSSH trick, but it's worth noting. If you have other sources of knowledge about your systems, automation can do a lot of the legwork for you in creating an SSH config. A perfect example here would be if you have some kind of database which knows about all your servers - you can use that to produce a fragment of an SSH config, then download it to your workstation and concatenate it with various other fragments into a final config. If you mix this with distributed version control, your entire team can share a broadly identical SSH config, with allowance for each person to have a personal fragment for their own preferences and personal hosts. I can't recommend this sort of collaborative working enough.


Generate your ~/.ssh/known_hosts
This follows on from the previous item. If you have some kind of database of servers, teach it the SSH host key of each (usually something like /etc/ssh/ssh_host_rsa_key.pub) then you can export a file with the keys and hostnames in the correct format to use as a known_hosts file, e.g.:

server1.company.com 10.0.0.101 ssh-rsa BLAHBLAHCRYPTOMUMBO
You can then associate this with all the relevant hosts by including something like this in your ~/.ssh/config:
Host *.mycompany.com
  UserKnownHostsFile ~/.ssh/generated_known_hosts
  StrictHostKeyChecking yes
This brings some serious advantages:
  • Safer - because you have pre-loaded all of the host keys and specified strict host key checking, SSH will prompt you if you connect to a machine and something has changed.
  • Discoverable - if you have tab completion, your shell will let you explore your infrastructure just by prodding the Tab key.
Keep your private keys, private, private
This seems like it ought to be more obvious than it perhaps is... the private halves of your SSH keys are very privileged things. You should treat them with a great deal of respect. Don't put them on multiple machines (SSH keys are cheap to generate and revoke) and don't back them up.


Know your limits
If you're going to write a config snippet that applies to a lot of hosts you can't match with a wildcard, you may end up with a very long Host line in your ssh config. It's worth remembering that there is a limit to the length of lines: 1024 characters. If you're going to need to exceed that, you will have to just have multiple Host sections with the same options.

Set sane global defaults
HashKnownHosts no
Host *
  GSSAPIAuthentication no
  ForwardAgent no
These are very sane global defaults:
  • Known hosts hashing is good for keeping your hostnames secret from people who obtain your known_hosts file, but is also really very inconvenient as you are also unable to get any useful information out of the file yourself (such as tab completion). If you're still feeling paranoid you might consider tightening the permissions on your known_hosts file as it may be readable by other users on your workstation.
  • GSSAPI is very unlikely to be something you need, it's just slowing things down if it's enabled.
  • Agent forwarding can be tremendously dangerous and should, I think, be actively and passionately discouraged. It ought to be a nice feature, but it requires that you trust remote hosts unequivocally as if they had your private keys, because functionally speaking, they do. They don't actually have the private key material, but any sufficiently privileged process on the remote server can connect back to the SSH agent running on your workstation and request it respond to challenges from an SSH server. If you keep your keys unlocked in an SSH agent, this gives any privileged attacker on a server you are logged into, trivial access to any other machine your keys can SSH intoIf you somehow depend on using agent forwarding with Internet facing servers, please re-consider your security model (unless you are able to robustly and accurately argue why your usage is safe, but if that is the case then you don't need to be reading a post like this!)
Notify useful metadata
If you're using a Linux or OSX desktop, you either have something like notify-send(1) or Growl for desktop notifications. You can hook this into your SSH config to display useful metadata to yourself. The easiest way to do this is via the LocalCommand option:
Host *
  PermitLocalCommand yes
  LocalCommand /home/user/bin/ssh-notify.sh %h
This will call the ssh-notify.sh script every time you SSH to a host, passing the hostname you gave, as an argument.  In the script you probably want to ensure you're actually in an interactive terminal and not some kind of backgrounded batch session - this can be done trivially by ensuring that tty -s returns zero. Now the script just needs to go and fetch some metadata about the server you're connecting to (e.g. its physical location, the services that run on it, its hardware specs, etc.) and format them into a command that will display a notification.

Sidestep overzealous key agents
If you have a lot of SSH keys in your ssh-agent (e.g. more than about 5) you may have noticed that SSHing to machines which want a password, or those which you wish to use a specific key that isn't in your agent, can be quite tricky. The reason for this is that OpenSSH currently seems to talk to the agent in preference to obeying command line options (i.e. -i) or config file directives (i.e. IdentityFile or PreferredAuthentications). You can force the behaviour you are asking for with the IdentitiesOnly option, e.g.:
Host server1.company.com
  IdentityFile /some/rarely/used/ssh.key
  IdentitiesOnly yes
(on a command line you would add this with -o IdentitiesOnly=yes)

Match hosts with wildcards
Sometimes you need to talk to a lot of almost identically-named servers. Obviously SSH has a way to make this easier or I wouldn't be mentioning this. For example, if you needed to ssh to a cluster of remote management devices:
Host *.company.com management-rack-??.company.com
  User root
  PreferredAuthentications password
This will match anything ending in .company.com and also anything that starts with management-rack- and then has two characters, followed by .company.com.

Per-host SSH keys
You may have some machines where you have a different key for each machine. By naming them after the fully qualified domain names of the hosts they relate to, you can skip over a more tedious SSH config with something like the following:
Host server-??.company.com
  IdentityFile /some/path/id_rsa-%h
(the %h will be substituted with the FQDN you're SSHing to. The ssh_config man page lists a few other available substitutions.)

Use fake, per-network port forwarding hosts
If you have network management devices which require web access that you normally forward ports for with the -L option, consider constructing a fake host in your SSH config which establishes all of the port forwards you need for that network/datacentre/etc:
Host port-forwards-site1.company.com
  Hostname server1.company.com
  LocalForward 1234 10.0.0.101:1234
This also means that your forwards will be on the same port each time, which makes saving certificates in your browser a reasonable undertaking. All you need to do is ssh port-forwards-site1.company.com (using nifty Tab completion of course!) and you're done. If you don't want it tying up a terminal you can add the options -f and -N to your command line, which will establish the ssh connection in the background.
If you're using programs which support SOCKS (e.g. Firefox and many other desktop Linux apps) you can use the DynamicForward option to send traffic over the SSH connection without having to add LocalForward entries for each port you care about. Used with a browser extension such as FoxyProxy (which lets you configure multiple proxies based on wildcard/regexp URL matches) makes for a very flexible setup.

Use an SSH jump host
Rather than have tens/dozens/hundreds/etc of servers holding their SSH port open to the Internet and being battered with brute force password cracking attempts, you might consider having a single host listening (or a single host per network perhaps), which you can proxy your SSH connections through.
If you do consider something like this, you must resist the temptation to place private keys on the jump host - to do so would utterly defeat the point.
Instead, you can use an old, but very nifty trick that completely hides the jump host from your day-to-day usage:
Host jumphost.company.com
  ProxyCommand none
Host *.company.com
  ProxyCommand ssh jumphost.company.com nc -q0 %h %p
You might wonder what on earth that is doing, but it's really quite simple. The first Host stanza just means we won't use any special commands to connect to the jump host itself. The second Host stanza says that in order to connect to anything ending in .company.com (but excluding jumphost.company.com because it just matched the previous stanza) we will first SSH to the jump host and then use nc(1) (i.e. netcat) to connect to the relevant port (%p) on the host we originally asked for (%h). Your local SSH client now has a session open to the jump host which is acting like it's a socket to the SSH port on the host you wanted to talk to, so it just uses that connection to establish an SSH session with the machine you wanted. Simple!

For those of you lucky enough to be connecting to servers that have OpenSSH 5.4 or newer, you can replace the jump host ProxyCommand with:
ProxyCommand ssh -W %h:%p jumphost.company.com
Re-use existing SSH connections
Some people swear by this trick, but because I'm very close to my servers and have a decent CPU, the setup time for connections doesn't bother me. Folks who are many milliseconds from their servers, or who don't have unquenchable techno-lust for new workstations, may appreciate saving some time when establishing SSH connections.
The idea is that OpenSSH can place connections into the background automatically, and re-use those existing secure channels when you ask for a new ssh(1), scp(1) or sftp(1) connections to hosts you have already spoken to. The configuration I would recommend for this, would be:
Host *
  ControlMaster auto
  ControlPath ~/.ssh/control/%h-%l-%p
  ControlPersist 600
This will do several things:
  • ControlMaster auto will cause OpenSSH to establish the "master" connection sockets as needed, falling back to normal connections if something is wrong.
  • The ControlPath option specifies where the connection sockets will live. Here we are placing them in a directory and giving them filenames that consist of the hostname, login username and port, which ought to be sufficient to uniquely identify each connection. If you need to get more specific, you can place this section near the end of your config and have explicit ControlPath entries in earlier Host stanzas.
  • ControlPersist 600 causes the master connections to die if they are idle for 10 minutes. The default is that they live on as long as your network is connected - if you have hundreds of servers this will add up to an awful lot of ssh(1) processes running on your workstation! Depending on your needs, 10 minutes may not be long enough.
Note: You should make the ~/.ssh/control directory ahead of time and ensure that only your user can access it.

Cope with old/buggy SSH devices
Perhaps you have a bunch of management devices in your infrastructure and some of them are a few years old already. Should you find yourself trying to SSH to them, you might find that your connections don't work very well. Perhaps your SSH client is too new and is offering algorithms their creaky old SSH servers can't abide. You can strip down the long default list of algorithms to this to ones that a particular device supports, e.g.:
Host power-device-1.company.com
  HostkeyAlgorithms ssh-rsa,ssh-dss
That's all folks
Those are the most useful tips and tricks I have for now. Hopefully someone will read this and think "hah! I can do much more advanced stuff than that!" and one-up me :)
Do feel free to comment if you do have something sneaky to add, I'll gladly steal your ideas!

0

Evil shell genius

Posted on Monday, 23 January 2012

Jono Lange was committing acts of great evil in Bash earlier today. I gave him a few pointers and we agreed that it was sufficiently evil that it deserved a blog post.

So, if you find yourself wishing you could get pretty desktop notifications when long-running shell commands complete, see his post here for the details.

0

HP Microserver Remote Access helper

Posted on Friday, 6 January 2012

I've only had the Remote Access card installed in my HP Microserver for a few hours and already I am bored of accessing it by first logging into the web UI, then navigating to the right bit of the UI, then clicking a button to download a .jnlp file and then running that with javaws(1).

Instead, I have written some Python that will login for you, fetch the file and execute javaws. Much better!

You can find the code: here and you'll want to have python-httplib2 installed.

0

HP Microserver Remote Access Card

Posted on Thursday, 5 January 2012

I've been using an HP ProLiant Microserver (N36L) as my fileserver at home, for about a year and it's been a really reliable little workhorse.
Today I gave it a bit of a spruce up with 8GB of RAM and the Remote Access Card option.

Since it came with virtually no documentation, and since I can't find any reference online to anyone else having had the same issue I had, I'm writing this post so Google can help future travellers.

When you are installing the card, check in the BIOS's PCI Express options that you have set it to automatically choose the right graphics card to use. I had hard coded it to use the onboard VGA controller.

The reason for this is that the RAC card is actually a graphics card, so the BIOS needs to be able to activate it as the primary card.

If you don't change this setting, what you will see is the RAC appear to work normally, but its vKVM remote video feature will only ever show you a green screen window, with the words "OUT OF RANGE" in yellow letters.

Annoyingly, I thought this was my 1920x1080 monitor confusing things, so it took me longer to fix this than it should have, but there we go.