From Google Books to PDF lickety-split!

So I got a Kindle for Xmas and wanted to start sticking some content on it. Google Books has some gems (some in full!) so i have just figure out this process of getting them into PDF format and then onto my Kindle. I use PDF as you get images from Google Books and not text. Somebody might wan to find a command line image-to-text converter (OSC) and stick this in at the end of this process (depending on how keen you are). There are some large text books i would consider doing it to, but for now…

  1. Use Firefox with the GreaseMonkey add-on and the Google Book Downloader script to generate a list of links to all the page images
  2. This will genarete a long list of links, one for each page, so you can then use the FireFox add-on DownThemAll! to… downlaod them all…
    • create a folder for them all to live in
    • under “Fast filter” I entered “‘books?id=” to select the book page links
    • Set the “Renaming Mask” to “*text*” (less the quotes)
  3. This should give you a folder with all the book pages as images. To convert these to PDF you will need imagemagick installed (on the Mac i recommend using Homebrew to get this installed quickly). Simply crack open the terminal/command-line:
    cd /folder/with/the/downloaded/images
    and then:
    convert *.png mynewbook.pdf

…and your done!

If you have a eBook reader them you might want to import the PDF with Calibre and upload to your respective device.

Defining the Semantic of Markup

Semantics in regard to HTML markup is a murky water. This is because web pages are usually not an essay style document, which HTML was designed to markup, and contain information that is not actually relevant to what the page is about. Examples would be: menus, shopping cart information, summaries of forum activity, and the other half of HTML design: user/human interfaces. To say or even think that HTML can encapsulate all the “meanings” that human language structures can come up with (which are actually infinite), not to mention the non-language structures found on web systems representing a computer system interface, is naive. It is also an assumption that has never been backed up by any standards body in argument and thats because its simply wrong. The Microformat standard and now POSH process seem to be unwittingly dealing with the problem without understanding it. This is actually an applied philosophy problem!

Continue reading “Defining the Semantic of Markup”