indexwritings › journal

grawity's journal

Opera's OBML format

A while ago, I came across old backups of my previous phone and the one before it; among them, several dozen of Opera Mini's "saved webpages" in .obml format. The problem is, OBML is an undocumented binary format – it's not recognized by the desktop Opera browser, it's not very portable across devices (the W760i had troubles loading pages saved on the S68), and older format versions aren't even supported by newer Opera Mini releases (I used to keep 3–4 old releases installed for that).

The usual method used by other people to read their old .obml saves was to run the actual Opera Mini client inside MicroEmulator, but it still has the format version problem, and there's no way to export long texts from the emulated app into the host (not even copy & paste). Aside from MicroEmu, the only app I've found that understands the format was OBML Viewer, which does work well, but is still limited to fairly old format versions, it's also Windows-only and closed-source, and Chrome keeps telling me that the website is really shady.

So I ended up writing my own OBML parser – at first only to extract original URLs from the pages, but as some have been taken down by now, I eventually extended it to dump the body text; now it fully converts the OBML files to regular HTML pages, with embedded images, layouts, forms, and everything. (It's not a perfect conversion, as OBML layouts are entirely pixel-positioned and depend on the device's font metrics, but it's probably as close as I could get.)

Along with the parser code, there's some basic documentation on the file structure. There are still a few unknown-purpose chunks and several unrecognized header fields, but it's enough to reproduce the page layout. Right now the parser and format documentation are in my hacks repo, though I'll eventually Git-ify them properly.


year 2015

year 2014

year 2013

year 2012

year 2011

year 2010

year 2009