Thursday, March 6, 2014

I miss Steve Jobs

This morning, my dad sent me this screenshot from his several months old Macbook Pro... sigh.

Friday, January 17, 2014

How to get to the Marieta islands

On our recent trip to Sayulita, our friends recommended we check out the Marieta islands for the amazing hidden beach that you have to swim into.  Unfortunately, we were there during a busy time of year and all of the chartered boats companies in Sayulita were sold out.  It is actually a good thing because it turns out they tend to oversell their boats and pack people into them. Also, Sayulita and Puerto Vallarta are pretty far from the islands. That's a whole day trip right there.  Asking Google for directions only just came up with these commercial companies, we thought we weren't going to get to go.

So, instead of playing that game, we drove to the small town of Punta Mita, which is the closest town to the islands.  We had a good lunch at one of the local places and then started walking down the main strip.  Within 5 minutes someone approached us and offered to take us out on a private tour for a decent rate.  I'm sure the price was somewhat inflated for the gringos, but whatever, we had the whole boat to ourselves and it wasn't going to take all day.

On the way out to the islands, we saw a bunch of humpback whales... the guy who offered the boat up gave a money back guarantee that we'd see whales. I guess we were paying for sure now.

Once we were at the island, they anchored the boat, we jumped in the water swam through the cave to the beach in and we hung out for a while on the sand and people watched.  After we swam back to the boat, we drove around the entire island, saw another cave (no beach) and got sprayed by swells hitting the rocks.

Unfortunately, we didn't see any more whales on the way back as the day was pretty much over.  That said, it was an excellent adventure and I'm really glad how it all worked out.

Tuesday, January 14, 2014

After all these years, I still write Java code...

I wish I had another hand so I could give Scala three thumbs down!

"New languages are indubitably exciting to learn and play with, and everyone is interested in improving upon what we already have. Sometimes it appears that languages like Scala or Clojure or Ceylon are the fix that the Java ecosystem needs to improve productivity and obtain that linear scalability that simply doesn't exist with traditionally written Java applications. But the fact is, Java, despite some misgivings, is a well thought out language that is both powerful and consistent, making it easy to learn, and more importantly, easy to maintain. Sure, new systems will appear that will try to knock the crown off the Java language, but for now, the want-to-be emperors of the JVM are increasingly being shown to be wearing no clothes."

Pretty much exactly how I've felt all along.

Wednesday, January 1, 2014

Rental Car Insurance

I recently took a trip to Puerto Vallarta, Mexico with my wife and rented a car for our 8 day stay. I had done the research in the past with regards to insurance and knew that as long as I use my VISA card, I should decline any extra coverage that the rental car agency offers. It turns out that in Mexico, you need at least one basic level of coverage called SLI or SAI insurance. This covers you against damage to someone else. This ran us an extra $115 on our bill. If we had gone for their full inclusive insurance, it would have been an additional $300! Way more than we paid to rent the car.

I've also read articles like this one that claim "Declining to buy the insurance (some of which is mandatory, anyway) is foolhardy to the extreme, but buying the full package without knowing what you're buying is only slightly less so." The article must be out of date or I got bad information from the person at the rental counter, but the SLI/SAI insurance was mandatory.

Well, to make a long story shorter, our brand new rental car got a nice big scratch on the side of it...

In the US, this would cost probably $1000-1500 to fix. This had me worried that we would run into all sorts of trouble at the rental car company, so on our last day, we left a bit early to take care of things. I kept thinking, maybe we should have gotten the all inclusive insurance.

When we arrived, they noticed the scratch immediately, of course. They were very nice about it and simply asked for a copy of the original rental agreement through the 3rd party company that we got the car from. They asked me to write up an accident report detailing what happened. This was a simple sentence. Then, they told me the price for the damage... only about $89! I didn't argue it. I've opened a ticket with VISA and I expect they will pay it after some period of time.

This got me thinking... the rental car place must have their own insurance which covers mishaps like this. Why in the world would anyone go with the all inclusive insurance for $300+ when simple damage can be paid for relatively cheaply by the rental car company itself. Sure, there is probably a risk of total loss of the car, but that is super rare. Even still, VISA would cover it under their own insurance.

So, unless you don't have VISA coverage, don't fret not getting the extra insurance. I'm sure others have worse stories, but this one turned out pretty well for me.

Wednesday, October 24, 2012

How to determine DKIM key length?

I read this article in Wired about a researcher who figured out that Google was using a weak (512-bit) key for its implementation of DKIM. It turns out this is old news as someone did the same thing to Facebook.

This got me wondering if the keys for our emailing service are long enough. But, how to easily determine the length of the key? Turns out it is kind of convoluted so I decided to repeat the info here for my own benefit when I forget about this later.

Take your public DKIM key (probably from your DNS TXT record). It looks like this one from Google: 86400 IN TXT "k=rsa\; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAp5kQ31/aZDreQqR9/ikNe00ywRvZBFHod6dja+Xdui4C1y8SVrkUMQQLOO49UA+ROm4evxAru5nGPbSl7WJzyGLl0z8Lt+qjGSa3+qxf4ZhDQ2chLS+2g0Nnzi6coUpF8r" "juvuWHWXnzpvLxE5TQdfgp8yziNWUqCXG/LBbgeGqCIpaQjlaA6GtPbJbh0jl1NcQLqrOmc2Kj2urNJAW+UPehVGzHal3bCtnNz55sajugRps1rO8lYdPamQjLEJhwaEg6/E50m58BVVdK3KHvQzrQBwfvm99mHLALJqkFHnhyKARLQf8tQMy8wVtIwY2vOUwwJxt3e0KcIX6NtnjSSwIDAQAB"

Save the p= part of the TXT record into a file (google.key) that is line wrapped to around 78 columns (yes, it needs to be line wrapped or the openssl command used below breaks). Google seems to store their key in two parts so I removed the " " that is embedded in the middle of the blob above. You will also need the BEGIN/END PK block.

-----END PUBLIC KEY-----

Then run openssl rsa -noout -text -pubin < google.key

Which then outputs this chunk with your answer:

Modulus (2048 bit):
Exponent: 65537 (0x10001)

Wednesday, May 9, 2012

Scrape webpages with node.js

I recently had the task of scraping data from a website so I choose to use node.js in order to get a bit more experience with it. After checking out a few different options for scraping, I finally settled on the project which provided the most robust handling and configuration features that I could find.

The hard part about scraping data from websites is coming up with ways to quickly and reliably pick out pieces from the document object model (DOM). These days, I spend a lot of time using the jQuery selector syntax to develop my site which means that ideally I'd find a solution that can download a webpage and then provide me with jQuery-like functions and selectors to pick out pieces from the DOM. For this purpose, uses a project called node-soupselect by default, but I found the selector syntax to be lacking. Thus, I layered another project called cheerio on top. Whatever you do, don't use jsdom as it is too slow and very strict in its processing of html. stands out from the rest of the projects because it applies a 'jobs' approach to scraping. This is something that I used in another project of mine and it worked out really well. In other words, you write a 'job' which gets executed and if there is a failure during the run of the job, you have control over what you do next (skip, fail, retry).

In order to develop and debug your code, you shouldn't continually hit the website that you are scraping from. What I do is download the page I want to scrape and then run a local webserver to serve up the file. I found the python webserver the easiest to use as it is just one simple command, python -m SimpleHTTPServer 8000 which will serve up files from whichever directory you run that command from.

For debugging code, I recommend setting up the excellent node-inspector which will allow you to setup breakpoints in your code and step through to inspect objects as necessary, just like you do with web page JavaScript development. This becomes invaluable with JavaScript because the lack of types makes it hard to know what properties objects have. For logging output to the console so that I could keep track of the execution, I ended up with the nlogger project which I wasn't super happy with, but worked well enough for this project.

Writing your first job is easy and if you are a CoffeeScript (CS) fan, will automatically compile your CS files for you.  If you aren't a CS fan, I apologize as my example is in CS. This simple job @get's a page and selects the <title> element:

nodeio = require ''
cheerio = require 'cheerio'
log = require('nlogger').logger(module)

count = 0
class InputScraper extends nodeio.JobClass
    input: [476,1184]
    run: (inputId) ->
        @get "http://localhost:8000/#{inputId}.html", (err, data, headers) =>
            @exit(err) if err?
      'Started: {}', inputId)
                $ = cheerio.load(data)
      "(#{count}) Finished inputId: #{inputId}")
            catch error
                log.error('Error: {} : {}', inputId, error.stack)
    output: (data) ->
@class = InputScraper
@job = new InputScraper({spoof:true, max: 1})


The 'run' function is called for each piece of input data in the array (476, 1184). @get() grabs the html page data. On success, @get() executes the callback function which loads the data into cheerio and @emit()'s the title. When the 'run' function is complete, calls output which logs the @emit() data to the console.

In my code, the line above the @emit(), I have another class function which I pass the $ cheerio object into which handles all of my parsing and the result is an object that I pass into @emit(). This allows me to re-use the InputScraper boilerplate to parse all sorts of different pages.

I also run things directly with node 'nodeio.start(@job)' so that I can use the node-inspector more easily. This means that I also end up actually compiling the CS myself using my answer on StackOverflow.

Obviously, this is a pretty simple example, but it should get you up and running with the framework quickly. The speed of this is fairly impressive, on my desktop and home network, I was able to crawl and extract data from about 2000 webpages using max: 20 in about 1.5 minutes. Most of the time is spent downloading the pages and the parsing only takes a few milliseconds.

Thursday, March 29, 2012

The first comprehensive review of 22 online athletic event registration services. The Good, the Bad, and the Ugly.

Before we decided to build Voost, we spent a lot of time studying the myriad registration services already available online. There are a staggeringly large number of choices with wildly differing levels of sophistication. Comparing them is incredibly difficult, in no small part due to the fact that many of these websites seem to go out of their way to hide critical information like fees and disbursement schedules.

Over a hundred hours of tedious research, creating accounts, combing through documentation and FAQs, setting up test events, going through registration processes, calculating fees, etc. While we certainly have biases, we have tried to present the information as objectively as possible - this isn't a cheap marketing gimmick designed to show green checkboxes for us and red Xes for our competitors. We freely admit that there are features other services have that Voost does not (yet!) - and this information is reflected on the table.

The table is not complete. We have left fields blank where we just couldn't figure out the answers (and believe me, we tried). Despite our best efforts, there may also be errors - again, some websites seem to deliberately befuddle attempts at objective comparison.

Wednesday, January 25, 2012

GitHire spam again...

Just keeping this around for posterity since they deleted the HN posting and maybe others will want to see the truth if they are googling around for why you are getting spam from these guys.

I called them out for being a the spammers that they are, and I got a rather odd response:
Hi LatchKey,
I'm really sorry that we sent you that email. We just launched a little over a week ago with this crazy idea, and were extremely surprised at how quickly we were overwhelmed with orders.
We made a bad judgement call in sending some emails to people asking if anyone is interested in jobs.
If it makes you feel any better, you can see that we aren't finding very many talented engineers, and we will likely need to refund a lot of money in a few days.
We are honestly trying to be a great service for software developers and employers. We need feedback from people like yourself to learn how we can be the best service possible to reshape the hiring industry.
We actually sent you an email, but never heard back. Please let us know if you're interested in continuing this discussion further on or off of a public message board.
Thanks for keeping us honest.
The HN thread goes on with a lot of people pointing out their own issues with GitHire and I have my response to the above quote here:

Sorry, but I just don't have any tolerance for spammers or people trying to profit off my profile without my permission.

Sunday, January 22, 2012

Scalable System Architecture Comedy

I was reading Scaling a PHP MySQL Web Application, which is a technical document published on the Oracle website. As I was scrolling down the page, I saw the typical Load balancing Figure 1 that you always see in any PHP/MySql web application.

But then, as the article goes on, it gets more entertaining. It goes on to Figure 3, showing Multiple MySQL Slaves, which is now 4 machines.

But wait, there's more. Now you need a dedicated database slave for each Web server, so the picture expands to even more lines and arrows in Figure 4. A total of 8 machines.

As you keep scrolling, you get to Figure 5. A real gem of an image. Arrows in every direction. Arrows jumping through other arrows. 8 machines, but a completely incomprehensible image.

Ok, now we've randomized the connections between all the web servers and database slaves.

Could you imagine one of these machines going down or throwing errors and trying to figure out which one it is or how it connects to the other machines?

We all know that as systems grow, they get more complex. That said, if you draw an incomprehensible picture of your architecture, it is a clear sign that you are doing it wrong.

Monday, January 16, 2012 is a spammer

I've kept up with all the recent HackerNew articles on GitHire. They seemed like a rather interesting service because I believe that hosting your projects on a site like GitHub, is the best kind of resume a software engineer can have.

That is, until they just spammed me with a whole list of completely random and unrelated jobs. I could understand it if I signed up to their site and requested spam (aka: LinkedIn), but I'm definitely not interested as I'm in the middle of starting my own company!

After checking out their site, I realized they've also got a profile up for me that I had no hand in creating, nor desire having. Apologies if that link stops working, hopefully they will remove my information soon. Maybe I should be pissed that I'm only in their Top 50%? ;-)

Since I wanted to remove myself, I clicked the "Opt out of Githire" link, which then takes me to a page on Github to authorize their application?!?

Hell no, I'm not going to authorize your application, just so I can opt out of your website. That is wrong on so many levels.

Anyway, I cc'd [email protected] on my response to 'Steve', so hopefully they will be going away soon. I can't see how GitHub is allowing this company to exist, when they so clearly violate their terms of service policy.