Friday, January 8, 2010

you really need a library for that?

Oh Ruby masters. Really? Are you kidding me?

8 comments:

Paul Dix said...

yes, yes we do. URI.parse will parse a domain, but it won't tell you which part is the TLD and it won't cannonicalize a URL. Parsing out the actual domain (and not the tld) is more complex than just finding the .whatever.

Jon Scott Stevens said...

paul, it isn't complex and you don't need a whole library for what amounts to about 2 lines of code.

Paul Dix said...

it's definitely more complex than that. have a look at the public suffix list and you'll get it. If you can do it in two lines you're probably doing it wrong. Meaning oversimplifying and just looking for (com|org|net) or something silly like that.

either way, that's two lines of code that I won't have to write and test again.

Jon Scott Stevens said...

/**
* Gets the top level domain of the host. Ie ".com"
* @return .com if the TLD can't be parsed out.
*/
public static String getTLDFromHost(String host)
{
host = host.endsWith(".") && host.length() > 1 ? host.substring(0, host.length() -2) : host;
int val = host.lastIndexOf('.');
if (val > 0)
return host.substring(val);
else
return ".com";
}

Paul Dix said...

thanks, you've proven my point.

Jon Scott Stevens said...

Proven what point? That you have to do disk io to parse a file every time someone wants to instantiate your class?

I'm looking at your unit tests and you don't even understand what a TLD is. Please do some reading: http://en.wikipedia.org/wiki/Top-level_domain

from: http://github.com/pauldix/domainatrix/blob/master/spec/domainatrix/domain_parser_spec.rb

it "parses the tld" do
@domain_parser.parse("http://pauldix.net")[:tld].should == "net"
@domain_parser.parse("http://pauldix.co.uk")[:tld].should == "co.uk"
@domain_parser.parse("http://pauldix.com.kg")[:tld].should == "com.kg"
@domain_parser.parse("http://pauldix.com.aichi.jp")[:tld].should == "com.aichi.jp"
end

The TLD for pauldix.com.aichi.jp is ".jp", not "com.aichi.jp"

Paul Dix said...

ok, I should probably call it a public suffix. That would be more accurate. as for the disk IO, RTFM. If you're using the lib correctly (as documented in the README) that IO happens once. I take it this isn't a problem you've actually had to solve before.

Jon Scott Stevens said...

Seriously, what use case do you have to parse out the public suffix (aka: eTLD) of a domain? From the usage examples on your site, it appears as though you really just want the TLD. In that case, you really don't need to use the public suffix list at all. You just need the few lines of code I gave you.

And yes, I do understand the concept of initialization of a class.