goblu, Ohio. A little east of Toledo. A little north of Genoa. It’s a paper town. You may think that it might, oh, I don’t know, be known for having a big papermill. Or, perhaps, was once of the home of an important news *paper*. Good guesses, but you’d be wrong.
+++
goblu, Ohio and its sister city, beatosu, Ohio don’t exist. So, who put Go Blue! And Beat OSU on the map – literally? Peter Fletcher, a Michigan alumnus and chairman of the State Highway Commission in 1978.
Now, you’re going to ask, why? Well, craps and cackles is one reason. The other, some speculate, is that people who make maps by copying other maps fall into the trap of including things that could only come from you. He should have named it gotcha.
Let’s see how this plays in Prospect Park. One map maker sued another for stealing NYC taxi maps lock stock and two smoking trap streets. No fuzzy dice. The court said you can’t mix facts and fiction and expect to hold copyright. Because, “If such were the law, information could never be reproduced or widely disseminated.”
That was in 1991. Frankly, I wish I had known this back then. In the latter part of ’91, I was in grad school. Profs would create note sets for classes. Kinkos (the predecessor to FedEx Office) would sell copied, stapled note sets for $20+. This is when I would use ATMs that would dispense $5 bills because sawbucks were out of my league. Now, I could have copied the pages from a friend for three of four dollars. But I was under the impression copying stuff was, oh, I don’t know, wrong. Or illegal. Going into a library and photocopying stuff seemed even more out of bounds. Stupid. Stupid. Stupid.
I was thinking small. Watch. In 2004, Google started to copy every page of every book in every library. Now, if I took a book. I’d get fined. If Google takes every book, they’re fine. Because, in Google’s aggregation engine, those books were… wait for it… transformed. Eleven years later, in 2015, the Supreme Court sided with… Google.
From 1988 to 1992, internet traffic grew from a million packets of information to 150 billion. What happens to 150 billion packets of information that can be copyrighted? They get stolen. We’ve spent thirty plus years searching engines that aggregate content. We’ve spent most of that time being targeted by data about us we didn’t know was public. The companies that amassed the most data most quickly won the biggest prizes. And, for the most part, it was legal.
As we leave the aggregation phase of the internet and enter the artificial era, you’d think, ok, hope, ok-ok, pray, that intellectual property rights would be protected. That people who create content would somehow come out on top. And that all of this would unleash a wave of novel ideas that ushers in a digital renaissance so we all benefit. You’d be hoping up the wrong tree.
The best protection sites have from being scraped is topping it with a file called robots.txt. While robots is not legally binding, Perplexity’s CEO said the company honors the polite ask. Turns out robots is less effective than a ripped lambskin condom and sites that use it rely on the kindness of strangers more than Tennessee’s Blanche did in Streetcar.
Perplexity hid their IP address, scraped sites like Condé Nast, The Guardian, Forbes, and The New York Times. For non-technical folks reading this, this is like walking into a store wearing a mask and walking out with a bunch of stuff without paying. Masks don’t make things legal.
This is where an aggravating story becomes squirrely.
Aravind Srinivas, Perplexity’s CEO said they didn’t do it. Then, said we used a third-party service we didn’t control. That’s, “We didn’t rob the bank. We just asked someone to go in, rob it, and bought stuff from the person on the way out.” Then he said, Perplexity honors robots.txt, but if a user persists on asking something Perplexity doesn’t have, then, yeah, sure, Perplexity has to ignore the robots file. That’s, “We don’t rob banks unless someone we know asks us for the stuff inside.”
You see, Aravind is a smart guy. PhD from UC Berkeley in computer science smart. We don’t get the things the things he gets. He explains all of this like this, “We never ripped off content from anybody. Our engine is not training on anyone else’s content. We simply aggregate what other companies’ AI systems generate.” Ah, Google vs. books.
And, here is where it goes from squirrely to face palm emoji.
Forbes (of all publishers) found that Perplexity didn’t just aggregate content and transform it. They flat out duped it. goblu, beatosu, and all.
How do we know? A developer named Robb Knight set up a sting operation. That’s like putting ink all over the money in the vault. He coded his site so it couldn’t be scraped and put content there that he knew to be unique. Then, he asked Perplexity just the right question. It answered with his unique text. That’s like getting the robber to pay with the stolen, inked bills.
gotcha.
And, from face palm emoji to what the hell was I thinking?
Jeff Bezos is an investor in Perplexity. You may also know that Mr. Bezos started Amazon. Ok. That’s totally normal. Rich folks make investments. Perplexity runs on Amazon’s web services (AWS). Also, totally normal. Lots and lots of companies do.
This part is not normal. People who own sites are complaining to Amazon that Perplexity disregarded Amazon’s terms of service, “Thou shalt not steal, thou shalt honor thy mother and thy father and robots.txt, those kinds of thous.” Forget crappy digital laws. Amazon has to investigate if Perplexity broke Amazon’s laws. Jeff Bezos must be so happy.
You can’t make this stuff up.