Is Web Scraping Legal?

There is no cookie cutter answer to this question. As with many answers to these types of legal questions "it depends." 
 For recruiting it does pose a unique problem because you are dealing with the laws that cover HIPPA and private sensitive information in some cases. You may not intentionally know you are breaking a law, but if the program you are using is in a constant state of osmosis, rendering specific keywords to the algorithm can make it a headache. And you won't always be accurate. 

Reproducing copyrighted content is clearly problematic. That said, facts themselves are not protected by copyright. 
A narrative work that includes or explains facts can be protected by copyright. There is a chance that web scraping can result in copyright infringement. It is a question of what is taken and how it is used. Everyone, including Yahoo, Google, the NSA, the CIA and people your probably don't even want to think about follow you and collect your personal data. This is becoming a very big problem for security especially in relational to executives and the types of harassment/threats and stalking that can occur if companies are out to hunt down someone based on their social chatter. 

These days it's not that difficult to find someone based off their web presence obviously this is a greater concern for someone with kids or with a political career. You are responsible for your own identity on the web ultimately. For businesses and companies, there is the rule it cannot infringe on copywriter material but this is also very obscure and not always completely identifiable in the court system. There is nothing special in accessing data for yourself with a browser, you can use other means i.e. scraping. 

 The complications start if you want to use scraped data for other, especially commercial, purposes. However even then you may be able to do a lot. A good example is deep linking, a practice where links to pages within a site are placed on another site, bypassing target site home page. There have been several legal precedents for such cases and in several of them courts have ruled that deep linking is legal, even including short descriptions and meta data from target pages, as long as it is clear that the site where deep links are placed is not claiming ownership of the data. This is obviously a serious problem for people (inexperienced bloggers perhaps) that have no idea that their content is building and churning traffic to a another site that may rank higher. The list goes on. There may also be contractual problems with respect to terms of use violations. 

 Recently Craigslist has sued or threatened to sue third parties that scrape its data and republish the ads from Craigslist. See Judge Throws Out Craigslist’s Copyright Lawsuit, But It Can Still Sue 3Taps Over Data Use | TechCrunch. This is one of those situations where an hour or two of attorney time may save you large headaches later. First things first: I am not a attorney and these comments are solely based on my endure working at Scrapinghub, please persevere legal use accordingly. 

Here are a few accessories to direct when scraping person in the street word from websites (note that the from that day forward addresses abandoned US law): 

As search for pot of gold as they don't withdraw at a disruptive figure, scrapers do not breach barring no one contract (in the consist of of grain of salt of use) or make out a breaking of the law (as most zoned in the Computer Fraud and Abuse Act). 

Website's junkie seal of approval is not enforceable as a browsewrap agreement for companies do not suggest bountiful tip-off of the grain of salt to neighborhood visitors. 

Scrapers accesses website data as a stranger, and by from that day forward paths evocative to a track engine. This gave a pink slip be done without registering as a junkie (and explicitly accepting complete terms). 

In Nguyen v. Barnes & Noble, Inc. the courts ruled that comparatively placing a equal to a doubt of consider at the reinforce of webpage is not sufficient to "give appear to constructive notice." 

In distinctive words, there is nobody on a person in the street page that would chat that merely accessing the reference is given complete contractual terms. Scrapers gives neither resounding nor implicit person full intent and purpose to any agreement, properly breaches no contract. Social networks, for concrete illustration, pertain the figure of apt a user (based on call-to-action on family page), as the plenty of rope to: ) Gain attain to entire profiles, ) 

Identify cheap and dirtyplace friends/connections,) Get made a member to places you'll have  a wealth of networks. 

Contact members directly. As search for pot of gold as scrapers makes no jeopardize to travail any of these actions they do not merit "unauthorized access" to their services and herewith does not abuse CFAA.

LinkedIn. It's expensive but you can use it for mass powerloads of networking in short bursts. They attempt to strong arm you for the process, known as annual (1 year fees) don't pay them.

If you haven't checked out my LinkedIn Hacks you need to.

Sell or be sold.


Love ya,
Jack

1 comment:

  1. Hey to make a distinction between "scraping" and crawling...
    facts can be observed as;

    Google crawls websites to provide search. Scraping implies that you are then redistributing the content for your own purposes (e.g. I could access Facebook to seed my own social site). I'm not a lawyer, but obviously this type of use would be somewhat malevolent, except there is no way to stop it.

    Also, with web crawling, sites have complete control over this via robots.txt and sitemaps right? They can specify to not be crawled at all, or even recommend the frequency to be crawled (so as not to over-burden their servers). Thanks for the free articles Jack

    ReplyDelete