Hire

.lt

Tiklso link

Scraping expert needed to develop web scraping engine with Wordpress insert - Upwork

Būsena UŽDARYTA
Biudžetas 501-1000 Eur
Sukurta: 2019-05-25
Baigiasi: 2019-06-01
Siūlo: Nėra
Apibūdinimas: Need an expert to develop an engine that I can extend which accomplishes the following,


Javascript/AJAX content is unlikely, the source sites are simple.


1) Scraping

- Crawls predefined urls to identify child listings that will be scraped

- Extract data from both the parent and child using CSS and xpath, combining them into a single listing item

- Download a single image from the child listing

- Potentially push the image to a host via ftp

- Cleanse and normalize the data

- Dynamically match the listing to an existing array of make and model taxonomy using fuzzy string matching to set the listing's categories

- Scheduled

- Output to JSON feed(s)


2) Importing to Wordpress

- Access JSON feed(s) above

- Utilize native Wordpress functions to insert and update custom post with custom taxonomy

- Insert where item is new, update where last modified date is in the future, designate as "No longer available" where item is no longer listed within feed



Data Fields for extraction,

source = root domain

location = if URL matches predefined patern

link = listing url

linktext = ahref text of listing url

title = listing title

make = Vehicle manufacturer. Listing title will need to be parsed to match against existing make list.

model = Vehicle model. Listing title will need to be parsed to match against existing model list.

stocknumber = straightforward, a uid on the source

transmission = text field

mileage = text field

price = text field

currency = determined by pre-set source data

image =  jpg image typically, 1

content = the listings description

contact = an email or phone number, pattern established on a per source basis

Budget: $225

Posted On: May 25, 2019 03:00 UTC
Category: Data Science & Analytics > Data Extraction / ETL

Skills: Data Scraping, PHP, Scrapy, Scrapy, Web Scraping
Country: Canada

click to apply


Darbo Tipas(ai):
  • PHP
  • CSS
Duomenų Bazė:
Operacinė Sistema: Linux
Siūlymų Skaičius: 0
Siūlosi Žinutės Kaina Trukmė Įvertinimas Informacija