Scraping expert needed to develop web scraping engine with Wordpress insert - Upwork
Būsena | UŽDARYTA |
Biudžetas | 501-1000 Eur |
Sukurta: | 2019-05-25 |
Baigiasi: | 2019-06-01 |
Siūlo: | Nėra |
Apibūdinimas: | Need an expert to develop an engine that I can extend which accomplishes the following, Javascript/AJAX content is unlikely, the source sites are simple. 1) Scraping - Crawls predefined urls to identify child listings that will be scraped - Extract data from both the parent and child using CSS and xpath, combining them into a single listing item - Download a single image from the child listing - Potentially push the image to a host via ftp - Cleanse and normalize the data - Dynamically match the listing to an existing array of make and model taxonomy using fuzzy string matching to set the listing's categories - Scheduled - Output to JSON feed(s) 2) Importing to Wordpress - Access JSON feed(s) above - Utilize native Wordpress functions to insert and update custom post with custom taxonomy - Insert where item is new, update where last modified date is in the future, designate as "No longer available" where item is no longer listed within feed Data Fields for extraction, source = root domain location = if URL matches predefined patern link = listing url linktext = ahref text of listing url title = listing title make = Vehicle manufacturer. Listing title will need to be parsed to match against existing make list. model = Vehicle model. Listing title will need to be parsed to match against existing model list. stocknumber = straightforward, a uid on the source transmission = text field mileage = text field price = text field currency = determined by pre-set source data image = jpg image typically, 1 content = the listings description contact = an email or phone number, pattern established on a per source basis Budget: $225 Posted On: May 25, 2019 03:00 UTC Category: Data Science & Analytics > Data Extraction / ETL Skills: Data Scraping, PHP, Scrapy, Scrapy, Web Scraping Country: Canada click to apply |
Darbo Tipas(ai): |
|
Duomenų Bazė: | |
Operacinė Sistema: | Linux |
Siūlymų Skaičius: | 0 |