Сайт динамичен, поэтому вы можете использовать selenium
:
from selenium import webdriver
import collections
from bs4 import BeautifulSoup as soup
import re
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://money.usnews.com/investing/stocks/stocks-under-10')
s = soup(d.page_source, 'lxml')
while True:
try:
d.find_element_by_link_text("Load More").click() #get all data
except:
break
company = collections.namedtuple('company', ['name', 'abbreviation', 'description', 'stats'])
headers = [['a', {'class':'search-result-link'}], ['a', {'class':'text-muted'}], ['p', {'class':'text-small show-for-medium-up ellipsis'}], ['dl', {'class':'inline-dl'}], ['span', {'class':'stock-trend'}], ['div', {'class':'flex-row'}]]
final_data = [[getattr(i.find(a, b), 'text', None) for a, b in headers] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'search-result flex-row'})]
new_data = [[i[0], i[1], re.sub('
+s{2,}', '', i[2]), [re.findall('[$w.%/]+', d) for d in i[3:]]] for i in final_data]
final_results = [i[:3]+[dict(zip(['Price', 'Daily Change', 'Percent Change'], filter(lambda x:re.findall('d', x), i[-1][0])))] for i in new_data]
new_results = [company(*i) for i in final_results]
Выход (первая компания):
company(name=u'Aileron Therapeutics Inc', abbreviation=u'ALRN', description=u'Aileron Therapeutics, Inc. is a clinical stage biopharmaceutical company, which focuses on developing and commercializing stapled peptides. Its ALRN-6924 product targets the tumor suppressor p53 for the treatment of a wide variety of cancers. It also offers the MDMX and MDM2. The company was founded by Gregory L. Verdine, Rosana Kapeller, Huw M. Nash, Joseph A. Yanchik III, and Loren David Walensky in June 2005 and is headquartered in Cambridge, MA.more
', stats={'Daily Change': u'$0.02', 'Price': u'$6.04', 'Percent Change': u'0.33%'})
Редактировать:
Все аббревиатуры:
abbrevs = [i.abbreviation for i in new_results]
Вывод:
[u'ALRN', u'HAIR', u'ONCY', u'EAST', u'CERC', u'ENPH', u'CASI', u'AMBO', u'CWBR', u'TRXC', u'NIHD', u'LGCY', u'MRNS', u'RFIL', u'AUTO', u'NEPT', u'ARQL', u'ITUS', u'SRAX', u'APTO']