Beautiful soup get span content

I have parsed html page: using beautifulsoup

badges = soup.body.find('div', attrs={'class': 'col-md-11'})

after this my badges object looks like this:

<div class="col-md-11">
   <h4>
      <span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
      <font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
      <span style="color:green;font-weight:bold;"> [activ]</span>
   </h4>
   <p>
      <span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal in Baroul Dolj, adresa: mun.Craiova, str.Mihail kogalniceanu, nr.16, jud.Dolj, tel.
   </p>
   <p>
      <span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
      <span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
   </p>
</div>

Now I want to extract NEDELCU Paul-Iulian, Baroul Dolj, [activ], Sediu principal in Baroul Dolj, adresa: mun.Craiova, str.Mihail kogalniceanu, nr.16, jud.Dolj, tel. and paul_iulyan@yahoo.com

I tried to use badges.span.span but that doesn't work.

python,html,web-scraping,beautifulsoup,

0

Ответов: 2


2 accepted

Using soup.find

Demo:

from bs4 import BeautifulSoup
s = """<div class="col-md-11">
   <h4>
      <span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
      <font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
      <span style="color:green;font-weight:bold;"> [activ]</span>
   </h4>
   <p>
      <span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal in Baroul Dolj, adresa: mun.Craiova, str.Mihail kogalniceanu, nr.16, jud.Dolj, tel.
   </p>
   <p>
      <span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
      <span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
   </p>
</div>"""

soup = BeautifulSoup(s, "html.parser")
val = soup.find("font", {"style":"font-weight:bold;"})
print( "{} {}".format(val.text, val.next_sibling ).strip() )
print( soup.find("span", {"style":"color:green;font-weight:bold;"}).text.strip() )
print( soup.find("span", class_="fas fa-map-marker text-red padding-right-sm").next_sibling.strip() )
print( soup.find("span", class_="text-nowrap").text.strip() )

Output:

NEDELCU Paul-Iulian , Baroul Dolj
[activ]
Sediu principal in Baroul Dolj, adresa: mun.Craiova, str.Mihail kogalniceanu, nr.16, jud.Dolj, tel.
paul_iulyan@yahoo.com

1

Optimized solution with single soup.select method:

for el in badges.select('h4 font, h4 span:nth-of-type(3), p:nth-of-type(1), p:nth-of-type(2) > span.text-nowrap'):
    if el.name == 'font':
        result.extend([el.text.strip(), el.nextSibling.strip()])
    else:
        result.append(el.text.strip())

print(result)

The output (formatted):

['NEDELCU Paul-Iulian',
 ', Baroul Dolj',
 '[activ]',
 'Sediu principal in Baroul Dolj, adresa: mun.Craiova, str.Mihail kogalniceanu, nr.16, jud.Dolj, tel.',
 'paul_iulyan@yahoo.com']
python,html,web-scraping,beautifulsoup,
Похожие вопросы
Яндекс.Метрика