News

How do i find one tag with multiple text Content with Beautifulsoup

You are running soup.find, not soup.find_all. find_all will return a list of all p‘s. You cannot run text.strip() on a list, so let’s wrap it in a list comprehension that does it for all independent items:

soup = BeautifulSoup(r.text,'html5lib')
content = [i.text.strip() for i in soup.find_all('p')]

now content is a list of strings. If you want to turn this list into a single string you can run:

content = ' '.join(content)

About the image, soup.find_all('img') will also return a list of images. If you’re sure there is only one image in the html You can run soup.find('img'). Or you can gat the image link with soup.find('img')['src'].

This makes:

kompas = requests.get(‘https://url_on_html.com/’)
beautify = BeautifulSoup(kompas.content,’html5lib’)

news = beautify.find_all(‘div’, {‘class’,’jeg_block_container’})
arti = []

for each in news:
title = each.find(‘h3’, {‘class’,’jeg_post_title’}).text
lnk = each.a.get(‘href’)
r = requests.get(lnk)
soup = BeautifulSoup(r.text,’html5lib’)
content = [i.text.strip() for i in soup.find_all(‘p’)]
content = ‘ ‘.join(content)
images = soup.find(‘img’)[‘src’]

arti.append({
    'Headline': title,
    'Link': lnk,
    'image': images,
    'content': content
    })

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.