Beautiful

Python lxml/beautiful soup 查找一个网页上的所有链接

我正在编写一个脚本来读取网页，并建立一个符合特定条件的链接数据库。现在我坚持使用lxml并了解如何获取所有来自html...result=self._openurl(self.mainurl)content=result.read()html=lxml.html.fromstring(content)printlxml.html.find_rel_links(html,'href') 最佳答案使用XPath。类似的东西(无法从这里测试):urls=html.xpath('//a/@href')

网站的Python正确编码(Beautiful Soup)

我正在尝试加载html页面并输出文本，即使我正确获取网页，BeautifulSoup以某种方式破坏了编码。来源:#-*-coding:utf-8-*-importrequestsfromBeautifulSoupimportBeautifulSoupurl="http://www.columbia.edu/~fdc/utf8/"r=requests.get(url)encodedText=r.text.encode("utf-8")soup=BeautifulSoup(encodedText)text=str(soup.findAll(text=True))printtext.deco

Beautiful Python code BeautifulSoup strong encoding utf-8 mojibake

python - 使用 Beautiful Soup 在 python 中解析网页

我在从网站获取数据时遇到了一些麻烦。网站来源在这里:view-source:http://release24.pl/wpis/23714/%22La+mer+a+boire%22+%282011%29+FRENCH.DVDRip.XviD-AYMO有这样的东西:INFORMACJEOFILMIETytuł............................................:LameràboireOcena.............................................:IMDB-6.3/10(24)Produkcja............

python Beautiful 39 code gt beautifulsoup urllib

python - 使用 Beautiful Soup 从 'src' 标签中提取 'img' 属性

考虑:我想使用BeautifulSoup从图像(即img)标签中提取源(即src)属性。我使用BeautifulSoup4，我无法使用a.attrs['src']获取src，但我可以获取href。我该怎么办？最佳答案您可以使用BeautifulSoup提取HTMLimg标签的src属性。在我的示例中，htmlText包含img标记本身，但这也可以用于URL，以及urllib2。对于URLfromBeautifulSoupimportBeautifulSoupasBSHTMLimporturllib2page=urllib2.ur

amp 39 code strong python regex beautifulsoup

Python 使用 Selenium 和 Beautiful Soup 抓取 JavaScript

我正在尝试使用BS和Selenium抓取启用JavaScript的页面。到目前为止，我有以下代码。它仍然没有以某种方式检测到JavaScript(并返回空值)。在这种情况下，我试图在底部抓取Facebook评论。(检查元素将类显示为postText)感谢您的帮助!fromseleniumimportwebdriverfromselenium.common.exceptionsimportNoSuchElementExceptionfromselenium.webdriver.common.keysimportKeysimportBeautifulSoupbrowser=webdrive

JavaScript Beautiful section browser import python selenium screen-scraping beautifulsoup

python - 无法导入 Beautiful Soup

我正在尝试使用BeautifulSoup，尽管使用了import语句:从bs4导入BeautifulSoup我收到错误:ImportError:cannotimportnameBeautifulSoupimportbs4没有给出任何错误。我也试过importbs4.BeautifulSoup并只导入bs4并创建一个BeautifulSoup对象:bs4.BeautifulSoup()如有任何指导，我们将不胜感激。最佳答案问题是我将文件命名为HTMLParser.py，并且该名称已在bs4模块中的某处使用。感谢所有提供帮助的人!

Beautiful python code section BeautifulSoup

python - Beautiful Soup Unicode 编码错误

我正在尝试使用特定的HTML文件编写以下代码fromBeautifulSoupimportBeautifulSoupimportreimportcodecsimportsysf=open('test1.html')html=f.read()soup=BeautifulSoup(html)body=soup.body.contentspara=soup.findAll('p')printstr(para).encode('utf-8')我收到以下错误:UnicodeEncodeError:'ascii'codeccan'tencodecharacteru'\u2019'inpositio

Beautiful Unicode code section 39 python beautifulsoup

python - 从已解析的 Beautiful Soup 列表中删除 <br> 标签？

我目前正在进入一个包含我想要的所有行的for循环:page=urllib2.urlopen(pageurl)soup=BeautifulSoup(page)tables=soup.find("td","bodyTd")forrowintables.findAll('tr'):在这一点上，我有我的信息，但是标签破坏了我的输出。删除这些最干净的方法是什么？最佳答案 foreinsoup.findAll('br'):e.extract() 关于python-从已解析的BeautifulSou

Beautiful amp section code pre python beautifulsoup html-parsing

python - 从已解析的 Beautiful Soup 列表中删除 <br> 标签？

我目前正在进入一个包含我想要的所有行的for循环:page=urllib2.urlopen(pageurl)soup=BeautifulSoup(page)tables=soup.find("td","bodyTd")forrowintables.findAll('tr'):在这一点上，我有我的信息，但是标签破坏了我的输出。删除这些最干净的方法是什么？最佳答案 foreinsoup.findAll('br'):e.extract() 关于python-从已解析的BeautifulSou

Beautiful amp section code pre python beautifulsoup html-parsing

python - 使用 Beautiful Soup 按类名获取内容

使用BeautifulSoup模块，如何获取类名为feeditemcontentcxfeeditemcontent的div标签的数据？是吗:soup.class['feeditemcontentcxfeeditemcontent']或:soup.find_all('class')这是HTML源代码:Theactualdataissomewherehere这是Python代码:fromBeautifulSoupimportBeautifulSouphtml_doc=open('home.jsp.html','r')soup=BeautifulSoup(html_doc)class="fe

类名 Beautiful code class section python beautifulsoup