サイトを Python に「スクレイピング」する方法は多数あります。 特に強力な方法の 1 つは、Pandas の read_html メソッドです。 このビデオでは、それを使用してデータを読み取り、必要に応じて操作する方法を紹介します。 Python とソフトウェア エンジニアリングに関する無料の週刊記事については、次の URL にある私の Better developers ニュースレターをご覧ください。

Scraping HTML tables into Pandas with read_html
6 thoughts on "Scraping HTML tables into Pandas with read_html

  1. Carlos Franchy says:

    Hello! First at all nice video! Im working in a project where its very usefull to use pd.read_htlm. The problem i have is there are some data that are png, as the flag in your example. Is there anyways to convert this png into arrays? Ty!

  2. Siddharth Siddhu says:

    Hello Mr. Lerner, This is Siddharth. I am using read_html for my python coding to read html files. It seems quite powerful in its working. Although, for html files of size less than 2 mb of data, it takes few seconds to run this command. But for large files of size such as 5 mb or more, it takes about half an hour for me for running the read_html command. Could you please suggest how to do about to read large html files in a quicker way?

  3. ye says:

    Thanks a lot sir. I tried before with predetermined link on my online course, it always said key error, but when i tried with another url, it works. How could this happen?

  4. Mandar Raut says:

    Hi Reuvin, This was helpful, Thankyou
    But i need some more help ,For eg.i have a set of links of same website amd i am trying to get html tables(Specification table ) but the issue here is that i am able to save html tables for each product that means if i have 20 links than i am saving 20 different excel files
    What i want is that if we can save all html tables into 1 excel and as we are saving specifications tables most of the time it may have same headers and different value. So whenever we scrape tables its values should append below each other as per specific header and if we find a new header it should append into headers and add its value under it.
    Please help me with this. I am unable to do so


