跳转至内容
  • 社区首页
  • 版块
  • 最新
  • 标签
  • 热门
折叠

GitHub中文论坛

96Jennifer9

96Jennifer

@96Jennifer
关于
帖子
14
主题
7
分享
0
群组
0
粉丝
0
关注
0

帖子

最新 最佳 有争议的

  • 各位大佬都是做的哪里的python爬虫实例呀?
    96Jennifer9 96Jennifer

    想问问各位怎么保持自己爬虫实例一直不退,而且还一直进步的?你们都是做的github上的实战项目吗?可以分享给我一些经验和好的实战项目吗?非常感谢😊

    综合交流

  • 定时爬取天气到mysql
    96Jennifer9 96Jennifer

    程序不报错,但是也爬不出来东西

    import requests
    import pymysql
    import datetime
    import re
    import pandas as pd
    import time
    from pypinyin import lazy_pinyin
    from bs4 import BeautifulSoup
    
    
    def run():
        db = pymysql.connect(host='localhost', user='root', password='789456', db='test', charset='utf8mb4')
        cursor = db.cursor()
        sql_insert = 'INSERT INTO new_d(date, city, temp,low,top, quality, wind) ' \
                     'VALUES (%s, %s,%s, %s, %s, %s, %s)'
        #读取需要爬取的城市名单
        city = pd.read_excel('D:\city.xls')['城市']
        late_url = ['/15']
        for j in range(0, len(city)):
            try:
                # 将城市的中文转换成拼音
                word = ''.join(lazy_pinyin(city[j]))
                print('正在查询%s的天气预报' % str(city[j]))
                base_url = 'https://www.tianqi.com/'
                # 拼接三段形成url
                url = base_url + word + late_url
                headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}
                response = requests.get(url, headers=headers).text
                # 用BeautifulSoup解析网页
                soup = BeautifulSoup(response, 'lxml')
                # 定位未来15天天气的数据
                future_list = str(soup.find_all("div", {"class": "box_day"}))
                # 正则匹配时间
                date_list = re.findall(r'<h3><b>(.*?)</b>', future_list)
                temp_list = re.findall(r'<li class="temp">(.*?)</b>', future_list)
                quality_list = re.findall(r'空气质量:(.*?)">', future_list)
                wind_list = re.findall(r'<li>(.*?)</li>', future_list)
                print(quality_list)
                for n in range(0, len(date_list)):
                    date = date_list[n]
                    temp = temp_list[n]
                    quality = quality_list[n]
                    wind = wind_list[n]
                    fir_temp_list = temp.split(' ')[0]
                    sec_weather_list = temp.split(' ')[1]
                    # print(sec_weather_list)
                    new_list = sec_weather_list.split('~<b>')
                    low = new_list[0]
                    top = new_list[1]
                    print(date, fir_temp_list, low, top, quality, wind)
    
                    # print(temp.split(' ~ ')[1])
                    # print(city[j], today[0], weather, low_temp, top_temp, real_shidu, fengxiang[0], ziwaixian[0])
                    cursor.execute(sql_insert, (date, city[j], fir_temp_list, low, top,
                                                quality, wind))
                    db.commit()
                time.sleep(1)
            except:
                pass
        db.close()
    if __name__ == '__main__':
        #通过每隔20秒检测一遍是否到达时间
        while 1:
            now_h = datetime.datetime.now().hour
            now_m = datetime.datetime.now().minute
            if now_h == 9 and now_m == 30:
                run()
            else:
                time.sleep(2)
    
    

    3421c7d8-f9ec-4477-bac3-5376e2e23502-image.png

    这下面是城市名的爬取

    import requests
    from bs4 import BeautifulSoup
    url = "https://www.tianqi.com/chinacity.html"
    header = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"}
    res = requests.get(url=url, headers=header)
    html = res.text.encode('iso-8859-1')
    soup = BeautifulSoup(html, "html.parser", from_encoding="utf-8")
            # 获取所以的a
    city = soup.find('div', class_="citybox")
    a = city.find_all('a')
    f = open('city.txt', 'w', encoding='utf-8')
    for di in a:
        text = di.get_text()
        print(text)
        f.write(text + '\n')
    
    
    
    技术交流

  • 如何读取json格式的内容,报错TypeError
    96Jennifer9 96Jennifer

    @k1995 你真聪明,非常感谢😆

    综合交流

  • 如何读取json格式的内容,报错TypeError
    96Jennifer9 96Jennifer

    @mango 应该是怎么样的?但是我照着的别人视频一样的代码就能出来结果

    综合交流

  • 如何读取json格式的内容,报错TypeError
    96Jennifer9 96Jennifer

    找到了动态网页的内容,如何提取json字典表里面所需的内容?

    import re
    import json
    import requests
    headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}
    for i in range(1,2):
        param = {
            'cb': 'jQuery112408933494064226006_1636169759698',
            'pn': f'{i}',
            'pz': '20',
            'po': '1',
            'np': '1',
            'ut': 'bd1d9ddb04089700cf9c27f6f7426281',
            'fltt': '2',
            'invt': '2',
            'fid': 'f3',
            'fs': 'm:0 t:80',
            'fields': 'f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152',
            '_': '1636169759852'
        }
        url = "http://90.push2.eastmoney.com/api/qt/clist/get"
        r = requests.get(url, params=param, headers=headers)
        #r=str(r)
        print(r.text)
        obj=re.search(r"data:(.*)",r.text)
        content=obj.group().replace("data:", "")
        print(content)
        # json_str = json.dumps(param)
        # print("JSON 对象:", json_str)
     
    

    代码报错说TypeError
    跟着别人视频写的代码,人家都没报错,不知道我这是哪里出毛病了?

    8328f1d7-f077-47f3-b744-9dde8e49f90d-image.png

    综合交流

  • 如何MATLAB中输入这种积分公式?
    96Jennifer9 96Jennifer

    1b9f3915-3759-40c4-8fb2-62e1590ab270-image.png
    谢谢各位

    综合交流

  • 关于动态爬取json格式文件
    96Jennifer9 96Jennifer

    @96jennifer 原因找到了
    因为param中批量加入引号,引号与内容有空格,
    更新代码

    import requests
    
    headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}
    for i in range(1,55):
        param = {
            'cb': 'jQuery112408933494064226006_1636169759698',
            'pn': f'{i}',
            'pz': '20',
            'po': '1',
            'np': '1',
            'ut': 'bd1d9ddb04089700cf9c27f6f7426281',
            'fltt': '2',
            'invt': '2',
            'fid': 'f3',
            'fs': 'm:0 t:80',
            'fields': 'f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152',
            '_': '1636169759852'
        }
        url = "http://90.push2.eastmoney.com/api/qt/clist/get"
        r = requests.get(url, params=param, headers=headers)
        print(r.text)
    
    综合交流

  • 怎么解决pycharm爬取天气预报存入mysql总是重复存入数据?
    96Jennifer9 96Jennifer

    @k1995 好的,谢谢
    😊

    技术交流

  • 关于动态爬取json格式文件
    96Jennifer9 96Jennifer

    请问朋友们知道为什么打印text显示jQuery,里面的data为null?
    我跟着视频做的博主data里面都是字典表内容,这个是我的

    import requests
    
    headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}
    for i in range(1,2):
        param = {
            'cb': ' jQuery112408933494064226006_1636169759698',
            'pn': f' {i}',
            'pz': ' 20',
            'po': ' 1',
            'np': ' 1',
            'ut': ' bd1d9ddb04089700cf9c27f6f7426281',
            'fltt': ' 2',
            'invt': ' 2',
            'fid': ' f3',
            'fs': ' m:0 t:80',
            'fields': ' f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152',
            '_': '1636169759852'
        }
        url = "http://49.push2.eastmoney.com/api/qt/clist/get"
        r = requests.get(url, params=param, headers=headers)
        print(r.text)
    

    8f83f02c-8850-4ea5-97ee-a335689b1151-image.png
    不知道哪里有问题?求解答。网址为“东方财富网”

    综合交流

  • 怎么解决pycharm爬取天气预报存入mysql总是重复存入数据?
    96Jennifer9 96Jennifer

    @k1995 mysql会显示duplicate entry

    还有一个问题请问这代码怎么设置一个简hui单的每天某一时间爬取呀,或者不停爬,不用手点击

    技术交流

  • 怎么解决pycharm爬取天气预报存入mysql总是重复存入数据?
    96Jennifer9 96Jennifer

    @k1995 上面那个alter 在mysql里面报错

    技术交流

  • 怎么解决pycharm爬取天气预报存入mysql总是重复存入数据?
    96Jennifer9 96Jennifer

    @k1995 写selenium是因为我不会爬取那个空气质量,是动态网页不会爬

    技术交流

  • 这论坛怎么看自己的问题呀
    96Jennifer9 96Jennifer

    不知道我的问题有没有人回复

    综合交流

  • 怎么解决pycharm爬取天气预报存入mysql总是重复存入数据?
    96Jennifer9 96Jennifer

    a6c24e19-bf32-473e-bb90-7a59a0132198-image.png
    52eb6dd7-234d-420f-9a47-5a6026d26404-image.png
    29日是爬取的最后一天数据,存完了还是从爬取当天25日又存了两遍同样的数据,这是怎么回事呀?
    能不能从pycharm中写代码抑制它存入2遍或者多遍呀?后面附了代码,(还有有没有大神能教一个简单实用的不间断爬取的函数呀import time然后定义一个函数def round_time():...这种

    import re
    import time
    import requests
    import pymysql
    from bs4 import BeautifulSoup
    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    import datetime
    conn = pymysql.connect(host='localhost', user='root', passwd='789456', db='test', port=3306, charset='utf8')
    cursor=conn.cursor()
    
    url = 'https://tianqi.2345.com/'
    html = requests.get(url).text
    Pattern = re.compile('{"temp":(.*?)}')
    datas = re.findall(Pattern, html)
    # fd = open('weather_data.txt', 'w', encoding='utf8')
    # fd.write('日期,时间,温度,天气,风向,风级,空气质量\n')
    
    url = 'https://tianqi.2345.com/'
    service=Service('C:\Program Files\Google\Chrome\Application\chromedriver_win32\chromedriver.exe')
    browser=webdriver.Chrome(service=service)
    browser.get(url)
    soup=BeautifulSoup(browser.page_source,'lxml')
    data_quality=soup.find('div','banner-right-canvas-kq-i clearfix').find_all('i')
    print('列表信息')
    print(data_quality)
    for num in data_quality:
        quality=num.get_text()
        for line in datas:
            data = '"temp":' + line.encode('utf-8').decode('unicode_escape')
            tmp = re.findall('"temp":"(.*?)"', data)
            weather = re.findall('"weather":"(.*?)"', data)
            day = re.findall('"day":"(.*?)"', data)
            tm = re.findall('"time_origin_text":"(.*?)"', data)
            wind_direction = re.findall('"wind_direction":"(.*?)"', data)
            wind_level = re.findall('"wind_level":"(.*?)"', data)
            print(day[0], tm[0], tmp[0] + '°', weather[0], wind_direction[0], wind_level[0], quality[0])
            # fd.write('{},{},{},{},{},{}\n'.format(day[0], time[0], tmp[0]+'°', weather[0], wind_direction[0], wind_level[0]))
            # fd.close()
            sql = "INSERT INTO mytable(day,tm,temp,weather,wind,wscale,quality) VALUES ('%s','%s','%s','%s','%s','%s','%s')" % (
            day[0], tm[0], tmp[0] + '°', weather[0], wind_direction[0], wind_level[0], quality[0])
            #cursor.execute(sql)
            cursor.execute(sql)
    conn.commit()
    conn.close()
    
    
    
    
    
    
    
    
    
    技术交流
  • 登录

  • 第一个帖子
    最后一个帖子
0
  • 社区首页
  • 版块
  • 最新
  • 标签
  • 热门