[내일배움캠프 14-5일] crawling 기능 임시 구현

Crawling 연습 구현 진행

import requests

from bs4 import BeautifulSoup

def fetch_baekjoon_problem_data():

# 백준 문제 리스트 페이지 URL

url = 'https://www.acmicpc.net/problemset' # 백준 문제 리스트 페이지를 실제로 사용 시 URL 수정 필요

response = requests.get(url)

html = response.text

soup = BeautifulSoup(html, 'html.parser')

# 문제 테이블 찾기

table = soup.find('table', {'class': 'table table-bordered table-striped'})

# 테이블이 없으면 빈 리스트 반환

if not table:

print("No table found.")

return []

rows = table.find_all('tr')[1:] # 첫 번째 행은 헤더이므로 제외

problem_list = []

for row in rows:

columns = row.find_all('td')

# 문제 정보 추출

if len(columns) >= 3: # 데이터가 제대로 있는지 확인

problem_id = columns[0].text.strip() # 문제 번호

title_element = columns[1].find('a') # 제목은 a 태그 안에 있음

if title_element:

title = title_element.text.strip() # 문제 제목

description = columns[2].text.strip() # 문제 설명

problem = {

'problem_id': problem_id,

'title': title,

'description': description

}

# 문제 정보를 출력하거나 저장

print(problem)

problem_list.append(problem)

return problem_list

if __name__ == '__main__':

# 백준 문제 데이터를 크롤링

problems = fetch_baekjoon_problem_data()

# 크롤링한 문제 데이터의 개수를 출력

print(f"Crawled {len(problems)} problems.")

url = 'https://www.acmicpc.net/problemset'

response = requests.get(url)

html = response.text

# HTML 전체 출력

print(html)

https://softwaree.tistory.com/74

파이썬으로 웹페이지 크롤링을 해보자. (1) Django 프로젝트 만들기

들어가며... 파이썬으로 웹페이지의 새로운 게시글을 크롤링하고, Django를 이용해서 DB에 저장하는 실습을 해보도록 하겠습니다. crontab을 이용해서 주기적으로 크롤링을 실행되도록 하고, 새로운

softwaree.tistory.com

'매일 TIL' 카테고리의 다른 글

[내일배움캠프 14-4일] 이메일 인증 gitignore 처리 (0)	2024.09.26
[내일배움캠프 14-3일] 이메일 인증, 비밀번호 확인 (0)	2024.09.25
[내일배움캠프 13-5일] django 심화과제 끝 (0)	2024.09.20
[내일배움캠프 13-4일] django 심화과제 (0)	2024.09.19
[내일배움캠프 12-5일] django 심화과제 (0)	2024.09.13

noaetoile 님의 블로그

[내일배움캠프 14-5일] crawling 기능 임시 구현

Crawling 연습 구현 진행

'매일 TIL' 카테고리의 다른 글

티스토리툴바

[내일배움캠프 14-5일] crawling 기능 임시 구현

Crawling 연습 구현 진행

'매일 TIL' 카테고리의 다른 글

'매일 TIL' Related Articles

티스토리툴바