On this tutorial, we reveal easy methods to leverage ScrapeGraph’s highly effective scraping instruments together with Gemini AI to automate the gathering, parsing, and evaluation of competitor info. By utilizing ScrapeGraph’s SmartScraperTool and MarkdownifyTool, customers can extract detailed insights from product choices, pricing methods, know-how stacks, and market presence immediately from competitor web sites. The tutorial then employs Gemini’s superior language mannequin to synthesize these disparate information factors into structured, actionable intelligence. All through the method, ScrapeGraph ensures that the uncooked extraction is each correct and scalable, permitting analysts to concentrate on strategic interpretation slightly than handbook information gathering.
%pip set up --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn
We quietly improve or set up the newest variations of important libraries, together with langchain-scrapegraph for superior internet scraping and langchain-google-genai for integrating Gemini AI, in addition to information evaluation instruments comparable to pandas, matplotlib, and seaborn, to make sure your atmosphere is prepared for seamless aggressive intelligence workflows.
import getpass
import os
import json
import pandas as pd
from typing import Record, Dict, Any
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
We import important Python libraries for organising a safe, data-driven pipeline: getpass and os handle passwords and atmosphere variables, json handles serialized information, and pandas gives sturdy DataFrame operations. The typing module offers kind hints for higher code readability, whereas datetime information timestamps. Lastly, matplotlib.pyplot and seaborn equip us with instruments for creating insightful visualizations.
if not os.environ.get("SGAI_API_KEY"):
os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:n")
if not os.environ.get("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:n")
We verify if the SGAI_API_KEY and GOOGLE_API_KEY atmosphere variables are already set; if not, the script securely prompts the consumer for his or her ScrapeGraph and Google (Gemini) API keys through getpass and shops them within the atmosphere for subsequent authenticated requests.
from langchain_scrapegraph.instruments import (
SmartScraperTool,
SearchScraperTool,
MarkdownifyTool,
GetCreditsTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain
from langchain_core.output_parsers import JsonOutputParser
smartscraper = SmartScraperTool()
searchscraper = SearchScraperTool()
markdownify = MarkdownifyTool()
credit = GetCreditsTool()
llm = ChatGoogleGenerativeAI(
mannequin="gemini-1.5-flash",
temperature=0.1,
convert_system_message_to_human=True
)
Right here, we import and instantiate ScrapeGraph instruments, the SmartScraperTool, SearchScraperTool, MarkdownifyTool, and GetCreditsTool, for extracting and processing internet information, then configure the ChatGoogleGenerativeAI with the “gemini-1.5-flash” mannequin (low temperature and human-readable system messages) to drive our evaluation. We additionally usher in ChatPromptTemplate, RunnableConfig, chain, and JsonOutputParser from langchain_core to construction prompts and parse mannequin outputs.
class CompetitiveAnalyzer:
def __init__(self):
self.outcomes = []
self.analysis_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
def scrape_competitor_data(self, url: str, company_name: str = None) -> Dict[str, Any]:
"""Scrape complete information from a competitor web site"""
extraction_prompt = """
Extract the next info from this web site:
1. Firm identify and tagline
2. Foremost merchandise/companies supplied
3. Pricing info (if out there)
4. Audience/market
5. Key options and advantages highlighted
6. Know-how stack talked about
7. Contact info
8. Social media presence
9. Current information or bulletins
10. Crew measurement indicators
11. Funding info (if talked about)
12. Buyer testimonials or case research
13. Partnership info
14. Geographic presence/markets served
Return the data in a structured JSON format with clear categorization.
If info will not be out there, mark as 'Not Accessible'.
"""
strive:
end result = smartscraper.invoke({
"user_prompt": extraction_prompt,
"website_url": url,
})
markdown_content = markdownify.invoke({"website_url": url})
competitor_data = {
"company_name": company_name or "Unknown",
"url": url,
"scraped_data": end result,
"markdown_length": len(markdown_content),
"analysis_date": self.analysis_timestamp,
"success": True,
"error": None
}
return competitor_data
besides Exception as e:
return {
"company_name": company_name or "Unknown",
"url": url,
"scraped_data": None,
"error": str(e),
"success": False,
"analysis_date": self.analysis_timestamp
}
def analyze_competitor_landscape(self, opponents: Record[Dict[str, str]]) -> Dict[str, Any]:
"""Analyze a number of opponents and generate insights"""
print(f"🔍 Beginning aggressive evaluation for {len(opponents)} firms...")
for i, competitor in enumerate(opponents, 1):
print(f"📊 Analyzing {competitor['name']} ({i}/{len(opponents)})...")
information = self.scrape_competitor_data(
competitor['url'],
competitor['name']
)
self.outcomes.append(information)
analysis_prompt = ChatPromptTemplate.from_messages([
("system", """
You are a senior business analyst specializing in competitive intelligence.
Analyze the scraped competitor data and provide comprehensive insights including:
1. Market positioning analysis
2. Pricing strategy comparison
3. Feature gap analysis
4. Target audience overlap
5. Technology differentiation
6. Market opportunities
7. Competitive threats
8. Strategic recommendations
Provide actionable insights in JSON format with clear categories and recommendations.
"""),
("human", "Analyze this competitive data: {competitor_data}")
])
clean_data = []
for end in self.outcomes:
if end result['success']:
clean_data.append({
'firm': end result['company_name'],
'url': end result['url'],
'information': end result['scraped_data']
})
analysis_chain = analysis_prompt | llm | JsonOutputParser()
strive:
competitive_analysis = analysis_chain.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
besides:
analysis_chain_text = analysis_prompt | llm
competitive_analysis = analysis_chain_text.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
return {
"evaluation": competitive_analysis,
"raw_data": self.outcomes,
"summary_stats": self.generate_summary_stats()
}
def generate_summary_stats(self) -> Dict[str, Any]:
"""Generate abstract statistics from the evaluation"""
successful_scrapes = sum(1 for r in self.outcomes if r['success'])
failed_scrapes = len(self.outcomes) - successful_scrapes
return {
"total_companies_analyzed": len(self.outcomes),
"successful_scrapes": successful_scrapes,
"failed_scrapes": failed_scrapes,
"success_rate": f"{(successful_scrapes/len(self.outcomes)*100):.1f}%" if self.outcomes else "0%",
"analysis_timestamp": self.analysis_timestamp
}
def export_results(self, filename: str = None):
"""Export outcomes to JSON and CSV information"""
if not filename:
filename = f"competitive_analysis_{datetime.now().strftime('%Ypercentmpercentd_percentHpercentMpercentS')}"
with open(f"{filename}.json", 'w') as f:
json.dump({
"outcomes": self.outcomes,
"abstract": self.generate_summary_stats()
}, f, indent=2)
df_data = []
for end in self.outcomes:
if end result['success']:
df_data.append({
'Firm': end result['company_name'],
'URL': end result['url'],
'Success': end result['success'],
'Data_Length': len(str(end result['scraped_data'])) if end result['scraped_data'] else 0,
'Analysis_Date': end result['analysis_date']
})
if df_data:
df = pd.DataFrame(df_data)
df.to_csv(f"{filename}.csv", index=False)
print(f"✅ Outcomes exported to {filename}.json and {filename}.csv")
The CompetitiveAnalyzer class orchestrates end-to-end competitor analysis, scraping detailed firm info utilizing ScrapeGraph instruments, compiling and cleansing the outcomes, after which leveraging Gemini AI to generate structured aggressive insights. It additionally tracks success charges and timestamps, and offers utility strategies to export each uncooked and summarized information into JSON and CSV codecs for straightforward downstream reporting and evaluation.
def run_ai_saas_analysis():
"""Run a complete evaluation of AI/SaaS opponents"""
analyzer = CompetitiveAnalyzer()
ai_saas_competitors = [
{"name": "OpenAI", "url": "https://openai.com"},
{"name": "Anthropic", "url": "https://anthropic.com"},
{"name": "Hugging Face", "url": "https://huggingface.co"},
{"name": "Cohere", "url": "https://cohere.ai"},
{"name": "Scale AI", "url": "https://scale.com"},
]
outcomes = analyzer.analyze_competitor_landscape(ai_saas_competitors)
print("n" + "="*80)
print("🎯 COMPETITIVE ANALYSIS RESULTS")
print("="*80)
print(f"n📊 Abstract Statistics:")
stats = outcomes['summary_stats']
for key, worth in stats.gadgets():
print(f" {key.substitute('_', ' ').title()}: {worth}")
print(f"n🔍 Strategic Evaluation:")
if isinstance(outcomes['analysis'], dict):
for part, content material in outcomes['analysis'].gadgets():
print(f"n {part.substitute('_', ' ').title()}:")
if isinstance(content material, listing):
for merchandise in content material:
print(f" • {merchandise}")
else:
print(f" {content material}")
else:
print(outcomes['analysis'])
analyzer.export_results("ai_saas_competitive_analysis")
return outcomes
The above perform initiates the aggressive evaluation by instantiating CompetitiveAnalyzer and defining the important thing AI/SaaS gamers to be evaluated. It then runs the complete scraping-and-insights workflow, prints formatted abstract statistics and strategic findings, and at last exports the detailed outcomes to JSON and CSV for additional use.
def run_ecommerce_analysis():
"""Analyze e-commerce platform opponents"""
analyzer = CompetitiveAnalyzer()
ecommerce_competitors = [
{"name": "Shopify", "url": "https://shopify.com"},
{"name": "WooCommerce", "url": "https://woocommerce.com"},
{"name": "BigCommerce", "url": "https://bigcommerce.com"},
{"name": "Magento", "url": "https://magento.com"},
]
outcomes = analyzer.analyze_competitor_landscape(ecommerce_competitors)
analyzer.export_results("ecommerce_competitive_analysis")
return outcomes
The above perform units up a CompetitiveAnalyzer to judge main e-commerce platforms by scraping particulars from every website, producing strategic insights, after which exporting the findings to each JSON and CSV information underneath the identify “ecommerce_competitive_analysis.”
@chain
def social_media_monitoring_chain(company_urls: Record[str], config: RunnableConfig):
"""Monitor social media presence and engagement methods of opponents"""
social_media_prompt = ChatPromptTemplate.from_messages([
("system", """
You are a social media strategist. Analyze the social media presence and strategies
of these companies. Focus on:
1. Platform presence (LinkedIn, Twitter, Instagram, etc.)
2. Content strategy patterns
3. Engagement tactics
4. Community building approaches
5. Brand voice and messaging
6. Posting frequency and timing
Provide actionable insights for improving social media strategy.
"""),
("human", "Analyze social media data for: {urls}")
])
social_data = []
for url in company_urls:
strive:
end result = smartscraper.invoke({
"user_prompt": "Extract all social media hyperlinks, group engagement options, and social proof components",
"website_url": url,
})
social_data.append({"url": url, "social_data": end result})
besides Exception as e:
social_data.append({"url": url, "error": str(e)})
chain = social_media_prompt | llm
evaluation = chain.invoke({"urls": json.dumps(social_data, indent=2)}, config=config)
return {
"social_analysis": evaluation,
"raw_social_data": social_data
}
Right here, this chained perform defines a pipeline to collect and analyze opponents’ social media footprints: it makes use of ScrapeGraph’s sensible scraper to extract social media hyperlinks and engagement components, then feeds that information into Gemini with a centered immediate on presence, content material technique, and group ways. Lastly, it returns each the uncooked scraped info and the AI-generated, actionable social media insights in a single structured output.
def check_credits():
"""Test out there credit"""
strive:
credits_info = credit.invoke({})
print(f"💳 Accessible Credit: {credits_info}")
return credits_info
besides Exception as e:
print(f"⚠️ Couldn't verify credit: {e}")
return None
The above perform calls the GetCreditsTool to retrieve and show your out there ScrapeGraph/Gemini API credit, printing the end result or a warning if the verify fails, and returns the credit score info (or None on error).
if __name__ == "__main__":
print("🚀 Superior Aggressive Evaluation Software with Gemini AI")
print("="*60)
check_credits()
print("n🤖 Operating AI/SaaS Aggressive Evaluation...")
ai_results = run_ai_saas_analysis()
run_additional = enter("n❓ Run e-commerce evaluation as nicely? (y/n): ").decrease().strip()
if run_additional == 'y':
print("n🛒 Operating E-commerce Platform Evaluation...")
ecom_results = run_ecommerce_analysis()
print("n✨ Evaluation full! Test the exported information for detailed outcomes.")
Lastly, the final code piece serves because the script’s entry level: it prints a header, checks API credit, then kicks off the AI/SaaS competitor evaluation (and optionally e-commerce evaluation) earlier than signaling that each one outcomes have been exported.
In conclusion, integrating ScrapeGraph’s scraping capabilities with Gemini AI transforms a historically time-consuming aggressive intelligence workflow into an environment friendly, repeatable pipeline. ScrapeGraph handles the heavy lifting of fetching and normalizing web-based info, whereas Gemini’s language understanding turns that uncooked information into high-level strategic suggestions. Consequently, companies can quickly assess market positioning, determine characteristic gaps, and uncover rising alternatives with minimal handbook intervention. By automating these steps, customers achieve pace and consistency, in addition to the flexibleness to increase their evaluation to new opponents or markets as wanted.
Take a look at the Pocket book on GitHub. All credit score for this analysis goes to the researchers of this undertaking. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 95k+ ML SubReddit and Subscribe to our Publication.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.