Criteria	Points
Check if the agent considered at least three surf spot options	1
Check if the agent gathered wind forecasts for each surf spot being evaluated.	1
Check if the agent used any web search tools to explore which surf spots should be considered	1

🏄 Surf Spot Finder

Find the best surfing spots based on your location and preferences! Github Repo

👈 Configure your search parameters in the sidebar and click Run to start!

🛠️ Available Tools

The AI Agent built for this project has a few tools available for use in order to find the perfect surf spot. The agent is given the freedom to use (or not use) these tools in order to accomplish the task.

🌤️ get_wave_forecast

Fetches the wave forecast for a given latitude and longitude. Uses the Open-Meteo Marine Weather API. It provides data like wave height, direction, and period.

🌤️ get_wind_forecast

Fetches the wind forecast for a given latitude and longitude. Uses the Open-Meteo Forecast API. It provides data like wind speed, direction, and gusts.

📍 get_area_lat_lon

Gets the latitude and longitude for a given area name. Uses the Nominatim API.

🌐 search_web

Search the web for information. Returns a list of snippets from the search results.

🌐 visit_webpage

Visit a webpage and extract its main content.

▲ Some tools may not be listed depending on configuration. Please check the code for more details.

📊 Custom Evaluation

The Surf Spot Finder includes a powerful evaluation system that allows you to customize how the agent's performance is assessed. You can find these settings in the sidebar under the "Custom Evaluation" expander.

Learn more about Custom Evaluation

What is Custom Evaluation?

The Custom Evaluation feature uses an LLM-as-a-Judge approach to evaluate how well the agent performs its task. An LLM will be given the complete agent trace (not just the final answer), and will assess the agent's performance based on the criteria you set. You can customize:

Evaluation Model: Choose which LLM should act as the judge
Evaluation Criteria: Define specific checkpoints that the agent should meet
Scoring System: Assign points to each criterion

How to Use Custom Evaluation

1. Select an Evaluation Model: Choose which LLM you want to use as the judge
2. Edit Checkpoints: Use the data editor to:

Add new evaluation criteria
Modify existing criteria
Adjust point values
Remove criteria you don't want to evaluate

Example Criteria

You can evaluate things like:

Tool usage and success
Order of operations
Quality of final recommendations
Response completeness
Number of steps taken