Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions apps/chrome-extension/src/utils/eventOptimizer.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import Service from '@midscene/core';
import type { Rect, UIContext } from '@midscene/core';
import { ScreenshotItem } from '@midscene/core';
import type { RecordedEvent } from '@midscene/recorder';
import { globalModelConfigManager } from '@midscene/shared/env';
import { compositeElementInfoImg } from '@midscene/shared/img';
Expand Down Expand Up @@ -135,9 +136,13 @@ export const generateAIDescription = async (
const descriptionPromise = (async () => {
try {
const mockContext: UIContext = {
screenshotBase64: event.screenshotBefore as string,
size: { width: event.pageInfo.width, height: event.pageInfo.height },
};
screenshot: ScreenshotItem.create(event.screenshotBefore as string),
shotSize: {
width: event.pageInfo.width,
height: event.pageInfo.height,
},
shrunkShotToLogicalRatio: 1,
} as UIContext;

const service = new Service(mockContext);
const rect = extractRect(event);
Expand Down
2 changes: 1 addition & 1 deletion apps/report/src/components/detail-panel/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ const DetailPanel = (): JSX.Element => {
}

contextLocatorView =
highlightElements.length > 0 && activeTask.uiContext?.size ? (
highlightElements.length > 0 && activeTask.uiContext?.shotSize ? (
<ScreenshotDisplay
title={isPageContextFrozen ? 'UI Context (Frozen)' : 'UI Context'}
>
Expand Down
5 changes: 4 additions & 1 deletion apps/site/docs/en/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ All agents share these base options:
- Using Node.js: `npx serve`
- Using Python: `python -m http.server` or `python3 -m http.server`
Then access the report via `http://localhost:3000` (or the port shown in the terminal).
- `screenshotShrinkFactor: number`: Controls the scaling ratio of screenshots to reduce the image size sent to the AI model, thereby reducing token consumption. The default value is 1 (no scaling). If set to 2, the width and height of the screenshot will be halved, and the area will be reduced to a quarter of the original. You can adjust this value based on your actual situation to find the best balance between image clarity and token consumption.
- For mobile devices, setting `screenshotShrinkFactor` to 2 can reduce token consumption while maintaining clarity, but it is not recommended to set it higher than 3, as this may cause the image to be too blurry and affect the AI model's understanding.
- For web pages, if the content is complex or contains a lot of details, it is not recommended to set `screenshotShrinkFactor` to avoid overly blurry screenshots. Additionally, if you want higher clarity for web page screenshots, you can configure Puppeteer or Playwright's `deviceScaleFactor` to 2, which will allow Puppeteer or Playwright to render the page as if it were a high-definition screen.

### Custom model configuration

Expand Down Expand Up @@ -777,7 +780,7 @@ function aiLocate(
height: number;
};
center: [number, number];
scale: number; // device pixel ratio
dpr: number; // device pixel ratio
}>;
```

Expand Down
6 changes: 2 additions & 4 deletions apps/site/docs/en/integrate-with-any-interface.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ export interface SampleDeviceConfig {
deviceName?: string;
width?: number;
height?: number;
dpr?: number;
}

/**
Expand All @@ -86,7 +85,6 @@ export class SampleDevice implements AbstractInterface {
deviceName: config.deviceName || 'Sample Device',
width: config.width || 1920,
height: config.height || 1080,
dpr: config.dpr || 1,
};
}

Expand All @@ -101,12 +99,12 @@ export class SampleDevice implements AbstractInterface {

/**
* Required: Get interface dimensions
* The width and height here refer to the logical size of the interface, not considering the device pixel ratio (dpr). The coordinates obtained from actions like defineActionTap are also based on this logical coordinate system. You can convert logical coordinates to physical coordinates in your action implementations if needed.
*/
async size(): Promise<Size> {
return {
width: this.config.width,
height: this.config.height,
dpr: this.config.dpr,
};
}

Expand Down Expand Up @@ -287,7 +285,7 @@ These are the required methods that you need to implement:

- `interfaceType: string`: define a name for the interface, this will not be provided to the AI model
- `screenshotBase64(): Promise<string>`: take a screenshot of the interface and return the base64 string with the `'data:image/` prefix
- `size(): Promise<Size>`: the size and dpr of the interface, which is an object with the `width`, `height`, and `dpr` properties
- `size(): Promise<Size>`: the size of the interface, which is an object with the `width` and `height` properties
- `actionSpace(): DeviceAction[] | Promise<DeviceAction[]>`: the action space of the interface, which is an array of `DeviceAction` objects. Use predefined actions or define any custom action.

Type signatures:
Expand Down
5 changes: 4 additions & 1 deletion apps/site/docs/zh/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ Midscene 针对每个不同环境都有对应的 Agent。每个 Agent 的构造
- 使用 Node.js:`npx serve`
- 使用 Python:`python -m http.server` 或 `python3 -m http.server`
然后通过 `http://localhost:3000`(或终端显示的端口)访问报告。
- `screenshotShrinkFactor: number`: 控制截图的缩放比例,以减少发送给 AI 模型的图像大小,从而减少 token 消耗。默认值为 1(不缩放)。如果将其设置为 2,则截图的宽高将缩小为原来的一半,面积缩小为原来的四分之一。你可以根据实际情况调整这个值,以在图像清晰度和 token 消耗之间找到最佳平衡点。
- 对于移动端设备,将 `screenshotShrinkFactor` 设置为 2 可以在保持清晰度的同时减少 token 的消耗,但不建议设置的值超过 3,否则可能会导致图像过于模糊,影响 AI 模型的理解。
- 对于 Web 页面,如果页面内容比较复杂或包含大量细节,不建议设置 `screenshotShrinkFactor`,以避免截图过于模糊。此外,如果为了让 Web 页面截图有更高的清晰度,可以配置 Puppeteer 或 Playwright 的 `deviceScaleFactor` 为 2,这可以让 Puppeteer 或 Playwright 按照高清屏的方式来渲染页面。

### 自定义模型

Expand Down Expand Up @@ -773,7 +776,7 @@ function aiLocate(
height: number;
};
center: [number, number];
scale: number; // device pixel ratio
dpr: number; // device pixel ratio
}>;
```

Expand Down
6 changes: 2 additions & 4 deletions apps/site/docs/zh/integrate-with-any-interface.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ export interface SampleDeviceConfig {
deviceName?: string;
width?: number;
height?: number;
dpr?: number;
}

/**
Expand All @@ -85,7 +84,6 @@ export class SampleDevice implements AbstractInterface {
deviceName: config.deviceName || 'Sample Device',
width: config.width || 1920,
height: config.height || 1080,
dpr: config.dpr || 1,
};
}

Expand All @@ -100,12 +98,12 @@ export class SampleDevice implements AbstractInterface {

/**
* 必需:获取界面尺寸
* 这里的宽高是指界面的逻辑尺寸(logical size),不需要考虑设备像素比(dpr)。defineActionTap 等动作得到的坐标也是基于这个逻辑尺寸的坐标系。你可以在动作实现中根据需要将逻辑坐标转换为物理坐标。
*/
async size(): Promise<Size> {
return {
width: this.config.width,
height: this.config.height,
dpr: this.config.dpr,
};
}

Expand Down Expand Up @@ -260,7 +258,7 @@ import { AbstractInterface } from '@midscene/core';

- `interfaceType: string`:为界面定义一个名称,这不会提供给 AI 模型
- `screenshotBase64(): Promise<string>`:截取界面的屏幕截图并返回带有 `'data:image/` 前缀的 base64 字符串
- `size(): Promise<Size>`:界面的大小和 dpr,它是一个具有 `width`、`height` 和 `dpr` 属性的对象
- `size(): Promise<Size>`:界面的大小,它是一个具有 `width` 和 `height` 属性的对象
- `actionSpace(): DeviceAction[] | Promise<DeviceAction[]>`:界面的动作空间,它是一个 `DeviceAction` 对象数组。在这里你可以使用预定义动作,或是自定义交互操作。

类型签名:
Expand Down
2 changes: 0 additions & 2 deletions packages/android/src/device.ts
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,6 @@ ${Object.keys(size)
return {
physicalWidth: Number.parseInt(match[1], 10),
physicalHeight: Number.parseInt(match[2], 10),
dpr: this.devicePixelRatio,
orientation: screenSize.orientation,
isCurrentOrientation: screenSize.isCurrentOrientation,
};
Expand Down Expand Up @@ -890,7 +889,6 @@ ${Object.keys(size)
return {
width: logicalWidth,
height: logicalHeight,
dpr: this.devicePixelRatio,
};
}

Expand Down
2 changes: 0 additions & 2 deletions packages/android/src/scrcpy-device-adapter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ interface ResolvedScrcpyConfig {
export interface DevicePhysicalInfo {
physicalWidth: number;
physicalHeight: number;
dpr: number;
orientation: number;
isCurrentOrientation?: boolean;
}
Expand Down Expand Up @@ -174,7 +173,6 @@ export class ScrcpyDeviceAdapter {
return {
width: resolution.width,
height: resolution.height,
dpr: deviceInfo.dpr,
};
}

Expand Down
6 changes: 3 additions & 3 deletions packages/android/tests/unit-test/agent.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ describe('AndroidAgent', () => {
interfaceType: 'android',
actionSpace: vi.fn().mockReturnValue([]),
screenshotBase64: vi.fn(),
size: vi.fn().mockResolvedValue({ width: 0, height: 0, dpr: 1 }),
size: vi.fn().mockResolvedValue({ width: 0, height: 0 }),
getElementsInfo: vi.fn(),
url: vi.fn(),
launch: vi.fn(),
Expand Down Expand Up @@ -165,7 +165,7 @@ describe('AndroidAgent', () => {
interfaceType: 'android',
actionSpace: vi.fn().mockReturnValue([]),
screenshotBase64: vi.fn(),
size: vi.fn().mockResolvedValue({ width: 0, height: 0, dpr: 1 }),
size: vi.fn().mockResolvedValue({ width: 0, height: 0 }),
getElementsInfo: vi.fn(),
url: vi.fn(),
launch: vi.fn(),
Expand All @@ -192,7 +192,7 @@ describe('AndroidAgent', () => {
interfaceType: 'android',
actionSpace: vi.fn().mockReturnValue([]),
screenshotBase64: vi.fn(),
size: vi.fn().mockResolvedValue({ width: 0, height: 0, dpr: 1 }),
size: vi.fn().mockResolvedValue({ width: 0, height: 0 }),
getElementsInfo: vi.fn(),
url: vi.fn(),
launch: vi.fn(),
Expand Down
6 changes: 1 addition & 5 deletions packages/android/tests/unit-test/page.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ describe('AndroidDevice', () => {
const size1 = await device.size();
const size2 = await device.size();

expect(size1).toEqual({ width: 540, height: 960, dpr: 2 });
expect(size1).toEqual({ width: 540, height: 960 });
expect(size2).toEqual(size1);
// Caching is removed, so it should be called twice
expect(vi.spyOn(device as any, 'getScreenSize')).toHaveBeenCalledTimes(2);
Expand Down Expand Up @@ -194,7 +194,6 @@ describe('AndroidDevice', () => {
vi.spyOn(device, 'size').mockResolvedValue({
width: 1080,
height: 1920,
dpr: 2,
});
vi.spyOn(ImgUtils, 'isValidImageBuffer').mockReturnValue(true);
vi.spyOn(ImgUtils, 'resizeAndConvertImgBuffer').mockImplementation(
Expand Down Expand Up @@ -1174,7 +1173,6 @@ describe('AndroidDevice', () => {
vi.spyOn(device, 'size').mockResolvedValue({
width: 1080,
height: 1920,
dpr: 1,
});
});

Expand Down Expand Up @@ -1956,7 +1954,6 @@ describe('AndroidDevice', () => {
vi.spyOn(deviceWithDisplay, 'size').mockResolvedValue({
width: 1080,
height: 1920,
dpr: 2,
});

await deviceWithDisplay.screenshotBase64();
Expand Down Expand Up @@ -2091,7 +2088,6 @@ describe('AndroidDevice', () => {
vi.spyOn(deviceWithDisplay, 'size').mockResolvedValue({
width: 1080,
height: 1920,
dpr: 2,
});

await deviceWithDisplay.screenshotBase64();
Expand Down
5 changes: 0 additions & 5 deletions packages/android/tests/unit-test/scrcpy-adapter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ vi.mock('@midscene/shared/img', () => ({
const defaultDeviceInfo: DevicePhysicalInfo = {
physicalWidth: 1080,
physicalHeight: 1920,
dpr: 2.625,
orientation: 0,
};

Expand Down Expand Up @@ -159,7 +158,6 @@ describe('ScrcpyDeviceAdapter', () => {
const highRes: DevicePhysicalInfo = {
physicalWidth: 1440,
physicalHeight: 3120,
dpr: 3.2,
orientation: 0,
};
const config = adapter.resolveConfig(highRes);
Expand All @@ -175,7 +173,6 @@ describe('ScrcpyDeviceAdapter', () => {
const highRes: DevicePhysicalInfo = {
physicalWidth: 1440,
physicalHeight: 3120,
dpr: 3.2,
orientation: 0,
};
const config = adapter.resolveConfig(highRes);
Expand All @@ -187,7 +184,6 @@ describe('ScrcpyDeviceAdapter', () => {
const landscape: DevicePhysicalInfo = {
physicalWidth: 1920,
physicalHeight: 1080,
dpr: 2,
orientation: 1,
};
const config = adapter.resolveConfig(landscape);
Expand Down Expand Up @@ -237,7 +233,6 @@ describe('ScrcpyDeviceAdapter', () => {
expect(size).toEqual({
width: 576,
height: 1024,
dpr: 2.625,
});
});
});
Expand Down
1 change: 0 additions & 1 deletion packages/computer/src/device.ts
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,6 @@ Available Displays: ${displays.length > 0 ? displays.map((d) => d.name).join(',
return {
width: screenSize.width,
height: screenSize.height,
dpr: 1, // Desktop typically uses logical pixels
};
} catch (error) {
debugDevice(`Failed to get screen size: ${error}`);
Expand Down
Loading