Implement MarkdownAuditorService - Audit markdown files and detect auto-generated docs
Migrated from llm/agent-buildkit#4 on 2025-10-12T20:16:19.127Z Original author: @group_286_bot_4df652bbc58e62e505f1a777cd8f21b8 | Created: 2025-10-11T03:56:42.780Z
MarkdownAuditorService Implementation
Parent Issue
Related to #1 (Project Health Auditor Epic)
Objective
Implement MarkdownAuditorService that audits all markdown files in a project, detects auto-generated documentation (with "DO NOT EDIT" headers), and identifies duplicate README files.
Contract (Port Interface)
Implements: `IMarkdownAuditorPort` from `/src/services/ports/project-health.port.ts`
Required Methods
```typescript interface IMarkdownAuditorPort { auditMarkdown(projectPath: string): Promise; isGenerated(filePath: string): Promise; findDuplicateReadmes(projectPath: string): Promise<string[]>; } ```
Responsibilities (SINGLE RESPONSIBILITY)
ONLY audit markdown files. Does NOT:
- Audit scripts (ScriptAuditorService)
- Analyze structure (StructureAnalyzerService)
- Calculate health (HealthScoringService)
Implementation Details
File Location
`/src/services/domain/project-health/MarkdownAuditorService.ts`
Auto-Generated Detection Patterns
Must detect these patterns in markdown files:
```typescript const GENERATED_PATTERNS = [ /<!--\sDO NOT EDIT/i, /<!--\sAUTO[-_]?GENERATED/i, /<!--\sThis file is automatically generated/i, /^\s<!--\s*@generated/im, /@generated automatically/i, /Generated by/i, /Auto-generated/i, /This file was automatically/i, ]; ```
Algorithm: isGenerated()
```typescript async isGenerated(filePath: string): Promise { const content = await this.fs.readFile(filePath, 'utf-8');
// Check first 50 lines for generation markers const lines = content.split('\n').slice(0, 50); const header = lines.join('\n');
for (const pattern of GENERATED_PATTERNS) { if (pattern.test(header)) { return true; } }
// Check if file is in a generated docs directory const generatedDirs = ['api-docs', 'generated', 'dist/docs', '.docusaurus']; for (const dir of generatedDirs) { if (filePath.includes(`/${dir}/`)) { return true; } }
return false; } ```
Algorithm: findDuplicateReadmes()
```typescript async findDuplicateReadmes(projectPath: string): Promise<string[]> { const readmes = await this.findFiles(projectPath, /^README.md$/i);
// Root README is always valid const rootReadme = join(projectPath, 'README.md');
// Any other README is considered a duplicate const duplicates = readmes.filter(path => path !== rootReadme);
return duplicates; } ```
Algorithm: auditMarkdown()
```typescript async auditMarkdown(projectPath: string): Promise { const files: MarkdownInfo[] = [];
// 1. Find all markdown files const mdFiles = await this.findFiles(projectPath, /.md$/i);
// 2. Analyze each file for (const file of mdFiles) { const isGen = await this.isGenerated(file); const hasDoNotEdit = await this.hasDoNotEditMarker(file);
files.push({
path: file,
isGenerated: isGen,
hasDoNotEdit,
recommendation: this.getRecommendation(file, isGen, hasDoNotEdit)
});
}
// 3. Find duplicate READMEs const duplicateReadmes = await this.findDuplicateReadmes(projectPath);
// 4. Calculate summary const summary = { total: files.length, generated: files.filter(f => f.isGenerated).length, manual: files.filter(f => !f.isGenerated).length };
return { projectName: basename(projectPath), files, summary }; } ```
Recommendations Logic
```typescript private getRecommendation( path: string, isGenerated: boolean, hasDoNotEdit: boolean ): string { const filename = basename(path);
// Duplicate README
if (filename.toUpperCase() === 'README.MD' && !path.endsWith('/README.md')) {
return '
// Generated without DO NOT EDIT marker
if (isGenerated && !hasDoNotEdit) {
return '
// DO NOT EDIT but not in generated dir
if (hasDoNotEdit && !path.includes('/generated/') && !path.includes('/api-docs/')) {
return '
// Manual markdown in root clutter
if (!isGenerated && path.split('/').length <= 3 && filename !== 'README.md') {
const docsPath = 'docs/${filename}';
return `
return '
Helper Methods
```typescript private async hasDoNotEditMarker(filePath: string): Promise { const content = await this.fs.readFile(filePath, 'utf-8'); return /DO NOT EDIT/i.test(content.slice(0, 500)); }
private async findFiles( rootPath: string, pattern: RegExp ): Promise<string[]> { const results: string[] = [];
async function scan(dir: string) { const entries = await this.fs.readdir(dir);
for (const entry of entries) {
const fullPath = join(dir, entry);
const stat = await this.fs.stat(fullPath);
if (stat.isDirectory()) {
if (!SKIP_DIRS.has(entry)) {
await scan(fullPath);
}
} else if (pattern.test(entry)) {
results.push(fullPath);
}
}
}
await scan(rootPath); return results; } ```
Dependencies
- IFileSystemPort: For file operations (inject via constructor)
- Zod schemas: MarkdownAuditReportSchema, MarkdownInfoSchema
Acceptance Criteria
-
Implements all 3 methods from IMarkdownAuditorPort -
Constructor accepts IFileSystemPort for testability -
Detects all auto-generation patterns -
Identifies duplicate README files -
Provides actionable recommendations -
Returns data matching Zod schemas -
Unit tests with mock filesystem (>90% coverage) -
Integration tests with real markdown files
Testing Strategy
```typescript describe('MarkdownAuditorService', () => { let service: MarkdownAuditorService; let mockFs: MockFileSystem;
beforeEach(() => { mockFs = new MockFileSystem(); service = new MarkdownAuditorService(mockFs); });
it('detects DO NOT EDIT marker', async () => { mockFs.addFile('/docs/api.md', '\n# API'); const isGen = await service.isGenerated('/docs/api.md'); expect(isGen).toBe(true); });
it('detects @generated marker', async () => { mockFs.addFile('/types.md', '\n# Types'); const isGen = await service.isGenerated('/types.md'); expect(isGen).toBe(true); });
it('finds duplicate README files', async () => { mockFs.setupDirectory('/', { 'README.md': 'Main readme', 'src/': { 'README.md': 'Duplicate 1' }, 'docs/': { 'README.md': 'Duplicate 2' } });
const duplicates = await service.findDuplicateReadmes('/');
expect(duplicates).toEqual([
'/src/README.md',
'/docs/README.md'
]);
});
it('provides recommendations for duplicate READMEs', async () => { mockFs.addFile('/src/README.md', '# Readme'); const result = await service.auditMarkdown('/');
const srcReadme = result.files.find(f => f.path === '/src/README.md');
expect(srcReadme?.recommendation).toContain('DUPLICATE');
}); }); ```
Example Output
```json
{
"projectName": "agent-buildkit",
"files": [
{
"path": "/README.md",
"isGenerated": false,
"hasDoNotEdit": false,
"recommendation": "
Estimated Effort
8 hours
- Implementation: 4 hours
- Tests: 3 hours
- Documentation: 1 hour
References
- Port: `/src/services/ports/project-health.port.ts:IMarkdownAuditorPort`
- Reference: `/src/services/domain/project-health/ProjectDiscoveryService.ts`
- Schemas: `/src/types/dto/project-health.schemas.ts` (MarkdownAuditReportSchema)
Labels
`enhancement`, `service-implementation`, `documentation`, `health-monitoring`