The mental journey of a log analysis tool

The mental journey of a log analysis tool

background

  • Language selection: On the one hand, I chose golang as my personal preference. On the other hand, a compiled language would theoretically be faster.
  • Other reasons: The history tool is implemented by the shell using various linux commands, and it is not very accurate in filtering logs.

Tool concept

  • Solve the current problems, the log query is incomplete
  • More efficient
  • More comprehensive analysis functions
  • Support a variety of outputs to facilitate subsequent alarm analysis and use

Tool implementation process

Tool vision:

  • First of all, how to find the log data accurately?
    • There will be a lot of log files in the log directory, all kinds of log content
    • The log will be rolled back continuously, and each type of log will have a number of log files for rollback
  • The iterative process of the solution
  • Filter necessary documents
    • First version
      • My personal idea is to make some similar structures below to represent each file
//AccessFile access log object
type AccessFile struct {
    FirstLine *AccessLog//The first line of the file
    LastLine *AccessLog//The last line of the file
    Stat os.FileInfo//File handle
    Filename string//File name 
    File *os.File.//File handle
    StartFlag int64//The first position where data matching is required   
    EndFlag int64//The last position where the data needs to be matched  
    All bool//All requirements are included  
    Some bool//Required content part contains 
}   
 
The function to read the first and last lines looks like this: 
//ReadFileFirstLine reads the first line of the file 
func ReadFileFirstLine(filename string) (line string) { 
    file, err := os.OpenFile(filename, os.O_RDONLY, os.ModePerm) 
    defer file.Close() 
    if err != nil { 
        panic(err) 
    } 
    var linebyte = make([]byte, 5*logger.KB) 
    length, err := file.Read(linebyte) 
    if length <0 { 
        return "" 
    } 
    if err != nil && err != io.EOF { 
        panic(err) 
    } 
    linebuf := bytes.NewReader(linebyte) 
    linebufio := bufio.NewReader(linebuf) 
    lineb, _, err := linebufio.ReadLine() 
    if err != nil { 
        panic(err) 
    } 
    if err == io.EOF { 
        return 
    } 
    return string(lineb) 
} 
//ReadFileLastLine reads the last line of the file 
func ReadFileLastLine(filename string) (line string) { 
    stat, err := os.Stat(filename) 
    if err != nil { 
        panic(err) 
    } 
    file, err := os.OpenFile(filename, os.O_RDONLY, os.ModePerm) 
    defer file.Close() 
    if err != nil { 
        panic(err) 
    } 
    var linebyte = make([]byte, 5*logger.KB) 
    indexlog := stat.Size()-5*logger.KB 
    if indexlog <0 { 
        indexlog = 0 
    } 
    DeBugPrintln("file: ", filename, "filesize:", stat.Size()) 
    length, err := file.ReadAt(linebyte, indexlog) 
    if length <0 { 
        return "" 
    } 
    if err != nil && err != io.EOF { 
        panic(err) 
    } 
    linebuf := string(linebyte) 
    linelist := strings.Split(linebuf, "\n") 
    if len(linelist) <2 { 
        return "" 
    } 
    line = linelist[len(linelist)-2:][0] 
    DeBugPrintln(string(line)) 
    return line 
} 
  • Through the above structure, map each log file in the corresponding directory, and then filter out the useless files through the following stupid method
Pseudo thinking:
Suppose the log interval we need is logstime ~ logetime (logstime is less than logetime)
The time of the first and last lines of a file is filestime ~ fileetime (recall the log writing scenario, filestime must be less than fileetime)
Then there will be the following six situations:
1. Filestime -> fileetime —> logstime —> logetime f.All=false, f.Some=false 
2. Filestime -> logstime -> fileetime -> logetime f.All=false, f.Some=true 
3. Logstime -> filestime -> logetime -> fileetime f.All=false, f.Some=true 
4. Logstime -> logetime -> filestime -> fileetime f.All=false, f.Some=false 
5. Filestime -> logstime -> logetime -> Fileetime f.All=false, f.Some=true 
6. Logstime -> Filestime -> Fileetime -> Logetime. f.All=true, f.Some=false
Through the implementation of the above algorithm, we can filter out some files that are not in the range we need, and then leave the files that need to be analyzed for processing. 
  • 2.version
    • After discussing with some development students, I found that each file actually has an mtime. The log file where the log information we generally need is located. The mtime should be within or after the log needs time, so the above filtering can be optimized a bit
Code logic:
Assuming that the modification time of the file is mtime, the log interval is still logstime ~ logetime 
There are several situations: 
mtime -> logstime -> logetime is ignored directly, it will not contain the files we need 
Logstime -> mtime -> logstime must contain the information we need

Logstime -> logetime -> mtime may contain the information we need
  • After this screening, the first version of the method is used to screen the files for the next time.
  • So far, we have found all the log files that contain the information we need.
  • File information read processing after screening
    • First version
      • Judge the All and Some variables of the file abstract object. If All = true, the files are all matched.
      • By traversing all the structures once, and then filtering out all the objects that do not contain the required content, clean up
      • Next, process the remaining file objects in turn, all files containing the content we need, grab the file content directly from the file handle, and then if some is true, use the regular to find the first data we need, and then start reading The data is loaded, and the valid data is placed in the fiterpro.
      • There is a pit here, which was discovered in later practice, because most of the time the log file is quite large, and then it will cause a problem. Here, you need to load all the files into the memory, and then it is easy to cause insufficient memory. , The program hangs.
    • 2.version
      • To judge and filter the file module is still the same, that is, use the first line and the last line to compare the time
      • The main thing is to deal with the problem of loading all the logs above and then matching. This is obviously very unreasonable, because we don't necessarily need so much information, and at the same time, loading so much information also consumes a lot of I/O. 
      • Solution: Divide the file into blocks of a specified size (for example, 50M), and then offset the data to read. At the same time, the processing speed of the block is much faster, and reading small blocks of data does not require I/O. There will be too much pressure, this may miss some data, but for the processing of log information, this bit of data is completely acceptable.
      • Code block:
 for _, uf := range upro.LogFile { 
        var n int64 
        DeBugPrintln(uf.Filename) 
        for n <uf.Stat.Size() { 
            var linedata = make([]byte, zonesize) 
            nu, err := uf.File.ReadAt(linedata, n) 
            DeBugPrintln(nu, n, err) 
            if err != nil && err != io.EOF { 
                break 
            } 
            wg.Add(1) 
            go proUpstreamLogFile(uf.All, uf.Some, linedata, upro, host, directory, &wg) 
            n += int64(nu) 
        } 
        wg.Wait() 
        if upro.AllNum >= upro.MaxSize { 
            break 
        } 
    } 
  • Data filtering processing
    • First version
      • After the above processing, we have captured the required data into our filter.
      • Filter data processing, filter out the data we need.
      • Here is on-demand processing, there is no major change for the time being, just to minimize the number of repeated useless executions of the program.

Some experiences in the process of going online

Background: Logs are stored on the server. When there are thousands of services that need to process logs, what should we do?

Here are my thoughts on the problems I encountered:

  • Is it possible to use a CDN for the command file distribution server, because if the file is 2M larger, there is still pressure to download tens of thousands of thousands of files.
  • The command is a general function, can it be issued once, saved locally and called directly next time? 
  • How should the file update be done? Oh, it seems to be md5, compare md5, and then decide whether the program needs to be updated.

The following is my current plan:

  • When we need to call file data, send the md5 of the latest command file to the local before calling, and then check whether there is a command file locally, if it does not exist, download the latest command file directly, and then execute the operation of issuing the command . If it exists, load the local file to calculate the md5 and compare it with the md5 issued by the server. If the md5 calculated on both sides are the same, use the currently existing command file directly. Otherwise, update the local file.
  • Note: If there is no such operation, many risk points are really unexpected at the beginning. Many times when doing development, we still have to discuss with the big brothers and sisters. Others really have a lot of experience and can give us a lot of guidance.

Finally, thank you all for reading, and if you have more ideas, welcome to discuss with me.

Reference: https://cloud.tencent.com/developer/article/1385375 The mental journey of a log analysis tool-Cloud + Community-Tencent Cloud