Nanjing University Unified Identity Authentication Verification Code Recognition: A Full Process Open Source Practice from Dataset Construction to Model Deployment

This article is synchronized and updated to xLog by Mix Space
For the best browsing experience, it is recommended to visit the original link
https://www.do1e.cn/posts/deepl/nju-captcha

Preface#

In the previously written NJUlogin, the account password login required captcha recognition, and I used ddddocr at that time, achieving good accuracy.

I also deployed a server and asked a friend to help write a Tampermonkey script, allowing me to automatically fill in the captcha every time I need to log in (the account password is automatically filled by the browser), so I only need to click login.

Recently, I thought about making the recognition model lighter for easier deployment on edge devices, which led to this project. (Could you give it a Star? ＞︿＜ If you just want to use it and don't want to understand the related technology, scroll to the end for the recommended NJU server API version.)

Github Repo not found

The embedded github repo could not be found…

Implementation Effect

Data Collection#

https://github.com/Do1e/NJUcaptcha/tree/main/build_dataset

The dataset construction is basically automated, mainly relying on the following two tools:

ddddocr: for preliminary captcha recognition
NJUlogin: to verify the correctness of the recognition results

I slightly modified NJUlogin to determine whether the recognition is correct, and then saved them into different folders. For the incorrectly recognized ones (about a few hundred?), I just needed to rename them manually.
To collect 100,000 images, it ran in the background for about 3 to 4 days, and time.sleep couldn't be too small, otherwise, the IP would be blocked. ＞︿＜

Thus, this dataset was created, welcome to download and use, containing 100,000 captcha images, with the file naming format {captcha text}_{image md5}.jpg, and all captcha texts are in lowercase.
Dataset download link: NJU-captcha-dataset.7z
Decompression password: @Do1e

The dataset is as follows:

https://github.com/Do1e/NJUcaptcha/blob/main/model/dataset.py

Recognition Model#

https://github.com/Do1e/NJUcaptcha/tree/main/model

With the data in hand, I could design the model and train it. This time, I completely let AI help me with the model design, and the results are quite good.

Model size 12.98MiB -> 2.25MiB
Model accuracy 99.37% -> 99.83%
Throughput 173.95 images/sec -> 1076.56 images/sec [AMD Ryzen 7 8845H]

https://github.com/Do1e/NJUcaptcha/blob/main/model/model.py

Maybe it can be a bit smaller? ~~Let's save that for the next upgrade~~

Server Deployment#

https://github.com/Do1e/NJUcaptcha/tree/main/service

Previously, I also implemented a simple recognition server using fastapi, which recognizes the received base64 images and returns the content of the captcha. This time, I took the opportunity to deploy it on vercel. Test command under Linux:

curl -s -L "https://authserver.nju.edu.cn/authserver/captcha.html" -o "captcha.jpg" && [ -f "captcha.jpg" ] && curl -s -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "captcha=$(base64 -i captcha.jpg | tr -d '\n')" "https://njucaptcha.vercel.app" || { echo "Failed to download captcha image"; exit 1; }

Tampermonkey Script Auto-fill#

As mentioned in the preface, to achieve automatic captcha recognition and input during login, I wrote a Tampermonkey script for auto-filling. The previous version was server-based:

https://github.com/Do1e/NJUcaptcha/blob/main/njucaptcha.user.js

The open-source code still uses the vercel service, which is very slow and cannot be used when logging into p.nju. (￣﹃￣)

My own solution is to set up a service on campus and map it to my public server through frp, and access the internal service when logging into p.nju:

const url_pub = 'https://example.com/';
const url_nju = 'https://nju.example.com/';
const currentUrl = window.location.href;
const serverUrl = currentUrl.includes('//p.nju.edu.cn') ? url_nju : url_pub;

The most challenging part of this entire project was how to directly execute ONNX inference on the client side. I spent several hours with AI tools to successfully figure it out. Implemented using ONNX Runtime Web.

https://github.com/Do1e/NJUcaptcha/blob/main/njucaptcha_onnx.user.js

One downside of the ONNX version is that it requires an internet connection to download some necessary inference dependencies when there is no cache, but it can cache after the first use (ort-wasm-simd-threaded.jsep.mjs and ort-wasm-simd-threaded.jsep.wasm can only cache for 7 days, which isn't too long. If anyone has a way to achieve a near-permanent cache like @resource, feel free to submit a PR).

In summary, both solutions have their pros and cons. The most recommended is to deploy and use it yourself as I did, or directly use the NJU server API version I provided at the end.

The above versions of the Tampermonkey script can be installed directly by clicking the links below (provided you have the Tampermonkey extension installed):

| | vercel api version | NJU server api version | onnx local inference version |
| :--- | :--- | :--- |
| Advantages | No need for scientific internet access | Best practice, personally considered quite perfect | Very fast, filled in before the page finishes loading, and can be used when logging into p.nju (with caching) |
| Disadvantages | Very slow, and cannot be used when logging into p.nju | Requires deployment on an internal and external server, which I will not be able to use after graduation | Requires scientific internet access to cache some files without cache, cannot be used when logging into p.nju, and the cache lasts only 7 days |

Note: This code is licensed under the GPL-3.0 open-source agreement, please ignore the following explanation about the open-source agreement. ~~I’m too lazy to change the webpage code, my website has the final say, right?~~